Hi all,
This patch set adds to libosmocore an optimized Viterbi decodeer for architecture specific (Intel SSE) and non-specific cases. The implementation covers codes with constraint lengths of K=5 and K=7 and rates 1/4 to 3/4, which make up the majority of GSM use cases. Speedup from the current implementation is in the range of 5 to 20 depending on the processor and code type. API is unchanged.
Tested on Haswell (i7-4770K) and Atom (D2550). Additional test codes from osmo-bts are included. Further tests for AWGN bit-error-rate and benchmarks can be found in the following repository.
https://github.com/ttsou/osmo-conv-test
Here are some examples.
Bit error test for GPRS CS2 with SNR of 5 dB and 100000 bursts.
$ ./conv_test -c 2 -e -r 5 -i 100000
================================================= [+] Testing: GPRS CS2 [.] Specs: (N=2, K=5, non-recursive, flushed, not punctured) [.] Input length : ret = 290 exp = 290 -> OK [.] Output length : ret = 588 exp = 588 -> OK
[.] BER tests: [..] Testing base: [..] Input BER.......................... 0.042443 [..] Output BER......................... 0.000006 [..] Output FER......................... 0.001350 (135) [..] Testing SIMD: [..] Input BER.......................... 0.042460 [..] Output BER......................... 0.000005 [..] Output FER......................... 0.001240 (124)
Timed AFS benchmark with 8 threads and 100000 bursts per thread.
$ ./conv_test -b -c 10 -j 8 -i 100000
================================================= [+] Testing: GSM TCH/AFS 6.7 [.] Specs: (N=4, K=5, recursive, flushed, punctured) [.] Input length : ret = 140 exp = 140 -> OK [.] Output length : ret = 448 exp = 448 -> OK
[.] Performance benchmark: [..] Encoding / Decoding 800000 bursts on 8 thread(s): [..] Testing base: [..] Elapsed time....................... 4.320001 secs [..] Rate............................... 25.925920 Mbps [..] Testing SIMD: [..] Elapsed time....................... 0.458272 secs [..] Rate............................... 244.396341 Mbps [..] Speedup............................ 9.426718
-TT
Hi Thomas,
Thank you for submitting this patch at. Finally :) It promises a very good improvement for the OsmoBTS performance.
May I suggest you to check in the test cases into a test or contrib directory of libosmocore? This will help ensure that we do not loose them and that they stay in sync with the code. Ideally, we should also run validity checks on "make check" as well.
29 апр. 2014 г. 8:13 пользователь "Thomas Tsou" tom@tsou.cc написал:
Hi all,
This patch set adds to libosmocore an optimized Viterbi decodeer for architecture specific (Intel SSE) and non-specific cases. The implementation covers codes with constraint lengths of K=5 and K=7 and rates 1/4 to 3/4, which make up the majority of GSM use cases. Speedup from the current implementation is in the range of 5 to 20 depending on the processor and code type. API is unchanged.
Tested on Haswell (i7-4770K) and Atom (D2550). Additional test codes from osmo-bts are included. Further tests for AWGN bit-error-rate and benchmarks can be found in the following repository.
https://github.com/ttsou/osmo-conv-test
Here are some examples.
Bit error test for GPRS CS2 with SNR of 5 dB and 100000 bursts.
$ ./conv_test -c 2 -e -r 5 -i 100000
================================================= [+] Testing: GPRS CS2 [.] Specs: (N=2, K=5, non-recursive, flushed, not punctured) [.] Input length : ret = 290 exp = 290 -> OK [.] Output length : ret = 588 exp = 588 -> OK
[.] BER tests: [..] Testing base: [..] Input BER.......................... 0.042443 [..] Output BER......................... 0.000006 [..] Output FER......................... 0.001350 (135) [..] Testing SIMD: [..] Input BER.......................... 0.042460 [..] Output BER......................... 0.000005 [..] Output FER......................... 0.001240 (124)
Timed AFS benchmark with 8 threads and 100000 bursts per thread.
$ ./conv_test -b -c 10 -j 8 -i 100000
================================================= [+] Testing: GSM TCH/AFS 6.7 [.] Specs: (N=4, K=5, recursive, flushed, punctured) [.] Input length : ret = 140 exp = 140 -> OK [.] Output length : ret = 448 exp = 448 -> OK
[.] Performance benchmark: [..] Encoding / Decoding 800000 bursts on 8 thread(s): [..] Testing base: [..] Elapsed time....................... 4.320001 secs [..] Rate............................... 25.925920 Mbps [..] Testing SIMD: [..] Elapsed time....................... 0.458272 secs [..] Rate............................... 244.396341 Mbps [..] Speedup............................ 9.426718
-TT
On Tue, Apr 29, 2014 at 2:09 AM, Alexander Chemeris alexander.chemeris@gmail.com wrote:
May I suggest you to check in the test cases into a test or contrib directory of libosmocore? This will help ensure that we do not loose them and that they stay in sync with the code. Ideally, we should also run validity checks on "make check" as well.
The patch for additional test codes was already posted and is run with 'make check'. Those cases mostly serve as regression tests. For AWGN tests, I feel less certain that wireless channel modeling belong in libosmocore. The error rate simulations will not fit the current make check model since they are probabilistic and non-deterministic by definition.
-TT
On 01 May 2014, at 20:02, Tom Tsou tom@tsou.cc wrote:
Dear Tom,
The patch for additional test codes was already posted and is run with 'make check'. Those cases mostly serve as regression tests. For AWGN tests, I feel less certain that wireless channel modeling belong in libosmocore. The error rate simulations will not fit the current make check model since they are probabilistic and non-deterministic by definition.
I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?
kind regards holger
Hi Holger,
On Sun, May 17, 2015 at 10:05 AM, Holger Freyther holger@freyther.de wrote:
I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?
There was some discussion, but we never got around to completing the review. If there is still interest, we can revive that discussion.
-TT
On 18 May 2015, at 21:23, Tom Tsou tom@tsou.cc wrote:
Hi Holger,
Dear Tom,
On Sun, May 17, 2015 at 10:05 AM, Holger Freyther holger@freyther.de wrote:
I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?
There was some discussion, but we never got around to completing the review. If there is still interest, we can revive that discussion.
I am not an user of the viterbi decoding but I assume a speed-up will benefit osmo-trx and it would be a shame to lose it.
holger
On Tue, May 19, 2015 at 2:20 AM, Holger Freyther holger@freyther.de wrote:
On 18 May 2015, at 21:23, Tom Tsou tom@tsou.cc wrote: On Sun, May 17, 2015 at 10:05 AM, Holger Freyther holger@freyther.de wrote:
I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?
There was some discussion, but we never got around to completing the review. If there is still interest, we can revive that discussion.
I am not an user of the viterbi decoding but I assume a speed-up will benefit osmo-trx and it would be a shame to lose it.
Strictly speaking, it speeds up lower parts of osmo-bts.
And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.
On 20 May 2015, at 08:28, Alexander Chemeris alexander.chemeris@gmail.com wrote:
Strictly speaking, it speeds up lower parts of osmo-bts.
And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.
then please find some time to review/include it. I can set barriers like (requires a testcase, ABI/API changes need a TODO-RELEASE entry) but I would like the people that end up using these routines to comment and review them.
thanks holger
Strictly speaking, it speeds up lower parts of osmo-bts.
And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.
then please find some time to review/include it. I can set barriers like (requires a testcase, ABI/API changes need a TODO-RELEASE entry) but I would like the people that end up using these routines to comment and review them.
Yeah, it's my bad ...
But I'm travelling for the next 10 days so won't happen immediately.
I'll at least try to re-read the discussion to see exactly what needs to be done to get to a state where it can be merged even if it's not perfect.
Cheers,
Sylvain
On 23 May 2015, at 20:56, Sylvain Munaut 246tnt@gmail.com wrote:
Good Evening.
Yeah, it's my bad …
don’t worry and no pressure. I just went through patchwork and noticed what nice things we didn’t merge yet (e.g. the external libtalloc patch).
holger
Hi Holger,
On Thu, May 21, 2015 at 2:53 AM, Holger Freyther holger@freyther.de wrote:
On 20 May 2015, at 08:28, Alexander Chemeris alexander.chemeris@gmail.com wrote: Strictly speaking, it speeds up lower parts of osmo-bts.
And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.
then please find some time to review/include it. I can set barriers like (requires a testcase, ABI/API changes need a TODO-RELEASE entry) but I would like the people that end up using these routines to comment and review them.
Unfortunately only Sylvain can do a meaningful review of that patch. Even though this patch was funded by us and I guess we're the primary users of it, I don't have enough expertise in that part of the code to meaningfully contribute to the discussion. Otherwise I would already did it.
Ok, so I finally took some time to re-read this.
First a quick comment on each patch.
Patch 1/4: See below Patch 2/4: Didn't really look deeply yet, but at first glance looks OK, I'd say let's focus on getting the infra of patch 1 merged first. Patch 3/4: I'm pretty sure -march=native will wreck cross-compiling Patch 4/4: Looks good
For patch 1: * About the state persistence, let's just ignore it for now. It does cost some performance but there is no easy fix, the perf improvement even with this over head is still massive and we can deal with it internally later.
* The symbol visibility and naming comments raised in the thread need to be dealt with. To sum up there is 3 kinds of symbols/names :
- Global functions / structures: Those that either appear in the header or that are visible when doing an objdump on the resulting .so . Those all need the osmo_ prefix
- Internal to the lib: Those that are accessed between several files in the lib. Those don't need an osmo_ prefix for brevity, but they need something less conflicted than 'gen_' ... include 'viterbi' or 'conv' or something to that effect in the name. They need to be hidden from the outside (so they shouldn't show up in the resulting .so), not sure about the current fvisibility situation in libosmocore (don't know if we explicitely mark exported function or if we explicitely mark non-exported one).
- Internal to the file: Make sure they're static, naming is fairly irrelevant
So I'd say re-submit patch 1 and 4 first, forward ported to apply cleanly and pass a make distcheck on current master. We'll get that merged first, then look at integrating the SSE stuff.
Cheers,
Sylvain
Hi Thomas, Sylvain,
Recent activity around convolutional coding has reminded me about this lost diamond. Just wondering if there is a chance to resolve the critical issues and get this merged?
On Sat, Aug 1, 2015 at 12:16 AM, Sylvain Munaut 246tnt@gmail.com wrote:
Ok, so I finally took some time to re-read this.
First a quick comment on each patch.
Patch 1/4: See below Patch 2/4: Didn't really look deeply yet, but at first glance looks OK, I'd say let's focus on getting the infra of patch 1 merged first. Patch 3/4: I'm pretty sure -march=native will wreck cross-compiling Patch 4/4: Looks good
For patch 1:
- About the state persistence, let's just ignore it for now. It does
cost some performance but there is no easy fix, the perf improvement even with this over head is still massive and we can deal with it internally later.
- The symbol visibility and naming comments raised in the thread need
to be dealt with. To sum up there is 3 kinds of symbols/names :
- Global functions / structures: Those that either appear in the
header or that are visible when doing an objdump on the resulting .so . Those all need the osmo_ prefix
- Internal to the lib: Those that are accessed between several
files in the lib. Those don't need an osmo_ prefix for brevity, but they need something less conflicted than 'gen_' ... include 'viterbi' or 'conv' or something to that effect in the name. They need to be hidden from the outside (so they shouldn't show up in the resulting .so), not sure about the current fvisibility situation in libosmocore (don't know if we explicitely mark exported function or if we explicitely mark non-exported one).
- Internal to the file: Make sure they're static, naming is fairly
irrelevant
So I'd say re-submit patch 1 and 4 first, forward ported to apply cleanly and pass a make distcheck on current master. We'll get that merged first, then look at integrating the SSE stuff.
Cheers,
Sylvain
On Wed, Apr 20, 2016 at 3:55 AM, Alexander Chemeris alexander.chemeris@gmail.com wrote:
Recent activity around convolutional coding has reminded me about this lost diamond. Just wondering if there is a chance to resolve the critical issues and get this merged?
I'm quite backed up at this point, but, yes, I'm aware that the Viterbi patchset is outstanding. I've been working on EDGE coding schemes lately, so this is a good time to get the patch issues addressed.
-TT