[PATCH 0/4] core/conv: Fast Viterbi decoding

List overview All Threads
Download

newer

older

[PATCH 1/2] Adjust si2quater ranges

[PATCH 1/9] debug: Add DSS...

Thomas Tsou

29 Apr 2014 29 Apr '14

2:13 p.m.

Hi all,

This patch set adds to libosmocore an optimized Viterbi decodeer for architecture specific (Intel SSE) and non-specific cases. The implementation covers codes with constraint lengths of K=5 and K=7 and rates 1/4 to 3/4, which make up the majority of GSM use cases. Speedup from the current implementation is in the range of 5 to 20 depending on the processor and code type. API is unchanged.

Tested on Haswell (i7-4770K) and Atom (D2550). Additional test codes from osmo-bts are included. Further tests for AWGN bit-error-rate and benchmarks can be found in the following repository.

https://github.com/ttsou/osmo-conv-test

Here are some examples.

Bit error test for GPRS CS2 with SNR of 5 dB and 100000 bursts.

$ ./conv_test -c 2 -e -r 5 -i 100000

================================================= [+] Testing: GPRS CS2 [.] Specs: (N=2, K=5, non-recursive, flushed, not punctured) [.] Input length : ret = 290 exp = 290 -> OK [.] Output length : ret = 588 exp = 588 -> OK

[.] BER tests: [..] Testing base: [..] Input BER.......................... 0.042443 [..] Output BER......................... 0.000006 [..] Output FER......................... 0.001350 (135) [..] Testing SIMD: [..] Input BER.......................... 0.042460 [..] Output BER......................... 0.000005 [..] Output FER......................... 0.001240 (124)

Timed AFS benchmark with 8 threads and 100000 bursts per thread.

$ ./conv_test -b -c 10 -j 8 -i 100000

================================================= [+] Testing: GSM TCH/AFS 6.7 [.] Specs: (N=4, K=5, recursive, flushed, punctured) [.] Input length : ret = 140 exp = 140 -> OK [.] Output length : ret = 448 exp = 448 -> OK

[.] Performance benchmark: [..] Encoding / Decoding 800000 bursts on 8 thread(s): [..] Testing base: [..] Elapsed time....................... 4.320001 secs [..] Rate............................... 25.925920 Mbps [..] Testing SIMD: [..] Elapsed time....................... 0.458272 secs [..] Rate............................... 244.396341 Mbps [..] Speedup............................ 9.426718

-TT

Show replies by date

Alexander Chemeris

29 Apr 29 Apr

4:09 p.m.

Hi Thomas,

Thank you for submitting this patch at. Finally :) It promises a very good improvement for the OsmoBTS performance.

May I suggest you to check in the test cases into a test or contrib directory of libosmocore? This will help ensure that we do not loose them and that they stay in sync with the code. Ideally, we should also run validity checks on "make check" as well.

29 апр. 2014 г. 8:13 пользователь "Thomas Tsou" tom@tsou.cc написал:

...

Hi all,

This patch set adds to libosmocore an optimized Viterbi decodeer for architecture specific (Intel SSE) and non-specific cases. The implementation covers codes with constraint lengths of K=5 and K=7 and rates 1/4 to 3/4, which make up the majority of GSM use cases. Speedup from the current implementation is in the range of 5 to 20 depending on the processor and code type. API is unchanged.

Tested on Haswell (i7-4770K) and Atom (D2550). Additional test codes from osmo-bts are included. Further tests for AWGN bit-error-rate and benchmarks can be found in the following repository.

https://github.com/ttsou/osmo-conv-test

Here are some examples.

Bit error test for GPRS CS2 with SNR of 5 dB and 100000 bursts.

$ ./conv_test -c 2 -e -r 5 -i 100000

================================================= [+] Testing: GPRS CS2 [.] Specs: (N=2, K=5, non-recursive, flushed, not punctured) [.] Input length : ret = 290 exp = 290 -> OK [.] Output length : ret = 588 exp = 588 -> OK

[.] BER tests: [..] Testing base: [..] Input BER.......................... 0.042443 [..] Output BER......................... 0.000006 [..] Output FER......................... 0.001350 (135) [..] Testing SIMD: [..] Input BER.......................... 0.042460 [..] Output BER......................... 0.000005 [..] Output FER......................... 0.001240 (124)

Timed AFS benchmark with 8 threads and 100000 bursts per thread.

$ ./conv_test -b -c 10 -j 8 -i 100000

================================================= [+] Testing: GSM TCH/AFS 6.7 [.] Specs: (N=4, K=5, recursive, flushed, punctured) [.] Input length : ret = 140 exp = 140 -> OK [.] Output length : ret = 448 exp = 448 -> OK

[.] Performance benchmark: [..] Encoding / Decoding 800000 bursts on 8 thread(s): [..] Testing base: [..] Elapsed time....................... 4.320001 secs [..] Rate............................... 25.925920 Mbps [..] Testing SIMD: [..] Elapsed time....................... 0.458272 secs [..] Rate............................... 244.396341 Mbps [..] Speedup............................ 9.426718

-TT

Tom Tsou

2 May 2 May

4:02 a.m.

On Tue, Apr 29, 2014 at 2:09 AM, Alexander Chemeris alexander.chemeris@gmail.com wrote:

...

May I suggest you to check in the test cases into a test or contrib directory of libosmocore? This will help ensure that we do not loose them and that they stay in sync with the code. Ideally, we should also run validity checks on "make check" as well.

The patch for additional test codes was already posted and is run with 'make check'. Those cases mostly serve as regression tests. For AWGN tests, I feel less certain that wireless channel modeling belong in libosmocore. The error rate simulations will not fit the current make check model since they are probabilistic and non-deterministic by definition.

-TT

Holger Freyther

18 May 18 May

3:05 a.m.

...

On 01 May 2014, at 20:02, Tom Tsou tom@tsou.cc wrote:

Dear Tom,

...

The patch for additional test codes was already posted and is run with 'make check'. Those cases mostly serve as regression tests. For AWGN tests, I feel less certain that wireless channel modeling belong in libosmocore. The error rate simulations will not fit the current make check model since they are probabilistic and non-deterministic by definition.

I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?

kind regards holger

Tom Tsou

19 May 19 May

5:24 a.m.

Hi Holger,

On Sun, May 17, 2015 at 10:05 AM, Holger Freyther holger@freyther.de wrote:

...

I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?

There was some discussion, but we never got around to completing the review. If there is still interest, we can revive that discussion.

-TT

Holger Freyther

4:20 p.m.

...

On 18 May 2015, at 21:23, Tom Tsou tom@tsou.cc wrote:

Hi Holger,

Dear Tom,

...

On Sun, May 17, 2015 at 10:05 AM, Holger Freyther holger@freyther.de wrote:

...
I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?

There was some discussion, but we never got around to completing the review. If there is still interest, we can revive that discussion.

I am not an user of the viterbi decoding but I assume a speed-up will benefit osmo-trx and it would be a shame to lose it.

holger

Alexander Chemeris

20 May 20 May

10:28 a.m.

On Tue, May 19, 2015 at 2:20 AM, Holger Freyther holger@freyther.de wrote:

...

...
On 18 May 2015, at 21:23, Tom Tsou tom@tsou.cc wrote: On Sun, May 17, 2015 at 10:05 AM, Holger Freyther holger@freyther.de wrote:

...
I am going through patchwork and it doesn’t seem this was applied? I assumed you and Sylvain would figure it out and merge it. What should we do with the four patches?

There was some discussion, but we never got around to completing the review. If there is still interest, we can revive that discussion.

I am not an user of the viterbi decoding but I assume a speed-up will benefit osmo-trx and it would be a shame to lose it.

Strictly speaking, it speeds up lower parts of osmo-bts.

And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.

-- Regards, Alexander Chemeris. CEO, Fairwaves, Inc. https://fairwaves.co

Holger Freyther

21 May 21 May

4:53 p.m.

...

On 20 May 2015, at 08:28, Alexander Chemeris alexander.chemeris@gmail.com wrote:

Strictly speaking, it speeds up lower parts of osmo-bts.

And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.

then please find some time to review/include it. I can set barriers like (requires a testcase, ABI/API changes need a TODO-RELEASE entry) but I would like the people that end up using these routines to comment and review them.

thanks holger

Sylvain Munaut

23 May 23 May

10:56 p.m.

...

...
Strictly speaking, it speeds up lower parts of osmo-bts.

And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.

then please find some time to review/include it. I can set barriers like (requires a testcase, ABI/API changes need a TODO-RELEASE entry) but I would like the people that end up using these routines to comment and review them.

Yeah, it's my bad ...

But I'm travelling for the next 10 days so won't happen immediately.

I'll at least try to re-read the discussion to see exactly what needs to be done to get to a state where it can be merged even if it's not perfect.

Cheers,

Sylvain

Holger Freyther

11 p.m.

...

On 23 May 2015, at 20:56, Sylvain Munaut 246tnt@gmail.com wrote:

Good Evening.

...

Yeah, it's my bad …

don’t worry and no pressure. I just went through patchwork and noticed what nice things we didn’t merge yet (e.g. the external libtalloc patch).

holger

Alexander Chemeris

25 May 25 May

11:01 a.m.

Hi Holger,

On Thu, May 21, 2015 at 2:53 AM, Holger Freyther holger@freyther.de wrote:

...

...
On 20 May 2015, at 08:28, Alexander Chemeris alexander.chemeris@gmail.com wrote: Strictly speaking, it speeds up lower parts of osmo-bts.

And yes, it's a great improvement and I hope the review can be finished and patch merged. Viterbi is the most CPU hungry part of osmo-bts on UmTRX and other SDR platforms.

then please find some time to review/include it. I can set barriers like (requires a testcase, ABI/API changes need a TODO-RELEASE entry) but I would like the people that end up using these routines to comment and review them.

Unfortunately only Sylvain can do a meaningful review of that patch. Even though this patch was funded by us and I guess we're the primary users of it, I don't have enough expertise in that part of the code to meaningfully contribute to the discussion. Otherwise I would already did it.

-- Regards, Alexander Chemeris. CEO, Fairwaves, Inc. https://fairwaves.co

Sylvain Munaut

1 Aug 1 Aug

7:16 a.m.

Ok, so I finally took some time to re-read this.

First a quick comment on each patch.

Patch 1/4: See below Patch 2/4: Didn't really look deeply yet, but at first glance looks OK, I'd say let's focus on getting the infra of patch 1 merged first. Patch 3/4: I'm pretty sure -march=native will wreck cross-compiling Patch 4/4: Looks good

For patch 1: * About the state persistence, let's just ignore it for now. It does cost some performance but there is no easy fix, the perf improvement even with this over head is still massive and we can deal with it internally later.

* The symbol visibility and naming comments raised in the thread need to be dealt with. To sum up there is 3 kinds of symbols/names :

- Global functions / structures: Those that either appear in the header or that are visible when doing an objdump on the resulting .so . Those all need the osmo_ prefix

- Internal to the lib: Those that are accessed between several files in the lib. Those don't need an osmo_ prefix for brevity, but they need something less conflicted than 'gen_' ... include 'viterbi' or 'conv' or something to that effect in the name. They need to be hidden from the outside (so they shouldn't show up in the resulting .so), not sure about the current fvisibility situation in libosmocore (don't know if we explicitely mark exported function or if we explicitely mark non-exported one).

- Internal to the file: Make sure they're static, naming is fairly irrelevant

So I'd say re-submit patch 1 and 4 first, forward ported to apply cleanly and pass a make distcheck on current master. We'll get that merged first, then look at integrating the SSE stuff.

Cheers,

Sylvain

Alexander Chemeris

20 Apr 20 Apr

8:55 p.m.

Hi Thomas, Sylvain,

Recent activity around convolutional coding has reminded me about this lost diamond. Just wondering if there is a chance to resolve the critical issues and get this merged?

On Sat, Aug 1, 2015 at 12:16 AM, Sylvain Munaut 246tnt@gmail.com wrote:

...

Ok, so I finally took some time to re-read this.

First a quick comment on each patch.

Patch 1/4: See below Patch 2/4: Didn't really look deeply yet, but at first glance looks OK, I'd say let's focus on getting the infra of patch 1 merged first. Patch 3/4: I'm pretty sure -march=native will wreck cross-compiling Patch 4/4: Looks good

For patch 1:

About the state persistence, let's just ignore it for now. It does

cost some performance but there is no easy fix, the perf improvement even with this over head is still massive and we can deal with it internally later.

The symbol visibility and naming comments raised in the thread need

to be dealt with. To sum up there is 3 kinds of symbols/names :
- Global functions / structures: Those that either appear in the
header or that are visible when doing an objdump on the resulting .so . Those all need the osmo_ prefix
- Internal to the lib: Those that are accessed between several
files in the lib. Those don't need an osmo_ prefix for brevity, but they need something less conflicted than 'gen_' ... include 'viterbi' or 'conv' or something to that effect in the name. They need to be hidden from the outside (so they shouldn't show up in the resulting .so), not sure about the current fvisibility situation in libosmocore (don't know if we explicitely mark exported function or if we explicitely mark non-exported one).
- Internal to the file: Make sure they're static, naming is fairly
irrelevant

So I'd say re-submit patch 1 and 4 first, forward ported to apply cleanly and pass a make distcheck on current master. We'll get that merged first, then look at integrating the SSE stuff.

Cheers,

Sylvain

-- Regards, Alexander Chemeris. CEO, Fairwaves, Inc. https://fairwaves.co

Tom Tsou

27 Apr 27 Apr

4:44 a.m.

On Wed, Apr 20, 2016 at 3:55 AM, Alexander Chemeris alexander.chemeris@gmail.com wrote:

...

Recent activity around convolutional coding has reminded me about this lost diamond. Just wondering if there is a chance to resolve the critical issues and get this merged?

I'm quite backed up at this point, but, yes, I'm aware that the Viterbi patchset is outstanding. I've been working on EDGE coding schemes lately, so this is a good time to get the patch issues addressed.

-TT

3612

Age (days ago)

4340

Last active (days ago)

openbsc@lists.osmocom.org

13 comments

5 participants

tags (0)

participants (5)

Alexander Chemeris
Holger Freyther
Sylvain Munaut
Thomas Tsou
Tom Tsou