[PATCH 1/4] core/conv: Add optimized Viterbi decoding

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.

Tom Tsou tom at tsou.cc
Fri May 9 18:23:03 UTC 2014


On Fri, May 9, 2014 at 4:22 AM, Sylvain Munaut <246tnt at gmail.com> wrote:
> Among all those not marked 'static' and don't have an osmo_ prefix, I didn't
> yet do an exhaustive check to see what was an oversight and what's really
> needed to be global.

Overall, I don't think there should be _any_ non-static or global
variable as that was the original intent. Admittedly, there were some
oversights in the architecture specific BMU/PMU calls.

>>>  - Did you check (and test in make test) that it works for all codes
>>> that matches
>>>    "if ((code->N <= 4) && ((code->K == 5) || (code->K == 7)))" ?
>>
>> Almost all codes used by osmo-bts and a few others are covered by make
>> check. Rates N=5 are skipped and TCH/HR passes with the posted
>> puncturing fix for osmo-bts.
>
> I was more thinking about random codes. Like if someone invents a random
> code with N=4 and K=5, will it work ? When implementing it, you didnt see
> any other limitations that those right ?

The original implementation used octal based generator polynomials, so
all non-degenerate cases are handled. By degenerate case I mean
something like a K=7 code with generator polynomial of 0104, which is
really a shifted K=5 code. No practical implementation will handle
those cases of bad codes because the symmetric/anti-symmetric ACS
properties of the trellis vanish.

Similarly for recursive codes, the register feedback is based on the
generator polynomial with the allowance that the systematic bit can be
on any output. The AFS codes require that, which was really annoying.

>> 1. Single file for generic and architecture specific implementations
>> with large ifdef blocks.
>> 2. Separate files with exported symbols (current patch).
>> 3. Separate files with restricted visibility with
>> __attribute__((__visibility__("hidden"))) or other combinations with
>> the gcc -fvisibility flag.
>
> I think we'll need to play with 3. Still probably need a prefix (not
> the full "osmo_conv_", but something less conflicty thatn gen_)

Agreed on internal prefixing.

Currently, libosmocore checks for the availability of hidden symbol
visibility, sets SYMBOL_VISIBILITY="-fvisibility=hidden", but ignores
use the result. Without making library wide changes or using ifdef
blocks, the only approach is to mark the osmo_conv_gen calls with
hidden visibility

> When I designed the API, I did do benchmark and just repeated them, and
> there is virtually no difference between the two versions. ( < 1% )

The allocation effects are case specific and more significant as the
compute load drops. Also, you may have noticed the bitswap operations,
which only exist because the trellis and API have conflicting trellis
endian formats. Those calls are only used during initialization.
Parallel decoding threads also seems affected.

Overall though, for GSM where the data rates are quite low, the
allocate and tear-down effects, while measurable, are probably not
significant enough to justify an API change. There may be a case on
ARM or Epiphany, but I have not tested those platforms enough to
comment.

> But I guess that's because in the current version, the 'decoder' struct
> stores virtually nothing that can be re-used between different bursts.
> (and malloc itself is pretty fast nowadays)

The main memory allocation is the path metric storage, which is
unavoidable, and similar in both cases. The current version does store
paths with 8-bits instead of 16-bits, so that allocation is halved.
The SSE outputs are 16-bit so converting to 8-bit at every stage
wasn't worth the overhead. Still, cache performance on the submitted
code codes out far ahead.

$ perf stat -e cache-misses,page-faults ./conv_test -c 3 -b -i 500000 -o

=================================================
[+] Testing: GPRS CS3
[.] Specs: (N=2, K=5, non-recursive, flushed, not punctured)
[.] Input length  : ret = 334  exp = 334 -> OK
[.] Output length : ret = 676  exp = 676 -> OK

[.] Performance benchmark:
[..] Encoding / Decoding 500000 bursts on 1 thread(s):
[..] Testing base:
[..] Elapsed time....................... 18.189684 secs
[..] Rate............................... 9.181028 Mbps

 Performance counter stats for './conv_test -c 3 -b -i 500000 -o':

            65,116      cache-misses
               284      page-faults

      18.191569647 seconds time elapsed


$ perf stat -e cache-misses,page-faults ./conv_test -c 3 -b -i 500000 -s

=================================================
[+] Testing: GPRS CS3
[.] Specs: (N=2, K=5, non-recursive, flushed, not punctured)
[.] Input length  : ret = 334  exp = 334 -> OK
[.] Output length : ret = 676  exp = 676 -> OK

[.] Performance benchmark:
[..] Encoding / Decoding 500000 bursts on 1 thread(s):
[..] Testing SIMD:
[..] Elapsed time....................... 1.820236 secs
[..] Rate............................... 91.746345 Mbps

 Performance counter stats for './conv_test -c 3 -b -i 500000 -s':

            11,508      cache-misses
               288      page-faults

       1.822036339 seconds time elapsed

> In your case there is the treillis derived from the 'code' struct. And I
> guess this is where the performance hits comes from. An option would be
> to have an internal 'cache' of codes that have been used in the past
> so those data would be re-used.

Internal caching was in the original implementation, but stripped from
the submitted version mainly for simplicity and avoiding the need for
global variables, though we seem to be having that discussion anyway
;-) The trellis values can be cached based on pointer or hashed code.
That works well until threading is involved and cache access needs to
be locked. Those are features I need, but can probably be ignored in
this case.

Again, I think the API should be kept intact. Internal caching, can be
a topic for later discussion.

 -TT




More information about the OpenBSC mailing list