[PATCH 1/4] core/conv: Add optimized Viterbi decoding

historical

Hi Tom,

>>  - Some things that should be 'static' are not and they pollute the
>> global namespace
>
> Indeed. Internal calls in viterbi_gen.c should be static with leading
> underscores stripped since that seems to be the preferred style. If
> you were referring to others, please let me know.

I don't mind the leading underscore honestly, especially in static, but if
Holger prefers it that way, it's fine with me too.

When I do a full review of the patch, I will look in more details into
each call.

>>  - Anything that is in the global namespace needs the osmo_ (or even
>> osmo_conv_ here) prefix
>
> Ok. I purposely kept the internal struct definitions out of the public
> interface, but prefixing and exposing is fine.

Not what I meant.  'structs' don't end up in the global namespace as long
as they're not used in global function signatures, so you can keep them as
is. I was thinking about functions.

Among all those not marked 'static' and don't have an osmo_ prefix, I didn't
yet do an exhaustive check to see what was an oversight and what's really
needed to be global.

>>  - Did you check (and test in make test) that it works for all codes
>> that matches
>>    "if ((code->N <= 4) && ((code->K == 5) || (code->K == 7)))" ?
>
> Almost all codes used by osmo-bts and a few others are covered by make
> check. Rates N=5 are skipped and TCH/HR passes with the posted
> puncturing fix for osmo-bts.

I was more thinking about random codes. Like if someone invents a random
code with N=4 and K=5, will it work ? When implementing it, you didnt see
any other limitations that those right ?

>>    Also, to avoid exporting those internal symbols, I'm wondering if
>> #including the viterbi_gen.c from viterbi.c instead of compiling
>> separately wouldn't be better. But that's just a random idea ...
>
> I see three approaches:
>
> 1. Single file for generic and architecture specific implementations
> with large ifdef blocks.
> 2. Separate files with exported symbols (current patch).
> 3. Separate files with restricted visibility with
> __attribute__((__visibility__("hidden"))) or other combinations with
> the gcc -fvisibility flag.

I think we'll need to play with 3. Still probably need a prefix (not
the full "osmo_conv_", but something less conflicty thatn gen_)

> I was just noting performance observations. For example, the SSE K=5
> case on my machine is memory bound and not limited by compute
> resources. Given the effort to squeeze the current version behind the
> existing API, I won't be making breaking changes anytime soon.

When I designed the API, I did do benchmark and just repeated them, and
there is virtually no difference between the two versions. ( < 1% )

But I guess that's because in the current version, the 'decoder' struct
stores virtually nothing that can be re-used between different bursts.
(and malloc itself is pretty fast nowadays)

In your case there is the treillis derived from the 'code' struct. And I
guess this is where the performance hits comes from. An option would be
to have an internal 'cache' of codes that have been used in the past
so those data would be re-used.

Basic flow of a decoder_init would be:

{
  decoder = calloc(sizeof(decoder));

  treillis = get_from_cache(code)
  if (!treillis)
    treillis = put_in_cache(gen_treillis(code));

  decoder->treillis = treillis;
}

Since the number of different codes used in a single software is not
big, this should be
fairly easy to implement.

Cheers,

    Sylvain