core/conv: further Viterbi decoder optimizations

historical

Hi Vadim,

On Thu, Jun 15, 2017 at 2:43 PM, Vadim Yanitskiy <axilirator at gmail.com> wrote:
> Returning back to the subject, as we allocate and free some
> memory on every osmo_conv_decode_acc() call, what may happen
> very frequently and tear down performance on some hardware,
> there was the following suggestions:

Max has a valid point of asking whether there is significant value in
further optimization when weighted against the cost of a non-trivial
API change. There are fielded deployments running the baseline code.
Will further optimization make a significant difference in those
cases? The answer is system and situation dependent; I can't answer
those questions in the general sense.

> 1) Use static memory allocation where it's possible.
> 2) Use talloc for dynamic allocation.
> 3) Internal caching:

Persistent allocation is the best solution. Talloc will minimally
affect performance if at all. Internal caching is somewhat of a hack
to hide static allocation behind an API that does not allow it.

> Fri May 9 18:23:03 UTC 2014, Tom Tsou wrote:
>> Internal caching was in the original implementation, but
>> stripped from the submitted version mainly for simplicity
>> and avoiding the need for global variables, though we seem
>> to be having that discussion anyway ;-) The trellis values
>> can be cached based on pointer or hashed code. That works well
>> until threading is involved and cache access needs to be locked.
>> Those are features I need, but can probably be ignored in this
>> case.

Three years ago, which feels like much longer, I was performing brute
force RNTI searches on LTE control channels. Convolutional decoding
was the limiting factor and maxed out multiple threads; every
optimization helped. That was a much different set of requirements.

  -TT