Dear all,

I would like to know your opinions about some optimizations
of Viterbi decoder, which were already discussed previously.

First of all, I would like to share some benchmarking results.
I used the test cases ("osmo-conv-test"), written by Tom Tsou,
to ensure that SIMD optimization is integrated correctly. And,
shortly speaking, the results are almost equal. Older version
of decoder is a little bit faster, but I think it's because
one is being compiled with "-march=native".

Returning back to the subject, as we allocate and free some
memory on every osmo_conv_decode_acc() call, what may happen
very frequently and tear down performance on some hardware,
there was the following suggestions:

1) Use static memory allocation where it's possible.
2) Use talloc for dynamic allocation.
3) Internal caching:

Fri May 9 18:23:03 UTC 2014, Tom Tsou wrote:
> Internal caching was in the original implementation, but
> stripped from the submitted version mainly for simplicity
> and avoiding the need for global variables, though we seem
> to be having that discussion anyway ;-) The trellis values
> can be cached based on pointer or hashed code. That works well
> until threading is involved and cache access needs to be locked.
> Those are features I need, but can probably be ignored in this
> case.
>
> Again, I think the API should be kept intact. Internal caching,
> can be a topic for later discussion.

So, I am open for your ideas, opinions and remarks.

With best regards,
Vadim Yanitskiy.