Dear all,
I would like to know your opinions about some optimizations
of Viterbi decoder, which were already discussed previously.
First of all, I would like to share some benchmarking results.
I used the test cases ("osmo-conv-test"), written by Tom Tsou,
to ensure that SIMD optimization is integrated correctly. And,
shortly speaking, the results are almost equal. Older version
of decoder is a little bit faster, but I think it's because
one is being compiled with "-march=native".
Returning back to the subject, as we allocate and free some
memory on every osmo_conv_decode_acc() call, what may happen
very frequently and tear down performance on some hardware,
there was the following suggestions:
1) Use static memory allocation where it's possible.
2) Use talloc for dynamic allocation.
3) Internal caching:
Fri May 9 18:23:03 UTC 2014, Tom Tsou wrote:
Internal caching was in the original implementation,
but
stripped from the submitted version mainly for simplicity
and avoiding the need for global variables, though we seem
to be having that discussion anyway ;-) The trellis values
can be cached based on pointer or hashed code. That works well
until threading is involved and cache access needs to be locked.
Those are features I need, but can probably be ignored in this
case.
Again, I think the API should be kept intact. Internal caching,
can be a topic for later discussion.
So, I am open for your ideas, opinions and remarks.
With best regards,
Vadim Yanitskiy.