Dear all,
I would like to know your opinions about some optimizations of Viterbi decoder, which were already discussed previously.
First of all, I would like to share some benchmarking results. I used the test cases ("osmo-conv-test"), written by Tom Tsou, to ensure that SIMD optimization is integrated correctly. And, shortly speaking, the results are almost equal. Older version of decoder is a little bit faster, but I think it's because one is being compiled with "-march=native".
Returning back to the subject, as we allocate and free some memory on every osmo_conv_decode_acc() call, what may happen very frequently and tear down performance on some hardware, there was the following suggestions:
1) Use static memory allocation where it's possible. 2) Use talloc for dynamic allocation. 3) Internal caching:
Fri May 9 18:23:03 UTC 2014, Tom Tsou wrote:
Internal caching was in the original implementation, but stripped from the submitted version mainly for simplicity and avoiding the need for global variables, though we seem to be having that discussion anyway ;-) The trellis values can be cached based on pointer or hashed code. That works well until threading is involved and cache access needs to be locked. Those are features I need, but can probably be ignored in this case.
Again, I think the API should be kept intact. Internal caching, can be a topic for later discussion.
So, I am open for your ideas, opinions and remarks.
With best regards, Vadim Yanitskiy.