core/conv: further Viterbi decoder optimizations

15 Jun 2017


      Dear all,
I would like to know your opinions about some optimizations
of Viterbi decoder, which were already discussed previously.
First of all, I would like to share some benchmarking results.
I used the test cases ("osmo-conv-test"), written by Tom Tsou,
to ensure that SIMD optimization is integrated correctly. And,
shortly speaking, the results are almost equal. Older version
of decoder is a little bit faster, but I think it's because
one is being compiled with "-march=native".
Returning back to the subject, as we allocate and free some
memory on every osmo_conv_decode_acc() call, what may happen
very frequently and tear down performance on some hardware,
there was the following suggestions:
1) Use static memory allocation where it's possible.
2) Use talloc for dynamic allocation.
3) Internal caching:
Fri May 9 18:23:03 UTC 2014, Tom Tsou wrote:
...
Internal caching was in the original implementation, but
stripped from the submitted version mainly for simplicity
and avoiding the need for global variables, though we seem
to be having that discussion anyway ;-) The trellis values
can be cached based on pointer or hashed code. That works well
until threading is involved and cache access needs to be locked.
Those are features I need, but can probably be ignored in this
case.
Again, I think the API should be kept intact. Internal caching,
can be a topic for later discussion.
So, I am open for your ideas, opinions and remarks.
With best regards,
Vadim Yanitskiy.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

core/conv: further Viterbi decoder optimizations