Re: DSP optimization

13 Jul 2013

Hi,

...
  The optimized portions, PMU and BMU, can stand
independently, and my
 preferred approach would be to integrate those pieces with the
 existing trellis parts intact. That assumes that the accumulated path
 metrics, branch metrics, and path decisions are each represented by
 contiguously allocated 16-bit values. 
Well, the internal state is "private" to the implementation so if
there is need for change to better match the SSE and make it easier to
match SSE/non-SSE code, that shouldn't be an issue.

...
  The downside of such an approach
 is that it would probably take far more time to integrate than the
 time to actually create the optimizations. 
Well, that's often the case. Getting some proof of concept is "easy".
The cleanup and proper integration is definitely a significant part of
the work.

...
   The treillis
scan is very similar in all cases. The only thing to
 support is the puncturing during the scan (I wanted to avoid having
 the de-puncture first). 
 What is the benefit of this? I explicitly separated puncturing because
 puncturing is inherently not part of the Viterbi algorithm. 
It was done inside the "common code" because all the protocols that
were targeted but the initial code used it and it allowed better code
re-use.
Now some actually do it externally themselves (TETRA) and just set the
'puncture' to NULL.

Now, it was done in-line vs two-steps to avoid an allocation and data
movement. IIRC I tested both cases and the perf difference weren't
significant.
But I'm not strictly attached to it. If we have to change it into two
steps were it's depunctured  in a first loop then processed, it would
be acceptable.

...
   It also
currently supports progressive decoding (i.e. you can start
 decoding a message before you have all of it if you want, you just
 feed it chunk by chunk and it can resume where it left off). 
 If not enough bits exist to decode a message, why attempt to partially
 decode it instead of waiting for the other bits? I could add support
 for this; it's just not a use case that I ever encountered. 
It was an effort to support "continuous" streams rather than single messages.
Part of this capability also useful to support tail-biting codes where
you need to basically feeding the data twice without resetting
everything.

But all that needed is to store offsets and support not resetting path
costs, so nothing touching the 'core' of the loop so it shouldn't be
an issue.

Cheers,

     Sylvain

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: DSP optimization