DSP optimization

historical

Hi,

> The optimized portions, PMU and BMU, can stand independently, and my
> preferred approach would be to integrate those pieces with the
> existing trellis parts intact. That assumes that the accumulated path
> metrics, branch metrics, and path decisions are each represented by
> contiguously allocated 16-bit values.

Well, the internal state is "private" to the implementation so if
there is need for change to better match the SSE and make it easier to
match SSE/non-SSE code, that shouldn't be an issue.

> The downside of such an approach
> is that it would probably take far more time to integrate than the
> time to actually create the optimizations.

Well, that's often the case. Getting some proof of concept is "easy".
The cleanup and proper integration is definitely a significant part of
the work.

>> The treillis scan is very similar in all cases. The only thing to
>> support is the puncturing during the scan (I wanted to avoid having
>> the de-puncture first).
>
> What is the benefit of this? I explicitly separated puncturing because
> puncturing is inherently not part of the Viterbi algorithm.

It was done inside the "common code" because all the protocols that
were targeted but the initial code used it and it allowed better code
re-use.
Now some actually do it externally themselves (TETRA) and just set the
'puncture' to NULL.

Now, it was done in-line vs two-steps to avoid an allocation and data
movement. IIRC I tested both cases and the perf difference weren't
significant.
But I'm not strictly attached to it. If we have to change it into two
steps were it's depunctured  in a first loop then processed, it would
be acceptable.

>> It also currently supports progressive decoding (i.e. you can start
>> decoding a message before you have all of it if you want, you just
>> feed it chunk by chunk and it can resume where it left off).
>
> If not enough bits exist to decode a message, why attempt to partially
> decode it instead of waiting for the other bits? I could add support
> for this; it's just not a use case that I ever encountered.

It was an effort to support "continuous" streams rather than single messages.
Part of this capability also useful to support tail-biting codes where
you need to basically feeding the data twice without resetting
everything.

But all that needed is to store offsets and support not resetting path
costs, so nothing touching the 'core' of the loop so it shouldn't be
an issue.

Cheers,

     Sylvain