Hi,
The optimized portions, PMU and BMU, can stand
independently, and my
preferred approach would be to integrate those pieces with the
existing trellis parts intact. That assumes that the accumulated path
metrics, branch metrics, and path decisions are each represented by
contiguously allocated 16-bit values.
Well, the internal state is "private" to the implementation so if
there is need for change to better match the SSE and make it easier to
match SSE/non-SSE code, that shouldn't be an issue.
The downside of such an approach
is that it would probably take far more time to integrate than the
time to actually create the optimizations.
Well, that's often the case. Getting some proof of concept is "easy".
The cleanup and proper integration is definitely a significant part of
the work.
The treillis
scan is very similar in all cases. The only thing to
support is the puncturing during the scan (I wanted to avoid having
the de-puncture first).
What is the benefit of this? I explicitly separated puncturing because
puncturing is inherently not part of the Viterbi algorithm.
It was done inside the "common code" because all the protocols that
were targeted but the initial code used it and it allowed better code
re-use.
Now some actually do it externally themselves (TETRA) and just set the
'puncture' to NULL.
Now, it was done in-line vs two-steps to avoid an allocation and data
movement. IIRC I tested both cases and the perf difference weren't
significant.
But I'm not strictly attached to it. If we have to change it into two
steps were it's depunctured in a first loop then processed, it would
be acceptable.
It also
currently supports progressive decoding (i.e. you can start
decoding a message before you have all of it if you want, you just
feed it chunk by chunk and it can resume where it left off).
If not enough bits exist to decode a message, why attempt to partially
decode it instead of waiting for the other bits? I could add support
for this; it's just not a use case that I ever encountered.
It was an effort to support "continuous" streams rather than single messages.
Part of this capability also useful to support tail-biting codes where
you need to basically feeding the data twice without resetting
everything.
But all that needed is to store offsets and support not resetting path
costs, so nothing touching the 'core' of the loop so it shouldn't be
an issue.
Cheers,
Sylvain