Hi Vadim,
On Fri, Jun 16, 2017 at 04:43:12AM +0700, Vadim Yanitskiy wrote:
I used the test cases ("osmo-conv-test"), written by Tom Tsou, to ensure that SIMD optimization is integrated correctly. And, shortly speaking, the results are almost equal. Older version of decoder is a little bit faster, but I think it's because one is being compiled with "-march=native".
You can compile the new code with the same optimization flags (specified in CFLAGS at configure time). We just simply cannot do -march=native as a default, as that is not safe. But you can use that to verify your assumption.
Returning back to the subject, as we allocate and free some memory on every osmo_conv_decode_acc() call, what may happen very frequently and tear down performance on some hardware, there was the following suggestions:
- Use static memory allocation where it's possible.
That's generally what we do a lot in osmocom code. If it's predictable how much memory a given process takes, we allocate that memory once (statically or dynamically), use that all the time and then release it once that process/procedure/instance is destroyed.
When I look at the current API, the decode_init()/decode_deinit() functions are great candidates to perform such allocation/release of memory. If applications use the osmo_conv_decode() convenience wrapper, it simply means they don't care about efficiency, but if they use the real init/scan/flush/deinit API, they would benefit from it.
Not sure if scan+flush should/could become one call for simplicity?
The viterbi.c code (which by the way chould be renamed to conv_acc.c or the like) would then have to be restructured to plug into init/fini for allocation+release.
- Use talloc for dynamic allocation.
I doubt this will improve your speed. Feel free to try, but this is more of a clean-up to ensure all the memory allocated within osmocom projects is tracked/accounted for, and you can get talloc usage reports, debug memory leaks, etc.