Hi,
I have submitted a pull request on Github[1] which improves the performance of the hackrf source. With this patch the conversion int8_t -> float is done in realtime instead of looking up values in the "_lut" lookup table. In this way the compiler is able to generate AVX/SSE code (-O3 -march=native) to perform the conversion.
In [2] you can find a benchmark to show the differences. Using -Ofast (or -O3) and -march=native I get ~2.8x using an Intel i7 (4th gen) and 3.1x on a Core i5 (6th gen).
Gqrx with the patched library can play a WFM radio without any interruption even with just 3 buffers (option="hackrf=0,buffers=3").
Best regards, Alain
[1] https://github.com/osmocom/gr-osmosdr/pull/14 [2] https://gist.github.com/carpikes/cad029c338605f70d9f687aeee447db4
I would say, just remember that gr-osmosdr is not intended only to x86, but also for ARM and other archs that does not even have SIMD. Also there are some x86 nowadays that does not have AVX code. The LUT solution is the "generic" one that works average on all platforms (calculate on the fly for any arch that does not have SIMD conversion will make a great performance loss).
Lucas
Em 09/07/2018 13:35, Alain Carlucci escreveu:
Hi,
I have submitted a pull request on Github[1] which improves the performance of the hackrf source. With this patch the conversion int8_t -> float is done in realtime instead of looking up values in the "_lut" lookup table. In this way the compiler is able to generate AVX/SSE code (-O3 -march=native) to perform the conversion.
In [2] you can find a benchmark to show the differences. Using -Ofast (or -O3) and -march=native I get ~2.8x using an Intel i7 (4th gen) and 3.1x on a Core i5 (6th gen).
Gqrx with the patched library can play a WFM radio without any interruption even with just 3 buffers (option="hackrf=0,buffers=3").
Best regards, Alain
[1] https://github.com/osmocom/gr-osmosdr/pull/14 [2] https://gist.github.com/carpikes/cad029c338605f70d9f687aeee447db4
Hi Lucase,
On Tue, Jul 10, 2018 at 05:05:26AM -0300, Lucas Teske wrote:
I would say, just remember that gr-osmosdr is not intended only to x86, but also for ARM and other archs that does not even have SIMD. Also there are some x86 nowadays that does not have AVX code. The LUT solution is the "generic" one that works average on all platforms (calculate on the fly for any arch that does not have SIMD conversion will make a great performance loss).
I agree with the above statement. Having compile-time flags is not a solution, as most people will just use a distribution package - and distributions will have to compile the most general and non-cpu-specific code.
The proper approach for this kind of problem is to have run-time CPU feature detection and then use the most optimal code for the given CPU. This is what we're e.g. dong in osmo-trx.
GNUradio has the whole VOLK infrastructure in place to provide generic code + optimized code per architecture + run time selection and even benchmarking to select the fastest version.
So I would suggest just using the volk_8i_s32f_convert_32 kernel and be done with it. If some current code is faster than the current volk implementations available, then a second patch can be submitted to volk to add this arch specific implementation and allow it to be used.
Cheers,
Sylvain