Profiling data of decoding EPDAN compressed bitmap
holger at freyther.de
Wed Jul 13 09:53:03 UTC 2016
> On 13 Jul 2016, at 11:43, Holger Freyther <holger at freyther.de> wrote:
> perf report -i perf.data.tree
> 33.00% TbfTest.TREE libosmocore.so.7.0.0 [.] bitvec_set_bit_pos
> 20.46% TbfTest.TREE TbfTest.TREE [.] bitvec_write_field(bitvec*, unsigned int&, unsigned long long, unsigned int)
> 14.30% TbfTest.TREE libosmocore.so.7.0.0 [.] bitvec_set_bit
so crazily neither the original of the C++ bitvec_write_field nor the C version end up inlining bitvec_set_bit/bitvec_set_bit_pos.
1st) the C++ bitvec_write_field with the reference should be a inline function that calls the C version and passes the parameter as pointer
2nd) We need to get set_bit_pos and set_bit inlined into bitvec_write_field. The wall clock time of my benchmark run goes from ~24s to ~13s if these routines are inlined.
> 9.94% TbfTest.TREE TbfTest.TREE [.] search_runlen(node*, unsigned char const*, unsigned char, unsigned char*, unsigned short*)
> 5.27% TbfTest.TREE TbfTest.TREE [.] Decoding::decompress_crbb(signed char, unsigned char, unsigned char const*, bitvec*)
> 57.51% TbfTest libosmocore.so.7.0.0 [.] osmo_t4_decode
osmo_t4_decode (got the runlen step and such inlined). What I think decompress_crbb is doing better is 1st not using bitvec as input but iterating over the bits itself and being more direct in applying the codeword in the result. What I am missing and have to check is if search_runlen can be implemented around the "table" we have and what the performance difference is. I have asked Max for help.
I will follow up after I have seen the performance difference.
More information about the osmocom-net-gprs