Profiling data of decoding EPDAN compressed bitmap

historical

> On 13 Jul 2016, at 11:43, Holger Freyther <holger at freyther.de> wrote:
> 

Hi all,

> perf report -i perf.data.tree 
> 
> 33.00%  TbfTest.TREE  libosmocore.so.7.0.0  [.] bitvec_set_bit_pos
> 20.46%  TbfTest.TREE  TbfTest.TREE          [.] bitvec_write_field(bitvec*, unsigned int&, unsigned long long, unsigned int)
> 14.30%  TbfTest.TREE  libosmocore.so.7.0.0  [.] bitvec_set_bit

so crazily neither the original of the C++ bitvec_write_field nor the C version end up inlining bitvec_set_bit/bitvec_set_bit_pos.

1st) the C++ bitvec_write_field with the reference should be a inline function that calls the C version and passes the parameter as pointer

2nd) We need to get set_bit_pos and set_bit inlined into bitvec_write_field. The wall clock time of my benchmark run goes from ~24s to ~13s if these routines are inlined.

>  9.94%  TbfTest.TREE  TbfTest.TREE          [.] search_runlen(node*, unsigned char const*, unsigned char, unsigned char*, unsigned short*)

>  5.27%  TbfTest.TREE  TbfTest.TREE          [.] Decoding::decompress_crbb(signed char, unsigned char, unsigned char const*, bitvec*)

> 57.51%  TbfTest  libosmocore.so.7.0.0  [.] osmo_t4_decode

osmo_t4_decode (got the runlen step and such inlined). What I think decompress_crbb is doing better is 1st not using bitvec as input but iterating over the bits itself and being more direct in applying the codeword in the result. What I am missing and have to check is if search_runlen can be implemented around the "table" we have and what the performance difference is. I have asked Max for help.

I will follow up after I have seen the performance difference.

holger