On 13 Jul 2016, at 11:43, Holger Freyther
<holger(a)freyther.de> wrote:
Hi all,
perf report -i perf.data.tree
33.00% TbfTest.TREE libosmocore.so.7.0.0 [.] bitvec_set_bit_pos
20.46% TbfTest.TREE TbfTest.TREE [.] bitvec_write_field(bitvec*, unsigned
int&, unsigned long long, unsigned int)
14.30% TbfTest.TREE libosmocore.so.7.0.0 [.] bitvec_set_bit
so crazily neither the original of the C++ bitvec_write_field nor the C version end up
inlining bitvec_set_bit/bitvec_set_bit_pos.
1st) the C++ bitvec_write_field with the reference should be a inline function that calls
the C version and passes the parameter as pointer
2nd) We need to get set_bit_pos and set_bit inlined into bitvec_write_field. The wall
clock time of my benchmark run goes from ~24s to ~13s if these routines are inlined.
9.94% TbfTest.TREE TbfTest.TREE [.]
search_runlen(node*, unsigned char const*, unsigned char, unsigned char*, unsigned short*)
5.27% TbfTest.TREE TbfTest.TREE [.]
Decoding::decompress_crbb(signed char, unsigned char, unsigned char const*, bitvec*)
57.51% TbfTest libosmocore.so.7.0.0 [.]
osmo_t4_decode
osmo_t4_decode (got the runlen step and such inlined). What I think decompress_crbb is
doing better is 1st not using bitvec as input but iterating over the bits itself and being
more direct in applying the codeword in the result. What I am missing and have to check is
if search_runlen can be implemented around the "table" we have and what the
performance difference is. I have asked Max for help.
I will follow up after I have seen the performance difference.
holger