This is merely a historical archive of years 2008-2021, before the migration to mailman3.
A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/osmocom-net-gprs@lists.osmocom.org/.
Holger Freyther holger at freyther.de> On 13 Jul 2016, at 11:43, Holger Freyther <holger at freyther.de> wrote: > Hi all, > perf report -i perf.data.tree > > 33.00% TbfTest.TREE libosmocore.so.7.0.0 [.] bitvec_set_bit_pos > 20.46% TbfTest.TREE TbfTest.TREE [.] bitvec_write_field(bitvec*, unsigned int&, unsigned long long, unsigned int) > 14.30% TbfTest.TREE libosmocore.so.7.0.0 [.] bitvec_set_bit so crazily neither the original of the C++ bitvec_write_field nor the C version end up inlining bitvec_set_bit/bitvec_set_bit_pos. 1st) the C++ bitvec_write_field with the reference should be a inline function that calls the C version and passes the parameter as pointer 2nd) We need to get set_bit_pos and set_bit inlined into bitvec_write_field. The wall clock time of my benchmark run goes from ~24s to ~13s if these routines are inlined. > 9.94% TbfTest.TREE TbfTest.TREE [.] search_runlen(node*, unsigned char const*, unsigned char, unsigned char*, unsigned short*) > 5.27% TbfTest.TREE TbfTest.TREE [.] Decoding::decompress_crbb(signed char, unsigned char, unsigned char const*, bitvec*) > 57.51% TbfTest libosmocore.so.7.0.0 [.] osmo_t4_decode osmo_t4_decode (got the runlen step and such inlined). What I think decompress_crbb is doing better is 1st not using bitvec as input but iterating over the bits itself and being more direct in applying the codeword in the result. What I am missing and have to check is if search_runlen can be implemented around the "table" we have and what the performance difference is. I have asked Max for help. I will follow up after I have seen the performance difference. holger