Hi all,
I've been doing some profiling on osmo-bts recently (on sysmobts hardware, which has only a relatively slow ARM926 CPU core), and the two things that show up most are:
* msgb_alloc() -> talloc_zero() -> malloc this can be alleviated somewhat by using talloc pools. For some reason the pools don't remove all of the malloc() calls.
* vfprintf() and friends, from logp() statements. The sad part is that calls like gsm_lchan_name() are of course executed beefore the call into logp(), at which point the vfprintf/sprintf/... for arguments has already been executed, and only the last/final one hasn't happened yet.
Here we can do two things: Calls like gsm_lchan_name() don't need to happen all the time, as the lchan name is static and can be generated once at the time gsm_lchan is created. I implemented that in osmo-bts (and openbsc, as it's from gsm_data_shared).
The second idea would be to expand the LOGP() macro a bit in a way to ensure the the checking whether the log line is enabled _before_ the arguments (and thus associated function calls) are evaluated. Any ideas on that?
After a brief look at osmo-pcu profiling, it looks like in the attached picture. We cannot do much about the __copy_to_user_std, do_select and core_sys_select, as those are kernel side.
However, there again we see vfprintf and friends, mostly via gprs_rlcmac_tbf::name() - and of course the msgb_alloc() and msgb_free() going through talloc and finally malloc.
So the same strategies as above could (and probably should) be applied to osmo-pcu.
Regards, Harald
On 07 Dec 2015, at 09:12, Harald Welte laforge@gnumonks.org wrote:
Hi all,
- vfprintf() and friends, from logp() statements. The sad part is that
calls like gsm_lchan_name() are of course executed beefore the call into logp(), at which point the vfprintf/sprintf/... for arguments has already been executed, and only the last/final one hasn't happened yet.
Jacob has proposed (and I think maybe included) a patch to check if logging is enabled for any target before making the call (and passing the arguments)
After a brief look at osmo-pcu profiling, it looks like in the attached picture. We cannot do much about the __copy_to_user_std, do_select and core_sys_select, as those are kernel side.
epoll. We avoid transferring FD set to the kernel all the time (and arming them in the driver). But I assume the bigger cost of to user is the indications we copy from the DSP to userspace.
Hi,
On 07.12.2015 09:25, Holger Freyther wrote:
On 07 Dec 2015, at 09:12, Harald Welte laforge@gnumonks.org wrote:
- vfprintf() and friends, from logp() statements. The sad part is that
calls like gsm_lchan_name() are of course executed beefore the call into logp(), at which point the vfprintf/sprintf/... for arguments has already been executed, and only the last/final one hasn't happened yet.
Jacob has proposed (and I think maybe included) a patch to check if logging is enabled for any target before making the call (and passing the arguments)
This is contained in the "log: Add log_check_level function" series of patches posted on Nov 17. In first tests it hast reduced the process load by 10-20% (relative to PCU load).
Jacob
07.12.2015 09:25, Holger Freyther пишет:
epoll. We avoid transferring FD set to the kernel all the time (and arming them in the driver). But I assume the bigger cost of to user is the indications we copy from the DSP to userspace.
There's interestingly looking wrapper around epoll:
http://0pointer.net/blog/introducing-sd-event.html
Not sure if it would cover dsp2userspace transfers though.
cheers, Max.
On Mon, Dec 07, 2015 at 04:38:38PM +0100, Suraev wrote:
There's interestingly looking wrapper around epoll:
I don't think we'd need a wrapper, a we don't need the timerfd and signalfd integration, child process state change events, unix process events, etc.
So the wrapper only adds a layer of abstraction and lots of features we don't need. The fundamental issue here is that both select() are quite epxensive (due to the large FD array copying/checking/zeroing), and the fact that obviously all data has to be always copied between kernel memory and userspace memory.
Not sure if it would cover dsp2userspace transfers though.
From the userspace point of view, is just a character device that you
open and read/write on, nothing fancy.