Dear list,
I did some more experiments.
On Sat, Sep 26, 2020 at 07:50:14PM +0200, Harald Welte wrote:
== launching the program in gnome-terminal
@usecs: [2, 4) 32387 |@@@ | [4, 8) 457668 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8, 16) 44408 |@@@@@ | [16, 32) 466 | | [32, 64) 63 | | [64, 128) 9 | | [128, 256) 3 | | [256, 512) 53 | | [512, 1K) 0 | | [1K, 2K) 0 | | [2K, 4K) 0 | | [4K, 8K) 1 | | [8K, 16K) 0 | | [16K, 32K) 0 | | [32K, 64K) 39 | | [64K, 128K) 125 | | [128K, 256K) 0 | | [256K, 512K) 0 | | [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 0 | | [8M, 16M) 0 | | [16M, 32M) 0 | | [32M, 64M) 0 | | [64M, 128M) 0 | | [128M, 256M) 0 | | [256M, 512M) 0 | | [512M, 1G) 0 | | [1G, 2G) 1 | |
So we can see that gnome-terminal tends to be faster on most stderr-writes, but at the expens of introducing a few long-delays in the 32..128ms bucket.
The other idea would be to do our own buffering using osmo_wqueue() on the raw, unbuffered stderr-fd, which would then be in non-blocking mode and handled vis osmocom select. We'd have to add some kind of limit for that buffer in order to avoid OOM situations.
I just implemented that. Results are promising.
This is with gnome-terminal where above we had the significant number of 64..128ms blocking delays. Now:
@usecs: [1] 404920 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2, 4) 308383 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4, 8) 3191 | | [8, 16) 1434 | | [16, 32) 1438 | | [32, 64) 65 | | [64, 128) 22 | | [128, 256) 9 | | [256, 512) 9 | |
so no more than 512us for any log output line (which is within the process, allocating a msgb + enqueueing it to the write_queue). We have to be a bit careful here, in order to prevent re-entrancy problems. For example, write_queue currently wants to log if the queue is full, which obviosly causes infinite recursion.
We make logging more expensive this way, as we add a msgb allocation + memcpy to each log statement. At the same time, we remove the potential of any write-blocking, which is probably worth it.
There are some other side-effects, though: stderr-logging no longer works for small programs tat don't ever call osmo_select_main(). So maybe we need to support both the old blocking as well as the new non-blocking method?
But how do we make sure everyone gets the best default behavior? Start with the blocking/buffered I/O and switch to the write_queue backend on the first call to osmo_select_main()?
What do you think?