Hi all,
we are currently having lots of discussions on (non-)blocking I/O. I'd like to
put these thoughts out there for that discussion, because there seems to be a
misunderstanding in terms.
Blocking never goes away, it is just reduced to other orders of magnitude.
Types of "blocking":
- blocking or non-blocking pipes: will writing to a file or socket stop the
program until the pipe is ready for writing? (basic "OS level" I/O)
- synchronous or asynchronous event handling: will the program stop until a
remote side has responded, or can the program handle other events in the
meantime? (one job queue, one worker == osmo_select_main())
- sequential or parallelized event handling: can events be handled
concurrently, or just one after the other?
(one or more job queues, more than one worker)
- concurrent access of resources: a given resource is not thread-safe, hence
one thread needs to wait for the other to release a resource lock.
(This is always present, the aim is to hit a sweet spot of least locking.)
In osmo programs I have worked on, we do the first two, but not the other two.
Asynchronous event handling is the bare minimum for a server program to be
functional. Non-blocking pipes is a common addition that is easy to do.
From then on we enter the world of parallelization, and things get very complex
very quickly. It is possible to cause more blocking than before. It is possible
to significantly increase the load, instead of improving.
I am familiar with parallelized non-blocking event handling and I/O, from
realtime audio+video+control hacking. We do not use any of these techniques in
osmo programs I have worked on -- for good reasons, I thought.
The spectrum, from most blocking to most non-blocking:
- single-threaded, single queue with async defer;
- task queue with multiple worker threads;
- scheduling based on fairness or urgency;
- map/reduce across a cloud, in a functional language.
We're almost all the way to the blocking end of the parallelization spectrum.
So far I thought that this was a conscious choice. Async-but-blocking is low
complexity, with large benefits in maintainability and stability.
Example:
If we have pending, say, 10 incoming packets on three different links, we
handle each packet one by one when it is its turn. If one subscriber's incoming
measurement report triggers longish handover calculations, any events like an
MGCP ACK or SCCP CC for some other subscribers will have to wait in line, even
though they might take a thousand fold less time to complete.
OsmoBSC works well in that fashion, even for hundreds of cells and multiple
MSC: compared to audio+video+control, CNI signalling has huge tolerances on
timing. This is why 3GPP separates control-plane from user-plane.
It is important to balance all of these aspects!
---
It was mentioned somewhere that our VTY is both asynchronous and non-blocking.
I do not agree at all and would like to explain, as an example of the above.
Our vty server is NOT asynchronous. When a VTY request comes in, the vty
function must directly vty_out() the response. We cannot defer the VTY response
asynchronously like any other protocol can (see example below).
Our VTY structures, and the program-specific internal state that VTY
manipulates and queries, are not thread-safe. The VTY server cannot be
parallelized as it is now.
A contrived example:
Let's say we wanted to query nft counters from a VTY command:
* read VTY command from user,
* do some nft command asynchronously,
* and print back the result when nft is done.
Naively, we could store the struct vty * somewhere, and exit the vty handling
function. When nft is done some time later, just vty_out() the result to that
struct vty * that we still have from earlier. But there are problems:
If the user closed the telnet session in the meantime, this struct vty * is
stale and the program will crash. We need a cancel mechanism to avoid that.
Also when a VTY command function is done, we directly transmit the next VTY
prompt. vty-test scripts (`expect`) won't function properly when more response
data arrives after the prompt is received; human users may be confused.
So our VTY server is *both* Synchronous and Blocking. It is not trivial to make
it async (like all of our other protocols are) and non-blocking (which we have
nowhere in osmo-cni yet).
---
These are the kinds of mechanisms I care about in our discussions:
- "blocking" on what time scale?
- tradeoff with code complexity and maintainability.
- tradeoff with code stability and determinism.
- tradeoff with system performance load due to additional management and caching.
One does not simply put things in threads.
There are very non-trivial aspects that *always* come with it,
one of them is super good, most of them are pretty bad.
~N
--
- Neels Hofmeyr <nhofmeyr(a)sysmocom.de>
https://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Siemensstr. 26a
* 10551 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte