Dear Pablo, Ciaby,
from what I see we have the following OpenBSC/OS integration.
* OpenBSC wants to send some RSL data and calls abis_rsl_sendmsg ** abis_rsl_sendmsg entails the msgb into a queue and informs the lower layer driver (ipa in our case) ** ipa sets the when to ~= BSC_FD_WRITE * code returns * OpenBSC runs select for all fds/timers.. * libosmocore dispatches fd's ** libosmo-abis/ipaccess.c will try to write a single message ** libosmo-abis/ipaccess.c will set the BSC_FD_WRITE again if the queue is not empty * Linux/TCP will run nagle to combine these messages * OpenBSC runs select for all fds/timers... ....
The integration does work but appears to be a bit painful for busy and high latency links. E.g. OpenBSC starts timers when abis_rsl_sendmsg is invoked which might just be a little bit later.
I wonder if there is a better way? For reading we could read until -EWOULDBLOCK but then this might not be too fair for the other parts of the software. For writing we might end up in the situation where a write only partially succeeds and we need to remember how much of the msgb to write next... So we would need to do ioctl's to check how much space is left in the send buffer
Do you have ideas? comments? is it a non issue? is it something we can do better? Is there a TCP mode where a "write" either fully succeeds or fails with -ENOSPC or such?
holger
Hi Holger, Pablo, Ciaby,
On Sat, Apr 04, 2015 at 09:05:20AM +0200, Holger Hans Peter Freyther wrote:
- Linux/TCP will run nagle to combine these messages
we could easily disable nagle if that's what we want to avoid latencies [see below]
I wonder if there is a better way? For reading we could read until -EWOULDBLOCK but then this might not be too fair for the other parts of the software.
In terms of msgb reading, I think a fixed upper limit might make sense, i.e. read up to [configurable] N messages in one BSC_FD_READ callback.
For writing we might end up in the situation where a write only partially succeeds and we need to remember how much of the msgb to write next... So we would need to do ioctl's to check how much space is left in the send buffer
Even if you do the ioctl(SICOUTQ) / getsockopt(SO_SNDBUF), what would you do next? i.e. if you have determined not to write the msgb due to insufficient socket buffer space. If you then leave BSC_FD_WRITE set, then your next select will immediately return, and you have a potential to busy-loop until finally the buffer space is available. In case the remote end / network link is stalled this would be quite dangerous.
If you don't set BSC_FD_WRITE, then you wouldn't notice once the buffer has sufficient space again. If you did periodic checks, then the latency introduced by those checks are probably counter-productive.
So I think we will have to deal with partial writes from the IPA msgb sending code, just like the IPA msgb receiving code has to deal with partial reads.
Do you have ideas? comments? is it a non issue? is it something we can do better?
It depends probably on the use case. On expensive (e.g. sat) links you probably want nagle to combine multiple packets. In all other cases disabling nagle is probably a good idea to get started.
Is there a TCP mode where a "write" either fully succeeds or fails with -ENOSPC or such?
Not to my knowledge. The proper answer would probably to use DCCP or SCTP or any other protocol that can provide reliable delivery of packets (possibly with ordering guarantees) and not abuse a stream protocol like TCP. This would of course break IPA compatibility... but it might be an interesting experiment to compare different L4 protocol performance against TCP for the given use case.
Hi!
On Sun, Apr 05, 2015 at 03:46:16PM +0200, Harald Welte wrote:
Hi Holger, Pablo, Ciaby,
On Sat, Apr 04, 2015 at 09:05:20AM +0200, Holger Hans Peter Freyther wrote:
- Linux/TCP will run nagle to combine these messages
we could easily disable nagle if that's what we want to avoid latencies [see below]
I wonder if there is a better way? For reading we could read until -EWOULDBLOCK but then this might not be too fair for the other parts of the software.
In terms of msgb reading, I think a fixed upper limit might make sense, i.e. read up to [configurable] N messages in one BSC_FD_READ callback.
I think the same, batching seems like the way to go to me.
[...]
Is there a TCP mode where a "write" either fully succeeds or fails with -ENOSPC or such?
Not to my knowledge. The proper answer would probably to use DCCP or SCTP or any other protocol that can provide reliable delivery of packets (possibly with ordering guarantees) and not abuse a stream protocol like TCP. This would of course break IPA compatibility... but it might be an interesting experiment to compare different L4 protocol performance against TCP for the given use case.
Looking at the current state of art of the Linux stack, TCP is very well tuned and there are more people looking into it than those other protocols.