 
            Hi Holger, Pablo, Ciaby,
On Sat, Apr 04, 2015 at 09:05:20AM +0200, Holger Hans Peter Freyther wrote:
- Linux/TCP will run nagle to combine these messages
we could easily disable nagle if that's what we want to avoid latencies [see below]
I wonder if there is a better way? For reading we could read until -EWOULDBLOCK but then this might not be too fair for the other parts of the software.
In terms of msgb reading, I think a fixed upper limit might make sense, i.e. read up to [configurable] N messages in one BSC_FD_READ callback.
For writing we might end up in the situation where a write only partially succeeds and we need to remember how much of the msgb to write next... So we would need to do ioctl's to check how much space is left in the send buffer
Even if you do the ioctl(SICOUTQ) / getsockopt(SO_SNDBUF), what would you do next? i.e. if you have determined not to write the msgb due to insufficient socket buffer space. If you then leave BSC_FD_WRITE set, then your next select will immediately return, and you have a potential to busy-loop until finally the buffer space is available. In case the remote end / network link is stalled this would be quite dangerous.
If you don't set BSC_FD_WRITE, then you wouldn't notice once the buffer has sufficient space again. If you did periodic checks, then the latency introduced by those checks are probably counter-productive.
So I think we will have to deal with partial writes from the IPA msgb sending code, just like the IPA msgb receiving code has to deal with partial reads.
Do you have ideas? comments? is it a non issue? is it something we can do better?
It depends probably on the use case. On expensive (e.g. sat) links you probably want nagle to combine multiple packets. In all other cases disabling nagle is probably a good idea to get started.
Is there a TCP mode where a "write" either fully succeeds or fails with -ENOSPC or such?
Not to my knowledge. The proper answer would probably to use DCCP or SCTP or any other protocol that can provide reliable delivery of packets (possibly with ordering guarantees) and not abuse a stream protocol like TCP. This would of course break IPA compatibility... but it might be an interesting experiment to compare different L4 protocol performance against TCP for the given use case.