Hi Tom,
thanks for your input.
On Fri, Jun 23, 2017 at 11:19:53AM -0700, Tom Tsou wrote:
On Fri, Jun 23, 2017 at 2:19 AM, Harald Welte laforge@gnumonks.org wrote:
I agree that the L1<->L0 socket interface is quite unusual. The historical reason for a distinct mid-PHY split was to create a license shim layer between commercial licensed OpenBTS code and GPL based GNU Radio. I don't believe that there was ever a good technical reason in terms of code or structure for the separation.
ah, I didn't know (or remember) there was actually any gnuradio dependency in the OpenBTS transceiver.
Currently, the only reason that the socket layer needs to exist is for backwards compatibility with OpenBTS, and I'm not sure how much support there is for that option now.
I also don't think there is much value in this. I'm not aware anyone regularly testing that configuration, and I don't think there is value to support something that nobody is testing.
Perhaps there are some fronthaul / C-RAN application benefits, but I'm not aware of that being a popular use case for osmo-trx.
not yet, at least. Maybe once such systems become more deployed in regions where GSM is not phased out. But even in such cases, I would expect that actual I/Q baseband samples are required (e.g. in CPRI or OBSAI), and not unmodulated/demodulated symbols.
So the justification for the the existing TRX<->BTS interface for use with osmo-bts is not very strong.
Agreed. On the other hand, changing it would be quite some amount of work, so we might just as well keep it.
where is evidence of that?
- do we get underruns / overruns in reading/writing from/to the SDR?
 ** if this is not properly logged yet, we should make sure all such instances are properly logged, and that we have a counter that counts such events since the process start. Printing of related counters could be done at time of sending a signal to the process, or in periodic intervals (every 10s?) on stdout
We do not have overrun / underrun counters in osmo-trx, but I agree that this is a good idea.
I think the should definitely be added. It might also make sense to do some runtime evaluation of how long it typically takes us to process a burst in uplink and downlink, to get an idea about how much margin there is. I'm thinking of something like taking a timestamp when we read from the UDP socket to the time we go back to sleep (and the same in the inverse direction, from samples to UDP.
Packet loss between TRX-BTS is definitely a concern, but I think that is unlikely. The skew between OS time and device time is likely driven by scheduling and transient delays in BTS burst processing and/or late UDP arrival from TRX. In that case, a faster machine certainly helps.
The problem I have is that right now there is no clear indication of whta's happening. If a given machine is unable to provide sufficient CPU to operate, we should fail gracefully with some explicit message on that regard. Looking into under/overruns on the SDR side as well as keeping an eye on (and exporting/reporting) the "margin" in terms of how soon we finish our processing before the next burst period happens would improve the situation here. Thisis true for both the OsmoTRX doing modulation/demodulation, but probably even more so for osmo-bts-trx doing convolutional decoding, etc.
It might also be worthwhile to consider whether short, occasional drop-outs are acceptable, and whether we can recover from that in a more meaningful way - short of exiting the process and having it re-spawned by systemd.
Individual missing samples at some occasions shouldn't be that critical, I guess? Sure, they will increase BER when they happen, but beyond that?
And in terms of osmo-bts-trx missing received UDP burst data, it is basically FER.
In both cases, it might make sense to accept this in rare intervals + raise a related OML ALERT to the BSC.
I'm likely going to look a bit more into the osmo-bts-trx side soon (beyond the CLOCK_MONOTONIC patches under review), but for OsmoTRX I have no current plans to do any of the above improvements myself.
Regards, Harald