On Fri, Jun 23, 2017 at 2:19 AM, Harald Welte <laforge(a)gnumonks.org> wrote:
Yes, I think it's a sign of very poor design if we
cannot even sync the
local wall clock to a NTP or GPS reference. CLOCK_MONOTONIC_RAW should
be used on Linux for use cases like the one in osmo-bts-trx, having to
schedule bursts at specific time intervals.
In fact, I think the entire TRX<->BTS interface is not all that good an
idea to begin with.
I agree that the L1<->L0 socket interface is quite unusual. The
historical reason for a distinct mid-PHY split was to create a license
shim layer between commercial licensed OpenBTS code and GPL based GNU
Radio. I don't believe that there was ever a good technical reason in
terms of code or structure for the separation.
Currently, the only reason that the socket layer needs to exist is for
backwards compatibility with OpenBTS, and I'm not sure how much
support there is for that option now. Perhaps there are some fronthaul
/ C-RAN application benefits, but I'm not aware of that being a
popular use case for osmo-trx. So the justification for the the
existing TRX<->BTS interface for use with osmo-bts is not very strong.
Besides that,
I have no idea what could cause the clock skews, except maybe
that the CPU or the USB are not fast enough??
where is evidence of that?
* do we get underruns / overruns in reading/writing from/to the SDR?
** if this is not properly logged yet, we should make sure all such
instances are properly logged, and that we have a counter that counts
such events since the process start. Printing of related counters
could be done at time of sending a signal to the process, or in
periodic intervals (every 10s?) on stdout
We do not have overrun / underrun counters in osmo-trx, but I agree
that this is a good idea.
* do we see indications of packet loss between TRX and
BTS?
** each UDP on the per-TRX data interface contains frame number and
timeslot index in its header, so detecting missing frames is easy,
whether or not this is currently already implemented.
Packet loss between TRX-BTS is definitely a concern, but I think that
is unlikely. The skew between OS time and device time is likely driven
by scheduling and transient delays in BTS burst processing and/or late
UDP arrival from TRX. In that case, a faster machine certainly helps.
Another test could be running BTS and TRX on separate machines to
isolate process scheduling for each application.
-TT