osmo-bts-trx fails frequently on osmo-gsm-tester

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.

Harald Welte laforge at gnumonks.org
Fri Jun 23 09:19:10 UTC 2017


Hi Neels,

On Fri, Jun 23, 2017 at 04:51:07AM +0200, Neels Hofmeyr wrote:
> We're still having massive stability problems with osmo-bts-trx on the osmo-gsm-tester.

I'm sorry, but I have to ask for more specifics:
What exactly is a 'massive stability problem'?  How does it manifest
itself in detail at the lowest possible interface (i.e. log output of
osmo-trx, osmo-bts-trx, ...)?

> I have run a tcpdump on the ntp port for the past days, and nothing is doing
> ntp besides the actual ntp service.

And that service was presumably disabled (before your test described in
the next paragraph)?

> Today I started ntp while an osmo-bts-trx run was active and what do you know,
> the osmo-bts-trx process exits immediately. I think this is bad, osmo-bts-trx
> shouldn't use wall clock time for precise timing needs.

Yes, I think it's a sign of very poor design if we cannot even sync the
local wall clock to a NTP or GPS reference.  CLOCK_MONOTONIC_RAW should
be used on Linux for use cases like the one in osmo-bts-trx, having to
schedule bursts at specific time intervals.

In fact, I think the entire TRX<->BTS interface is not all that good an
idea to begin with.

In OsmoTRX, we have the ADC/DAC sample clock that is driving
transmission of samples.  Normally, the entire PHY layer runs
synchronous to that, and it would drive the "clock" of L2 by means of
PH-RTS.ind, so the L2 knows whenever it wants to transmit something.

However, the OsmoTRX <-> osmo-bts-trx interface is not at the PHY<->L2
boundary, but it is at an inner boundary between the radio modem
(OsmoTRX) and the L1 (in osmo-bts-trx).  And those are two separate
processes, without any way to synchronously trigger some action based on
the ADC/DAC master sample clock.  As a result, osmo-bts-trx needs to
keep its own clock, based on whatever clock source available in the
operating system / hardware, and make sure it sends bursts at the right
speed to OsmoTRX.  So OsmoTRX and osmo-bts-trx run actually
asynchronous, at something that is specified/designed to be a
synchronous interface in the GSM architecture.

But then, I guess we don't have the luxury of changing all of this, so
migrating to something like CLOCK_MONOTONIC_RAW or CLOCK_MONOTONIC.
Instead of osmocom timers, using timer_create(CLOCK_MONOTONIC, ..))
sounds like a good idea, or even timerfd_create() which would integrate
with our select() loop.  Problem is only that those are about periodic
timers.  While we do want periodicity (once every burst period of
577us), the local clock of the Linux system is >= 1000 times less
accurate than the clock of the GSM transmitting hardware, i.e. we need
to adjust the expiration of our timer based on clock information
provided by osmo-trx.

> Besides that, I have no idea what could cause the clock skews, except maybe
> that the CPU or the USB are not fast enough??

where is evidence of that?
* do we get underruns / overruns in reading/writing from/to the SDR?
** if this is not properly logged yet, we should make sure all such
   instances are properly logged, and that we have a counter that counts
   such events since the process start.  Printing of related counters
   could be done at time of sending a signal to the process, or in
   periodic intervals (every 10s?) on stdout
* do we see indications of packet loss between TRX and BTS?
** each UDP on the per-TRX data interface contains frame number and
   timeslot index in its header, so detecting missing frames is easy,
   whether or not this is currently already implemented.

Regards,
	Harald
-- 
- Harald Welte <laforge at gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)



More information about the OpenBSC mailing list