osmo-bts-trx fails frequently on osmo-gsm-tester

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.

Harald Welte laforge at gnumonks.org
Sun Jun 25 15:20:49 UTC 2017


Hi Neels,

On Sun, Jun 25, 2017 at 03:22:06PM +0200, Neels Hofmeyr wrote:
> On Fri, Jun 23, 2017 at 11:19:10AM +0200, Harald Welte wrote:
> > On Fri, Jun 23, 2017 at 04:51:07AM +0200, Neels Hofmeyr wrote:
> > > We're still having massive stability problems with osmo-bts-trx on the osmo-gsm-tester.
> > 
> > I'm sorry, but I have to ask for more specifics:
> > What exactly is a 'massive stability problem'?  How does it manifest
> 
> To quantify: between 30 and 40% of all osmo-gsm-tester runs fail because of:
> 
> 20170625121036320 DL1P <0007> l1sap.c:423 Invalid condition detected: Frame difference is > 1!

that's the higher-layer code complaining that the frame number as
reported by the lower layer code (osmo-bts-trx) has not incremented by
+1.  The normal expectation is tha that osmo-bts-* feeds every FN into
the common layer (via l1sap).

> 20170625121036320 DL1C <0006> scheduler_trx.c:1527 GSM clock skew: old fn=2289942, new fn=2290004

That's 62 frames "missed", which is quite a lot (translating to 285ms).

> I can of course test things in case anyone has more ideas.

As indicated in the related ticket, I have submitted a patch to gerrit
that switches from gettimeofday() based osmo_timer_list to a monotonic
timerfd based interval timer for the FN clock inside osmo-bts-trx.  It
would be good if you can see to this being tested. I am travelling more
than I'm at home or at the office (i.e. no access to related equipment),
nor do I have insight into how we could test a non-master patch in the
osmo-gsm-tester setup.

There are more odd parts in osmo-bts-trx that I could imagine having an
inpact on this, but we should take it step-by-step.  One problem is for
example that the UDP sockets for the TRX/BTS communication are not set
to non-blocking-mode, so a blocking write could mess a lot with timing.

> Tom mentioned the idea of running osmo-bts-trx on a different machine from
> osmo-trx -- that is certainly possible in a manual test, but I guess not really
> an option for the regular tests. 

It is an option.  However, we need to understand what exactly is the
problem here.  Rather than adding additional hardware to the
osmo-gsm-tester setup in a "trial and error" aka "stumbling in the dark"
fashion, I would use the opposite approach:  Set up osmo-bts-trx on the
same hardware (APU) next to your laptop on your personal desk, and then
try to see if and when the above problems can be reproduced, maybe by
putting some more CPU load on the APU, or I/O load, or whatever..

If osmo-bts-trx is too unstable for the "production" osmo-gsm-tester, I
would simply disable it until we have adressed related bugs.

Regards,
	Harald

-- 
- Harald Welte <laforge at gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)



More information about the OpenBSC mailing list