Hi Neels,
On Sun, Jun 25, 2017 at 03:22:06PM +0200, Neels Hofmeyr wrote:
On Fri, Jun 23, 2017 at 11:19:10AM +0200, Harald Welte
wrote:
On Fri, Jun 23, 2017 at 04:51:07AM +0200, Neels
Hofmeyr wrote:
We're still having massive stability problems
with osmo-bts-trx on the osmo-gsm-tester.
I'm sorry, but I have to ask for more specifics:
What exactly is a 'massive stability problem'? How does it manifest
To quantify: between 30 and 40% of all osmo-gsm-tester runs fail because of:
20170625121036320 DL1P <0007> l1sap.c:423 Invalid condition detected: Frame
difference is > 1!
that's the higher-layer code complaining that the frame number as
reported by the lower layer code (osmo-bts-trx) has not incremented by
+1. The normal expectation is tha that osmo-bts-* feeds every FN into
the common layer (via l1sap).
20170625121036320 DL1C <0006>
scheduler_trx.c:1527 GSM clock skew: old fn=2289942, new fn=2290004
That's 62 frames "missed", which is quite a lot (translating to 285ms).
I can of course test things in case anyone has more
ideas.
As indicated in the related ticket, I have submitted a patch to gerrit
that switches from gettimeofday() based osmo_timer_list to a monotonic
timerfd based interval timer for the FN clock inside osmo-bts-trx. It
would be good if you can see to this being tested. I am travelling more
than I'm at home or at the office (i.e. no access to related equipment),
nor do I have insight into how we could test a non-master patch in the
osmo-gsm-tester setup.
There are more odd parts in osmo-bts-trx that I could imagine having an
inpact on this, but we should take it step-by-step. One problem is for
example that the UDP sockets for the TRX/BTS communication are not set
to non-blocking-mode, so a blocking write could mess a lot with timing.
Tom mentioned the idea of running osmo-bts-trx on a
different machine from
osmo-trx -- that is certainly possible in a manual test, but I guess not really
an option for the regular tests.
It is an option. However, we need to understand what exactly is the
problem here. Rather than adding additional hardware to the
osmo-gsm-tester setup in a "trial and error" aka "stumbling in the
dark"
fashion, I would use the opposite approach: Set up osmo-bts-trx on the
same hardware (APU) next to your laptop on your personal desk, and then
try to see if and when the above problems can be reproduced, maybe by
putting some more CPU load on the APU, or I/O load, or whatever..
If osmo-bts-trx is too unstable for the "production" osmo-gsm-tester, I
would simply disable it until we have adressed related bugs.
Regards,
Harald
--
- Harald Welte <laforge(a)gnumonks.org>
http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
(ETSI EN 300 175-7 Ch. A6)