This is merely a historical archive of years 2008-2021, before the migration to mailman3.
A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.
Neels Hofmeyr nhofmeyr at sysmocom.deOn Sun, Jun 25, 2017 at 05:20:49PM +0200, Harald Welte wrote: > Hi Neels, > > On Sun, Jun 25, 2017 at 03:22:06PM +0200, Neels Hofmeyr wrote: > > On Fri, Jun 23, 2017 at 11:19:10AM +0200, Harald Welte wrote: > > > On Fri, Jun 23, 2017 at 04:51:07AM +0200, Neels Hofmeyr wrote: > > > > We're still having massive stability problems with osmo-bts-trx on the osmo-gsm-tester. > > > > > > I'm sorry, but I have to ask for more specifics: > > > What exactly is a 'massive stability problem'? How does it manifest > > > > To quantify: between 30 and 40% of all osmo-gsm-tester runs fail because of: > > > > 20170625121036320 DL1P <0007> l1sap.c:423 Invalid condition detected: Frame difference is > 1! > > that's the higher-layer code complaining that the frame number as > reported by the lower layer code (osmo-bts-trx) has not incremented by > +1. The normal expectation is tha that osmo-bts-* feeds every FN into > the common layer (via l1sap). > > > 20170625121036320 DL1C <0006> scheduler_trx.c:1527 GSM clock skew: old fn=2289942, new fn=2290004 > > That's 62 frames "missed", which is quite a lot (translating to 285ms). > > > I can of course test things in case anyone has more ideas. > > As indicated in the related ticket, I have submitted a patch to gerrit > that switches from gettimeofday() based osmo_timer_list to a monotonic > timerfd based interval timer for the FN clock inside osmo-bts-trx. It > would be good if you can see to this being tested. I have put your trx patches on a branch and built a binary from it, from http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run/976 the patches are being tested on the gsm-tester. branch: osmo-bts:neels/trx_test (it actually started from 973, which failed because 'settsc' config is removed by one of the patches but was still in the osmo-bts-trx config file) 976 has a different failure in *one* of two trx tests: 20170626171713445 DOML <0001> oml.c:333 OC=CHANNEL INST=(00,00,07) AVAIL STATE Dependency -> OK 20170626171713445 DOML <0001> oml.c:340 OC=CHANNEL INST=(00,00,07) OPER STATE Disabled -> Enabled 20170626171713445 DOML <0001> oml.c:301 OC=CHANNEL INST=(00,00,07) Tx STATE CHG REP 20170626171713513 DL1C <0006> scheduler_trx.c:1704 We were 47 FN faster than TRX, compensating 20170626171713514 DL1C <0006> scheduler_trx.c:1704 We were 47 FN faster than TRX, compensating 20170626171713515 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713517 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713517 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713518 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713518 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713518 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713518 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713519 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713519 DL1C <0006> scheduler_trx.c:1704 We were 44 FN faster than TRX, compensating 20170626171713727 DL1C <0006> scheduler_trx.c:1600 PC clock skew: elapsed_us=614659, error_us=610044 20170626171713727 DOML <0001> bts.c:208 Shutting down BTS 0, Reason No clock from osmo-trx [...] Shutdown timer expired The next run, 977, is successful. All following runs until now (982) are failing. See http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run/test_results_analyzer/ and click once on the (+) to expand one level of child nodes. So at first glance it appears that the patches make things worse. Starting from build #983, we are testing an osmo-bts-trx with *only* the CLOCK_MONOTONIC patch applied. Notably we have removed the settsc config option from the osmo-bts-trx config, but then again settsc seems to not have any effect in the code. > fashion, I would use the opposite approach: Set up osmo-bts-trx on the > same hardware (APU) next to your laptop on your personal desk, and then > try to see if and when the above problems can be reproduced, maybe by > putting some more CPU load on the APU, or I/O load, or whatever.. Yes, may be something Pau should take on? > If osmo-bts-trx is too unstable for the "production" osmo-gsm-tester, I > would simply disable it until we have adressed related bugs. We'll see about disabling soon. We *did* catch a regression with it recently... ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20170626/49e9a5f4/attachment.bin>