This is merely a historical archive of years 2008-2021, before the migration to mailman3.
A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.
Neels Hofmeyr nhofmeyr at sysmocom.deOn Fri, Jun 23, 2017 at 11:19:10AM +0200, Harald Welte wrote: > Hi Neels, > > On Fri, Jun 23, 2017 at 04:51:07AM +0200, Neels Hofmeyr wrote: > > We're still having massive stability problems with osmo-bts-trx on the osmo-gsm-tester. > > I'm sorry, but I have to ask for more specifics: > What exactly is a 'massive stability problem'? How does it manifest To quantify: between 30 and 40% of all osmo-gsm-tester runs fail because of: 20170625121036320 DL1P <0007> l1sap.c:423 Invalid condition detected: Frame difference is > 1! 20170625121036320 DL1C <0006> scheduler_trx.c:1527 GSM clock skew: old fn=2289942, new fn=2290004 20170625121036320 DL1P <0007> l1sap.c:423 Invalid condition detected: Frame difference is > 1! Detailed logs in http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run/940/artifact/trial-940-run.tgz in /run.2017-06-25_12-05-43/sms:trx/mo_mt_sms.py/osmo-bts-trx/osmo-bts-trx/stderr Related osmo-trx output is in the same tgz at in /run.2017-06-25_12-05-43/sms:trx/mo_mt_sms.py/osmo-bts-trx/osmo-trx/stderr (Number crunching: if 30% of the test runs fail, where each run contains two osmo-bts-trx tests, it means that roughly 15% of osmo-bts-trx tests fail.) (The reason why I say "massive": it's really annoying to have this rate of sporadic failure. Instead of investigating upon first failure, we will only notice a regression when runs fail consistently, i.e. when there are no successful runs for, say, 5 or more runs. We don't take action immediately, yet we have to be careful to not be too late and loose jenkins run logs of the last successful run. The first failing runs in a series can well be just trx failures, so it needs more effort to find out which run introduced an actual regression.) > > I have run a tcpdump on the ntp port for the past days, and nothing is doing > > ntp besides the actual ntp service. > > And that service was presumably disabled (before your test described in > the next paragraph)? Yes, started the tcpdump filtering on the ntp port, saw ntp packets (to verify that it works), disabled the ntp service, saw that packets cease, restarted the tcpdump in a tmux, forgot about it for a couple of days, then came back to the tmux and saw that the tcpdump was completely empty. Then again I started the ntp service, immediately saw ntp packets in the tcpdump and the osmo-bts-trx test run failed promptly. Let me mention that I see myself as "the messenger", relaying the results I see on the tester setup; I will pursue a solution in a limited fashion, to not neglect other tasks. I can of course test things in case anyone has more ideas. Tom mentioned the idea of running osmo-bts-trx on a different machine from osmo-trx -- that is certainly possible in a manual test, but I guess not really an option for the regular tests. It would be a lot of manual supervision to perform a series of tests, like 20 or more, to find out the success rate; or a code and jenkins config change to run the osmo-bts-trx binary on a different build slave, not trivial. It would be much preferred to stay on a single host computer... ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20170625/14d984da/attachment.bin>