osmo-bts-trx fails frequently on osmo-gsm-tester

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.

Alexander Chemeris alexander.chemeris at gmail.com
Mon Jun 26 10:25:29 UTC 2017


Hi Neels,

Are you running osmo-trx in a single TRX or dual-TRX configuration?

Do you have a CPU usage information from the system?

Could you try disabling all timeslots but the ones take needed? It won't
completely disable them with the current code, but IIRC it will somewhat
help with the CPU load which I think is the real issue here.

Please excuse typos. Written with a touchscreen keyboard.

--
Regards,
Alexander Chemeris
CTO/Founder Fairwaves, Inc.
https://fairwaves.co

On Jun 25, 2017 22:23, "Neels Hofmeyr" <nhofmeyr at sysmocom.de> wrote:

> On Fri, Jun 23, 2017 at 11:19:10AM +0200, Harald Welte wrote:
> > Hi Neels,
> >
> > On Fri, Jun 23, 2017 at 04:51:07AM +0200, Neels Hofmeyr wrote:
> > > We're still having massive stability problems with osmo-bts-trx on the
> osmo-gsm-tester.
> >
> > I'm sorry, but I have to ask for more specifics:
> > What exactly is a 'massive stability problem'?  How does it manifest
>
> To quantify: between 30 and 40% of all osmo-gsm-tester runs fail because
> of:
>
> 20170625121036320 DL1P <0007> l1sap.c:423 Invalid condition detected:
> Frame difference is > 1!
> 20170625121036320 DL1C <0006> scheduler_trx.c:1527 GSM clock skew: old
> fn=2289942, new fn=2290004
> 20170625121036320 DL1P <0007> l1sap.c:423 Invalid condition detected:
> Frame difference is > 1!
>
> Detailed logs in
> http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/
> job/osmo-gsm-tester_run/940/artifact/trial-940-run.tgz
> in /run.2017-06-25_12-05-43/sms:trx/mo_mt_sms.py/osmo-bts-trx/
> osmo-bts-trx/stderr
>
> Related osmo-trx output is in the same tgz at
> in /run.2017-06-25_12-05-43/sms:trx/mo_mt_sms.py/osmo-bts-trx/
> osmo-trx/stderr
>
> (Number crunching: if 30% of the test runs fail, where each run contains
> two
> osmo-bts-trx tests, it means that roughly 15% of osmo-bts-trx tests fail.)
>
> (The reason why I say "massive": it's really annoying to have this rate of
> sporadic failure. Instead of investigating upon first failure, we will only
> notice a regression when runs fail consistently, i.e. when there are no
> successful runs for, say, 5 or more runs. We don't take action
> immediately, yet
> we have to be careful to not be too late and loose jenkins run logs of the
> last
> successful run. The first failing runs in a series can well be just trx
> failures, so it needs more effort to find out which run introduced an
> actual
> regression.)
>
> > > I have run a tcpdump on the ntp port for the past days, and nothing is
> doing
> > > ntp besides the actual ntp service.
> >
> > And that service was presumably disabled (before your test described in
> > the next paragraph)?
>
> Yes, started the tcpdump filtering on the ntp port, saw ntp packets (to
> verify
> that it works), disabled the ntp service, saw that packets cease,
> restarted the
> tcpdump in a tmux, forgot about it for a couple of days, then came back to
> the
> tmux and saw that the tcpdump was completely empty. Then again I started
> the
> ntp service, immediately saw ntp packets in the tcpdump and the
> osmo-bts-trx
> test run failed promptly.
>
>
> Let me mention that I see myself as "the messenger", relaying the results
> I see
> on the tester setup; I will pursue a solution in a limited fashion, to not
> neglect other tasks.
>
> I can of course test things in case anyone has more ideas.
>
> Tom mentioned the idea of running osmo-bts-trx on a different machine from
> osmo-trx -- that is certainly possible in a manual test, but I guess not
> really
> an option for the regular tests. It would be a lot of manual supervision to
> perform a series of tests, like 20 or more, to find out the success rate;
> or a
> code and jenkins config change to run the osmo-bts-trx binary on a
> different
> build slave, not trivial. It would be much preferred to stay on a single
> host
> computer...
>
> ~N
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20170626/d001de41/attachment.htm>


More information about the OpenBSC mailing list