osmo-bts-trx: "No clock from osmo-trx"

List overview All Threads
Download

newer

older

aoip branch in openbsc.git

Re: osmo-bts-trx: "No clock from...

Neels Hofmeyr

14 Jun 2017 14 Jun '17

3:33 a.m.

Hi Tom and others,

in our testing setup, we have sporadic failures (~2 out of 10 times) with:

DOML <0001> bts.c:208 Shutting down BTS 0, Reason No clock from osmo-trx

What would be possible reasons for this failure, and how can we go about fixing it? Some more logging around it:

20170614032014399 DRSL <0000> rsl.c:2333 (bts=0,trx=0,ts=0,ss=2) Fwd RLL msg EST_IND from LAPDm to A-bis 20170614032018533 DL1C <0006> scheduler_trx.c:1451 PC clock skew: elapsed uS 4136730 20170614032018533 DOML <0001> bts.c:208 Shutting down BTS 0, Reason No clock from osmo-trx 20170614032018533 DL1C <0006> scheduler.c:240 Exit scheduler for trx=0 20170614032018533 DL1C <0006> scheduler.c:216 Init scheduler for trx=0 20170614032018533 DOML <0001> oml.c:280 OC=RADIO-CARRIER INST=(00,00,ff) AVAIL STATE OK -> Off line [...] Shutdown timer expired

(We're using an external 10MHz OCXO timing source)

It appears there's four seconds of nothing from osmo-trx?

Most curious is that the next run will be completely fine, until some time later we get this same failure.

We wait until osmo-trx logs

-- Transceiver active with 1 channel(s)

and then we "immediately" or up to a second later launch osmo-bts-trx. Would it help to give it more grace time??

Thanks!

Attachments:

signature.asc (application/pgp-signature — 819 bytes)

Show replies by date

Alexander Chemeris

15 Jun 15 Jun

12:47 a.m.

Neels,

A reason could be that osmo-trx is losing connection with the SDR. Are you running this on bare metal or a VM?

USB based SDRs like USRP B2x0 have hard time keeping the Tx/Rx alignment when there are any disturbances. So osmo-trx features a sophisticated algorithm to maintain this alignment for USB based devices. Thomas has spent tremendous effort tuning it to perform well, but may be there are edge cases which are not handled there yet. Let's wait for his comments.

(That's one the primary reasons we use Ethernet in UmTRX, btw - it's much more robust to issues like this)

Please excuse typos. Written with a touchscreen keyboard.

-- Regards, Alexander Chemeris CTO/Founder Fairwaves, Inc. https://fairwaves.co

On Jun 14, 2017 04:34, "Neels Hofmeyr" nhofmeyr@sysmocom.de wrote:

...

Hi Tom and others,

in our testing setup, we have sporadic failures (~2 out of 10 times) with:

DOML <0001> bts.c:208 Shutting down BTS 0, Reason No clock from osmo-trx

What would be possible reasons for this failure, and how can we go about fixing it? Some more logging around it:

20170614032014399 DRSL <0000> rsl.c:2333 (bts=0,trx=0,ts=0,ss=2) Fwd RLL msg EST_IND from LAPDm to A-bis 20170614032018533 DL1C <0006> scheduler_trx.c:1451 PC clock skew: elapsed uS 4136730 20170614032018533 DOML <0001> bts.c:208 Shutting down BTS 0, Reason No clock from osmo-trx 20170614032018533 DL1C <0006> scheduler.c:240 Exit scheduler for trx=0 20170614032018533 DL1C <0006> scheduler.c:216 Init scheduler for trx=0 20170614032018533 DOML <0001> oml.c:280 OC=RADIO-CARRIER INST=(00,00,ff) AVAIL STATE OK -> Off line [...] Shutdown timer expired

(We're using an external 10MHz OCXO timing source)

It appears there's four seconds of nothing from osmo-trx?

Most curious is that the next run will be completely fine, until some time later we get this same failure.

We wait until osmo-trx logs

-- Transceiver active with 1 channel(s)

and then we "immediately" or up to a second later launch osmo-bts-trx. Would it help to give it more grace time??

Thanks!

~N

Tom Tsou

1:03 a.m.

On Wed, Jun 14, 2017 at 3:47 PM, Alexander Chemeris alexander.chemeris@gmail.com wrote:

...

USB based SDRs like USRP B2x0 have hard time keeping the Tx/Rx alignment when there are any disturbances. So osmo-trx features a sophisticated algorithm to maintain this alignment for USB based devices. Thomas has spent tremendous effort tuning it to perform well, but may be there are edge cases which are not handled there yet. Let's wait for his comments.

The 'clock' issue is occurring between osmo-bts and osmo-trx and not between osmo-trx and the device. For the latter, irregular packet timing would appear as underruns, overflows, late packets, etc. - errors non-specific to GSM numerology.

There are timing considerations at startup because the device needs time to initialize. In the case of the B200 on first boot, the startup time is especially long because of the FPGA load. Running the uhd_usrp_probe utility will give an indication of the device initialization time. On top of that delay, osmo-trx could add another second for Tx/Rx synchronization purposes.

If clock skew is not occurring at startup, then process scheduling is probably related. If the flow of CLK IND stops entirely, as in the case when osmo-trx stops running, the message would be "No clock from osmo-trx". Clock skew could also occur because of variability in calling gettimeofday(), but I have not encountered that on any systems that I run.

-TT

Neels Hofmeyr

16 Jun 16 Jun

8:20 p.m.

On Wed, Jun 14, 2017 at 04:03:44PM -0700, Tom Tsou wrote:

...

There are timing considerations at startup because the device needs time to initialize. In the case of the B200 on first boot, the startup time is especially long because of the FPGA load.

We are specifically wating for the "Transceiver active" on stdout to wait until the FPGA load is done.

...

Running the uhd_usrp_probe utility will give an indication of the device initialization time. On top of that delay, osmo-trx could add another second for Tx/Rx synchronization purposes.

Ok, I'll try adding a little head room after receiving the "Transceiver active" message.

One point may be that it's not a very powerful machine: an APU with an 800MHz dual core.

...

If clock skew is not occurring at startup, then process scheduling is probably related. If the flow of CLK IND stops entirely, as in the case when osmo-trx stops running, the message would be "No clock from osmo-trx". Clock skew could also occur because of variability in calling gettimeofday(), but I have not encountered that on any systems that I run.

With NTP switched off I have no idea why the system clock could jump around. I also looked in the root crontab and so on, maybe something is still calling ntpdate on that system...

Could also make sense to wipe that OS to be sure. A lot has happened on there in the past...

Will keep you posted.

Harald Welte

11:03 p.m.

Hi Neels and Tom,

On Fri, Jun 16, 2017 at 08:20:35PM +0200, Neels Hofmeyr wrote:

...

One point may be that it's not a very powerful machine: an APU with an 800MHz dual core.

That actually means: An AMD Embedded G-Series T40E APU. We like to use passive cooled, low-end processors whenever possible.

...

...
If clock skew is not occurring at startup, then process scheduling is probably related. If the flow of CLK IND stops entirely, as in the case when osmo-trx stops running, the message would be "No clock from osmo-trx". Clock skew could also occur because of variability in calling gettimeofday(), but I have not encountered that on any systems that I run.

With NTP switched off I have no idea why the system clock could jump around. I also looked in the root crontab and so on, maybe something is still calling ntpdate on that system...

you can always run a tcpdump on the ntp port, if you're worried about that.

-- - Harald Welte laforge@gnumonks.org http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)

3097

Age (days ago)

3099

Last active (days ago)

openbsc@lists.osmocom.org

4 comments

4 participants

tags (0)

participants (4)

Alexander Chemeris
Harald Welte
Neels Hofmeyr
Tom Tsou