Hello Gullik,
If possible please set up some kind of monitoring of key system parameters (load, memory
etc.) so we can determine if there is any irregularities happening before the issue.
Also it would help us to see if there is any regularity in the time intervals the issue
happens.
I for my servers use Zabbix for this, but it would be way too complicated I think for this
purpose.
So any simple monitoring system that retains historic data (for the last, let’s say 24
hours) would be good.
Just my 2cents...
Domi
2018. dec. 22. dátummal, 12:25 időpontban Gullik Webjorn
<gullik.webjorn(a)corevalue.se> írta:
Hello Harald, long time.....
This is syslog for the time...
Dec 22 03:15:01 localhost CRON[25895]: (root) CMD (command -v debian-sa1 > /dev/null
&& debian-sa1 1 1)
Dec 22 03:16:22 localhost osmo-bts-trx[18200]: #033[0;m<0007> l1sap.c:510
1338262/1009/16/22/46 Invalid condition detected: Frame difference is 1338262-1338205=57
> 1!
Dec 22 03:17:01 localhost CRON[25911]: (root) CMD ( cd / && run-parts --report
/etc/cron.hourly)
here trx uhd has logged lots of lost packets
03:36:19, so 2 seconds before next syslog entry, all within the same second, filling the
terminal buffer, so I cannot see loggings immediately before
Dec 22 03:19:38 localhost osmo-bts-trx[18200]: #033[0;m#033[1;36m<0001> bts.c:250
Shutting down BTS 0, Reason No clock from osmo-trx
Dec 22 03:19:41 localhost osmo-bts-trx[18200]: #033[0;mShutdown timer expired
Dec 22 03:19:41 localhost osmo-bts-trx[18200]: ((*))
Dec 22 03:19:41 localhost osmo-bts-trx[18200]: |
Dec 22 03:19:41 localhost osmo-bts-trx[18200]: / \ OsmoBTS
Dec 22 03:19:41 localhost systemd[1]: osmo-bts-trx.service: Main process exited,
code=exited, status=42/n/a
Dec 22 03:19:41 localhost systemd[1]: osmo-bts-trx.service: Unit entered failed state.
Dec 22 03:19:41 localhost systemd[1]: osmo-bts-trx.service: Failed with result
'exit-code'.
Dec 22 03:19:43 localhost systemd[1]: osmo-bts-trx.service: Service hold-off time over,
scheduling restart.
Dec 22 03:19:43 localhost systemd[1]: Stopped Osmocom osmo-bts for osmo-trx.
Dec 22 03:19:43 localhost systemd[1]: Started Osmocom osmo-bts for osmo-trx.
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: <0017> control_if.c:911 CTRL at
127.0.0.1 4238
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0010>
telnet_interface.c:104 Available via telnet 127.0.0.1 4241
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipaccess.c:882
enabling ipaccess BTS mode, OML connecting to 127.0.0.1:3002
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b>
trx_if.c:754 Open transceiver for phy0.0
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipa.c:128
127.0.0.1:3002 connection done
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipaccess.c:705
received ID get from 1801/0/0
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:229
O&M Get Attributes [0], Manufacturer Dependent State is unsupported by TRX.
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[0] (150 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[1] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[2] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[3] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[4] (520 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[5] (165 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[6] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:1055
ADM state already was Unlocked
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipa.c:128
127.0.0.1:3003 connection done
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipaccess.c:705
received ID get from 1801/0/0
Dec 22 03:19:45 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 03:19:47 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 03:19:49 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
I just had the same stopping again, here is a snippet from that trx terminal.........
Sat Dec 22 10:21:16 2018 DMAIN <0000> Transceiver.cpp:1016 [tid=3007554640] reduced
latency: 3:9
Sat Dec 22 10:21:17 2018 DMAIN <0000> Transceiver.cpp:1039 [tid=3007521872]
ClockInterface: sending IND CLOCK 292365
Sat Dec 22 10:21:17 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new
latency: 3:10 (underrun 3:292376 vs 1:292229)
Sat Dec 22 10:21:18 2018 DMAIN <0000> Transceiver.cpp:1039 [tid=3007521872]
ClockInterface: sending IND CLOCK 292581
Sat Dec 22 10:21:18 2018 DMAIN <0000> Transceiver.cpp:1016 [tid=3007554640] reduced
latency: 2:10
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:1039 [tid=3007521872]
ClockInterface: sending IND CLOCK 292797
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:1016 [tid=3007554640] reduced
latency: 1:10
Sat Dec 22 10:21:19 2018 DDEV <0002> UHDDevice.cpp:861 [tid=3007521872] No packet
received, implementation timed-out
Sat Dec 22 10:21:19 2018 DDEV <0002> UHDDevice.cpp:865 [tid=3007521872] UHD:
Receive timed out
Sat Dec 22 10:21:19 2018 DMAIN <0000> radioInterfaceResamp.cpp:178 [tid=3007521872]
Receive error 0
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:908 [tid=3007521872] radio
Interface receive failed, requesting stop.
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new
latency: 1:11 (underrun 0:292830 vs 0:292819)
Sat Dec 22 10:21:19 2018 DDEV <0002> UHDDevice.cpp:861 [tid=3007521872] An internal
receive buffer has filled at 3113.55 sec.
Sat Dec 22 10:21:20 2018 DDEV <0002> UHDDevice.cpp:1485 [tid=3007521872] Skipping
buffer data: timestamp=1245558064 time_end=1245418309
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new
latency: 1:12 (underrun 3:292843 vs 2:292840)
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new
latency: 1:13 (underrun 3:292854 vs 4:292853)
Sat Dec 22 10:21:20 2018 DMAIN <0000> osmo-trx.cpp:435 [tid=3025046000] Shutting
down transceiver...
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:307 [tid=3025046000] Stopping
the transceiver
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:320 [tid=3025046000] Stopping
the device
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:333 [tid=3025046000]
Transceiver stopped
This is syslog from right before this, I cut a few lines before the first osmo-bts-trx
logging at 10:00:26
It complains a few times within that few seconds....then at 10:11:08 , 11, then stopping
at 10:21:21
Dec 22 09:57:12 localhost dbus[779]: [system] Successfully activated service
'org.freedesktop.nm_dispatcher'
Dec 22 09:57:12 localhost systemd[1]: Started Network Manager Script Dispatcher
Service.
Dec 22 09:57:12 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request
(2 scripts)
Dec 22 09:57:12 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: start
running ordered scripts...
Dec 22 10:00:01 localhost CRON[29739]: (root) CMD
(/usr/lib/armbian/armbian-truncate-logs)
Dec 22 10:00:26 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:741
(bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN ACT ACK
Dec 22 10:00:26 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:920 Store
content res. (dl=0xb681b498)
Dec 22 10:00:27 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:1556 N(S)
sequence error: N(S)=0, V(R)=1 (dl=0xb681b498 state LAPD_STATE_MF_EST)
Dec 22 10:00:27 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:1556 N(S)
sequence error: N(S)=0, V(R)=1 (dl=0xb681b498 state LAPD_STATE_MF_EST)
Dec 22 10:00:28 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:1556 N(S)
sequence error: N(S)=0, V(R)=1 (dl=0xb681b498 state LAPD_STATE_MF_EST)
Dec 22 10:00:29 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:720
(bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN REL ACK
Dec 22 10:05:01 localhost CRON[29783]: (root) CMD (command -v debian-sa1 > /dev/null
&& debian-sa1 1 1)
Dec 22 10:11:08 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:741
(bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN ACT ACK
Dec 22 10:11:11 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:720
(bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN REL ACK
Dec 22 10:15:01 localhost CRON[29858]: (root) CMD
(/usr/lib/armbian/armbian-truncate-logs)
Dec 22 10:15:02 localhost CRON[29859]: (root) CMD (command -v debian-sa1 > /dev/null
&& debian-sa1 1 1)
Dec 22 10:17:01 localhost CRON[29877]: (root) CMD ( cd / && run-parts --report
/etc/cron.hourly)
Dec 22 10:21:21 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> bts.c:250
Shutting down BTS 0, Reason No clock from osmo-trx
Dec 22 10:21:24 localhost osmo-bts-trx[25938]: #033[0;mShutdown timer expired
Dec 22 10:21:24 localhost osmo-bts-trx[25938]: ((*))
Dec 22 10:21:24 localhost osmo-bts-trx[25938]: |
Dec 22 10:21:24 localhost osmo-bts-trx[25938]: / \ OsmoBTS
Dec 22 10:21:24 localhost systemd[1]: osmo-bts-trx.service: Main process exited,
code=exited, status=42/n/a
Dec 22 10:21:24 localhost systemd[1]: osmo-bts-trx.service: Unit entered failed state.
Dec 22 10:21:24 localhost systemd[1]: osmo-bts-trx.service: Failed with result
'exit-code'.
Dec 22 10:21:26 localhost systemd[1]: osmo-bts-trx.service: Service hold-off time over,
scheduling restart.
Dec 22 10:21:26 localhost systemd[1]: Stopped Osmocom osmo-bts for osmo-trx.
Dec 22 10:21:26 localhost systemd[1]: Started Osmocom osmo-bts for osmo-trx.
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: <0017> control_if.c:911 CTRL at
127.0.0.1 4238
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0010>
telnet_interface.c:104 Available via telnet 127.0.0.1 4241
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipaccess.c:882
enabling ipaccess BTS mode, OML connecting to 127.0.0.1:3002
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:754 Open transceiver for phy0.0
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipa.c:128
127.0.0.1:3002 connection done
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipaccess.c:705
received ID get from 1801/0/0
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:229
O&M Get Attributes [0], Manufacturer Dependent State is unsupported by TRX.
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[0] (150 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[1] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[2] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[3] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[4] (520 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[5] (165 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681
Ignoring T200[6] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:1055
ADM state already was Unlocked
Dec 22 10:21:27 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipa.c:128
127.0.0.1:3003 connection done
Dec 22 10:21:27 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipaccess.c:705
received ID get from 1801/0/0
Dec 22 10:21:28 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:30 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:32 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:34 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:36 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:38 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:40 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:42 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:44 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:46 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:48 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:50 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:52 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:54 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:56 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:58 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:00 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:02 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:04 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:06 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:08 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:10 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:11 localhost dhclient[5395]: DHCPREQUEST of 192.168.1.170 on eth0 to
192.168.1.1 port 67
Dec 22 10:22:11 localhost dhclient[5395]: DHCPACK of 192.168.1.170 from 192.168.1.1
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2789] dhcp4
(eth0): address 192.168.1.170
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2791] dhcp4
(eth0): plen 24 (255.255.255.0)
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2792] dhcp4
(eth0): gateway 192.168.1.1
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2793] dhcp4
(eth0): server identifier 192.168.1.1
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2793] dhcp4
(eth0): lease time 3600
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2794] dhcp4
(eth0): hostname 'orangepizero'
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2795] dhcp4
(eth0): nameserver '192.168.1.1'
Dec 22 10:22:11 localhost dbus[779]: [system] Activating via systemd: service
name='org.freedesktop.nm_dispatcher'
unit='dbus-org.freedesktop.nm-dispatcher.service'
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2796] dhcp4
(eth0): domain name 'lan'
Dec 22 10:22:11 localhost NetworkManager[789]: <info> [1545470531.2797] dhcp4
(eth0): state changed bound -> bound
Dec 22 10:22:11 localhost systemd[1]: Starting Network Manager Script Dispatcher
Service...
Dec 22 10:22:11 localhost dhclient[5395]: bound to 192.168.1.170 -- renewal in 1598
seconds.
Dec 22 10:22:11 localhost dbus[779]: [system] Successfully activated service
'org.freedesktop.nm_dispatcher'
Dec 22 10:22:11 localhost systemd[1]: Started Network Manager Script Dispatcher
Service.
Dec 22 10:22:11 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request
(2 scripts)
Dec 22 10:22:11 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: start
running ordered scripts...
Dec 22 10:22:12 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:14 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:16 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:18 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:20 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:22 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b>
trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Then, as I restart trx-uhd again, ( by kbd uparrow <cr> ) carrier comes back and
phone registers within a few seconds.....
I see nothing upsetting in syslog that could explain cpu outage, besides, this is a
4-core Orange Pi Zero, I do not however know what quirks
in Armbian Linux could lock out processing, when total load is 95% or so out of 400% (
i.e. 75% idle )
If there is anything I could do to bring clarity, please suggest....
Regards,
Gullik
On 2018-12-22 10:42, Harald Welte wrote:
On Sat, Dec 22, 2018 at 10:00:50AM +0100, Gullik
Webjorn wrote:
At 3 am the trx stopped again. This time it exited (itself) after logging
large amounts of packet loss,
The interesting question is: Was there some kind of cron job or other activity
running at 3am on that system, which could cause a system load high enough to
make the flow between B100, kernel USB stack, libusb, UHD and osmo-trx-uhd
interrupt?
I will investigate as I possibly can
Something like this is likely the root cause of the problem.
Sure, osmo-trx could "plaster around" it by having a more elegant recovery
mechanism, but failing fast due to exit and letting osmo-trx-uhd respawn
(normally executed via systemd) isn't actually all too bad.
What's definitely a real problem that needs immediate fixing is if we
somehow get stack with osmo-trx continuing to run, but failing to
transmit a valid signal wihout exit + respawn.
That HAS happened for me, with the screen filled with logging of packet loss,
pointing at line 1319 in
uhd_device::recv_async_msg()
>
> Regards,
> Harald