Hello Gullik,

If possible please set up some kind of monitoring of key system parameters (load, memory etc.) so we can determine if there is any irregularities happening before the issue.
Also it would help us to see if there is any regularity in the time intervals the issue happens.
I for my servers use Zabbix for this, but it would be way too complicated I think for this purpose.
So any simple monitoring system that retains historic data (for the last, let’s say 24 hours) would be good.

Just my 2cents...

Domi


2018. dec. 22. dátummal, 12:25 időpontban Gullik Webjorn <gullik.webjorn@corevalue.se> írta:

Hello Harald, long time.....

This is syslog for the time...

Dec 22 03:15:01 localhost CRON[25895]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 22 03:16:22 localhost osmo-bts-trx[18200]: #033[0;m<0007> l1sap.c:510 1338262/1009/16/22/46 Invalid condition detected: Frame difference is 1338262-1338205=57 > 1!
Dec 22 03:17:01 localhost CRON[25911]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

here trx uhd has logged lots of lost packets

03:36:19, so 2 seconds before next syslog entry, all within the same second, filling the terminal buffer, so I cannot see loggings immediately before


Dec 22 03:19:38 localhost osmo-bts-trx[18200]: #033[0;m#033[1;36m<0001> bts.c:250 Shutting down BTS 0, Reason No clock from osmo-trx

Dec 22 03:19:41 localhost osmo-bts-trx[18200]: #033[0;mShutdown timer expired
Dec 22 03:19:41 localhost osmo-bts-trx[18200]: ((*))
Dec 22 03:19:41 localhost osmo-bts-trx[18200]:   |
Dec 22 03:19:41 localhost osmo-bts-trx[18200]:  / \ OsmoBTS
Dec 22 03:19:41 localhost systemd[1]: osmo-bts-trx.service: Main process exited, code=exited, status=42/n/a
Dec 22 03:19:41 localhost systemd[1]: osmo-bts-trx.service: Unit entered failed state.
Dec 22 03:19:41 localhost systemd[1]: osmo-bts-trx.service: Failed with result 'exit-code'.
Dec 22 03:19:43 localhost systemd[1]: osmo-bts-trx.service: Service hold-off time over, scheduling restart.
Dec 22 03:19:43 localhost systemd[1]: Stopped Osmocom osmo-bts for osmo-trx.
Dec 22 03:19:43 localhost systemd[1]: Started Osmocom osmo-bts for osmo-trx.
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: <0017> control_if.c:911 CTRL at 127.0.0.1 4238
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0010> telnet_interface.c:104 Available via telnet 127.0.0.1 4241
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipaccess.c:882 enabling ipaccess BTS mode, OML connecting to 127.0.0.1:3002
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b> trx_if.c:754 Open transceiver for phy0.0
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipa.c:128 127.0.0.1:3002 connection done
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipaccess.c:705 received ID get from 1801/0/0
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:229 O&M Get Attributes [0], Manufacturer Dependent State is unsupported by TRX.
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[0] (150 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[1] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[2] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[3] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[4] (520 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[5] (165 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[6] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> oml.c:1055 ADM state already was Unlocked
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipa.c:128 127.0.0.1:3003 connection done
Dec 22 03:19:43 localhost osmo-bts-trx[25938]: #033[0;m<0012> input/ipaccess.c:705 received ID get from 1801/0/0
Dec 22 03:19:45 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 03:19:47 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 03:19:49 localhost osmo-bts-trx[25938]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)


I just had the same stopping again, here is a snippet from that trx terminal.........

Sat Dec 22 10:21:16 2018 DMAIN <0000> Transceiver.cpp:1016 [tid=3007554640] reduced latency: 3:9
Sat Dec 22 10:21:17 2018 DMAIN <0000> Transceiver.cpp:1039 [tid=3007521872] ClockInterface: sending IND CLOCK 292365
Sat Dec 22 10:21:17 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new latency: 3:10 (underrun 3:292376 vs 1:292229)
Sat Dec 22 10:21:18 2018 DMAIN <0000> Transceiver.cpp:1039 [tid=3007521872] ClockInterface: sending IND CLOCK 292581
Sat Dec 22 10:21:18 2018 DMAIN <0000> Transceiver.cpp:1016 [tid=3007554640] reduced latency: 2:10
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:1039 [tid=3007521872] ClockInterface: sending IND CLOCK 292797
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:1016 [tid=3007554640] reduced latency: 1:10
Sat Dec 22 10:21:19 2018 DDEV <0002> UHDDevice.cpp:861 [tid=3007521872] No packet received, implementation timed-out
Sat Dec 22 10:21:19 2018 DDEV <0002> UHDDevice.cpp:865 [tid=3007521872] UHD: Receive timed out
Sat Dec 22 10:21:19 2018 DMAIN <0000> radioInterfaceResamp.cpp:178 [tid=3007521872] Receive error 0
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:908 [tid=3007521872] radio Interface receive failed, requesting stop.
Sat Dec 22 10:21:19 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new latency: 1:11 (underrun 0:292830 vs 0:292819)
Sat Dec 22 10:21:19 2018 DDEV <0002> UHDDevice.cpp:861 [tid=3007521872] An internal receive buffer has filled at 3113.55 sec.
Sat Dec 22 10:21:20 2018 DDEV <0002> UHDDevice.cpp:1485 [tid=3007521872] Skipping buffer data: timestamp=1245558064 time_end=1245418309
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new latency: 1:12 (underrun 3:292843 vs 2:292840)
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:1005 [tid=3007554640] new latency: 1:13 (underrun 3:292854 vs 4:292853)
Sat Dec 22 10:21:20 2018 DMAIN <0000> osmo-trx.cpp:435 [tid=3025046000] Shutting down transceiver...
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:307 [tid=3025046000] Stopping the transceiver
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:320 [tid=3025046000] Stopping the device
Sat Dec 22 10:21:20 2018 DMAIN <0000> Transceiver.cpp:333 [tid=3025046000] Transceiver stopped

This is syslog from right before this, I cut a few lines before the first osmo-bts-trx logging at 10:00:26

It complains a few times within that few seconds....then at 10:11:08 , 11, then stopping at 10:21:21

Dec 22 09:57:12 localhost dbus[779]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec 22 09:57:12 localhost systemd[1]: Started Network Manager Script Dispatcher Service.
Dec 22 09:57:12 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request (2 scripts)
Dec 22 09:57:12 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: start running ordered scripts...
Dec 22 10:00:01 localhost CRON[29739]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)
Dec 22 10:00:26 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:741 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN ACT ACK
Dec 22 10:00:26 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:920 Store content res. (dl=0xb681b498)
Dec 22 10:00:27 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:1556 N(S) sequence error: N(S)=0, V(R)=1 (dl=0xb681b498 state LAPD_STATE_MF_EST)
Dec 22 10:00:27 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:1556 N(S) sequence error: N(S)=0, V(R)=1 (dl=0xb681b498 state LAPD_STATE_MF_EST)
Dec 22 10:00:28 localhost osmo-bts-trx[25938]: #033[0;m<0011> lapd_core.c:1556 N(S) sequence error: N(S)=0, V(R)=1 (dl=0xb681b498 state LAPD_STATE_MF_EST)
Dec 22 10:00:29 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:720 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN REL ACK
Dec 22 10:05:01 localhost CRON[29783]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 22 10:11:08 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:741 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN ACT ACK
Dec 22 10:11:11 localhost osmo-bts-trx[25938]: #033[0;m#033[1;35m<0000> rsl.c:720 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4) (ss=0) SDCCH Tx CHAN REL ACK
Dec 22 10:15:01 localhost CRON[29858]: (root) CMD (/usr/lib/armbian/armbian-truncate-logs)
Dec 22 10:15:02 localhost CRON[29859]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 22 10:17:01 localhost CRON[29877]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Dec 22 10:21:21 localhost osmo-bts-trx[25938]: #033[0;m#033[1;36m<0001> bts.c:250 Shutting down BTS 0, Reason No clock from osmo-trx
Dec 22 10:21:24 localhost osmo-bts-trx[25938]: #033[0;mShutdown timer expired
Dec 22 10:21:24 localhost osmo-bts-trx[25938]: ((*))
Dec 22 10:21:24 localhost osmo-bts-trx[25938]:   |
Dec 22 10:21:24 localhost osmo-bts-trx[25938]:  / \ OsmoBTS
Dec 22 10:21:24 localhost systemd[1]: osmo-bts-trx.service: Main process exited, code=exited, status=42/n/a
Dec 22 10:21:24 localhost systemd[1]: osmo-bts-trx.service: Unit entered failed state.
Dec 22 10:21:24 localhost systemd[1]: osmo-bts-trx.service: Failed with result 'exit-code'.
Dec 22 10:21:26 localhost systemd[1]: osmo-bts-trx.service: Service hold-off time over, scheduling restart.
Dec 22 10:21:26 localhost systemd[1]: Stopped Osmocom osmo-bts for osmo-trx.
Dec 22 10:21:26 localhost systemd[1]: Started Osmocom osmo-bts for osmo-trx.
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: <0017> control_if.c:911 CTRL at 127.0.0.1 4238
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0010> telnet_interface.c:104 Available via telnet 127.0.0.1 4241
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipaccess.c:882 enabling ipaccess BTS mode, OML connecting to 127.0.0.1:3002
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:754 Open transceiver for phy0.0
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipa.c:128 127.0.0.1:3002 connection done
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipaccess.c:705 received ID get from 1801/0/0
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:229 O&M Get Attributes [0], Manufacturer Dependent State is unsupported by TRX.
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[0] (150 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[1] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[2] (180 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[3] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[4] (520 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[5] (165 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:681 Ignoring T200[6] (1680 ms) as sent by BSC due to suspected LAPDm bug!
Dec 22 10:21:26 localhost osmo-bts-trx[29913]: #033[0;m#033[1;36m<0001> oml.c:1055 ADM state already was Unlocked
Dec 22 10:21:27 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipa.c:128 127.0.0.1:3003 connection done
Dec 22 10:21:27 localhost osmo-bts-trx[29913]: #033[0;m<0012> input/ipaccess.c:705 received ID get from 1801/0/0
Dec 22 10:21:28 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:30 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:32 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:34 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:36 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:38 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:40 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:42 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:44 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:46 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:48 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:50 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:52 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:54 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:56 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:21:58 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:00 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:02 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:04 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:06 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:08 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:10 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:11 localhost dhclient[5395]: DHCPREQUEST of 192.168.1.170 on eth0 to 192.168.1.1 port 67
Dec 22 10:22:11 localhost dhclient[5395]: DHCPACK of 192.168.1.170 from 192.168.1.1
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2789] dhcp4 (eth0):   address 192.168.1.170
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2791] dhcp4 (eth0):   plen 24 (255.255.255.0)
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2792] dhcp4 (eth0):   gateway 192.168.1.1
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2793] dhcp4 (eth0):   server identifier 192.168.1.1
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2793] dhcp4 (eth0):   lease time 3600
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2794] dhcp4 (eth0):   hostname 'orangepizero'
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2795] dhcp4 (eth0):   nameserver '192.168.1.1'
Dec 22 10:22:11 localhost dbus[779]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2796] dhcp4 (eth0):   domain name 'lan'
Dec 22 10:22:11 localhost NetworkManager[789]: <info>  [1545470531.2797] dhcp4 (eth0): state changed bound -> bound
Dec 22 10:22:11 localhost systemd[1]: Starting Network Manager Script Dispatcher Service...
Dec 22 10:22:11 localhost dhclient[5395]: bound to 192.168.1.170 -- renewal in 1598 seconds.
Dec 22 10:22:11 localhost dbus[779]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Dec 22 10:22:11 localhost systemd[1]: Started Network Manager Script Dispatcher Service.
Dec 22 10:22:11 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: new request (2 scripts)
Dec 22 10:22:11 localhost nm-dispatcher: req:1 'dhcp4-change' [eth0]: start running ordered scripts...
Dec 22 10:22:12 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:14 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:16 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:18 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:20 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)
Dec 22 10:22:22 localhost osmo-bts-trx[29913]: #033[0;m#033[1;33m<000b> trx_if.c:178 No satisfactory response from transceiver for phy0.0 (CMD POWEROFF)

Then, as I restart trx-uhd again, ( by kbd uparrow <cr> ) carrier comes back and phone registers within a few seconds.....


I see nothing upsetting in syslog that could explain cpu outage, besides, this is a 4-core Orange Pi Zero, I do not however know what quirks

in Armbian Linux could lock out processing, when total load is 95% or so out of 400% ( i.e. 75% idle )

If there is anything I could do to bring clarity, please suggest....

Regards,

Gullik


On 2018-12-22 10:42, Harald Welte wrote:
On Sat, Dec 22, 2018 at 10:00:50AM +0100, Gullik Webjorn wrote:
At 3 am the trx stopped again. This time it exited (itself) after logging
large amounts of packet loss,
The interesting question is: Was there some kind of cron job or other activity
running at 3am on that system, which could cause a system load high enough to
make the flow between B100, kernel USB stack, libusb, UHD and osmo-trx-uhd
interrupt?
I will investigate as I possibly can
Something like this is likely the root cause of the problem.

Sure, osmo-trx could "plaster around" it by having a more elegant recovery
mechanism, but failing fast due to exit and letting osmo-trx-uhd respawn
(normally executed via systemd) isn't actually all too bad.

What's definitely a real problem that needs immediate fixing is if we
somehow get stack with osmo-trx continuing to run, but failing to
transmit a valid signal wihout exit + respawn.
That HAS happened for me, with the screen filled with logging of packet loss, pointing at line 1319 in
uhd_device::recv_async_msg()

Regards,
	Harald