Hi.
The current OsmoPCU master has segfault in tbf test - at least on my machine: Destroying MS object, TLLI = 0x00000000 === start test_tbf_dl_llc_loss === Assert failed !check_element_exists(cnode, cmd->string) command.c:626 backtrace() returned 9 addresses /usr/lib/x86_64-linux-gnu/libosmovty.so.3(install_element+0xce) [0x7ffff77a5f9e] /usr/lib/x86_64-linux-gnu/libosmovty.so.3(install_element_ve+0x11) [0x7ffff77a6411] /usr/lib/x86_64-linux-gnu/libosmogb.so.4(gprs_ns_vty_init+0x17) [0x7ffff79c1da7] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x22640) [0x555555576640] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x1a73a) [0x55555556e73a] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x1697f) [0x55555556a97f] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7ffff67e53f1] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x16eda) [0x55555556aeda]
Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x00007ffff67fc3ea in __GI_abort () at abort.c:89 #2 0x00007ffff77a5fa3 in cmd_make_descvec (descstr=<optimized out>, string=<optimized out>) at command.c:327 #3 install_element (ntype=ntype@entry=1, cmd=cmd@entry=0x7ffff7bcf680 <show_ns_cmd>) at command.c:630 #4 0x00007ffff77a6411 in install_element_ve (cmd=cmd@entry=0x7ffff7bcf680 <show_ns_cmd>) at command.c:637 #5 0x00007ffff79c1da7 in gprs_ns_vty_init (nsi=<optimized out>) at gprs_ns_vty.c:578 #6 0x0000555555576640 in gprs_bssgp_create_and_connect (bts=<optimized out>, local_port=<optimized out>, sgsn_ip=<optimized out>, sgsn_port=<optimized out>, nsei=<optimized out>, nsvci=<optimized out>, bvci=1234, mcc=1, mnc=1, lac=0, rac=0, cell_id=0) at gprs_bssgp_pcu.cpp:847 #7 0x000055555556e73a in test_tbf_dl_llc_loss () at tbf/TbfTest.cpp:499 #8 0x000055555556a97f in main (argc=<optimized out>, argv=<optimized out>) at tbf/TbfTest.cpp:2883
Which also seems to be reproducible in jenkins: http://jenkins.osmocom.org/jenkins/job/osmo-pcu-gerrit/477/label=linux_amd64...
That's odd cause it should have been caught by jenkins way before.
Anyone else seeing this?
It could be that the error was triggered by addeaa39b172b4114bffbbfdd3dd09a029eb37b3 in libosmocore.
On 12.01.2017 11:56, Max wrote:
Hi.
The current OsmoPCU master has segfault in tbf test - at least on my machine: Destroying MS object, TLLI = 0x00000000 === start test_tbf_dl_llc_loss === Assert failed !check_element_exists(cnode, cmd->string) command.c:626 backtrace() returned 9 addresses /usr/lib/x86_64-linux-gnu/libosmovty.so.3(install_element+0xce) [0x7ffff77a5f9e] /usr/lib/x86_64-linux-gnu/libosmovty.so.3(install_element_ve+0x11) [0x7ffff77a6411] /usr/lib/x86_64-linux-gnu/libosmogb.so.4(gprs_ns_vty_init+0x17) [0x7ffff79c1da7] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x22640) [0x555555576640] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x1a73a) [0x55555556e73a] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x1697f) [0x55555556a97f] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7ffff67e53f1] /home/max/source/gsm/osmo-pcu/tests/tbf/TbfTest(+0x16eda) [0x55555556aeda]
Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 #1 0x00007ffff67fc3ea in __GI_abort () at abort.c:89 #2 0x00007ffff77a5fa3 in cmd_make_descvec (descstr=<optimized out>, string=<optimized out>) at command.c:327 #3 install_element (ntype=ntype@entry=1, cmd=cmd@entry=0x7ffff7bcf680 <show_ns_cmd>) at command.c:630 #4 0x00007ffff77a6411 in install_element_ve (cmd=cmd@entry=0x7ffff7bcf680 <show_ns_cmd>) at command.c:637 #5 0x00007ffff79c1da7 in gprs_ns_vty_init (nsi=<optimized out>) at gprs_ns_vty.c:578 #6 0x0000555555576640 in gprs_bssgp_create_and_connect (bts=<optimized out>, local_port=<optimized out>, sgsn_ip=<optimized out>, sgsn_port=<optimized out>, nsei=<optimized out>, nsvci=<optimized out>, bvci=1234, mcc=1, mnc=1, lac=0, rac=0, cell_id=0) at gprs_bssgp_pcu.cpp:847 #7 0x000055555556e73a in test_tbf_dl_llc_loss () at tbf/TbfTest.cpp:499 #8 0x000055555556a97f in main (argc=<optimized out>, argv=<optimized out>) at tbf/TbfTest.cpp:2883
Which also seems to be reproducible in jenkins: http://jenkins.osmocom.org/jenkins/job/osmo-pcu-gerrit/477/label=linux_amd64...
That's odd cause it should have been caught by jenkins way before.
Anyone else seeing this?
On Thu, Jan 12, 2017 at 11:56:32AM +0100, Max wrote:
Subject: Re: tbf test segfault
It's a SIGABRT, not a segfault. SIGABRT is the intended result of a failed assertion.
Anyone else seeing this?
Yes, I can reproduce the same.
That's odd cause it should have been caught by jenkins way before.
It is possible that two patches on gerrit pass on their own, but when merged the combination of them causes a fault. That is due to the "rebase-if-necessary" policy we're using on gerrit, not enforcing another build verification when the master has moved on. In that case we can look at the master branch verification build on jenkins.osmosom.org (the one without "gerrit" in its name).
The first failure in our master verification job is https://jenkins.osmocom.org/jenkins/job/osmo-pcu/1007/ suggesting that the causing commit was
commit b3df58660f6e965799b18b5b87892a3272c4ccf1 Author: Max msuraev@sysmocom.de Log socket path on connection
which doesn't make sense to me, because that is a log message tweak. Could the local.sun_path somehow cause stack corruption?? :
--- a/src/osmobts_sock.cpp +++ b/src/osmobts_sock.cpp @@ -282,7 +282,8 @@ int pcu_l1if_open(void) return rc; }
- LOGP(DL1IF, LOGL_NOTICE, "osmo-bts PCU socket has been connected\n"); + LOGP(DL1IF, LOGL_NOTICE, "osmo-bts PCU socket %s has been connected\n", + local.sun_path);
pcu_sock_state = state;
I suspect some other hidden issue that coincidentally shows its effect only after this commit. Someone (TM) should fire asan and valgrind on it.
It appears osmo-pcu devel is now blocked until this issue is fixed.
~N
Fix has been submitted in https://gerrit.osmocom.org/#/c/1579/ - feel free to upvote.
osmocom-net-gprs@lists.osmocom.org