Hi Pablo!
I am sometimes getting osmo-nitb segfaults, and sometimes random bytes as output of log messages.
I've quickly started it in valgrind, and even during connection of the BTS I get the following issues:
<000d> input/ipaccess.c:693 accept()ed new OML link from 192.168.200.117 ==21897== Conditional jump or move depends on uninitialised value(s) ==21897== at 0x60D4BD4: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== ==21897== Use of uninitialised value of size 8 ==21897== at 0x60D5008: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== ==21897== Use of uninitialised value of size 8 ==21897== at 0x6101CE0: _IO_default_xsputn (genops.c:480) ==21897== by 0x60D4ED3: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== ==21897== Conditional jump or move depends on uninitialised value(s) ==21897== at 0x6101CF2: _IO_default_xsputn (genops.c:479) ==21897== by 0x60D4ED3: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== <000d> input/ipaccess.c:755 accept()ed new RSL link from 192.168.200.117 <0004> bsc_init.c:251 bootstrapping RSL for BTS/TRX (0/0) on ARFCN 871 using MCC=602 MNC=42 LAC=1 CID=0 BSIC=63 TSC=7 ==21897== Use of uninitialised value of size 8 ==21897== at 0x60D2EDB: _itoa_word (_itoa.c:195) ==21897== by 0x60D5386: vfprintf (vfprintf.c:1619) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x410141: inp_sig_cb (bsc_init.c:247)
Can you please investigate? I suspect it to be related to your recent _output() related changes in libosmocore.
Regards, Harald
Hi again,
On Tue, May 17, 2011 at 09:15:31AM +0200, Harald Welte wrote:
I am sometimes getting osmo-nitb segfaults, and sometimes random bytes as output of log messages.
I've quickly started it in valgrind, and even during connection of the BTS I get the following issues:
Some more info:
On the stdout it looks fine, but on telnet it is wrong:
Same log message, first on stdout: <0004> abis_rsl.c:1283 BTS 0 CHAN RQD: no resources for TCH/F 0xe3 <0004> abis_rsl.c:915 (bts=0,trx=0,ts=7,ss=0) CONNECTION FAIL: RELEASING CAUSE=0x01(Radio Link Failure)
same message, but on telnet: <0004> abis_rsl.c:1283 BTS 109771072 CHAN RQD: no resources for 0x1 <0004> abis_rsl.c:915 (null) CONNECTION FAIL: RELEASING CAUSE=0x743ecab()
Regards, Harald
On 17/05/11 09:26, Harald Welte wrote:
Hi again,
On Tue, May 17, 2011 at 09:15:31AM +0200, Harald Welte wrote:
I am sometimes getting osmo-nitb segfaults, and sometimes random bytes as output of log messages.
I've quickly started it in valgrind, and even during connection of the BTS I get the following issues:
Some more info:
On the stdout it looks fine, but on telnet it is wrong:
Same log message, first on stdout: <0004> abis_rsl.c:1283 BTS 0 CHAN RQD: no resources for TCH/F 0xe3 <0004> abis_rsl.c:915 (bts=0,trx=0,ts=7,ss=0) CONNECTION FAIL: RELEASING CAUSE=0x01(Radio Link Failure)
same message, but on telnet: <0004> abis_rsl.c:1283 BTS 109771072 CHAN RQD: no resources for 0x1 <0004> abis_rsl.c:915 (null) CONNECTION FAIL: RELEASING CAUSE=0x743ecab()
Sorry, I'm completely lost here, how can I set up log messages over telnet?
On 17/05/11 20:37, Holger Hans Peter Freyther wrote:
On 05/17/2011 08:04 PM, Pablo Neira Ayuso wrote:
Sorry, I'm completely lost here, how can I set up log messages over telnet?
logging enable logging filter all 1
should be enough for that (on the telnet interface)
thanks, I'll keep investigating this tomorrow. This tastes like a memory corruption, but not sure that this occurs in the _output() function.
I'll try to get a fix for this by tomorrow.
On Wed, May 18, 2011 at 01:01:58AM +0200, Pablo Neira Ayuso wrote:
On 17/05/11 20:37, Holger Hans Peter Freyther wrote:
On 05/17/2011 08:04 PM, Pablo Neira Ayuso wrote:
Sorry, I'm completely lost here, how can I set up log messages over telnet?
logging enable logging filter all 1
should be enough for that (on the telnet interface)
thanks, I'll keep investigating this tomorrow. This tastes like a memory corruption, but not sure that this occurs in the _output() function.
thanks a lot. I am currently busy in Egypt giving all-day training courses, so I don't have much time for testing. I've re-compiled libosmocore, openbsc and everything (with make clean)
I also have some occasional crashes regarding to log/debug messages, see the attached file... I suspect they might be related.
On 05/18/2011 01:01 AM, Pablo Neira Ayuso wrote:
thanks, I'll keep investigating this tomorrow. This tastes like a memory corruption, but not sure that this occurs in the _output() function.
I'll try to get a fix for this by tomorrow.
The easiest guess is that we either need the va_start/va_end inside the loop of _output, or need to copy the arguments again. to reproduce we should probably use the same arch and GCC/(E)GLIBC version that harald is using.
holger
Hi all,
On Wed, May 18, 2011 at 04:10:20PM +0200, Holger Freyther wrote:
The easiest guess is that we either need the va_start/va_end inside the loop of _output, or need to copy the arguments again. to reproduce we should probably use the same arch and GCC/(E)GLIBC version that harald is using.
My testing is on Debian unstable on x86_64 (libc6-2.13-4). As indicated, reverting the commit Pablo indicated has fixed the problem.
I have not seen the problem manifset itself on Ubuntu 11.04 on i386.
Regards, Harald
On 19/05/11 01:12, Harald Welte wrote:
Hi all,
On Wed, May 18, 2011 at 04:10:20PM +0200, Holger Freyther wrote:
The easiest guess is that we either need the va_start/va_end inside the loop of _output, or need to copy the arguments again. to reproduce we should probably use the same arch and GCC/(E)GLIBC version that harald is using.
My testing is on Debian unstable on x86_64 (libc6-2.13-4). As indicated, reverting the commit Pablo indicated has fixed the problem.
I have not seen the problem manifset itself on Ubuntu 11.04 on i386.
While reading the manpages, I noticed this:
"The functions vprintf(), vfprintf(), vsprintf(), vsnprintf() are equivalent to the functions printf(), fprintf(), sprintf(), snprintf(), respectively, except that they are called with a va_list instead of a variable number of arguments. These functions do not call the va_end macro. Consequently, the value of ap is undefined after the call. The application should call va_end(ap) itself afterwards."
I have attached a patch, it basically restores: http://cgit.osmocom.org/cgit/libosmocore/commit/?id=81e9636454294ae10ef9bc8b...
But it includes a better notice.
Please, apply. Thanks!
On 17/05/11 09:15, Harald Welte wrote:
Hi Pablo!
I am sometimes getting osmo-nitb segfaults, and sometimes random bytes as output of log messages.
I've quickly started it in valgrind, and even during connection of the BTS I get the following issues:
<000d> input/ipaccess.c:693 accept()ed new OML link from 192.168.200.117 ==21897== Conditional jump or move depends on uninitialised value(s) ==21897== at 0x60D4BD4: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== ==21897== Use of uninitialised value of size 8 ==21897== at 0x60D5008: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== ==21897== Use of uninitialised value of size 8 ==21897== at 0x6101CE0: _IO_default_xsputn (genops.c:480) ==21897== by 0x60D4ED3: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== ==21897== Conditional jump or move depends on uninitialised value(s) ==21897== at 0x6101CF2: _IO_default_xsputn (genops.c:479) ==21897== by 0x60D4ED3: vfprintf (vfprintf.c:1620) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x435E5C: listen_fd_cb (ipaccess.c:692) ==21897== by 0x4E32419: osmo_select_main (select.c:132) ==21897== by 0x40555B: main (bsc_hack.c:306) ==21897== <000d> input/ipaccess.c:755 accept()ed new RSL link from 192.168.200.117 <0004> bsc_init.c:251 bootstrapping RSL for BTS/TRX (0/0) on ARFCN 871 using MCC=602 MNC=42 LAC=1 CID=0 BSIC=63 TSC=7 ==21897== Use of uninitialised value of size 8 ==21897== at 0x60D2EDB: _itoa_word (_itoa.c:195) ==21897== by 0x60D5386: vfprintf (vfprintf.c:1619) ==21897== by 0x60F8F01: vsnprintf (vsnprintf.c:120) ==21897== by 0x4E33670: _output (logging.c:177) ==21897== by 0x4E33878: _logp (logging.c:227) ==21897== by 0x4E33BCB: logp2 (logging.c:246) ==21897== by 0x410141: inp_sig_cb (bsc_init.c:247)
Can you please investigate? I suspect it to be related to your recent _output() related changes in libosmocore.
Not sure what's wrong yet, I don't hit those problems here, probabling we're using a different config file?
You may want to revert this and give it a try:
http://cgit.osmocom.org/cgit/libosmocore/commit/?id=81e9636454294ae10ef9bc8b...
Hi Pablo,
On Tue, May 17, 2011 at 08:04:05PM +0200, Pablo Neira Ayuso wrote:
Can you please investigate? I suspect it to be related to your recent _output() related changes in libosmocore.
Not sure what's wrong yet, I don't hit those problems here, probabling we're using a different config file?
I will send you my config file in the attachment
You may want to revert this and give it a try:
http://cgit.osmocom.org/cgit/libosmocore/commit/?id=81e9636454294ae10ef9bc8b...
I've reverted it, and the crashes and erroneous printouts appear to be gone, at least after the first test.