Date: Mon, 28 Jun 2010 09:15:11 +0800 From: Holger Hans Peter Freyther holger@freyther.de To: openbsc@lists.gnumonks.org Re: Segmentation fault while sending sms via bsc_hack_VTY
Could you try two things? One is to build with OpenBSC with -O0 (either by passing CFLAGS on configure or changing the Makefile) and then run OpenBSC with valgrind and report the line number.
On second look, this seems to be a week or two old OpenBSC? is that true? Would it be a lot of work to test the latest version of OpenBSC?
the new version does not seem to build correct. Make prints out:
sgsn_libgtp.c: In function ‘sgsn_create_pdp_ctx’: sgsn_libgtp.c:117: error: ‘struct pdp_t’ has no member named ‘priv’ sgsn_libgtp.c: In function ‘cb_data_ind’: sgsn_libgtp.c:373: error: ‘struct pdp_t’ has no member named ‘priv’ sgsn_libgtp.c:396: warning: assignment makes pointer from integer without a cast
maybe this could be because I have installed openggsn?
anyway, when using make -k (and ./coonfigure CFLAGS="-O0"), bsc_hack builds and starts. Still it "crashes" when I try to send SMS from the bsc_hack_vty. There is no segmantation fault, but this:
<0008> paging.c:130 No slots available on bts nr 1 <0008> paging.c:130 No slots available on bts nr 0
and
<0004> abis_rsl.c:831 (bts=1,trx=0,ts=0,ss=0) CHANNEL ACTIVATE NACKCAUSE=0x6f(Protocol error, unspecified) <0011> handover_logic.c:197 unable to find HO record
it repeats (endlessly?)
Valgrind reports:
==26461== Invalid read of size 4 ==26461== at 0x806DA60: subscr_paging_cb (linuxlist.h:163) ==26461== by 0x806EE46: paging_T3113_expired (paging.c:209) ==26461== by 0x403D3EF: bsc_update_timers (timer.c:160) ==26461== by 0x403D8F6: bsc_select_main (select.c:94) ==26461== by 0x804BC75: main (bsc_hack.c:271) ==26461== Address 0x4731120 is 432 bytes inside a block of size 440 free'd ==26461== at 0x4024B3A: free (vg_replace_malloc.c:366) ==26461== by 0x40471AF: talloc_free (talloc.c:610) ==26461== by 0x806DD34: subscr_put (gsm_subscriber_base.c:133) ==26461== by 0x806E9F5: paging_remove_request (paging.c:77) ==26461== by 0x806EE02: paging_T3113_expired (paging.c:204) ==26461== by 0x403D3EF: bsc_update_timers (timer.c:160) ==26461== by 0x403D8F6: bsc_select_main (select.c:94) ==26461== by 0x804BC75: main (bsc_hack.c:271) ==26461==
and
==26524== Syscall param ioctl(TCSET{S,SW,SF}) points to uninitialised byte(s) ==26524== at 0x4431A5F: tcsetattr (tcsetattr.c:88) ==26524== by 0x4069865: vty_create (vty.c:1399) ==26524== by 0x406A289: telnet_new_connection (telnet_interface.c:167) ==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524== Address 0xbefa82c8 is on thread 1's stack ==26524== ==26524== Use of uninitialised value of size 4 ==26524== at 0x43A9288: _itoa_word (_itoa.c:196) ==26524== by 0x43ACAE1: vfprintf (vfprintf.c:1613) ==26524== by 0x444DBF3: __vsnprintf_chk (vsnprintf_chk.c:65) ==26524== by 0x444DB13: __snprintf_chk (snprintf_chk.c:36) ==26524== by 0x40417E4: hexdump (stdio2.h:65) ==26524== by 0x8072538: ipaccess_fd_cb (ipaccess.c:566) ==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524== ==26524== Syscall param socketcall.send(msg) points to uninitialised byte(s) ==26524== at 0x443BE78: send (socket.S:100) ==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524== Address 0x4736d9d is 261 bytes inside a block of size 1,140 alloc'd ==26524== at 0x4024F20: malloc (vg_replace_malloc.c:236) ==26524== by 0x4045291: _talloc_zero (talloc.c:355) ==26524== by 0x403DD66: msgb_alloc (msgb.c:37) ==26524== by 0x8061FF9: rsl_msgb_alloc (msgb.h:159) ==26524== by 0x806436E: rsl_chan_activate_lchan (abis_rsl.c:443) ==26524== by 0x80653D0: abis_rsl_rcvmsg (abis_rsl.c:1228) ==26524== by 0x80725F9: ipaccess_fd_cb (ipaccess.c:489) ==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524==
Best Regards,
Richard
On 06/30/2010 03:59 AM, Richard Zahoransky wrote:\
Hi,
thanks a lot for starting to debug this. Could you help me a bit with your test setup? Which type of BTS do you use? Could you get us a pcap file for the Channel Activate NACK?
maybe this could be because I have installed openggsn?
Sound likely, I would guess you need to update libgtp..
==26461== Invalid read of size 4 ==26461== at 0x806DA60: subscr_paging_cb (linuxlist.h:163) ==26461== by 0x806EE46: paging_T3113_expired (paging.c:209) ==26461== by 0x403D3EF: bsc_update_timers (timer.c:160) ==26461== by 0x403D8F6: bsc_select_main (select.c:94) ==26461== by 0x804BC75: main (bsc_hack.c:271) ==26461== Address 0x4731120 is 432 bytes inside a block of size 440 free'd ==26461== at 0x4024B3A: free (vg_replace_malloc.c:366) ==26461== by 0x40471AF: talloc_free (talloc.c:610) ==26461== by 0x806DD34: subscr_put (gsm_subscriber_base.c:133) ==26461== by 0x806E9F5: paging_remove_request (paging.c:77) ==26461== by 0x806EE02: paging_T3113_expired (paging.c:204) ==26461== by 0x403D3EF: bsc_update_timers (timer.c:160) ==26461== by 0x403D8F6: bsc_select_main (select.c:94) ==26461== by 0x804BC75: main (bsc_hack.c:271)
Thank's a lot. So the ingredient I was missing for my test was the failing paging request. I am using code from subscr_get_channel which is not adding a subscr_get/subscr_put... so the callback param points to a deleted subscriber.
==26524== Use of uninitialised value of size 4 ==26524== at 0x43A9288: _itoa_word (_itoa.c:196) ==26524== by 0x43ACAE1: vfprintf (vfprintf.c:1613) ==26524== by 0x444DBF3: __vsnprintf_chk (vsnprintf_chk.c:65) ==26524== by 0x444DB13: __snprintf_chk (snprintf_chk.c:36) ==26524== by 0x40417E4: hexdump (stdio2.h:65) ==26524== by 0x8072538: ipaccess_fd_cb (ipaccess.c:566) ==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524== ==26524== Syscall param socketcall.send(msg) points to uninitialised byte(s) ==26524== at 0x443BE78: send (socket.S:100) ==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524== Address 0x4736d9d is 261 bytes inside a block of size 1,140 alloc'd ==26524== at 0x4024F20: malloc (vg_replace_malloc.c:236) ==26524== by 0x4045291: _talloc_zero (talloc.c:355) ==26524== by 0x403DD66: msgb_alloc (msgb.c:37) ==26524== by 0x8061FF9: rsl_msgb_alloc (msgb.h:159) ==26524== by 0x806436E: rsl_chan_activate_lchan (abis_rsl.c:443) ==26524== by 0x80653D0: abis_rsl_rcvmsg (abis_rsl.c:1228) ==26524== by 0x80725F9: ipaccess_fd_cb (ipaccess.c:489) ==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524==
These two are new as well.... for the last it is either me or harald... doing it wrong. I will poke it a bit.
On 06/30/2010 09:16 AM, Holger Hans Peter Freyther wrote: by 0x804BC75: main (bsc_hack.c:271)
Thank's a lot. So the ingredient I was missing for my test was the failing paging request. I am using code from subscr_get_channel which is not adding a subscr_get/subscr_put... so the callback param points to a deleted subscriber.
I should have fixed the SMS crash and will push it in a second....
==26524== by 0x403D924: bsc_select_main (select.c:119) ==26524== by 0x804BC75: main (bsc_hack.c:271) ==26524==
These two are new as well.... for the last it is either me or harald... doing it wrong. I will poke it a bit.
The flood of Channel Activate NACKs and the Processing Failures (you didn't mention them) are due Harald's latest change in the channel activate, weird enough we still have the same valgrind error in them...
I will debug this issue a bit more..
On 06/30/2010 09:50 AM, Holger Hans Peter Freyther wrote:
The flood of Channel Activate NACKs and the Processing Failures (you didn't mention them) are due Harald's latest change in the channel activate, weird enough we still have the same valgrind error in them...
Hi,
I think I have fixed this for good now. I calculate the length based on how much we have stored in the msgb, so it will be always correct (as long as we store thingsn correctly), I have also added a memset to make valgrind happy, and checked if we would need more of that for the gsm48_chand_desc (but we don't).
The only open issue is... I think we do not need to add the MA at all... at least that is my impression after reading the GSM 08.58 doc about the Channel Identification.
please confirm that both the SMS crash and the NACKs are resolved.
thanks
Hi Zecke,
thanks for fixing the remaining issue,
On Wed, Jun 30, 2010 at 12:10:20PM +0800, Holger Hans Peter Freyther wrote:
The only open issue is... I think we do not need to add the MA at all... at least that is my impression after reading the GSM 08.58 doc about the Channel Identification.
The protocol traces I have for the Siemens BS-11 include the MA in hopping configurations, so this is why I decided to add them, too. I am not really sure which Phase the BS-11 implements - and if it is Phase 1, then the MA would still be required.
On 06/30/2010 02:49 PM, Harald Welte wrote:
The protocol traces I have for the Siemens BS-11 include the MA in hopping configurations, so this is why I decided to add them, too. I am not really sure which Phase the BS-11 implements - and if it is Phase 1, then the MA would still be required.
Hi LaF0rge,
this is the weird part. The whole Channel Identification is only needed for Phase1, and my copy of GSM 08.58 claims that MA should be present but have a size of zero.
It looks like we will need Nibbler's Phase1 phone and check if we need that or not. :)