Hi,
I've noticed a problem when delivering several SMS to a mobile back-to-back. The first one is ok, the second one is received by the MS but OpenBSC fails with "RX RP-ACK but no sms in transaction?!?" when receiving the "RX SMS RP-ACK (MO)" (full log below)
From the symptoms and looking at the code, I would say that the transaction of the first SMS delivery was not freed, therefore, when we receive the RP-ACK of the second SMS, we do a trans_find_by_id in gsm0411_rcv_sms, and we actually find the transaction of the _first_ deliver (which has its .sms field cleared since it was released already).
I think you wouldn't notice if you waited between SMS because SMC Timer TC1* would expire, calling trans_free().
The problem is that I don't really know _where_ trans_free should be called. There are commented out call to it at several places ... I would think the RP-ACK(MO) reception looks like a good place. But OTOH, this whole process can queue data, giving away a reference to lchan (without get) and trans_free does a put on lchan (so I guess that could invalidate the lchan ref that was queued ? didn't check that in details)
Sylvain
--- SNIP --- <0100> gsm_04_11.c:920 send_sms_lchan() <0002> transaction.c:69 subscr=0x1fe52d0, subscr->net=0x1f8fc30 <0002> gsm_subscriber_base.c:131 subscr 46332 usage increases usage to: 4 <0002> gsm_04_11.c:937 lchan (bts=0,trx=0,ts=0,ch=0) increases usage to: 1 <0100> gsm_04_11.c:972 TX: SMS DELIVER <0100> gsm_04_11.c:188 TX: CP-DATA trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 01 1e 01 2a 07 91 44 77 58 10 06 50 00 12 00 05 b9 72 65 f7 00 00 90 80 72 12 03 41 00 02 44 3a <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-ACK <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-DATA <0100> gsm_04_11.c:191 TX: CP-ACK trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 04 <0100> gsm_04_11.c:750 RX SMS RP-ACK (MO) <0002> gsm_subscriber_base.c:139 subscr 27567 usage decreased usage to: 0 <0002> gsm_subscriber_base.c:139 subscr 46332 usage decreased usage to: 3 DB: Found Subscriber: ID 2, IMSI 206205003327508, NAME '', TMSI 1666608304, EXTEN '27567', LAC 1, AUTH 1 DBI: -7: The requested variable type does not match what libdbi thinks it should be <0002> gsm_subscriber_base.c:131 subscr 46332 usage increases usage to: 4 <0100> gsm_04_11.c:920 send_sms_lchan() <0002> transaction.c:69 subscr=0x1fe52d0, subscr->net=0x1f8fc30 <0002> gsm_subscriber_base.c:131 subscr 46332 usage increases usage to: 5 <0002> gsm_04_11.c:937 lchan (bts=0,trx=0,ts=0,ch=0) increases usage to: 2 <0100> gsm_04_11.c:972 TX: SMS DELIVER <0100> gsm_04_11.c:188 TX: CP-DATA trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 01 1f 01 2a 07 91 44 77 58 10 06 50 00 13 00 05 b9 72 65 f7 00 00 90 80 72 12 03 41 00 03 47 39 19 <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-ACK <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-DATA <0100> gsm_04_11.c:191 TX: CP-ACK trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 04 <0100> gsm_04_11.c:750 RX SMS RP-ACK (MO) <0100> gsm_04_11.c:640 RX RP-ACK but no sms in transaction?!? <0100> gsm_04_11.c:561 TX: SMS RP ERROR, cause 111 (Protocol Error) <0100> gsm_04_11.c:188 TX: CP-DATA trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 01 04 05 2a 01 6f <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-ACK <0100> gsm_04_11.c:159 SMC Timer TC1* is expired, calling trans_free() <0002> transaction.c:101 lchan (bts=0,trx=0,ts=0,ch=0) decreases usage to: 1 <0002> gsm_subscriber_base.c:139 subscr 46332 usage decreased usage to: 4 --- /SNIP ---
Hi Harald,
I don't know if you saw that first email when I posted it (see below for reference), but I still can't decide where to put the trans_free. From the code, it's obvious you thought about it but apparently didn't decide yet since all call to it are commented :)
In gsm411_rx_rp_ack(...) there is a comment saying "/* do not free the transaction here, this is done by sending CP-ACK */", however this is no longer true, the method sending the cp_ack is now called _before_ gsm411_rx_rp_ack and so doesn't free up the transaction anymore (the trans_free call in gsm411_tx_cp_ack is commented out).
To me gsm411_rx_rp_ack sounds like the right place to do it but if you wrote that comment in the first place I guess you had reason not to do it there. Could you elaborate on that ?
Sylvain
I've noticed a problem when delivering several SMS to a mobile back-to-back.
The first one is ok, the second one is received by the MS but OpenBSC fails with "RX RP-ACK but no sms in transaction?!?" when receiving the "RX SMS RP-ACK (MO)" (full log below)
From the symptoms and looking at the code, I would say that the transaction of the first SMS delivery was not freed, therefore, when we receive the RP-ACK of the second SMS, we do a trans_find_by_id in gsm0411_rcv_sms, and we actually find the transaction of the _first_ deliver (which has its .sms field cleared since it was released already).
I think you wouldn't notice if you waited between SMS because SMC Timer TC1* would expire, calling trans_free().
The problem is that I don't really know _where_ trans_free should be called. There are commented out call to it at several places ... I would think the RP-ACK(MO) reception looks like a good place. But OTOH, this whole process can queue data, giving away a reference to lchan (without get) and trans_free does a put on lchan (so I guess that could invalidate the lchan ref that was queued ? didn't check that in details)
Sylvain
--- SNIP --- <0100> gsm_04_11.c:920 send_sms_lchan() <0002> transaction.c:69 subscr=0x1fe52d0, subscr->net=0x1f8fc30 <0002> gsm_subscriber_base.c:131 subscr 46332 usage increases usage to: 4 <0002> gsm_04_11.c:937 lchan (bts=0,trx=0,ts=0,ch=0) increases usage to: 1 <0100> gsm_04_11.c:972 TX: SMS DELIVER <0100> gsm_04_11.c:188 TX: CP-DATA trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 01 1e 01 2a 07 91 44 77 58 10 06 50 00 12 00 05 b9 72 65 f7 00 00 90 80 72 12 03 41 00 02 44 3a <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-ACK <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-DATA <0100> gsm_04_11.c:191 TX: CP-ACK trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 04 <0100> gsm_04_11.c:750 RX SMS RP-ACK (MO) <0002> gsm_subscriber_base.c:139 subscr 27567 usage decreased usage to: 0 <0002> gsm_subscriber_base.c:139 subscr 46332 usage decreased usage to: 3 DB: Found Subscriber: ID 2, IMSI 206205003327508, NAME '', TMSI 1666608304, EXTEN '27567', LAC 1, AUTH 1 DBI: -7: The requested variable type does not match what libdbi thinks it should be <0002> gsm_subscriber_base.c:131 subscr 46332 usage increases usage to: 4 <0100> gsm_04_11.c:920 send_sms_lchan() <0002> transaction.c:69 subscr=0x1fe52d0, subscr->net=0x1f8fc30 <0002> gsm_subscriber_base.c:131 subscr 46332 usage increases usage to: 5 <0002> gsm_04_11.c:937 lchan (bts=0,trx=0,ts=0,ch=0) increases usage to: 2 <0100> gsm_04_11.c:972 TX: SMS DELIVER <0100> gsm_04_11.c:188 TX: CP-DATA trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 01 1f 01 2a 07 91 44 77 58 10 06 50 00 13 00 05 b9 72 65 f7 00 00 90 80 72 12 03 41 00 03 47 39 19 <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-ACK <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-DATA <0100> gsm_04_11.c:191 TX: CP-ACK trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 04 <0100> gsm_04_11.c:750 RX SMS RP-ACK (MO) <0100> gsm_04_11.c:640 RX RP-ACK but no sms in transaction?!? <0100> gsm_04_11.c:561 TX: SMS RP ERROR, cause 111 (Protocol Error) <0100> gsm_04_11.c:188 TX: CP-DATA trans=4 <0100> gsm_04_11.c:149 GSM4.11 TX 49 01 04 05 2a 01 6f <0001> abis_rsl.c:1215 channel=(bts=0,trx=0,ts=0) chan_nr=0x20 sapi=3 DATA INDICATION <0100> gsm_04_11.c:817 trans_id=c RX SMS CP-ACK <0100> gsm_04_11.c:159 SMC Timer TC1* is expired, calling trans_free() <0002> transaction.c:101 lchan (bts=0,trx=0,ts=0,ch=0) decreases usage to: 1 <0002> gsm_subscriber_base.c:139 subscr 46332 usage decreased usage to: 4 --- /SNIP ---
Hi Sylvain,
thanks for looking into this issue.
On Wed, Sep 23, 2009 at 12:16:38AM +0200, Sylvain Munaut wrote:
I don't know if you saw that first email when I posted it (see below for reference), but I still can't decide where to put the trans_free. From the code, it's obvious you thought about it but apparently didn't decide yet since all call to it are commented :)
Yes, I also was not decided how to solve this problem properly. Most likely I first put it somewhere but then noticed that it doesn't work correctly and then placed a comment.
In gsm411_rx_rp_ack(...) there is a comment saying "/* do not free the transaction here, this is done by sending CP-ACK */", however this is no longer true, the method sending the cp_ack is now called _before_ gsm411_rx_rp_ack and so doesn't free up the transaction anymore (the trans_free call in gsm411_tx_cp_ack is commented out).
To me gsm411_rx_rp_ack sounds like the right place to do it but if you wrote that comment in the first place I guess you had reason not to do it there. Could you elaborate on that ?
I have played with so many different combinations and orders of the various commands/packets that I do not remember what worked how and with which phone.
Honestly, I have spent way too much time reading the specifications, and I think they are extremely vague in places and much more difficult than any other part of the GSM specs that I've read.
If you have a better understanding of them and can propose a fix, I'm happy to simply test and merge that, rather than again diving into this topic and wasting more than the days I have already put into it :(
Regards.
Hi Harald,
If you have a better understanding of them and can propose a fix, I'm happy to simply test and merge that, rather than again diving into this topic and wasting more than the days I have already put into it :(
I don't have a deep understanding of the spec but all the state description and diagram I saw indicated that once you have the RP-ACK, it's over: nothing else to do, so freeing the transaction there makes sense.
It's the patch I sent on the list a few days ago, along with the encryption fixes (4 patches submitted for merging in total). And it works with my setup (nanoBTS + tested 3 phones iPhone & cheap nokia & old T610).
Sylvain