Hi all,
I have rebased jolly/sms and split it up (first use the routines that were moved to libosmocore and that create compile errors, then begin to use the DLSMS debug area, then the actual patches in a way that all of them compile to allow to bisect them). I am now manually testing them with the help of the sysmocom modem bank.
There are various issues that needs to be resolved before the 29C3 to be able to use the code. I am just listing them here for reference.
There is a double trans_free in the new code now that will lead to segfaults and various 'soft' failures. I am just including some log messages:
gsm0411_smc.c:308 Cannot release yet current state: WAIT_CP_ACK gsm0411_smc.c:511 Message 0x332/1 unhandled at this state WAIT_CP_ACK. gsm0411_smc.c:517 RX Unimplemented CP msg_type: 0x332 gsm0411_smc.c:134 TX CP-ERROR, cause 97 (Message Type doesn't exist) gsm_04_11.c:913 Transaction contains SMS. gsm_04_11.c:441 RP-DATA (MO) without DST or TPDU ?!? gsm_04_11.c:420 TX: SMS RP ERROR, cause 96 (Invalid Mandatory Information)
cheers holger
On Sun, Nov 11, 2012 at 11:43:31AM +0100, Holger Hans Peter Freyther wrote:
Hi all,
Hi again,
gsm0411_smc.c:308 Cannot release yet current state: WAIT_CP_ACK gsm0411_smc.c:511 Message 0x332/1 unhandled at this state WAIT_CP_ACK. gsm0411_smc.c:517 RX Unimplemented CP msg_type: 0x332 gsm0411_smc.c:134 TX CP-ERROR, cause 97 (Message Type doesn't exist) gsm_04_11.c:913 Transaction contains SMS. gsm_04_11.c:441 RP-DATA (MO) without DST or TPDU ?!? gsm_04_11.c:420 TX: SMS RP ERROR, cause 96 (Invalid Mandatory Information)
I am now rebasing and start with testing just the SMC rework and I have seen a crash in the cp_timer_expired routine (NULL pointer + small offset). I have not seen how this can happen because the smc instance should be cleared at the end of an instance.. I will continue to test with the modem bank and improve the debugging (sadly an ABI incompatible change to the SMC/SMR structure).
holger
Hi,
I am now rebasing and start with testing just the SMC rework and I have seen a crash in the cp_timer_expired routine (NULL pointer + small offset). I have not seen how this can happen because the smc instance should be cleared at the end of an instance.. I will continue to test with the modem bank and improve the debugging (sadly an ABI incompatible change to the SMC/SMR structure).
btw, how easy are those to reproduce ?
Do you need an automated setup or just sending a couple SMS using a phone can trigger them ?
Cheers,
Sylvain
Sylvain,
On Tue, Nov 13, 2012 at 1:05 AM, Sylvain Munaut 246tnt@gmail.com wrote:
I am now rebasing and start with testing just the SMC rework and I have seen a crash in the cp_timer_expired routine (NULL pointer + small offset). I have not seen how this can happen because the smc instance should be cleared at the end of an instance.. I will continue to test with the modem bank and improve the debugging (sadly an ABI incompatible change to the SMC/SMR structure).
btw, how easy are those to reproduce ?
Do you need an automated setup or just sending a couple SMS using a phone can trigger them ?
Let me know if you need access to a setup with nanoBTS + sysmocom modem bank. I'll connect everything and give you access to our OpenVPN. We use our demo server to test OpenBTS, but it's idling most of the time anyway.
PS I'm happy to share our test setup with other well known developers as well. Feel free to contact me.
-- Regards, Alexander Chemeris. CEO, Fairwaves LLC / ООО УмРадио http://fairwaves.ru
On Mon, Nov 12, 2012 at 10:05:07PM +0100, Sylvain Munaut wrote:
Hi,
I am now rebasing and start with testing just the SMC rework and I have seen a crash in the cp_timer_expired routine (NULL pointer + small offset). I have not seen how this can happen because the smc instance should be cleared at the end of an instance.. I will continue to test with the modem bank and improve the debugging (sadly an ABI incompatible change to the SMC/SMR structure).
btw, how easy are those to reproduce ?
the crash with the entire patch set is 'easy' to reproduce. I have four devices that SMS to each other but I am confident that only two can cause the same crash.
the cp_timer_expired is more difficult to reproduce but I think I know how it can happen.
1.) cp_timer expired.. 2.) nmsg = gsm411_msgb_alloc(); inst->mn_recv(inst, GSM411_MNSMS_ERROR_IND, nmsg); msgb_free(nmsg); 3.) case GSM411_MNSMS_ERROR_IND: if (gh) DEBUGP(DLSMS, "MNSMS-ERROR-IND, cause %d (%s)\n", gh->data[0], get_value_string(gsm411_cp_cause_strs, gh->data[0])); else DEBUGP(DLSMS, "MNSMS-ERROR-IND, no cause\n"); trans_free(trans);
at this point the smc is gone... so thanks for asking to make me reflect on the crash. I wonder if I shouldn't just put the smc/smr patch together and debug the result.
Do you need an automated setup or just sending a couple SMS using a phone can trigger them ?
I think it helps that the Wavecom module of our modem bank is generally not happy with our SMS protocol handling and I end up in all the error paths.
On Mon, Nov 12, 2012 at 11:54:40PM +0100, Holger Hans Peter Freyther wrote:
Hi,
the cp_timer_expired is more difficult to reproduce but I think I know how it can happen.
zecke/smc-issues contains a testcase (that is crashing). Ideas how to resolve the issue and checking where similar issues exist and resolve them too (e.g. leading to a double free in the smr code).
holger
On Tue, Nov 13, 2012 at 10:47:40PM +0100, Holger Hans Peter Freyther wrote: Hi Andreas,
zecke/smc-issues contains a testcase (that is crashing). Ideas how to resolve the issue and checking where similar issues exist and resolve them too (e.g. leading to a double free in the smr code).
and the same issue exists with the SMR rp_timer_expired and the OpenBSC code calling trans_free from within the error indication and then another message is received (and the msg is empty but the client code still casts it to a msg).
there is another part I don't fully understand: * gsm411_rx_rp_ack will start a new transaction but not trans_free the old one. * gsm0411_rcv_sms will search for a 'pending' transaction and then free it.
are these two supposed to work together? When was this tested the last time?
holger
On Tue, Nov 13, 2012 at 10:47:40PM +0100, Holger Hans Peter Freyther wrote:
Hi,
zecke/smc-issues contains a testcase (that is crashing). Ideas how to resolve the issue and checking where similar issues exist and resolve them too (e.g. leading to a double free in the smr code).
after re-reading GSM 04.11 on my train trip I believe the only valid place to delete the transaction is either on RF failure or once the SMC entity is sending the RELEASE REQUEST to the lower service.
Andreas can you confirm that? It is certainly wrong to delete/destroy the SMC entity from within any 'error' handler.
holger