Hi LAPDm guys,
while I was digging into the LAPDm code to find out, why a broken SABM message is sent on a SAPI3 establish SACCH request during an active call (leading to a delay of 3s between RSL EST request and response), I stumbled over the following:
The initial SABM message (not the retransmitted one after T200) has a non-zero length and ends with 3 bytes that have been taken from the end of the RSL EST REQ message. The MS does not answer to this. Interestingly the second SABM message that gets sent after T200 (2s) has a length field of 0 and no trailing garbage.
The difference lies in the way the msgb is handled. In rslms_rx_rll_est_req() the msg buffer passed from RSL is being used. In the case of IPA, the l2h is prepended by the 3 byte IPA header.
Since all code in lapdm.c that handles RSL message seem to assume, that msg->data == msg->l2h, length computation is done based on that assumption in some places. In addition, msgb_pull() is used in a way that would also lead to undefined results in this case, e.g. in msgb_pull_l2h() just the difference between l2h and l3h is pulled from the beginning (msg->data).
The main difficulty to find this, was that much msgb handling is done by manual access to the msgb fields. I'd really favor the use of the predefined macros/inlines instead of meddling around with the fields.
I've added a function msgb_pull_to_l3() to msgb.h which just skips over everything in front of l3 (and therefore invalidates l2h and l1h) and replaced all calls to msgb_pull_l2h() by calls to msgb_pull_to_l3(). In addition, I replaced manual l3 length adjustment by calls to msgb_trim(). That alone fixed the SABM issue described above. See the jerlbeck/fixes/lapd-sms branch for details.
But AFAICS there is still something to do:
- The remaining msgb_pull() need to be checked (at least the one in l2_ph_data_ind() looks suspicious to me. - L3 length computation should be done with the macros, how it is done in lapdm_send_ph_data_req() is broken. - Why does lapd_msg_ctx have a length field that is not used? - ladp_test.c should be extended to check other execution paths, too. I've tried it for dummy packets and I didn't get it working without failing assertions (see below) - How l2/data/.. in a msg are expected to be used/set should be documented somewhere. - It should be clarified, whether all abis driver should reset the msg to start with l2.
Cheers
Jacob
======
The following patch still breaks the assertions, at least changing the l3len computation in lapdm_send_ph_data_req() influences but doesn't fix it.
--- a/tests/lapd/lapd_test.c +++ b/tests/lapd/lapd_test.c @@ -123,8 +123,10 @@ static struct msgb *create_empty_msg(void) static struct msgb *create_dummy_data_req(void) { struct msgb *msg; + const int dummy_l1len = 3;
msg = msgb_from_array(dummy1, sizeof(dummy1)); + msgb_push(msg, dummy_l1len); rsl_rll_push_l3(msg, RSL_MT_DATA_REQ, 0, 0, 1); return msg; }
On Tue, Jan 28, 2014 at 03:15:06PM +0100, Jacob Erlbeck wrote:
Dear Andras,
could you please have a look at the code, the branch by Jacob and the outstanding issues.
The initial SABM message (not the retransmitted one after T200) has a non-zero length and ends with 3 bytes that have been taken from the end of the RSL EST REQ message. The MS does not answer to this. Interestingly the second SABM message that gets sent after T200 (2s) has a length field of 0 and no trailing garbage.
This is a very subtle bug. In the end SMS transfer works but it needed one timeout/retransmission cycle for it to work. It was also non obvious to find as lapd_core/lapdm use two/three different ways to describe the payload size (it is the kind of problems I want to avoid with the trx_id in libosmo-abis).
Jacob will be out of office this week but you might catch up with him next week to resolve the defects/invariant violations.
cheers holger
Holger Hans Peter Freyther wrote:
could you please have a look at the code, the branch by Jacob and the outstanding issues.
The initial SABM message (not the retransmitted one after T200) has a non-zero length and ends with 3 bytes that have been taken from the end of the RSL EST REQ message. The MS does not answer to this. Interestingly the second SABM message that gets sent after T200 (2s) has a length field of 0 and no trailing garbage.
This is a very subtle bug. In the end SMS transfer works but it needed one timeout/retransmission cycle for it to work. It was also non obvious to find as lapd_core/lapdm use two/three different ways to describe the payload size (it is the kind of problems I want to avoid with the trx_id in libosmo-abis).
dear holger, dear jacob,
i 'wiresharked' this bug. there is actually 3 bytes of payload within the first SABM message. with the jerlbeck/fixes/ladp-sms branch, the message is correct and accepted by the phone. (no retransmission)
it is clear to me that removing everything in front of the l3h (msg) must be performed. we can't just remove data between l2h and l3h and assume that there is nothing in front of the l2h. so i agree with the patch in that branch.
best regards,
andreas
Dear Andreas,
what do you think about the other issues? My patch has not fixed all possible problems that could arise from additional l1 headers (see below).
On 28.01.2014 15:15, Jacob Erlbeck wrote:
But AFAICS there is still something to do:
- The remaining msgb_pull() need to be checked (at least the one in
l2_ph_data_ind() looks suspicious to me.
- L3 length computation should be done with the macros, how it is done
in lapdm_send_ph_data_req() is broken.
- Why does lapd_msg_ctx have a length field that is not used?
- ladp_test.c should be extended to check other execution paths, too.
I've tried it for dummy packets and I didn't get it working without failing assertions (see below)
The following applies to most sub-projects, but I find it hard to diagnose and use function calls if their API is not documented beyond parameter names and types. Especially when passing a msgb to a function it would really help if there was some documentation telling about which parts of the msgb are used (as in "l2 is expected to point to the start of the XYZ message"):
- How l2/data/.. in a msg are expected to be used/set should be
documented somewhere.
Best wishes
Jacob
======
The following patch still breaks the assertions, at least changing the l3len computation in lapdm_send_ph_data_req() influences but doesn't fix it.
--- a/tests/lapd/lapd_test.c +++ b/tests/lapd/lapd_test.c @@ -123,8 +123,10 @@ static struct msgb *create_empty_msg(void) static struct msgb *create_dummy_data_req(void) { struct msgb *msg;
const int dummy_l1len = 3; msg = msgb_from_array(dummy1, sizeof(dummy1));msgb_push(msg, dummy_l1len); rsl_rll_push_l3(msg, RSL_MT_DATA_REQ, 0, 0, 1); return msg;}
hi jacob,
Jacob Erlbeck wrote:
- The remaining msgb_pull() need to be checked (at least the one in
l2_ph_data_ind() looks suspicious to me.
the msgb_pull() should actually pull the space between data and l2h (after adding 2), but it assumes that there is no l1 header in front. i suggest to remove the pull:
- msgb_pull(msg, 2); msg->l2h += 2;
no pull, because the space between data and l3h is removed afterwards in the same function.
- L3 length computation should be done with the macros, how it is done
in lapdm_send_ph_data_req() is broken.
what is actually broken? i see that l3_len is the complete msgb space between data and tail.
- Why does lapd_msg_ctx have a length field that is not used?
i guess it was used in early code (when osmocombb project started), but it is actually not required.
- ladp_test.c should be extended to check other execution paths, too.
I've tried it for dummy packets and I didn't get it working without failing assertions (see below) The following applies to most sub-projects, but I find it hard to diagnose and use function calls if their API is not documented beyond parameter names and types. Especially when passing a msgb to a function it would really help if there was some documentation telling about which parts of the msgb are used (as in "l2 is expected to point to the start of the XYZ message"):
- How l2/data/.. in a msg are expected to be used/set should be
documented somewhere.
each msg has l1h, l2h and l3h pointers. the logical way is (even if it is currently not implemented correctly):
frames that are received from lower layer 1 should have l2h set and may have l1 header or not. (layer 2 is not responsible of l1 header, so we should ignore it.) l3h must not be set, since layer 1 does not know about any l3 header start. the l2_ph_data_ind() function should parse the l2h, set l3h and remove everything up to l3h.
frames that are sent to lower layer 1 should have l2h and l3h set before sending to layer 1. see lapdm_send_ph_data_req().
The following patch still breaks the assertions, at least changing the l3len computation in lapdm_send_ph_data_req() influences but doesn't fix it.
--- a/tests/lapd/lapd_test.c +++ b/tests/lapd/lapd_test.c @@ -123,8 +123,10 @@ static struct msgb *create_empty_msg(void) static struct msgb *create_dummy_data_req(void) { struct msgb *msg;
const int dummy_l1len = 3; msg = msgb_from_array(dummy1, sizeof(dummy1));msgb_push(msg, dummy_l1len); rsl_rll_push_l3(msg, RSL_MT_DATA_REQ, 0, 0, 1); return msg;}
push the dummy_l1 header after pushing RLL header:
msg = msgb_from_array(dummy1, sizeof(dummy1)); rsl_rll_push_l3(msg, RSL_MT_DATA_REQ, 0, 0, 1); + msgb_push(msg, dummy_l1len);
- first the msg is allocated and filled with array. l3h is set. - rsl_rll_push_l3 pushes the layer 3 information element and the RLL header in front of the msg and sets l2h. - at least msgb_push adds the dummy_l1 header in front of everything. l1h is not set, but it is not relevant, since l1 header is ignored by lapdm.
best regards,
andreas
Hello Andreas
On 15.02.2014 10:25, Andreas Eversberg wrote:
Jacob Erlbeck wrote:
- L3 length computation should be done with the macros, how it is done
in lapdm_send_ph_data_req() is broken.
what is actually broken? i see that l3_len is the complete msgb space between data and tail.
The line uint8_t l3_len = msg->tail - msg->data; delivers the correct result, that's not the point. But here are 2 different things being mixed: - the l3 length definition: l3_len = msg->tail - msg->l3h (AKA msgb_l3len()) - the assumption that msg->data == msg->l3h Mixing things this way is IMO conceptually broken and hard to maintain. According to your statement below ("frames that are sent to lower layer 1 should have l2h and l3h set [...]"), l3h must have been set when lapdm_send_ph_data_req() is entered. The requirement data == l3h on the other hand is more than requested in general, and it would be helpful if it was documented and/or asserted.
- Why does lapd_msg_ctx have a length field that is not used?
i guess it was used in early code (when osmocombb project started), but it is actually not required.
So it can be removed completely?
- How l2/data/.. in a msg are expected to be used/set should be
documented somewhere.
each msg has l1h, l2h and l3h pointers. the logical way is (even if it is currently not implemented correctly):
frames that are received from lower layer 1 should have l2h set and may have l1 header or not. (layer 2 is not responsible of l1 header, so we should ignore it.) l3h must not be set, since layer 1 does not know about any l3 header start. the l2_ph_data_ind() function should parse the l2h, set l3h and remove everything up to l3h.
frames that are sent to lower layer 1 should have l2h and l3h set before sending to layer 1. see lapdm_send_ph_data_req().
What about putting such definitions somewhere central (e.g. wiki or top of msgb.h)?
push the dummy_l1 header after pushing RLL header:
msg = msgb_from_array(dummy1, sizeof(dummy1)); rsl_rll_push_l3(msg, RSL_MT_DATA_REQ, 0, 0, 1);
msgb_push(msg, dummy_l1len);
- first the msg is allocated and filled with array. l3h is set.
- rsl_rll_push_l3 pushes the layer 3 information element and the RLL
header in front of the msg and sets l2h.
- at least msgb_push adds the dummy_l1 header in front of everything.
l1h is not set, but it is not relevant, since l1 header is ignored by lapdm.
Thank you, that makes sense.
Jacob