Hi
Our plan is to offer an subscription to get our mensa menu via sms. If just one MS get the sms everything works fine. But if two MSs are online and OpenBSC sends the sms paralell (for each TS) something goes wrong. The first two parts for both SMS will be transmitted correctly. But than we receive a malformed packet from the BTS (see No. 103 from wrong-sms.xml). After that the MS seems to be "confused" or something else is wrong, because the following parts (and part 2) will be transmitted several times. This is because the MS don't ack the sms correctly. On a previous test we get 14 sms parts instead of just 5 parts: 1. part: 1 times 2. part: 2 times 3. part: 3 times 4. part: 4 times 5. part: 4 times
Attached are two gammu-logfiles. right-sms.xml: Only one MS is online and the sms is transmitted correctly. wrong-sms.xml: Two MSs are online. In the logfile you can only see one MS. Until packet No. 102 everything seems to be OK.
ATM i don't know were to search on the code. I don't think there is a sms bug. I think there is a memory bug, because the packet No 103 seems to be random (or contains previous data).
Maybe someone can give me an advice.
Thanks in advance Dennis Wehrle
Hi Dennis,
thanks for your bug report, I'd be happy to investigate it - but the first question is of course: Do you have the pcap log files of the Abis interface?
Also, regarding your suspicion of memory related issues: One idea might be to try the same test with openbsc under valgrind.
Regards, Harald
Hi Harald
Am 24.11.2011 15:45, schrieb Harald Welte:
Hi Dennis,
thanks for your bug report, I'd be happy to investigate it - but the first question is of course: Do you have the pcap log files of the Abis interface?
I've tested it once more and here are the abis log-files. I hope i have recorded everything (port 3002 and 3003). The wrong-sms.pcap and wrong-sms-gammu.xml are recorded at the same time.
Another test was to send the same multipart sms from one MS to the other (and vice versa) on the same time. This also fails. Additionally the 3310 complains about transmission error. But the other MS (Siemens S55) received several parts of the sms, but not the whole sms.
Also, regarding your suspicion of memory related issues: One idea might be to try the same test with openbsc under valgrind.
I will make the valgrind stuff next week.
Best regards and have a nice weekend, Dennis Wehrle
Hi
Today i had a closer look at the abis pcap file (which i sent on Friday). For me one thing is "strange" but i don't know if this is a normal behavior. Is it OK that sometimes a packet contains 2 sub-packets? For instance:
01. IPv4 02. TCP --------------- 03. IPA 04. RSL 05. CP-DATA 06. RP-DATA 07. SMS-DELIVER --------------- 08. IPA 09. RSL 10. CP-DATA 11. RP-DATA 12. SMS-DELIVER ----------------
As you can see No. 3-7 and 8-12 are both on the same frame. Each sub-packet contains a sms-part (but not for the same sms). Maybe this could be the problem, but i don't know. One (potentially) related issue is that the TIO (transaction identifier) is changing wrongly. The messages are sent as follows (from abis pcap):
01. SMS_1 part 1/5: TIO: 0 02. SMS_2 part 1/4: TIO: 0 03. SMS_1 part 2/5: TIO: 1 04. SMS_2 part 2/4: TIO: 2 (SMS_1) + TIO: 1 (SMS_2) 05. SMS_1 part 3/5: TIO: 2 (SMS_2) + TIO: 3 (SMS_1) 06. SMS_2 part 3/4: TIO: 4 (SMS_1) + TIO: 3 (SMS_2) 07. SMS_2 part 3/4: TIO: 4 (SMS_2) 08. SMS_1 part 4/5: TIO: 5 (SMS_2) + TIO: 5 (SMS_1) 09. SMS_2 part 4/4: TIO: 6 (SMS_1) + TIO: 6 (SMS_2) 10. SMS_2 part 4/4: TIO: 0 (SMS_2) 11. SMS_1 part 5/5: TIO: 1 (SMS_2) + TIO: 0 (SMS_1) 12. SMS_2 part 4/4: TIO: 2 (SMS_2) 13. SMS_1 part 5/5: TIO: 1 (SMS_1)
After messages 06, 09 and 13 i get a CP-ERROR message with cause 98: "Message not compatible with short message protocol state" And after messages 07, 08, and 12 i get a CP-ERROR message with cause 81: "Invalid Transaction Identifier value"
I hope i'm on the right path because the usage of valgrind was useless (for me).
Best regards Dennis Wehrle
Am 24.11.2011 15:45, schrieb Harald Welte:
Hi Dennis,
thanks for your bug report, I'd be happy to investigate it - but the first question is of course: Do you have the pcap log files of the Abis interface?
Also, regarding your suspicion of memory related issues: One idea might be to try the same test with openbsc under valgrind.
Regards, Harald
Hi,
AFAIK It should just not send them in // at all ... it all should be sequential (on the same connection, but still one after the other).
Cheers,
Sylvain
Hi
Thx for the answer. If i'm right, than the packets will be processed on the __handle_ts1_write-function on ipaccess.c (libosmo-abis). Here the IPA header is added. Then the msg is sent via the send-function. Until here everything looks good. This assumption is made on basis of the debug output for DLMI. The degub output for the msg->l2h / msg-data contains only one complete packet-stack (IPA + RSL + CP-DATA + RP-DATA + SMS-DELIVER). Therefore i think until here everything is OK, except that the send-function should return the number of bytes which are sent, but the value is always too large for three. ATM i don't know whats happen on the send-function. Or maybe lcr is the problem, but i don't know either. Maybe someone can give me an advice where the tcp packets are build up.
Best regards Dennis Wehrle
Am 28.11.2011 17:14, schrieb Sylvain Munaut:
Hi,
AFAIK It should just not send them in // at all ... it all should be sequential (on the same connection, but still one after the other).
Cheers,
Sylvain
On 11/29/2011 07:42 PM, Dennis Wehrle wrote:
Hi
until here everything is OK, except that the send-function should return the number of bytes which are sent, but the value is always too large for three.
so it returns less than what we wrote? You can set a delay timer to not things as fast as possible and probably get IPA packets into different IP packets.
(thunderbird will wrap this, you can try to apply it by hand) diff --git a/src/input/ipaccess.c b/src/input/ipaccess.c index 2621290..24e85e3 100644 --- a/src/input/ipaccess.c +++ b/src/input/ipaccess.c @@ -512,6 +512,9 @@ static int __handle_ts1_write(struct osmo_fd *bfd, struct e1inp_line *line) DEBUGP(DLMI, "TX %u: %s\n", ts_nr, osmo_hexdump(msg->l2h, msgb_l2len(msg)));
ret = send(bfd->fd, msg->data, msg->len, 0); + if (ret != msg->len) + LOGP(DLINP, LOGL_ERROR, "Failed to write: %d/%d errno: %d\n", + ret, msgb_l2len(msg), errno); msgb_free(msg);
/* set tx delay timer for next event */ @@ -558,7 +561,7 @@ struct e1inp_driver ipaccess_driver = { .want_write = ts_want_write, .line_update = ipaccess_line_update, .close = ipaccess_close, - .default_delay = 0, + .default_delay = 50000, };
/* callback of the OML listening filedescriptor */
ATM i don't know whats happen on the send-function. Or maybe lcr is the problem, but i don't know either. Maybe someone can give me an advice where the tcp packets are build up.
TCP packets are built in the kernel. with nagle the kernel waits a bit to fill the frame[1].
On 11/28/2011 04:11 PM, Dennis Wehrle wrote:
Hi
Today i had a closer look at the abis pcap file (which i sent on Friday). For me one thing is "strange" but i don't know if this is a normal behavior. Is it OK that sometimes a packet contains 2 sub-packets? For instance:
this is valid behavior (and you see it during RSL/OML bring up), one thing how this can break (and would break OpenBSC/libosmo-abis) is that we do:
1.) ipa_header = recv(3 bytes) 2.) data = recv(ipa_header->len)
this would fail in case header and data do not appear on the same frame. So the esoteric question is do you have MTU changes in your network (this assumes that the ipaccess would have similar broken code we have)?
holger
Hi
until here everything is OK, except that the send-function should
return the
number of bytes which are sent, but the value is always too large
for three.
so it returns less than what we wrote?
No msg->len and ret are equal. I think it was my fault. I had count the bytes manually. For example:
<001b> ipaccess.c:512 TX 2: 03 01 01 28 02 03 0b 00 02 49 04 <001b> ipaccess.c:514 msg->len 14 <001b> ipaccess.c:520 bfd->fd: 12 <001b> ipaccess.c:521 ret: 14
On line 512 i count manually 15 bytes, but the msg->len and ret value is 18. I think this is OK, than the bytes on line 512 doesn't contain the IPA protocol bytes. I could verify this on wireshark. There are 3 bytes more for the IPA header (00 0b 00). So i would say that i've misinterpreted that. On line 510 the function ipaccess_prepend_header is called and therefore i thought that the bytes for the header is included at the output. I didn't concerned that only the msg->l2h is printed.
You can set a delay timer to not things as fast as possible and
probably get IPA packets into different IP packets.
Unfortunately this doesn't solve the problem.
this is valid behavior (and you see it during RSL/OML bring up), one
thing how
Ok. If this is a valid behavior, maybe i'm on the wrong track. Therefore i don't know how to investigate the problem further.
this can break (and would break OpenBSC/libosmo-abis) is that we do:
1.) ipa_header = recv(3 bytes) 2.) data = recv(ipa_header->len)
this would fail in case header and data do not appear on the same
frame. So
the esoteric question is do you have MTU changes in your network
(this assumes
that the ipaccess would have similar broken code we have)?
No the standard MTU (1500 Bytes) is set. Even if i connect it directly on my pc via a switch the problem occurs. My abis pcap file was created this way.
Best regards Dennis Wehrle
Hi,
I didn't look at the deug, but could you try the jolly/sms branch of openbsc ?
The SMS part has been rewritten and maybe just trying the new experimental code will fix your issue.
Cheers,
Sylvain
Hi
Thx for you answers.
@Sylvain I have tried to use the jolly/sms branch. But with this code i can't send multipart sms from vty.
@Harald I don't have the pcap log-file. I have never done this, but this is my next step. I will also test it with valgrind.
Best Regards Dennis
P.S.: Sorry for (possible) double posting ....
Am 24.11.2011 15:46, schrieb Sylvain Munaut:
Hi,
I didn't look at the deug, but could you try the jolly/sms branch of openbsc ?
The SMS part has been rewritten and maybe just trying the new experimental code will fix your issue.
Cheers,
Sylvain