This is merely a historical archive of years 2008-2021, before the migration to mailman3.
A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.
Neels Hofmeyr nhofmeyr at sysmocom.deIt's not really like we properly reached the LU milestone yet. I observe two MM failure modes, one fails after the first reply the MSC sends to the UE, the other fails one or two messages after that (depending on how I count them). Let's call them MM.1 and MM.2 When I power cycle the hNodeB, I randomly get one of these two failures. I tend to have to power cycle it, because it seems that SCTP stops working (see below). MM.1) UE hNodeB osmo-hnbgw osmo-cscn | | | --- LU REQ ---------------> | | <--------- ID REQ (IMSI) -- | | [seconds pass] | | <------- LU REJ (timeout)-- | missing/expecting: | --- ID RESP --------------> | This what I called "probably a timing issue of the reply towards the UE". MM.2) UE hNodeB osmo-hnbgw osmo-cscn | | | --- LU REQ ---------------> | | <--------- ID REQ (IMSI) -- | | --- ID RESP --------------> | | <------------- LU ACCEPT -- | | <--------------- MM INFO -- | | [seconds pass] | | <------- LU REJ (timeout)-- | missing/expecting: | --- TMSI REALLOC COMPL----> | SCTP) When one of above failures has occured, I no longer get the HEARTBEAT/HEARTBEAT_ACK messages that go through SCTP roughly every 6 seconds. Instead, wireshark shows a bunch of errors and retransmissions "Destination unreachable (Protocol unreachable)" or even "ABORT [Malformed package]" or "ABORT" / "Protocol violation" with Cause Intormation "Association exceeded its max retans count" [sic: "retans"] It seems SCTP itself has stopped working in that case. GW-sctp_recvmsg) In addition to that, I get an omso-hnbgw failure mode if after testing the above cases (doesn't matter which one) I let a few minutes pass. After a little while, I get <0000> hnbgw.c:171 Error during sctp_recvmsg() (-1 returned by sctp_recvmsg() impossible to further qualify short of heading into kernel debugging) During local testing of the same situation with hnb-test via loopback (127.0.0.1 as well as the same machine's "public" IP), this SCTP error doesn't occur, and consequently osmo-hnbgw doesn't segfault. When I run hnb-test from a different box and connect it to the osmo-hnbgw running on my machine, it also works without problems. Only when the hNodeB does the same, the sctp_recvmsg() error occurs. GW.segf) Shortly after the SCTP error, osmo-hnbgw segfaults. This is probably due to wrong/missing osmo-fd/timer cleanup after the sctp_recvmsg() error code. MSC) And I also get crashes of the MSC in form of the CSCN in conjunction with a LU reject due to timeout and invalidation of a subscriber conn. One time I got a cpu eater where two rb tree nodes pointed at each other via rb_left, and rb_erase kept looping through those two. Mostly I get a plain segfault. This is not as reproducable as the others though. Solutions? - connect the hNodeB with a proper timing source? So far no GPS is connected. - SCTP debugging? - Rather concentrate on further development using hNodeB mocking test programs? (Obviously catch the segfaults in the osmo code, but they are not the real problem. Once they are solved, the basic messaging problems will still exist.) ~Neels -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20160229/8ed2f90a/attachment.bin>