Hi list,
I've just seen my first Location Update Accept from our UMTS UE+femtoCell using osmo-cscn as IuCS network core :D
The Identity Request (IMSI) that the UE used to not answer is now being answered, and I am getting a successful Location Update Accept from osmo-cscn after that.
The reason why it didn't work last time isn't entirely clear. All I changed since is the *test* program that simulates the UE+hNodeB, and I merely verified that osmo-cscn works (as far as LU is concerned).
A hint is, Daniel told me that the IuPS setup had one day stopped to reply with a message that used to work before, and has since again started working. So it might've been a timing issue in the reply towards UE.
Next steps:
Next I will briefly try to run Daniel's 3G SGSN together with CSCN to see whether the premature Iu Release (?) he sees still happens when CS is connected successfully.
There is also still a segfault in osmo-hnbgw, triggered by reconnecting a second time with our hnb-test that simulates an hNodeB. That's next on the hacking todo list.
According to the specs, proper Authentication is mandatory for UMTS, which osmo-cscn should initiate. That's not happening yet. It seems our testing UE is fine with that, so I'll see when I'll get to that.
~Neels
It's not really like we properly reached the LU milestone yet.
I observe two MM failure modes, one fails after the first reply the MSC sends to the UE, the other fails one or two messages after that (depending on how I count them). Let's call them MM.1 and MM.2
When I power cycle the hNodeB, I randomly get one of these two failures. I tend to have to power cycle it, because it seems that SCTP stops working (see below).
MM.1)
UE hNodeB osmo-hnbgw osmo-cscn | | | --- LU REQ ---------------> | | <--------- ID REQ (IMSI) -- | | [seconds pass] | | <------- LU REJ (timeout)-- |
missing/expecting: | --- ID RESP --------------> |
This what I called "probably a timing issue of the reply towards the UE".
MM.2)
UE hNodeB osmo-hnbgw osmo-cscn | | | --- LU REQ ---------------> | | <--------- ID REQ (IMSI) -- | | --- ID RESP --------------> | | <------------- LU ACCEPT -- | | <--------------- MM INFO -- | | [seconds pass] | | <------- LU REJ (timeout)-- |
missing/expecting: | --- TMSI REALLOC COMPL----> |
SCTP)
When one of above failures has occured, I no longer get the HEARTBEAT/HEARTBEAT_ACK messages that go through SCTP roughly every 6 seconds. Instead, wireshark shows a bunch of errors and retransmissions "Destination unreachable (Protocol unreachable)" or even "ABORT [Malformed package]" or "ABORT" / "Protocol violation" with Cause Intormation "Association exceeded its max retans count" [sic: "retans"]
It seems SCTP itself has stopped working in that case.
GW-sctp_recvmsg)
In addition to that, I get an omso-hnbgw failure mode if after testing the above cases (doesn't matter which one) I let a few minutes pass.
After a little while, I get <0000> hnbgw.c:171 Error during sctp_recvmsg() (-1 returned by sctp_recvmsg() impossible to further qualify short of heading into kernel debugging)
During local testing of the same situation with hnb-test via loopback (127.0.0.1 as well as the same machine's "public" IP), this SCTP error doesn't occur, and consequently osmo-hnbgw doesn't segfault.
When I run hnb-test from a different box and connect it to the osmo-hnbgw running on my machine, it also works without problems. Only when the hNodeB does the same, the sctp_recvmsg() error occurs.
GW.segf)
Shortly after the SCTP error, osmo-hnbgw segfaults. This is probably due to wrong/missing osmo-fd/timer cleanup after the sctp_recvmsg() error code.
MSC)
And I also get crashes of the MSC in form of the CSCN in conjunction with a LU reject due to timeout and invalidation of a subscriber conn. One time I got a cpu eater where two rb tree nodes pointed at each other via rb_left, and rb_erase kept looping through those two. Mostly I get a plain segfault. This is not as reproducable as the others though.
Solutions?
- connect the hNodeB with a proper timing source? So far no GPS is connected.
- SCTP debugging?
- Rather concentrate on further development using hNodeB mocking test programs? (Obviously catch the segfaults in the osmo code, but they are not the real problem. Once they are solved, the basic messaging problems will still exist.)
~Neels