Trying to get the HNBGW to communicate with the MSC. I receive Iuh from the femto cell as usual, send to the MSC, and end up in an infinite loop where osmo-stp and osmo-hnbgw send the same message back and forth to each other.
osmo-hnbgw's local IuCS PC is 23, the MSC is PC 1.
It seems when osmo-hnbgw receives a message to PC 23, i.e. to itself, it thinks that its own PC isn't local, finds its own as-clnt-CS in the table of routes, and feeds it to m3ua_tx_xua_as(). osmo-stp then sends it back. repeat.
It seems overkill that a "client" like osmo-hnbgw should be capable of routing in the first place. It could switch off routing entirely and receive & handle every message it receives itself. But assuming that this were nonsense:
IIUC it should rather decide that its own PC 23 is local, in:
bool osmo_ss7_pc_is_local(struct osmo_ss7_instance *inst, uint32_t pc) { OSMO_ASSERT(ss7_initialized); if (pc == inst->cfg.primary_pc) return true; /* FIXME: Secondary and Capability Point Codes */ return false; }
Apparently this global primary_pc is set in osmo_sccp_simple_client(). Since in the osmo-hnbgw I'm (so far) creating two simple clients with distinct PCs (IuCS is 23, IuPS is 24), the primary_pc is overwritten with 24. After that we always fail to notice 23 as a local PC and bounce messages for self right back down the SCCP stack, because apparently we find a route for 23 instead.
A "matching" route is recognised by osmo_ss7_route_find_dpc(): if ((dpc & rt->cfg.mask) == rt->cfg.pc) My own local client gets entered as route, with rt->cfg.mask == 0 and rt->cfg.pc == 0, so an arbitrary dpc & 0 == 0, matches always. It seems awfully easy to create a fatal infinite loop flooding the network and CPU -- simply send a message to a DPC neither side sees as local. Anyway...
Which way to go to serve more than one PC in a program? It appears that either an osmo_sccp_simple_client() call's local PC or an osmo_sccp_user_bind_pc()'s PC should be stored&found as a local PC. ... or both.
Is the "right" way to have two PCs: having two osmo_sccp_simple_client()s with an sccp_user each; or rather one osmo_sccp_simple_client with two users?
Or, should we for the OsmoHNBGW just have one osmo_sccp_simple_client() and one user, i.e. only one PC to receive both IuCS and IuPS messages? All we do is feed both of them out to Iuh anyway... (This seems an easy way out of the multi PC problem, but our MSC wants to serve one PC for BSSAP=2G and one PC for RANAP=3G, so we will need to solve the multi-PC problem anyway.)
There are many directions I could head to, is any one better than the others?
Thanks! ~N
On Wed, Jun 21, 2017 at 05:12:37AM +0200, Neels Hofmeyr wrote:
It seems overkill that a "client" like osmo-hnbgw should be capable of routing in the first place. It could switch off routing entirely and receive & handle every message it receives itself. But assuming that this were nonsense:
I would rather not restrict this. There are many possible useful configurations, and disabling PC routing "by force" completely right now, before we have any actual practical deployments of this could bite us in the back.
Apparently this global primary_pc is set in osmo_sccp_simple_client().
it is not truly global, but speciic to the SS7 instance.
Since in the osmo-hnbgw I'm (so far) creating two simple clients with distinct PCs (IuCS is 23, IuPS is 24), the primary_pc is overwritten with 24.
This is where I think it is broken. The simple_client API is intended for a simple use case, and not for more complex configurations.
I think you have the following options:
a) use different ss7_instances, which is probably the right thing if you really want to assume that MSC and SGSN live in completely separate SS7 networks with no routing/STP in between. At that point, even the point codes may very well be no longer unique, so having two SS7 instances would compltely isolate the two parts.
b) add the missing 'secondary point codes' and then have 23 as primary and 24 as secondary, or whatever
c) do away with the two separate signaling connections for PS and CS domain. I would assume that an operator typically has one signaling network, with unique point codes and routing in place. So the HNB-GW establishes M3UA to some kind of STP (or multiple STPs in fail-over mode) and then simply let SCCP take care of its job to route the respective messages to either SGSN or MSC. The HNB-GW then only has a single point code in that scenario
d) move away from simple-client and introduce the ss7_vty with all its related configuration into the hnb-gw. This is required sooner or later anyway, as even a "simple" client in a real-world scenario would normally have M3UA links to multiple STPs for redundancy or load-sharing reasons.
The hack of having two simple clients (and two separate M3UA connections at all) was introduced as we didn't have an STP at the time and we still wanted to make progress on IuCS+IuPS. So my favorite would be option 'c', and if not that, option 'a' and only as a last resort go to option 'b'. 'd' can be done as incremental step to either of the three above.
A "matching" route is recognised by osmo_ss7_route_find_dpc(): if ((dpc & rt->cfg.mask) == rt->cfg.pc) My own local client gets entered as route, with rt->cfg.mask == 0 and rt->cfg.pc == 0, so an arbitrary dpc & 0 == 0, matches always. It seems awfully easy to create a fatal infinite loop flooding the network and CPU -- simply send a message to a DPC neither side sees as local. Anyway...
that's good to point out and should certainly be adressed. Unfortuantely I think the hop counter information element is not mandatory :/
Which way to go to serve more than one PC in a program? It appears that either
It took me ages yesterday to figure out why the described message loop was happening. Earlier today I experienced the stark opposite: by simply setting both hnbgw clients to the same local PC, IuCS started working immediately :)
It's still of course nonsense to have two clients with the same point code, IuPS is hence not working yet. The earlier questions are still valid.
The great success here: I have for the first time subscribed a UE via 3G using the AoIP code. USSD works, SMS work, voice calls work! Yay!!
Next: fix IuPS, and go on to test parallel operation of 2G + 3G. Exciting.
Apparently this global primary_pc is set in osmo_sccp_simple_client().
it is not truly global, but speciic to the SS7 instance.
Yes, I see more clearly now that we do ss7 = osmo_ss7_instance_find_or_create(ctx, 1); in osmo_sccp_simple_client(), which ends up using the same ss7 instance.
Instead of separate instances, I'm trying to go with:
c) do away with the two separate signaling connections for PS and CS domain. I would assume that an operator typically has one signaling network, with unique point codes and routing in place.
A problem I hit is that we send:
if (cnlink->is_ps) domain = RANAP_CN_DomainIndicator_ps_domain; else domain = RANAP_CN_DomainIndicator_cs_domain;
msg = ranap_new_msg_reset(domain, &cause);
return osmo_sccp_tx_unitdata_msg(cnlink->sccp_user, &cnlink->gw->sccp_local_addr, &cnlink->remote_addr, msg);
when a T_RafC times out. So we send a CS or PS domain RANAP RESET message towards the core network message when this timer for a CN link expires.
I will now have one SCCP link with one PC for the HNBGW, and thus will send both CS and PS domain RESET messages when this T_RafC timer expires. I can't claim to have understood this RESET message though, haven't seen it happening.
This reset message is the only thing where we compose the domain in the HNBGW. Two functions in ranap_msg_factory.c also set a domain indicator, but these are not used in the hnbgw, only by libosmo-ranap users.
Both SGSN and MSC can send their RANAP to the same PC, because they have already encoded the domain indicator in their RANAP messages. We could technically still tell apart who sent it from the SCCP originating PC.
(I also tried to have two SCCP users with the same PC on one sccp_instance, which doesn't work because a) the code prevents me from registering two users on the same PC and b) it doesn't make sense anyway; wishful thinking was that a conn_id would trace replies back to the right user and we'd magically know whether it came from the MSC or the SGSN, which might even work in some cases but is surely not what we want, and apparently don't even need at all.)
I still wonder the same as with a BSC connecting to the MSC via an STP: we never really know whether the MSC or SGSN are actually present. When we have a link to the STP but the SGSN goes up in smoke, nothing notifies us of a connection being cleared. The STP will fail to route our messages, but we can't really get notified of a link change. IIUC we will only receive info on the first leg to the STP and not beyond.
So far it's not that easy to figure out how to properly use SCCP with an STP in the middle. Playing around with this and making wrong choices (based on the previous hnbgw code) I'm starting to gather some experience ... having direct hnbgw<->MSC and hnbgw<->SGSN links was easier :) You mentioned link redundancy, and I guess easy CN-side re-routing is what we get out of using an STP?
Aside: using only one link from the HNBGW to the STP kind of makes the HNBGW look useless :) All that it does is "bounce back" the HNBAP stuff. Apart from that, it only translates the IuCS/IuPS domain indicator to an SCCP point code, sticks it to an M3UA message and passes on to the right. Before, it did have two separate CN links to divide up messages to, now it's just a bit of number crunching in a separate process.
... doesn't make sense to let our OsmoSTP talk Iuh directly, does it? :)
~N
On Wed, Jun 21, 2017 at 10:45:03PM +0200, Neels Hofmeyr wrote:
This reset message is the only thing where we compose the domain in the HNBGW.
Damn, that's not true. I somehow missed the main message dispatch towards Iuh, not sure how I managed that...
Both SGSN and MSC can send their RANAP to the same PC, because they have already encoded the domain indicator in their RANAP messages.
And not true either of course.
~N
Hi Neels,
On Wed, Jun 21, 2017 at 10:45:03PM +0200, Neels Hofmeyr wrote:
It took me ages yesterday to figure out why the described message loop was happening. Earlier today I experienced the stark opposite: by simply setting both hnbgw clients to the same local PC, IuCS started working immediately :)
strange. Maybe some unitialized state somewhere making this non-reproducible?
The great success here: I have for the first time subscribed a UE via 3G using the AoIP code. USSD works, SMS work, voice calls work! Yay!!
great!
Instead of separate instances, I'm trying to go with:
c) do away with the two separate signaling connections for PS and CS domain. I would assume that an operator typically has one signaling network, with unique point codes and routing in place.
A problem I hit is that we send:
if (cnlink->is_ps) domain = RANAP_CN_DomainIndicator_ps_domain; else domain = RANAP_CN_DomainIndicator_cs_domain;
I fear the entire 'cnlink' abstraction is at the wrong layer, or used wrongly. There's one signalling link at MTP-level, but there are multiple sccp_user and local/remote addresses inside.
Outlook: In networks with RAN sharing, a single hnb-gw may have to talk with multiple MSCs or SGSNs (of different operators). That could happen over one mtp-level signalling link[set] and separate sccp-addresses for the related MSC/SGSN peers. Or it could happen with separate mtp-level signalling link[sets], one for each operator.
I will now have one SCCP link with one PC for the HNBGW, and thus will send both CS and PS domain RESET messages when this T_RafC timer expires.
yes, I guess you basically need to split cnlink into something like an 'sccp_peer' kind of structure, which contains the related remote sccp_address and a reference to the signalling link. Then multiple sccp_peers can point to one link (or multiple links).
(I also tried to have two SCCP users with the same PC on one sccp_instance, which doesn't work because a) the code prevents me from registering two users on the same PC and b) it doesn't make sense anyway; wishful thinking was that a conn_id would trace replies back to the right user and we'd magically know whether it came from the MSC or the SGSN, which might even work in some cases but is surely not what we want, and apparently don't even need at all.)
If there are separate SSNs, you can have multiple SCCP users with the same PC. But with same SSN + PC it's like trying to bind two sockets to the same IP-address (PC) and port (SSN): It's not possible.
I still wonder the same as with a BSC connecting to the MSC via an STP: we never really know whether the MSC or SGSN are actually present. When we have a link to the STP but the SGSN goes up in smoke, nothing notifies us of a connection being cleared.
Philipp has worked on detecting unresponsive/disconnected MSCs, clearing local state and re-transmitting RESET messages on OsmoBSC recently. Maybe you should coordinate with him.
Also, a real STP would implement the actual management procedures and tell you * when a route comes up or goes down * when you send something to an unreachable destination
Both parts have not been implemented in OsmoSTP so far, as I didn't consider this the most important feature for us to get started.
As a SCCP User, you wuold get N-NOTICE.ind in such cases, as far as I remember.
In any case, passive detection of non-responding peers (as done by Philipp in the BSC) is something you would probably want to have anyway. I'm not sure there are plenty of failure cases in which an explicit notification of destination availability/unavailability will not reach you.
The STP will fail to route our messages, but we can't really get notified of a link change. IIUC we will only receive info on the first leg to the STP and not beyond.
see above.
So far it's not that easy to figure out how to properly use SCCP with an STP in the middle. Playing around with this and making wrong choices (based on the previous hnbgw code) I'm starting to gather some experience ... having direct hnbgw<->MSC and hnbgw<->SGSN links was easier :) You mentioned link redundancy, and I guess easy CN-side re-routing is what we get out of using an STP?
Interoperability with other BSC/MSC/SGSN/HNB-GW on AoIP and IuCS/IuPS is what we get out of it primarily.
Aside: using only one link from the HNBGW to the STP kind of makes the HNBGW look useless :) All that it does is "bounce back" the HNBAP stuff.
It does exactly what the HNB-GW is specified to do in the architecture.
It translates from RUA to M3UA. The rationale is that the HNodeBs's don't have to have SS7 connection (security issues) nor have a SS7 stack (cost issue, for all those people who ignore the fact that implementing related stack as FOSS would remove those royalties). Also, operational procedures like: "How do I assign point codes to all the hNodeB in an organzied fashion?" or "Do I actually have sufficient point codes for the large population of hNodeB that can easily go into the 100k or even millions of units?"