Hi Neels,
Thank you for pointing me in the direction of LU Reject cause values -
it appears that the problem is there indeed! Prior to the most recent
change in osmo-hlr (
https://gerrit.osmocom.org/c/osmo-hlr/+/16808),
the LU Reject cause returned by OsmoCNI to MS was horribly bad, and
even with this patch applied, the default cause value is *still*
horribly bad, unless the operator figures out the magic vty setting
which needs to be changed away from the default.
3GPP TS 24.008 4.4.1 describes LU Reject.
Thank you once again for pointing me in the right direction - but as I
studied that portion of the spec, I quickly realized that the section
of most relevance here is not 4.4.1, but 4.4.4.7. Right now the
default LU reject cause returned by OsmoCNI is #2 - but the spec tells
MS implementations to treat #2 the same as #3 and #6, which deal with
stolen phones and the like - thus by the spec, if an MS receives LU
reject cause #2, it is supposed to assume that the SIM is intrinsically
bad (meaning bad for any PLMN, not limited to the one it just tried),
and basically abstain from ANY further registration attempts on any
PLMN until the phone is rebooted, which is exactly the behaviour I see
with my wife's Nokia C3-00.
Instead we need to return cause #11: this cause tells phones to stop
trying to connect to *our* PLMN specifically, and advises them to go
look for their home PLMN or some other PLMN that allows roaming, while
ours does not.
I see that this topic has already been addressed (somewhat) in the
code review of the patch that just got merged into osmo-hlr mere days
ago, and there is this open issue too:
https://osmocom.org/issues/5865
I would argue that having this wrong default cause value in the code
is a *critical* bug that actively endangers innocent operators of test
GSM networks, causing them to inadvertently disrupt operation of live
commercial networks in their neighbourhoods and exposing them to risk
of being prosecuted for doing so. For those who do NOT have the luxury
of living in the middle of an ocean or a desert or a Rhizomatica-style
village, for those of us who have to operate our test GSM networks in
areas that are covered by regular commercial cell services, the highest
imperative needs to be ensuring that these test networks of ours do NOT
disrupt regular commercial networks or their phones in any way.
Consider the super-common scenario of someone in our community operating
a test GSM network in some "inhabited" location where other people live
nearby, using their phones on their respective commercial networks.
If some passer-by, completely unrelated to your test network, happens
to be physically right next to your test BTS, such that the signal
from the test network is the strongest despite putting out very little
dBm, the passer-by's phone WILL try to "jump ship" from the commercial
network to the test network - my practical experience absolutely
confirms this unfortunate fact - and what happens next depends on the
LU Reject cause returned by the test network. If the test network
returns cause #2, the phone will interpret it as "oops, my SIM is bad,
I need to withdraw from all further activity", if it behaves per the
spec - or if the phone behaves like Keith noted in the CR and keeps
retrying, it will keep retrying *to the test network*, and the end
result will be the same: the phone will disconnect from the legitimate
operator's network, and the subscriber will become unreachable at his
or her phone number. Folks, that's interference with or disruption of
public telecom networks, any one of us can get into serious trouble
for such behaviour - thus I say that we need to fix our default LU
Reject cause ASAP, stat!
Maybe some new configuration for reject causes could
help, but am not
sure how.
Seems to be the HLR's job, so, such a feature in MSC could be a layer
violation.
The cause value is set in osmo-hlr, and osmo-msc merely passes it
through. To get the critical fix implemented as quickly as possible,
I simply patched the hard-coded value in osmo-hlr on my system; for
people who run the latest code from git, if your osmo-hlr includes
commit 268a33e58b9d (just days ago), you need to add this vty setting
under the hlr node:
reject-cause not-found plmn-not-allowed
This setting changes the LU Reject cause from #2 to #11, (hopefully)
causing the test network to no longer interfere with commercial
networks by "luring away" their subscribers.
If no one beats me to it, I will see if I can submit a Gerrit patch to
osmo-hlr to change this default.
M~