Hello GSM community,
I have a question for those who operate their own GSM networks (be it for fun or for research or for any other purpose) in places that DO have regular commercial cell service, i.e., NOT ship-at-sea, middle of desert or Rhizomatica-type environments: how do you deal with, and ideally prevent, the highly undesirable situation of other people's phones, not related to your operation, "jumping ship" from being registered to their regular commercial network to trying to register to your test network instead?
I live and operate in an area where ONE commercial operator still provides GSM/2G service (although only to "grandfathered" customers, closed to new subscribers), plus there are super-strong 4G and 5G signals from all 3 USA-wide carriers. I also operate my own "pirate" GSM network on a test/experimental basis, meaning not always on, but only turned on for brief intervals when I am playing with it.
When I do turn on my test GSM network, I squat on an ARFCN in the middle of a 5 MHz wide "dead" spot (SA shows noise floor over the whole 5 MHz block in question), and most of the time I set my power output to the lowest possible setting: I set max_power_red to the maximum of 20, which should result in 3 dBm output from the sysmoBTS box. I also recently changed my MCC-MNC from 310-222 (an unallocated MNC within MCC 310) to 001-325 (MCC meaning test network, MNC is a feeble attempt to indicate that it's me - I got 00440325 as my test IMEI range in my "other" capacity as ME manuf), but the problematic behaviour of at least some phones erratically "jumping ship" from T-Mobile GSM to the test network still occurs.
Here is a concrete example of inexplicable erratic behaviour I am seeing:
* Last night I powered up my test network at around 19:41 local time. My wife was with me the whole evening; her primary personal phone is Nokia C3-00 (circa 2011, late in GSM terms, but still 2G-only in terms of RAN support) with service on T-Mobile.
* About 3.5 hours later, at around 23:14 local time, my wife noticed that her phone "went into the black hole" (our term for times when phones show "no service" even though T-Mobile GSM signal is there just fine), and she alerted me. I looked at OsmoCNI logs in syslog, and I saw that just a little earlier, at around 22:38 local time, there was an attempt from my wife's phone (her T-Mobile IMSI) to register to our test network. Of course that registration attempt failed - I don't have a roaming agreement with T-Mobile, there is no MAP roaming support in OsmoCNI, I don't have any T-Mobile or other operators' IMSIs in my OsmoHLR, and I am NOT running with "create sub on demand" feature.
* At around 23:14 local time, when my wife noticed that her phone went into the black hole, she immediately proceeded to reboot it - such reflexive reboots are now an "autopilot" action for her - and on its next boot cycle, it immediately proceeded to make another attempt to register to our test network instead of T-Mobile, as evidenced by OsmoCNI logs!
* At that point I turned off the test network GSM signal, as there did not appear to be any other way to convince my wife's Nokia phone to go back to its rightful network of T-Mobile.
Now let me add some noteworthy details:
* The ARFCN on which I squat for my test network is NOT listed in the neighbor cell list advertised by the sole and single commercial GSM/2G operator we have around here.
* When I mentioned this issue previously in an OsmoDevCall USSE, I was asked if perhaps the ARFCN I squat on might be listed as a 2G neighbor in the neighbor list of some newer-G cell. I don't have any direct way to disprove this idea, but my wife's phone, the one that exhibits this inexplicable behaviour, is a 2G-only model, NOT supporting LTE or even UMTS. And the last 3G/UMTS service in our area was shut down last summer, leaving only LTE+5G for the masses and GSM for the tiny sliver of "grandfathered" users who won't give it up until we die.
* In last night's episode, my wife's phone sat quite happily within our dwelling, mere meters from the sysmoBTS antenna putting out its 3 dBm, for almost 3 hours before it made its first attempt to jump ship. During the entirely of this almost-3-hours interval, the signal from our test network as received by the phone was overwhelmingly stronger than the commercial signal (being meters away from the BTS), yet the phone behaved like it should (listened to its serving cell and advertised neighbor cells, no searching around) for almost 3 h.
* The location update interval set by T-Mobile's network is 1 hour - thus periodic LU could not have been the trigger that told Nokia's bugger to abandon its serving cell and go into open-ended search of all possible ARFCNs. So what in the world could have been the trigger then, that caused the bugger to misbehave after almost 3 hours of behaving properly and correctly?
* Aside from whatever the trigger might be, once that Nokia bugger attempts to register to the test GSM network and fails, why in the bloody hell is it not going back to the weaker (in terms of RSSI) but working T-Mobile network, why does it "park" itself in no-service state instead?
I have heard of other people operating test GSM cells/networks in areas where commercial services do exist: I have heard that Neels, of Sysmocom team, operates a test cell under a test license, and when Keith gave an OsmoDevCall presentation on Rhizomatica back in 2021, that presentation was done from an office in some "big" city in Oaxaca, a place where test signals had to coexist peacefully with commercial operators' signals. So how do you guys do it? What additional magic are you doing, which I must be missing, to prevent the situation of phones jumping ship from commercial networks to the test network when the signal from the test network is much stronger due to proximity?
Perplexed, Mother Mychaela
AFAIK it's up to the phone to try and connect to your network, and it is up to you to reject that. If it thinks it is going out of "own" coverage (as if passing a border) it will try to start roaming.
A phone can only "go into the black hole" away from another operator's coverage when your CN accepts the Location Updating Request.
3GPP TS 24.008 4.4.1 describes LU Reject. IIUC, if you reject a LU with cause 'PLMN not allowed' you will still see requests popping up, but once rejected, those MS should not try again.
If you're using osmo-msc, it seems the only remotely similar reject cause we implement is "IMSI Unknown in HLR", which happens when osmo-hlr responds with a LU NACK (i.e. doesn't know the IMSI).
"Roaming not allowed" could also be a good reject cause.
Maybe some new configuration for reject causes could help, but am not sure how. Seems to be the HLR's job, so, such a feature in MSC could be a layer violation. (osmo-bsc used to filter attach requests by IMSI, but we dropped that.)
For the record, it's also important to send in the System Information that you are not providing emergency calls, see bsc cfg.
~N
Hi Neels,
Thank you for pointing me in the direction of LU Reject cause values - it appears that the problem is there indeed! Prior to the most recent change in osmo-hlr (https://gerrit.osmocom.org/c/osmo-hlr/+/16808), the LU Reject cause returned by OsmoCNI to MS was horribly bad, and even with this patch applied, the default cause value is *still* horribly bad, unless the operator figures out the magic vty setting which needs to be changed away from the default.
3GPP TS 24.008 4.4.1 describes LU Reject.
Thank you once again for pointing me in the right direction - but as I studied that portion of the spec, I quickly realized that the section of most relevance here is not 4.4.1, but 4.4.4.7. Right now the default LU reject cause returned by OsmoCNI is #2 - but the spec tells MS implementations to treat #2 the same as #3 and #6, which deal with stolen phones and the like - thus by the spec, if an MS receives LU reject cause #2, it is supposed to assume that the SIM is intrinsically bad (meaning bad for any PLMN, not limited to the one it just tried), and basically abstain from ANY further registration attempts on any PLMN until the phone is rebooted, which is exactly the behaviour I see with my wife's Nokia C3-00.
Instead we need to return cause #11: this cause tells phones to stop trying to connect to *our* PLMN specifically, and advises them to go look for their home PLMN or some other PLMN that allows roaming, while ours does not.
I see that this topic has already been addressed (somewhat) in the code review of the patch that just got merged into osmo-hlr mere days ago, and there is this open issue too:
https://osmocom.org/issues/5865
I would argue that having this wrong default cause value in the code is a *critical* bug that actively endangers innocent operators of test GSM networks, causing them to inadvertently disrupt operation of live commercial networks in their neighbourhoods and exposing them to risk of being prosecuted for doing so. For those who do NOT have the luxury of living in the middle of an ocean or a desert or a Rhizomatica-style village, for those of us who have to operate our test GSM networks in areas that are covered by regular commercial cell services, the highest imperative needs to be ensuring that these test networks of ours do NOT disrupt regular commercial networks or their phones in any way.
Consider the super-common scenario of someone in our community operating a test GSM network in some "inhabited" location where other people live nearby, using their phones on their respective commercial networks. If some passer-by, completely unrelated to your test network, happens to be physically right next to your test BTS, such that the signal from the test network is the strongest despite putting out very little dBm, the passer-by's phone WILL try to "jump ship" from the commercial network to the test network - my practical experience absolutely confirms this unfortunate fact - and what happens next depends on the LU Reject cause returned by the test network. If the test network returns cause #2, the phone will interpret it as "oops, my SIM is bad, I need to withdraw from all further activity", if it behaves per the spec - or if the phone behaves like Keith noted in the CR and keeps retrying, it will keep retrying *to the test network*, and the end result will be the same: the phone will disconnect from the legitimate operator's network, and the subscriber will become unreachable at his or her phone number. Folks, that's interference with or disruption of public telecom networks, any one of us can get into serious trouble for such behaviour - thus I say that we need to fix our default LU Reject cause ASAP, stat!
Maybe some new configuration for reject causes could help, but am not sure how. Seems to be the HLR's job, so, such a feature in MSC could be a layer violation.
The cause value is set in osmo-hlr, and osmo-msc merely passes it through. To get the critical fix implemented as quickly as possible, I simply patched the hard-coded value in osmo-hlr on my system; for people who run the latest code from git, if your osmo-hlr includes commit 268a33e58b9d (just days ago), you need to add this vty setting under the hlr node:
reject-cause not-found plmn-not-allowed
This setting changes the LU Reject cause from #2 to #11, (hopefully) causing the test network to no longer interfere with commercial networks by "luring away" their subscribers.
If no one beats me to it, I will see if I can submit a Gerrit patch to osmo-hlr to change this default.
M~