On 12/10/18 08:55, Harald Welte wrote:
Hi Keith,
Hi Harald, thanks for getting back on
this so quick.
On Thu, Oct 11, 2018 at 09:59:08PM +0200, Keith wrote:
The issue is about restarting MSC or BSC (or
both).
That's something that classic telecom doesn't typically consider
very well.
In terms of the functional specs, it is assumed that restarting a network
element, particularly a core network element is a super rare occasion, and
hence nothing to be really considering as a general problem.
Understood, so I guess the question here is do we want to and/or can we
do anything differently.
I'm thinking that if I want to upgrade osmo-msc in place on a production
system, I obviously have to restart it. There's also the case of power
(AC) outage.
Currently with osmo-nitb, upgrade involves a temporary (some seconds)
loss of BCCH that some phones do not even appear to react to at all.
(I have a script that waits for 0 calls before restarting)
With the split setup it involves a complete loss of service, basically,
as you say, until location update timeout.
I still really haven't looked at the split
setup enough yet,
OsmoNITB shouldn't really be any different. Is it?
It is, in that, if you have a network up, and phones camped, you kill
and restart osmo-nitb, the osmo-bts restarts, and basically from there
things continue as normal, that is to say, a camped phone can still has
service as before.
Is there a plan to implement some kind of
non-volatile state?
What you're referring to is basically the loss of VLR
information.
Yes, those three call attempts I mentioned display different error
messages, which indicate progressive restoration of VLR info, although
the 1st one IIRC, is something like "BSC unknown to this MSC"
Holger and I
were brainstorming about this some time ago, and we came up with the idea
of using a System V shared memory segment for the actual VLR data.
From reading
what you wrote and the subsequent discussion it sounds
feasible but complicated.
We explicitly don't want to use some kind of database system, as the VLR
data needs to be accessed all over the code
directly/synchronously/non-blockingly. We cannot wait for it to be
retrieved from somewhere. That's what is done with HLR data.
Would there be some way to have the MSC be able to restore it's VLR
state on startup from the HLR, or is that going to potentially cause too
much traffic on the GSUP i/f, or be just too far away from spec?
In the case of restarting the MSC with two phones
connected, of course
the phones don't notice anything. One now needs to attempt a call setup
at least 3 times, including a call setup attempt from the "callee" phone
before we can even call it.
The alternative is to wait for any location update,
either due to the
periodic location update timer expirign, or due to the phones actually
moving geographic location.
Moving in our case is probably not going to happen, and
waiting for the
LUR timeout.. well it's possible of course. In the case of these common
say 5 minute losses of power, where the entire system reboots, the
phones are not going to all LUR when the BTS comes back on, so
restoration of service will be very gradual.
And Yes, of course, there should be UPS and all!! But you know..
batteries wear out, stuff stops working and doesn't get replaced.. it's
all extra cost :((
I'd love to just say, look, the UPS is as system critical as the
antenna.. but it's hard to make that stick. I guess maybe when people
start noticing the outage of service after power restoration, we could
say "fix your UPS!!".
A BSC restart will always loose all active connections/channels/calls,
and I think there's nothing wrong with that. Persisting that state
makes little sense, as the phones will all have closed their radio
channels at the time your BSC recovers.
Sure! I wasn't considering active channels, and of course you're right
there's nothing wrong with that, and I'm not suggesting in anyway to try
and persist that state.
keith.