BSC / MSC volatile state / restart handling

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.

Keith keith at rhizomatica.org
Fri Oct 12 10:10:25 UTC 2018



On 12/10/18 08:55, Harald Welte wrote:
> Hi Keith,
Hi Harald, thanks for getting back on this so quick.
>
> On Thu, Oct 11, 2018 at 09:59:08PM +0200, Keith wrote:
>> The issue is about restarting MSC or BSC (or both).
> That's something that classic telecom doesn't typically consider very well.
>
> In terms of the functional specs, it is assumed that restarting a network
> element, particularly a core network element is a super rare occasion, and
> hence nothing to be really considering as a general problem.

Understood, so I guess the question here is do we want to and/or can we
do anything differently.

I'm thinking that if I want to upgrade osmo-msc in place on a production
system, I obviously have to restart it. There's also the case of power
(AC) outage.
Currently with osmo-nitb, upgrade involves a temporary (some seconds)
loss of BCCH that some phones do not even appear to react to at all.
(I have a script that waits for 0 calls before restarting)

With the split setup it involves a complete loss of service, basically,
as you say, until location update timeout.

>
>> I still really haven't looked at the split setup enough yet, 
> OsmoNITB shouldn't really be any different.  Is it?

It is, in that, if you have a network up, and phones camped, you kill
and restart osmo-nitb, the osmo-bts restarts, and basically from there
things continue as normal, that is to say, a camped phone can still has
service as before.
>
>> Is there a plan to implement some kind of non-volatile state?
> What you're referring to is basically the loss of VLR information. 
Yes, those three call attempts I mentioned display different error
messages, which indicate progressive restoration of VLR info, although
the 1st one IIRC, is something like "BSC unknown to this MSC"

>  Holger and I
> were brainstorming about this some time ago, and we came up with the idea
> of using a System V shared memory segment for the actual VLR data.  
From reading what you wrote and the subsequent discussion it sounds
feasible but complicated.

>
> We explicitly don't want to use some kind of database system, as the VLR
> data needs to be accessed all over the code
> directly/synchronously/non-blockingly.  We cannot wait for it to be
> retrieved from somewhere.  That's what is done with HLR data.

Would there be some way to have the MSC be able to restore it's VLR
state on startup from the HLR, or is that going to potentially cause too
much traffic on the GSUP i/f, or be just too far away from spec?

>
>> In the case of restarting the MSC with two phones connected, of course
>> the phones don't notice anything. One now needs to attempt a call setup
>> at least 3 times, including a call setup attempt from the "callee" phone
>> before we can even call it.
> The alternative is to wait for any location update, either due to the
> periodic location update timer expirign, or due to the phones actually
> moving geographic location.
Moving in our case is probably not going to happen, and waiting for the
LUR timeout.. well it's possible of course. In the case of these common
say 5 minute losses of power, where the entire system reboots, the
phones are not going to all LUR when the BTS comes back on, so
restoration of service will be very gradual.

And Yes, of course, there should be UPS and all!!  But you know..
batteries wear out, stuff stops working and doesn't get replaced.. it's
all extra cost :((
I'd love to just say, look, the UPS is as system critical as the
antenna.. but it's hard to make that stick. I guess maybe when people
start noticing the outage of service after power restoration, we could
say "fix your UPS!!".
>
> A BSC restart will always loose all active connections/channels/calls,
> and I think there's nothing wrong with that.  Persisting that state
> makes little sense, as the phones will all have closed their radio
> channels at the time your BSC recovers.

Sure! I wasn't considering active channels, and of course you're right
there's nothing wrong with that, and I'm not suggesting in anyway to try
and persist that state.

keith.






More information about the OpenBSC mailing list