BSC / MSC volatile state / restart handling

List overview All Threads
Download

newer

older

Location Updating failure due to...

SS requests do bad things.

Keith

11 Oct 2018 11 Oct '18

9:59 p.m.

Hi All.

This is very brief, but something that's been on my mind to make a ticket about.

I realise maybe it's better to ask first if there is a plan.

The issue is about restarting MSC or BSC (or both).

I still really haven't looked at the split setup enough yet, in terms of getting familiar with all the info available from logging in both programs, so I don't have a good analysis of what's going on. Sorry.

But my point here is not to describe a bug to be fixed but just asking in general what is the strategy to deal with daemon restarts, intentional or not. Is there a plan to implement some kind of non-volatile state?

The problem in terms of usability, from simple observation:

In the case of restarting the MSC with two phones connected, of course the phones don't notice anything. One now needs to attempt a call setup at least 3 times, including a call setup attempt from the "callee" phone before we can even call it.

I tried restarting both BSC and MSC, in various orders, and I did not get a satisfactory result. Of course, restarting the BSC restarts osmo-bts which causes (some) phones to notice the temporary loss of BCCH, but of course no LUR or anything as LAC doesn't change, so there's some state that is not getting set right someplace on restart, as I said. both phones need to interact before one of them can be called.

Yeah, so, sorry for the lame analysis, but I think this should be pretty easy to reproduce, and at this time there are people who are MUCH more familiar with osmo-msc and osmo-bsc than me, who probably know what I'm talking about and can comment.

Thanks!

Show replies by date

Harald Welte

12 Oct 12 Oct

8:55 a.m.

Hi Keith,

On Thu, Oct 11, 2018 at 09:59:08PM +0200, Keith wrote:

...

The issue is about restarting MSC or BSC (or both).

That's something that classic telecom doesn't typically consider very well.

In terms of the functional specs, it is assumed that restarting a network element, particularly a core network element is a super rare occasion, and hence nothing to be really considering as a general problem.

...

I still really haven't looked at the split setup enough yet,

OsmoNITB shouldn't really be any different. Is it?

...

Is there a plan to implement some kind of non-volatile state?

What you're referring to is basically the loss of VLR information. The 3GPP specs explciitly consider it volatile and non-persistent. Holger and I were brainstorming about this some time ago, and we came up with the idea of using a System V shared memory segment for the actual VLR data. SysV SHM has the nice property that it's regular mapped memory, but it is independent of processes, i.e. you can restart a process while keeping whatever is in that shared memory segment.

The devil is of course in the detail:

1) you may not be able to map it to the same address after each restart? Needs further investigation and might require some pointer fix-up, or the use of relative/offset addressing rather than absolute pointers.

2) what do you do if you're actually upgrading the software and the VLR related structures have been modified? There either needs to be an explicit conversion function, or at least a mechanism to detect this safely and discard all the data in such situations

3) if the restart is a crash: What if some corrupted VLR data was actually the cause of the crash? In that case you'd end up in a re-start loop. You'd have persistent crashes, rather than a one-off crash with recovery.

I've done some research on the web at that time (maybe 2 years ago) but unfortunately couldn't find any library/tool/infrastructure for having persistent data in SysV SHM, and also no other FOSS programs that did so. Maybe I didn't look closely enough? To me, it seems like the most obvious solution to persist state across crashes/restarts of C programs on unix-type systems.

We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

...

In the case of restarting the MSC with two phones connected, of course the phones don't notice anything. One now needs to attempt a call setup at least 3 times, including a call setup attempt from the "callee" phone before we can even call it.

The alternative is to wait for any location update, either due to the periodic location update timer expirign, or due to the phones actually moving geographic location.

...

I tried restarting both BSC and MSC, in various orders, and I did not get a satisfactory result. Of course, restarting the BSC restarts osmo-bts which causes (some) phones to notice the temporary loss of BCCH, but of course no LUR or anything as LAC doesn't change, so there's some state that is not getting set right someplace on restart, as I said. both phones need to interact before one of them can be called.

A BSC restart will always loose all active connections/channels/calls, and I think there's nothing wrong with that. Persisting that state makes little sense, as the phones will all have closed their radio channels at the time your BSC recovers.

-- - Harald Welte laforge@gnumonks.org http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)

Michael Andersen

9:08 a.m.

Hi Harald,

...

We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

Lightning Memory-Mapped Database (https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) is designed for exactly these requirements.

It gives non-blocking access to the data - directly in-memory. Yet it also persists the data to disk in the background. So in case the software is restarted, it can pick off exactly where it left its dataset.

LMDB is highly recommended.

BR, Michael A

Harald Welte

9:50 a.m.

Hi Michael,

On Fri, Oct 12, 2018 at 09:08:20AM +0200, Michael Andersen wrote:

...

...
We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

Lightning Memory-Mapped Database (https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database) is designed for exactly these requirements.

Thanks for your suggestion. After reading through the documentation I could find, I am not convinced it actually is applicable to our use case / requirements.

Osmocom programs, like OsmoMSC are non-blocking single-threaded event-loop driven designs. We cannot at any point run into a situation where there is a blocking read or blocking write to disk/file. Doing so would kill performance for all other concurrent transactions in progress.

...

It gives non-blocking access to the data - directly in-memory. Yet it also persists the data to disk in the background. So in case the software is restarted, it can pick off exactly where it left its dataset.

AFAICT from a very quick look at the source code, the writing-to-disk is not performed "in the background" but it is rather performed as blocking write() calls in the process/context at which the data is acessed?

So to me, LMDB more looks like BerkeleyDB or SQLite but with a layer of zero-copy memory cache/access at runtime on top of it?

We don't want blocking/IO. We don't want disk I/O at all. We don't want persistency across system reboots or the like, but only persistency across restarts/respawns of the specific process. Hence my argument in favor of a SysV SHM segment, which fullfills the criteria. If anyone has any pointers for examples of FOSS programs or libraries where that has been done before, it would be much appreciated.

Regards, Harald

Michael Andersen

11:23 a.m.

Hi Harald,

...

...
It gives non-blocking access to the data - directly in-memory. Yet it

also

...

...
persists the data to disk in the background. So in case the software is restarted, it can pick off exactly where it left its dataset.

AFAICT from a very quick look at the source code, the writing-to-disk is

not

...

performed "in the background" but it is rather performed as blocking

write()

...

calls in the process/context at which the data is acessed?

We use it in MDB_NOSYNC mode. In practice this seems to result in no waiting for writes in the user thread at any point. Waiting for reads is never an issue because the LMDB design does not support reading from disk.

The LMDB benchmarks talk about many thousands of transactions per second. I can confirm that these benchmarks are not lying.

...

So to me, LMDB more looks like BerkeleyDB or SQLite but with a layer of

zero-copy

...

memory cache/access at runtime on top of it?

The cool thing about LMDB is that it has no cache, no layers. The "database" is an area of memory that the LMDB functions help access using key/value functions.

BR, Michael A

Harald Welte

11:32 a.m.

Hi Michael,

On Fri, Oct 12, 2018 at 11:23:23AM +0200, Michael Andersen wrote:

...

We use it in MDB_NOSYNC mode. In practice this seems to result in no waiting for writes in the user thread at any point. Waiting for reads is never an issue because the LMDB design does not support reading from disk.

Thanks for your clafifications. Defnitely worth further investigation.

Neels Hofmeyr

12:34 p.m.

No matter which mechanism, in general, using a key-value store or having to rewire pointers / struct versions / recover corrupted state could end up being considerable effort: the code would look exactly like asking a DB layer for the data, even if that DB layer is super fast. Is an MSC restart really that common?

Restarting the BSC should be fine in that regard, BTW. As soon as it's back, and as soon as the LAC has been associated with that BSC (on the first L3 operation for any subscriber), paging should work out again. We could also add explicit LAC<->point-code mappings in the config to remove that gap, so that the BSC's LACs are known right from the time the BSC is connected.

On Fri, Oct 12, 2018 at 11:23:23AM +0200, Michael Andersen wrote:

...

We use it in MDB_NOSYNC mode. In practice this seems to result in no waiting for writes in the user thread at any point.

so... if LMDB uses disk for pesistent storage, and if you never sync to disk ... how can this possibly work?

https://symas.com/understanding-lmdb-database-file-sizes-and-memory-utilizat... explains that LMDB is using memory-mapped files. Elsewhere MDB_NOSYNC is explained to not do fsync() after a commit.

So I assume some file system I/O is still happening at some point, but I assume done by the kernel. I have no idea about mem-mapped files, whether that is done blocking.

It also says "The memory area occupied by the database file is marked read-only"; so, writing a new entry requires immediate disk I/O? (Or disk I/O cached by the kernel until the next fsync).

What if we kept the LMDB db file on a ramdisk? :)

(I'm still reluctant to key-value-ize the VLR though.)

Keith

12:10 p.m.

On 12/10/18 08:55, Harald Welte wrote:

...

Hi Keith,

Hi Harald, thanks for getting back on this so quick.

...

On Thu, Oct 11, 2018 at 09:59:08PM +0200, Keith wrote:

...
The issue is about restarting MSC or BSC (or both).

That's something that classic telecom doesn't typically consider very well.

In terms of the functional specs, it is assumed that restarting a network element, particularly a core network element is a super rare occasion, and hence nothing to be really considering as a general problem.

Understood, so I guess the question here is do we want to and/or can we do anything differently.

I'm thinking that if I want to upgrade osmo-msc in place on a production system, I obviously have to restart it. There's also the case of power (AC) outage. Currently with osmo-nitb, upgrade involves a temporary (some seconds) loss of BCCH that some phones do not even appear to react to at all. (I have a script that waits for 0 calls before restarting)

With the split setup it involves a complete loss of service, basically, as you say, until location update timeout.

...

...
I still really haven't looked at the split setup enough yet,

OsmoNITB shouldn't really be any different. Is it?

It is, in that, if you have a network up, and phones camped, you kill and restart osmo-nitb, the osmo-bts restarts, and basically from there things continue as normal, that is to say, a camped phone can still has service as before.

...

...
Is there a plan to implement some kind of non-volatile state?

What you're referring to is basically the loss of VLR information.

Yes, those three call attempts I mentioned display different error messages, which indicate progressive restoration of VLR info, although the 1st one IIRC, is something like "BSC unknown to this MSC"

...

Holger and I were brainstorming about this some time ago, and we came up with the idea of using a System V shared memory segment for the actual VLR data.

From reading what you wrote and the subsequent discussion it sounds feasible but complicated.

...

We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

Would there be some way to have the MSC be able to restore it's VLR state on startup from the HLR, or is that going to potentially cause too much traffic on the GSUP i/f, or be just too far away from spec?

...

...
In the case of restarting the MSC with two phones connected, of course the phones don't notice anything. One now needs to attempt a call setup at least 3 times, including a call setup attempt from the "callee" phone before we can even call it.

The alternative is to wait for any location update, either due to the periodic location update timer expirign, or due to the phones actually moving geographic location.

Moving in our case is probably not going to happen, and waiting for the LUR timeout.. well it's possible of course. In the case of these common say 5 minute losses of power, where the entire system reboots, the phones are not going to all LUR when the BTS comes back on, so restoration of service will be very gradual.

And Yes, of course, there should be UPS and all!! But you know.. batteries wear out, stuff stops working and doesn't get replaced.. it's all extra cost :(( I'd love to just say, look, the UPS is as system critical as the antenna.. but it's hard to make that stick. I guess maybe when people start noticing the outage of service after power restoration, we could say "fix your UPS!!".

...

A BSC restart will always loose all active connections/channels/calls, and I think there's nothing wrong with that. Persisting that state makes little sense, as the phones will all have closed their radio channels at the time your BSC recovers.

Sure! I wasn't considering active channels, and of course you're right there's nothing wrong with that, and I'm not suggesting in anyway to try and persist that state.

keith.

Harald Welte

12:43 p.m.

Hi Keith,

On Fri, Oct 12, 2018 at 12:10:25PM +0200, Keith wrote:

...

Currently with osmo-nitb, upgrade involves a temporary (some seconds) loss of BCCH that some phones do not even appear to react to at all. (I have a script that waits for 0 calls before restarting)

With the split setup it involves a complete loss of service, basically, as you say, until location update timeout.

ah, that's because OsmoNITB stores the location area in the database, which is "wrong" as per GSM spec, as the VLR information is specified/defined as volatile, and only the HLR information is persistent.

...

...
Holger and I were brainstorming about this some time ago, and we came up with the idea of using a System V shared memory segment for the actual VLR data.

From reading what you wrote and the subsequent discussion it sounds feasible but complicated.

I'm not saying it neccessarily is. There are just some considerations that need to be taken. Also, it wouldn't help for power outages. Our ideas were mostly related to recovery from crashes, where the hardware and OS keeps running and we just want to keep state persistent in RAM for the fraction of a second between crash and respawn.

Using something like LMDB would allow for both options:

1) run it on a tmpfs and get only in-RAM persistence as long as the system runs 2) run it on a SDD/HDD and get persistence across reboots

However, LMDB will have blocking writes/flushes of pages on write access such as location updates, which will impact performance. However, I would guess/hope those would be insignificant when you run it on a tmpfs. So for large deployments where you care about performance, tmpfs. For small deployments where you care about power outages for a few subscribers: sdd/flash based.

...

Would there be some way to have the MSC be able to restore it's VLR state on startup from the HLR, or is that going to potentially cause too much traffic on the GSUP i/f, or be just too far away from spec?

It's not within spec, and the HLR doesn't know the location area in which a MS was camped. It should know/store the VLR address (global title in real networks) where the last LU was received.

...

...
The alternative is to wait for any location update, either due to the periodic location update timer expirign, or due to the phones actually moving geographic location.

Moving in our case is probably not going to happen, and waiting for the LUR timeout.. well it's possible of course.

You could just increment the LAC of the cell[s] every time the MSC is disconnected/unavailable. Or in case of a "NITB style" setup, basically at every system boot. That would force all phones to perform a location update, which populates the VLR.

This will of course trigger a huge load spike at that time, but that should be manageable, particularly with the load mitigation mechanisms we have put in place such as ACC ramping, power ramping, ...

...

And Yes, of course, there should be UPS and all!! But you know.. batteries wear out, stuff stops working and doesn't get replaced.. it's all extra cost :((

I think it boils down to what kind of computer is used. Small, low-power embedded systems (think of beaglebone, raspi, ...) can easily bridge outages of way ore than 5 minutes of power loss from small Li-Ion batteries. Of course you don't want a traditional UPS with AC inverters, lead-acid batteries, and keep all the equipment powered (including BTSs).

Neels Hofmeyr

15 Oct 15 Oct

12:40 p.m.

On Fri, Oct 12, 2018 at 12:10:25PM +0200, Keith wrote:

...

With the split setup it involves a complete loss of service, basically, as you say, until location update timeout.

Not complete loss of service:

* any phone that contacts the network to get something done will be told to re-attach. So sending things and calling out (MO) will work.

* paging (MT) will not work, so contacting a phone from the core is broken until periodic LU time has passed. But actually only because the VLR thinks the phone is not attached to any cell, and so we don't even try to page it.

An easy way out of this would be to enable a BSS-wide (MSC wide?) paging mechanism, until the first periodic location updating time has passed -- after osmo-msc startup, just page any subscriber everywhere, if we think it is not attached, until the first periodic LU time has passed. That could be enabled by config (not on by compile time default, but on in Rhizomatica's cfg). Upon paging response, the phone would re-attach and things should work again.

That would be considerably less effort than persisting the VLR storage.

Another (less elegant) idea I'm having is that we do have some notion of which IMSIs were attached in the HLR database. They have the VLR number stored to match the MSC's identification. If we could employ some tool to graze the HLR db, and over a period of time invoke pagings for all those subscribers, we could blanket re-attach everyone within a short time. I think I'd make that an external tool using the CTRL interface, with some spread over time to not insanely load the network. The one mild problem with this is, it doesn't look like we notify the HLR on phones detaching. That could be resolved by also storing a timestamp of when a subscriber was last attached in the HLR, which I suggest makes sense anyway.

The first idea, "page on all cells until first periodic LU time has passed", is more elegant because we don't blanket page everyone and don't need external tools nor changes to the HLR.

Keith

16 Oct 16 Oct

2:47 p.m.

On 15/10/18 12:40, Neels Hofmeyr wrote:

...

On Fri, Oct 12, 2018 at 12:10:25PM +0200, Keith wrote:

...
With the split setup it involves a complete loss of service, basically, as you say, until location update timeout.

Not complete loss of service:

any phone that contacts the network to get something done will be told to re-attach. So sending things and calling out (MO) will work.

Ah! Well that's something I did not know, I'm not sure it is what I observed (memory tells me not), but I'll check it all out again.

...

Another (less elegant) idea I'm having is that we do have some notion of which IMSIs were attached in the HLR database. They have the VLR number stored to match the MSC's identification. If we could employ some tool to graze the HLR db, and over a period of time invoke pagings for all those subscribers, we could blanket re-attach everyone within a short time. I think I'd make that an external tool using the CTRL interface, with some spread over time to not insanely load the network. The one mild problem with this is, it doesn't look like we notify the HLR on phones detaching. That could be resolved by also storing a timestamp of when a subscriber was last attached in the HLR, which I suggest makes sense anyway.

I suggest it does too. In fact, I think it is 100% necessary for the distributed GSM idea where we send broadcast probes to various HLRs asking them for the timestamps of the last seen LUR of an apparently attached (t)IMSI.

thanks!

Rafael Diniz

12 Oct 12 Oct

3 p.m.

Hi all,

...

I've done some research on the web at that time (maybe 2 years ago) but unfortunately couldn't find any library/tool/infrastructure for having persistent data in SysV SHM, and also no other FOSS programs that did so. Maybe I didn't look closely enough? To me, it seems like the most obvious solution to persist state across crashes/restarts of C programs on unix-type systems.

We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

May be I'm missing something, but SysV SHM provides system calls you certainly can create a shared memory segment that is persistent. You just create / get the reference to a memory segment with shmget, then having the shmid of the segment, it can be shmat'ed as many times you want, attaching the memory segment to the address space of a process. You can do queries, using the ipc* tools - ipcmk, ipcs, ipcrm - in the shell. I already did this many times to load the state of entire the data segment of a process.

Regards, Rafael Diniz

Harald Welte

3:28 p.m.

Hi Rafael,

On Fri, Oct 12, 2018 at 10:00:14AM -0300, Rafael Diniz wrote:

...

Hi all,

...
I've done some research on the web at that time (maybe 2 years ago) but unfortunately couldn't find any library/tool/infrastructure for having persistent data in SysV SHM, and also no other FOSS programs that did so. Maybe I didn't look closely enough? To me, it seems like the most obvious solution to persist state across crashes/restarts of C programs on unix-type systems.

We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

May be I'm missing something, but SysV SHM provides system calls you certainly can create a shared memory segment that is persistent.

yes, that is what I'm saying and why we have been brainstorming about this approach at all. That's what I've been talking about in the first paragraph you quoted, and what we've been pondering to do.

The second paragraph is about embedded or external databases which we don't want to use, and whihc are not useful within the current osmocom architecture.

Rafael Diniz

13 Oct 13 Oct

7:57 p.m.

Hi Harald,

Thanks for clarifying, and sorry about myself missing some parts of the whole thread.

But if there is nothing yet implemented, I could do a proof of concept. I wrote a system for a company in the past for a DTV ISDB-T encoder which was multi-processes which used shared memory and locks (residing inside the shared segment) for access to it. We can talk in person next week.

Rafael

On 10/12/2018 10:28 AM, Harald Welte wrote:

...

Hi Rafael,

On Fri, Oct 12, 2018 at 10:00:14AM -0300, Rafael Diniz wrote:

...
Hi all,

...
I've done some research on the web at that time (maybe 2 years ago) but unfortunately couldn't find any library/tool/infrastructure for having persistent data in SysV SHM, and also no other FOSS programs that did so. Maybe I didn't look closely enough? To me, it seems like the most obvious solution to persist state across crashes/restarts of C programs on unix-type systems.

We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

May be I'm missing something, but SysV SHM provides system calls you certainly can create a shared memory segment that is persistent.

yes, that is what I'm saying and why we have been brainstorming about this approach at all. That's what I've been talking about in the first paragraph you quoted, and what we've been pondering to do.

The second paragraph is about embedded or external databases which we don't want to use, and whihc are not useful within the current osmocom architecture.

Rafael Diniz

9:11 p.m.

Hi Harald,

Thanks for clarifying, and sorry about myself missing some parts of the whole thread.

Rafael

On 10/12/2018 10:28 AM, Harald Welte wrote:

...

Hi Rafael,

On Fri, Oct 12, 2018 at 10:00:14AM -0300, Rafael Diniz wrote:

...
Hi all,

...
I've done some research on the web at that time (maybe 2 years ago) but unfortunately couldn't find any library/tool/infrastructure for having persistent data in SysV SHM, and also no other FOSS programs that did so. Maybe I didn't look closely enough? To me, it seems like the most obvious solution to persist state across crashes/restarts of C programs on unix-type systems.

We explicitly don't want to use some kind of database system, as the VLR data needs to be accessed all over the code directly/synchronously/non-blockingly. We cannot wait for it to be retrieved from somewhere. That's what is done with HLR data.

May be I'm missing something, but SysV SHM provides system calls you certainly can create a shared memory segment that is persistent.

yes, that is what I'm saying and why we have been brainstorming about this approach at all. That's what I've been talking about in the first paragraph you quoted, and what we've been pondering to do.

The second paragraph is about embedded or external databases which we don't want to use, and whihc are not useful within the current osmocom architecture.

Holger Freyther

7:44 p.m.

...

On 11. Oct 2018, at 20:59, Keith keith@rhizomatica.org wrote:

Hi All.

Hi!

...

This is very brief, but something that's been on my mind to make a ticket about.

I realise maybe it's better to ask first if there is a plan.

no plans but just some other generic approaches...

If we ignore "crashes" and think of managed restarts then one can...

* Have a BTS to connect to one of the BSCs * Have a BSC to connect to one of the MSCs

* When wanting to update to block BSC/MSC from taking new connections * Restart if drained or impatient... ;)

The next iteration comes with separating state from a TCP connection between BSC and MSC.

* Segment id's between two MSCs (upsizing would be difficult but our software should scale to some degree). * Have a level of indirection that routes SCCP connections to A or B * Have a config to drain A, B (or both...)

I think these are easier to implement than fully externalizing the state (or more advanced state transfer topics. I do like how minix3 is updating a server though).

2576

Age (days ago)

2581

Last active (days ago)

openbsc@lists.osmocom.org

15 comments

7 participants

tags (0)

participants (7)

Harald Welte
Holger Freyther
Keith
Michael Andersen
Neels Hofmeyr
Rafael Diniz
Rafael Diniz