Hi Keith,
On Thu, Oct 11, 2018 at 09:59:08PM +0200, Keith wrote:
The issue is about restarting MSC or BSC (or both).
That's something that classic telecom doesn't typically consider very well.
In terms of the functional specs, it is assumed that restarting a network
element, particularly a core network element is a super rare occasion, and
hence nothing to be really considering as a general problem.
I still really haven't looked at the split setup
enough yet,
OsmoNITB shouldn't really be any different. Is it?
Is there a plan to implement some kind of non-volatile
state?
What you're referring to is basically the loss of VLR information. The 3GPP
specs explciitly consider it volatile and non-persistent. Holger and I
were brainstorming about this some time ago, and we came up with the idea
of using a System V shared memory segment for the actual VLR data. SysV
SHM has the nice property that it's regular mapped memory, but it is independent
of processes, i.e. you can restart a process while keeping whatever is in
that shared memory segment.
The devil is of course in the detail:
1) you may not be able to map it to the same address after each restart?
Needs further investigation and might require some pointer fix-up, or
the use of relative/offset addressing rather than absolute pointers.
2) what do you do if you're actually upgrading the software and the VLR
related structures have been modified? There either needs to be an
explicit conversion function, or at least a mechanism to detect this
safely and discard all the data in such situations
3) if the restart is a crash: What if some corrupted VLR data was actually
the cause of the crash? In that case you'd end up in a re-start loop.
You'd have persistent crashes, rather than a one-off crash with recovery.
I've done some research on the web at that time (maybe 2 years ago) but
unfortunately couldn't find any library/tool/infrastructure for having
persistent data in SysV SHM, and also no other FOSS programs that did
so. Maybe I didn't look closely enough? To me, it seems like the most
obvious solution to persist state across crashes/restarts of C programs
on unix-type systems.
We explicitly don't want to use some kind of database system, as the VLR
data needs to be accessed all over the code
directly/synchronously/non-blockingly. We cannot wait for it to be
retrieved from somewhere. That's what is done with HLR data.
In the case of restarting the MSC with two phones
connected, of course
the phones don't notice anything. One now needs to attempt a call setup
at least 3 times, including a call setup attempt from the "callee" phone
before we can even call it.
The alternative is to wait for any location update, either due to the
periodic location update timer expirign, or due to the phones actually
moving geographic location.
I tried restarting both BSC and MSC, in various
orders, and I did not
get a satisfactory result. Of course, restarting the BSC restarts
osmo-bts which causes (some) phones to notice the temporary loss of
BCCH, but of course no LUR or anything as LAC doesn't change, so there's
some state that is not getting set right someplace on restart, as I
said. both phones need to interact before one of them can be called.
A BSC restart will always loose all active connections/channels/calls,
and I think there's nothing wrong with that. Persisting that state
makes little sense, as the phones will all have closed their radio
channels at the time your BSC recovers.
--
- Harald Welte <laforge(a)gnumonks.org>
http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
(ETSI EN 300 175-7 Ch. A6)