On Tue, Dec 03, 2019 at 05:20:33PM +0100, Harald Welte wrote:
== DNS zone / .msisdn suffix ===
.{msisdn,imsi}.dgsm.osmocom.org ? Sure, the packets will get larger by
sure, we can do that. An idea is to do that merely on the DNS encoding, and
strip it off to have only *.imsi in the mslookup client? (If we add other
methods, we might not use the domain kind of representation at all there)
== MSISDN format ==
Another thought is whether or not there are any concerns regarding the
MSISDN format. Historically, this is one of the weaknesses of OsmoMSC,
inherited from the OpenBSC days where we just thought in terms of PBX
extension numbers. In reality, a MSISDN consist of a TON (type of
number), NPI (numbering plan indicator) and the related digits. IIRC,
in the TON one can also specify if it's supposed to be national or
international, i.e. if it's prefixed with the country code or not.
It would be great to make sure that the format used in the mDNS queries
is somewhat standardized, if not at least only by the documentation
requiring that all queries should be done in fully qualfiied form with
country code present. NPI is sort-of bogus as IMOH E.164 is the only
one applicable for MSISDN.
Any thoughts?
So far it just works (TM) ... we reflect the MSISDNs saved in the HLR DB 1:1,
not sure what a TON representation might change about that.
It always passes through the MSC and SIP first.
If it's any consolation, the PBX gets the TON in the SIP INVITE (if it does),
and it could choose to treat numbers in any fashion. For mslookup to work
though, the MSISDN must reflect whatever we find in the HLR database. If that
is unable to reflect a TON (like it currently is unable to) then handling
non-naive MSISDNs would have to happen in the SIP dialplan anyway.
As soon as a given string is becomes parseable by an mslookup server, i.e. say
we implement some +123 notation in osmo-hlr, then it would suddenly start to
work out. mslookup itself doesn't care much about the msisdn, but I think we do
call osmo_msisdn_str_valid() on it. That could change easily, point being it
really is an arbitrary string (without dots) that gets sent as MSISDN.
== The use of 'age' vs. absolute timestamp ==
Given that the delays we're talking about are probably all sub-second or
maybe possible about 1s, it's probably not a problem.
I went through the same thoughts. When I do a first attach to a site, I find it
expected that a caller might not reach me for five more seconds. If it is even
that much, ever.
So I favored the elegance of 'age' vs absolute timestamp, because an entire
timezone/clockdrift/faulty GPS receiver family of problems simply vanishes
completely.
== GSUP keepalives / connection loss detection ==
In the presence of unreliable back-haul mesh between villages, the GSUP
connection can also not be seen as reliable. We would expect to see TCP
stalls due to packet loss, etc.
Have you considered this in your implementation and/or done any testing
based on simulated lossy networks to ensure we properly use either TCP
keepalives or IPA application-level PING/PONG to detect lost connections
and recover from such situations (by closing the old and
re-establishing)?
Unreliable networks can be easily simulated by Linux built-in 'tc netem'
for providing configurable packet loss / latency / jitter.
I also saw some comments / code related to "if a second connection using
the same IPA ID arrives, we're screwed" (paraphrasing here). I would
expect this not to be uncommon even if every MSC/HLR out there is
configred correctly exactly because e.g .the remote MSC/HLR has already
decided that the TCP/GSUP is dead and starts to reconnect by performing
a local-end release, while the "local" MSC/HLR still thinks the old
connection is alive. If the old connection "wins" (i.e. is preferred)
I see potential trouble here.
Situations like that probably warrant some carefully designed tests to
create exactly those situations.
We haven't tested this at all.
Should become an issue on redmine.
~N