Looking at sending GSUP messages between MSCs via an HLR acting as forwarding
agent, I see that the current decision for GSUP message consumption is
suboptimal:
Depending on the message type sent and received, libvlr of osmo-msc forwards
GSUP messages to the MSC code, and there, again, depending on the message type,
specific callbacks get invoked.
See vlr_gsupc_read_cb() and msc_vlr_route_gsup_msg().
In current osmo-msc it might seem to make sense to first resolve the IMSI to a
vlr_subscr in vlr.c. But if osmo-msc acts as a Handover target for an inter-MSC
Handover, it should be able to handle unknown IMSIs. Also, should we ever go
for a separate SMSC process, the VLR as first stage makes no sense. Finding a
vlr_subscr is a one-liner with vlr_subscr_find_by_imsi().
I would much rather have an explicit destination entity advertised in the GSUP
messages, and an explicit common GSUP MUX stage. In other words, the VLR of
osmo-msc shouldn't act as a GSUP forwarder, it should merely be one of the GSUP
consumers, and shouldn't even be involved when the messages are intended for
inter-MSC, for USSD or for SMS use.
And finally, for GSUP error responses, for example a report that a specific
target could not be reached, it may not be possible to trivially derive the
right GSUP message consumer from the GSUP message (like "Routing Error").
Going towards that idea, I have put in place the following in my temporary dev
source tree:
enum osmo_gsup_entity {
OSMO_GSUP_ENTITY_NONE = 0,
OSMO_GSUP_ENTITY_HLR,
OSMO_GSUP_ENTITY_VLR,
OSMO_GSUP_ENTITY_ESME,
OSMO_GSUP_ENTITY_SMSC,
OSMO_GSUP_ENTITY_USSD, // FIXME: what's an "ESME"/"SMSC" for USSD?
OSMO_GSUP_ENTITY_MSC_A,
OSMO_GSUP_ENTITY_MSC_B,
OSMO_GSUP_ENTITY_COUNT,
};
struct osmo_gsup_message {
[...]
enum osmo_gsup_entity source_entity;
enum osmo_gsup_entity destination_entity;
[...]
};
For calling the right rx_cb, we would need only an explicit target kind, but
for returning errors it is better to also include the source entity kind
explicitly.
A gsup_client_mux API:
struct gsup_client_mux_rx_cb {
int (* func )(struct gsup_client_mux *gcm, void *data, const struct osmo_gsup_message *msg);
void *data;
};
struct gsup_client_mux {
struct osmo_gsup_client *gsup_client;
/* Target clients by enum osmo_gsup_entity */
struct gsup_client_mux_rx_cb rx_cb[OSMO_GSUP_ENTITY_COUNT];
};
int gsup_client_mux_init(struct gsup_client_mux *gcm, struct osmo_gsup_client *gsup_client);
int gsup_client_mux_tx(struct gsup_client_mux *gcm, const struct osmo_gsup_message *gsup_msg);
void gsup_client_mux_tx_error_reply(struct gsup_client_mux *gcm, const struct osmo_gsup_message *gsup_orig,
enum gsm48_gmm_cause cause);
For backwards compat, we would still need to do target classification by
message type, but only if no explicit destination_entity is set:
static enum osmo_gsup_entity gsup_client_mux_classify(struct gsup_client_mux *gcm,
const struct osmo_gsup_message *gsup)
{
if (gsup->destination_entity)
return gsup->destination_entity;
/* Legacy message that lacks an explicit target entity. Guess by message type for backwards compat: */
switch (gsup_msg->message_type) {
case OSMO_GSUP_MSGT_PROC_SS_REQUEST:
case OSMO_GSUP_MSGT_PROC_SS_RESULT:
case OSMO_GSUP_MSGT_PROC_SS_ERROR:
return OSMO_GSUP_ENTITY_USSD;
case OSMO_GSUP_MSGT_MO_FORWARD_SM_ERROR:
case OSMO_GSUP_MSGT_MO_FORWARD_SM_RESULT:
case OSMO_GSUP_MSGT_READY_FOR_SM_ERROR:
case OSMO_GSUP_MSGT_READY_FOR_SM_RESULT:
case OSMO_GSUP_MSGT_MT_FORWARD_SM_REQUEST:
return OSMO_GSUP_ENTITY_SMSC;
default:
/* osmo-hlr capable of forwarding inter-MSC messages always includes the target entity, so any
* other legacy message is for the VLR. */
return OSMO_GSUP_ENTITY_VLR;
}
}
We'd have:
HLR <-> VLR
ESME <-> SMSC
USSD <-> USSD (names??)
MSC_A <-> MSC_B
Thanks for your thoughts.
~N
While working on the talloc context patches, I was wondering if we should
spend a bit of time to further improve libosmocore and collect something
like a wishlist.
I would currently identify the following areas:
1) initialization of the various sub-systems is too complex, there are too
many functions an application has to call. I would like to move more
to a global "application initialization", where an application registers
some large struct [of structs, ...] at start-up and tells the library
the log configuration, the copyright statement, the VTY IP/port, the config
file name, ... (some of those can of course be NULL and hence not used)
2) have some kind of extensible command line options/arguments parser
It would be useful to have common/library parts register some common
command line arguments (like config file, logging, daemonization, ..)
while the actual appliacation extending that with only its application-specific
options. I don't think this is possible with how getopt() works, so
it would require some new/different infrastructure how applications would
register their arguments
3) move global select() state into some kind of structure. This would mean
that there could be multiple lists of file descriptors rather than the
one implicit global one. Alternatively, turn the state into thread-local
storage, so each thread would have its own set of registered file descriptors,
which probably makes most sense. Not sure if one would have diffeent 'sets'
of registered file descriptors in a single thread. The same would apply
for timers: Have a list of timers for each thread; timeouts would then
also always execute on the same thread. This would put talloc context, select
and timers all in the same concept: Have one set of each on each thread,
used automatically.
Any other wishlist items?
--
- Harald Welte <laforge(a)gnumonks.org> http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
(ETSI EN 300 175-7 Ch. A6)
The templates for GSUP in GSUP_Types.ttcn seem to expect the GSUP IEs in a
specific order, which shouldn't be required.
Also I cannot easily define certain GSUP IEs as not mattering.
Particularly, I added Source Entity and Destination Entity IEs, and now I would
have liked to just add some source_entity := *, destination_entity := *, to
trivially make all tr_GSUP() pass, whether these IEs are present or not.
Instead I have to now add them to a listing of IEs everywhere.
That means the test suite will only work with the new osmo-msc and even if
"nightly" works out, we will see "latest" failing, etc.
I'd like to use the semantics I am used to in e.g. BSSMAP messages:
- order doesn't matter
- easy ':= *' for items in the root tr_GSUP() template
What would it take to make GSUP messages match in this way?
~N
Hi all,
this idea has been floating around for quite some time, and I finally took
some time for a proposed implementation: The introduction of a "volatile"
talloc context which one can use to allocate whatever temporary things from,
and which would automatically be free'd once we leave the select dispatch.
The current proposal is in https://gerrit.osmocom.org/#/c/libosmocore/+/13312
and I'm happy to receive any review.
How this works:
* within a new osmo_select_main_ctx() function, we create a temporary talloc
context before calling the filedescriptor call-back functions, and we
free that after the call-backs have returned
* any of the code running from within the select loop dispatch (which for
"normal" osmocom CNI projects is virtually everything) can use that
temporary context for allocations. There's a OTC_SELECT #define for
convenience. So you could do something like talloc_zero(OTC_SELECT, ...)
which would be automatically free'd after the current select loop
iteration.
Where is this useful? There's at least two common use cases:
* allocation of message buffers without having to worry about msgb ownership
* various temporary buffers e.g. for something-to-string conversions where
currently we use library-internal static buffers with all their known problems
(not being able to use them twice within a single printf() statement, not
being thread-safe, ...)
To Neels' disappointment, this is not all automatic. You
a) have to call the _c suffix of the respective function, e.g. osmo_hexdump_c()
instead of osmo_hexdump() with one extra initial argument:
OTC_SELECT. There's also msgb_alloc_c(), msgb_copy_c() and the like,
allowing msgb allocation from a context of your choice, as opposed
to the library-internal msgb_tall_ctx that we had so far.
b) have to use osmo_select_main_ctx() instead of osmo_select_main(). This is
an arbitrary limitation for optimization that Pau requested, to make sure
applications that don't want this can avoid all the extra talloc+free at
every select iteration. This is debatable, and we can make it automatic
in osmo_select_main(). It's probably worth a benchmark how expensive
that 'empty' allocation + free is.
However, I think that's a rather "OK" price to pay. Converting the existing
callers can be more or less done with "sed" (yes, spatch is of course better).
While introducing this feature, I also tried to address two other topics, which
are not strictly related. However, ignoring those two now would mean
we'd have API/ABI breaks if we'd care about them in the future. Hence I
think we should resolve them right away:
1) introduce another context: OTC_GLOBAL. This is basically a library-internal
replacement for the "g_tall_ctx" that we typically generate as one of the
first things in every application. You can use it anywhere where you'd want
to allocate something form the global talloc context
2) make those contexts thread-local. As you may know, talloc is not thread
safe. So you cannot allocate/free from a single context on multiple
threds. However, it is safe to have separate talloc contexts on each
thread. Making the new OTC_* contexts per-thread, we are enabling the
[limited] use of this functionality in multi-threaded programs. This
is of course very far from making libosmocore thread-safe. There are
many library-internal data structures like timers, file descriptors,
the VTY, ... which are absolutely not thread-safe. However, I think
it's a step in the right direction and I don't expect any performance
impact for single-threaded programs by marking the contexts as
__thread.
--
- Harald Welte <laforge(a)gnumonks.org> http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
(ETSI EN 300 175-7 Ch. A6)
Hi,
We had a question with regards to the OSMO MSc - as per the documentation , it does support a scaled down version of SMSC. Can we bring up just the SMSc service so that it can talk to another vendor's MSC/HLR .
We are trying to perform CSFB SMS tests with an external MSC / OSMO SMSC and Amarisoft EPC Core . So wanted to check if the OSMO SMSC could be used for this test.
Thanks
Alex
Hello,
openbsc.git doesn't build anymore against libosmocore master:
> ../../src/libcommon/libcommon.a(talloc_ctx.o): In function `talloc_ctx_init':
> /build/openbsc/src/libcommon/talloc_ctx.c:50: undefined reference to `tall_sigh_ctx'
Full build log:
https://jenkins.osmocom.org/jenkins/job/master-openbsc/IU=--disable-iu,MGCP…
Looks like this patch broke it:
https://gerrit.osmocom.org/#/c/libosmocore/+/13337/
Regards,
Oliver
--
- Oliver Smith <osmith(a)sysmocom.de> https://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte
Hi.
While experimenting with BTS TTCN-3 tests I've got some of the tests
misteriously failing with:
FBSB Failed with non-zero return code 255
Anyone hit this before? Any ideas what could be causing this?
--
- Max Suraev <msuraev(a)sysmocom.de> http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Directors: Harald Welte
Hi Neels,
> 21:14 < neeels> LaF0rge, I need a solution for picking a conn_id as an SCCP SAP user. The way osmo-bsc
> does it isn't much good either. There has to be some way to obtain an unused local
> reference without running into potential collisions
> 21:16 < neeels> (I submitted the patch expecting your -1, but just to make sure that I understood
> correctly I wanted to submit as code that has no room for misunderstandings)
> 21:18 < neeels> if I can't use the internal "next_id" thing, then what I would have to do is mimick the
> exact same in osmo-msc by looking up what conn_ids are already in use and pick an unused
> one. That's quite brain damaged IMHO, it's the same thing as that layer violation patch,
> only with much more effort and more chances for id collisions.
Nobody is suggesting that.
> 21:21 < neeels> ideas: add some primitive to hand out a local reference -- but that would be non-standard
> IIUC
I had mentioned and discarded that option in Message-ID: <20190311204817.GA725@nataraja>
here on this list. The reason is not that it's non-standard, but that it again wouldn't
work over any type of asynchronous / message based SAP [and that's how a SAP is conceptualized].
> 21:21 < neeels> give each application some unique number space for local references, like the highest
> byte is configured in .cfg file, and then each SCCP user cycles through its own number
> space of local references
> 21:23 < neeels> ...and libosmo-sccp would have to be constrain-able to its own number space for incoming
> connections
I was quite sure we've had that discussion before. Unfortuantely it doesn't
seem to be on this mailing list or in gerrit. It may have been on IRC or I am
starting to be delusional :/
The correct solution from the spec / SCCP architecutre point of view is rather clear, I think:
* don't recycle the SCCP local reference (protocol) from the SCCP connection ID (SAP),
but use distinct number spaces for that. It was an oversimplification to do that
in the original implementation. I simply missed the different scoping of those
two identifiers. The scoping is the root of our problems here.
* only once we have distinct identifiers, we can fix the scope: One of
the two identifiers has a per-SCCP-user scope, while the other has a
per-SCCP-instance scope
* this way, every SCCP user knows exactly which SCCP connection identifiers are
currently in use between this specific user and the SCCP provider, and it hence
can choose any unused ID for a new connection
* right now, the primitive exchange across the SCCP user SAP is synchronous,
so there can be no races where the SCCP provider and the SCCP user could
chose the same connection ID at the same time. If you want to prepare for
some kind of possible future asynchronous, message-queue based SAP, then you
could use the highest bit as "allocation flag", e.g. highest bit = 0, allocated
by the SCCP provider and highest bit = 1, allocated by SCCP user.
Let me know if you see any problem with above proposal.
Regards,
Harald
--
- Harald Welte <laforge(a)gnumonks.org> http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
(ETSI EN 300 175-7 Ch. A6)
Hi all,
master-libosmocore was running for 20 hours on jenkins, hanging here:
> make[7]: Entering directory '/home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/builddir/tests'
> osmo_verify_transcript_vty.py -v \
> -p 42042 \
> -r "../tests/tdef/tdef_vty_test_config_root" \
> /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/tests/tdef/tdef_vty_test_config_root.vty
> <0000> /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/src/socket.c:367 unable to bind socket:127.0.0.1:42042: Address already in use
> [0;m<0000> /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/src/socket.c:378 no suitable addr found for: 127.0.0.1:42042
> [0;m<0000> /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/src/vty/telnet_interface.c:100 Cannot bind telnet at 127.0.0.1 42042
https://jenkins.osmocom.org/jenkins/job/master-libosmocore/a2=default,a3=de…
I've stopped the job. Right after that, a new job spawned, and it also
failed to bind telnet at 42042. It did not hang this time, but stopped
there instead:
https://jenkins.osmocom.org/jenkins/job/master-libosmocore/a2=default,a3=de…
After triggering the job once more manually, it went through.
>From a quick analysis, I can not see why this has happened in the first
place. The master-libosmocore job is set to non-concurrent:
https://git.osmocom.org/osmo-ci/tree/jobs/master-builds.yml
And the gerrit verification jobs are running on another machine. Other
than that, none but libosamocore.git of the (almost all) Osmocom
repositories that I have checked out mention port 42042, so nothing else
should bind that in theory.
Regards,
Oliver
--
- Oliver Smith <osmith(a)sysmocom.de> https://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschaeftsfuehrer / Managing Director: Harald Welte
Avoiding msgb leaks is easiest if the caller retains ownership of the msgb.
Take this hypothetical chain where leaks are obviously avoided:
void send()
{
msg = msgb_alloc();
dispatch(msg);
msgb_free(msg);
}
void dispatch(msg)
{
osmo_fsm_inst_dispatch(fi, msg);
}
void fi_on_event(fi, data)
{
if (socket_is_ok)
socket_write((struct msgb*)data);
}
void socket_write(msgb)
{
if (!ok1)
return;
if (ok2) {
if (!ok3)
return;
write(sock, msg->data);
}
}
However, if the caller passes ownership down to the msgb consumer, things
become nightmarishly complex:
void send()
{
msg = msgb_alloc();
rc = dispatch(msg);
/* dispatching event failed? */
if (rc)
msgb_free(msg);
}
int dispatch(msg)
{
if (osmo_fsm_inst_dispatch(fi, msg))
return -1;
if (something_else())
return -1; // <-- double free!
}
void fi_on_event(fi, data)
{
if (socket_is_ok) {
socket_write((struct msgb*)data);
else
/* socket didn't consume? */
msgb_free(data);
}
int socket_write(msgb)
{
if (!ok1)
return -1; // <-- leak!
if (ok2) {
if (!ok3)
goto out;
write(sock, msg->data);
}
out:
msgb_free(msg);
return -2;
}
If any link in this call chain fails to be aware of the importance to return a
failed RC or to free a msgb if the chain is broken, we have a hidden msgb leak.
This is the case with osmo_sccp_user_sap_down(). In new osmo-msc, passing data
through various FSM instances, there is high potential for leak/double-free
bugs. A very large brain is required to track down every msgb path.
Isn't it possible to provide osmo_sccp_user_sap_down() in the caller-owns
paradigm? Thinking about an osmo_sccp_user_sap_down2() that simply doesn't
msgb_free().
Passing ownership to the consumer is imperative if a msg queue is involved that
might send out asynchronously. (A workaround could be to copy the message
before passing into the wqueue.) However, looks to me like no osmo_wqueue is
involved in osmo_sccp_user_sap_down()? It already frees the msgb right upon
returning, so this should be perfectly fine -- right?
I think I'll just try it, but if anyone knows a definite reason why this will
not work, please let me know.
(Remotely related, I also still have this potential xua msg leak fix lying
around, never got around to verify it:
https://gerrit.osmocom.org/#/c/libosmo-sccp/+/9957/ )
~N
--
- Neels Hofmeyr <nhofmeyr(a)sysmocom.de> http://www.sysmocom.de/
=======================================================================
* sysmocom - systems for mobile communications GmbH
* Alt-Moabit 93
* 10559 Berlin, Germany
* Sitz / Registered office: Berlin, HRB 134158 B
* Geschäftsführer / Managing Directors: Harald Welte