RFC: FSM instance IDs / log context / log filters

historical

This is an idea about logging context, the logging filter and FSM instances.

The idea is to make logging context a first-class citizen of osmo_fsm_inst and
of LOGP(), in order to get cleaner FSM ids, better configurable logging and a
more universal and reliable approach to logging filters.

At events like 35c3 I want to, and hence any production operator wants to have
useful filters for logging. The attempt with this RFC is to get this in an
elegant way that also improves current logging context and saves FSM IDs.

== Status quo ==

- Log filters:

Since a long time ago, we have a logging filter mechanism which requires
setting a global "context pointer" to some object, for every code path.  It is
not trivial to use and we mostly neglected using and testing it, so it doesn't
work well in practice. See enum log_filter_index, enum log_ctx_index,
log_set_context() in libosmocore and the filter_fn() in msc_main.c as an
example.

- Logging context:

For a long time, we added logging context to each and every LOGP() line, like:

  LOGP(DMSC, LOGL_DEBUG, "(%s %s %d %d) Actual message",
       subscr->ctx1, osmo_foo_name(subscr->ctx2), trans->ctx3, another->ctx);

Since the osmo_fsm_inst showed up, I have more and more used the FSM instance's
id as logging context:

  #define LOG_MY_OBJ(my_obj, loglevel, fmt, args...) \
  	LOGPFSML(my_obj->fi, loglevel, fmt, ##args)

  my_obj_update_id(my_obj)
  {
  	osmo_fsm_inst_update_id_f(my_obj->fi,
  		"%s-%s-%d-%d",
		my_obj->ctx1, osmo_foo_name(my_obj->ctx2), my_obj->ctx3, my_obj->ctx);
  }

  stuff() {
	  LOG_MY_OBJ(my_obj, LOGL_DEBUG, "Actual message");
  }

The advantages are obvious:
  - less time spent composing log message context when developing code.
  - context string is composed only once and re-used, less cpu cycles used to compose.
  - it is now made hard to forget adding useful log context.

But this also has disadvantages:

- Long lines

Since I want to be able to trace logging output by various items, my logging
context has often grown to a size that makes the log uncomfortable to read
unless the terminal is really wide.

For example, in osmo-msc, the context now often looks like this:

  DMSC DEBUG msc_a(IMSI-901700000014705:MSISDN-1001:TMSI-0x8D91D71B:GERAN-A-0:LU)[0x61200000b620]{MSC_A_ST_AUTH_CIPH}: Received Event MSC_A_EV_AUTHENTICATED (vlr_lu_fsm.c:741)

It includes the IMSI, MSISDN, TMSI, the RAN conn info and the complete-l3 type.
An extreme example is

  DCC DEBUG rtp_stream(IMSI-901700000014705:MSISDN-1001:TMSI-0x8D91D71B:GERAN-A-1:CM_SERVICE_REQ:trans-8:call-2147483649:RTP_TO_RAN:CI-BBBD5102:remote-port-not-sent:codec-not-sent:local-192-168-2-4-42000:remote-192-168-2-3-23004)[0x612000015e20]{ESTABLISHING}: Committing: Tx MDCX to update the MGW: updating remote-RTP-IP-port codec (rtp_stream.c:330)

WTF. But of course, for an RTP stream, it is interesting to see what subscriber
it is for, but also what callref and CC transaction, also the MGW's endpoint
CI, whether it is to RAN or CN, and which RTP address and ports are involved.

Now, a log line like this is pretty much guaranteed to be truncated in the
display of gsmtap_log in wireshark. Very irritating.

We would have to remove select items of logging context from the FSM instance
ID to strike a good balance of length and usefulness. Each time we tweak it,
all the logging output in test programs changes ... ugh.

- Special characters

Since an FSM instance can only have a limited set of characters for id, we
can't add IP addresses as 1.2.3.4, but need to use 1-2-3-4. And so on and so
on.

- ID obfuscation

If I want to address a specific osmo_fsm_inst on the CTRL interface, I need to
pass the entire long id. I don't know of anyone using this in practice, but
abusing the id as logging context makes this really hard.
I think this is the worst side effect of me adding elaborate logging context to
FSM instance ids.

- log reading

Various lengths of logging context make the log line's actual message show on
various "indentation" levels. Instead I would like to be able to configure the
log context part of a log message to show in the *end* of a log line instead of
at the start, so that the short messages like "New conn" or "Rx Location
Updating Request" show lined up one below each other; or alternatively
introduce a common chunk of indentation spacing to line things up. That's
currently not possible if both the log context and log message are just one
arbitrary string fed to LOGP().

== IDEA ==

Reflecting on this, I formed the idea of moving logging context to a separate
array of strings for each object. The most common usage would be to tie this
array of context information to an osmo_fsm_inst.

I'm not sure how far we could take this before we start producing too much
load. It would of course be the aim to find a good balance of effort and
usefulness, or think of nifty simplifications or optimizations.

It's just a vague idea, but I guess it would look something like this in pseudo
code:

  my_obj_update_id(my_obj)
  {
	set_log_context(my_obj->logctx, "IMSI", my_obj->imsi);
	set_log_context(my_obj->logctx, "MSISDN", my_obj->vlr->msisdn);
	set_log_context(my_obj->logctx, "RTP-local", osmo_sockaddr_str_name(my_obj->rtp_stream->rtp_local));
	...
  }

  set_log_context(ctx, name, val)
  {
  	slot = log_context_find_or_create_slot(ctx, name);
	osmo_talloc_replace_str(ctx, &ctx[slot], val);
	/* or osmo_strlcpy() to a fixed-size char[]? */
  }

  #define LOG_MY_OBJ(my_obj, level, fmt, args...) \
  	LOGP_CTX(DMSC, my_obj->ctx, level, fmt, ##args)

  use() {
  	LOG_MY_OBJ(my_obj, LOGL_DEBUG, "Actual message");
	/* i.e. most current log macros would be invoked in the same way */
  }

And then allow logging configuration like

	% hide all logging context in log output, except MSISDN and RTP-local:
	> logging print context none
	> logging print context MSISDN
	> logging print context RTP-local

	% show only logging lines where this exact IMSI is part of the context
	> logging filter-rule IMSI == 901700000014705

In gsmtap_log packets, we could add a listing of key-value items for log
context, hopefully allowing to show / hide each individual context by name in a
wireshark UI.

Instead of repeating context like the IMSI in various FSM instance IDs, the
logging context could be made chainable, i.e. internally, we could reference
one logging context from another. For example, the rtp_stream FSM instance
would only add the RTP address to its logging context, and then "upwards"
reference the call_leg FSM's context.  That would add the callref, and
"upwards" reference the subscriber's context, which adds IMSI, MSISDN and RAN
conn info to the rtp_stream logging context.  (We might be able to do this
implicitly from osmo_fsm_inst parent/child relations that we already have.)

It might even be thinkable to allow traversing all across related context info,
so that filtering for a given RTP port could cause even high level objects
(like the vlr_subscr) to search through all childrens' log ctx information to
see whether anyone defines this RTP-local context, thus show all logging for a
conn that currently uses this RTP port.

Premature optimization...  To run each log line through a set of filters would
potentially take up a lot of CPU cycles. For that I also have an idea to
resolve the decision to log or not to log only once per target object and cache
that result. For example, when I change the logging filter to pick a specific
MSISDN, an internal token is incremented. If I hit an object's logging, I
decide whether it passes the logging filter, and store the outcome as
true/false with the current filter's token value in the object. Next time I hit
that object and the token is still the same, the filter must also have the same
result, and I can just use the stored true/false. If the user changes the
filter, the token increments and on next logging, each object notices that the
filter needs to be re-run in full. If the object's context info changes, reset
the object's stored token to also rerun the filter on next logging.  This is
quite vague, especially not clear whether each object with logging context
would have to store tokens for each and every log target separately...

Premature optimization: require the log context names to be compile-time string
literals, so that we can just compare name_a == name_b without invoking
strcmp().

A general advantage is that we would always see the relevant context for
logging, and we would immediately notice bugs in the context, thus we would
always ensure reliable logging filters "for free".

An easy way out of implementing log filters in osmocom would be to simply not
have them, just provide well-defined log context in gsmtap_log, in a way that
wireshark can parse. Users can then use wireshark rules to filter the log.  The
disadvantage here is that we would have to first compose all logging, instead
of nicely skipping entire log lines before they are even composed.

So, this is so far just an idea I would like to run by you.
Even if the solutions might still be unworthy, I think it is worthwhile to be
aware of the implications I highlighted above.

~N
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20190411/4be6a3c0/attachment.bin>