RFC: FSM instance IDs / log context / log filters

historical

Hi Neels,

On Thu, Apr 11, 2019 at 03:02:17AM +0200, Neels Hofmeyr wrote:
> Since a long time ago, we have a logging filter mechanism which requires
> setting a global "context pointer" to some object, for every code path.  It is
> not trivial to use and we mostly neglected using and testing it, so it doesn't
> work well in practice.

This is true as the main developers typically test with networks/confiugrations
of rather limited size and hence don't need the feature during their day-by-day
work. However, it should be rather simple to add more log filters very quickly,
once somebody actually wants to improve there.  Also, support differs a lot between
programs.  I think in osmo-gbproxy it's quite "feature complete" as you rarely would
want to filter on anything beyond nsvcs?

> An extreme example is
> 
>   DCC DEBUG rtp_stream(IMSI-901700000014705:MSISDN-1001:TMSI-0x8D91D71B:GERAN-A-1:CM_SERVICE_REQ:trans-8:call-2147483649:RTP_TO_RAN:CI-BBBD5102:remote-port-not-sent:codec-not-sent:local-192-168-2-4-42000:remote-192-168-2-3-23004)[0x612000015e20]{ESTABLISHING}: Committing: Tx MDCX to update the MGW: updating remote-RTP-IP-port codec (rtp_stream.c:330)

;)

see my point below about what the ID is supposed to be.

> - Special characters
> 
> Since an FSM instance can only have a limited set of characters for id, we
> can't add IP addresses as 1.2.3.4, but need to use 1-2-3-4. And so on and so
> on.

yes, this restriction is very ugly, I really don't like it.  And all of that
just because we use the dot as separator in the CTRL interface.

> - ID obfuscation
> 
> If I want to address a specific osmo_fsm_inst on the CTRL interface, I need to
> pass the entire long id. I don't know of anyone using this in practice, but
> abusing the id as logging context makes this really hard.
> I think this is the worst side effect of me adding elaborate logging context to
> FSM instance ids.

Yes, this is a problem. The ID was intended to be *a* unique identifier
for the given resource, like the IMSI *or* the TMSI.  Something like a
"primary lookup key".  Not as a concatenation of all posssible
identifier or context.

The primary use case for the ID was the CTRL interface, not logging.

So I'm all for returning to the original model, irrespective of the
current proposal discussed here.

> == IDEA ==
> 
> Reflecting on this, I formed the idea of moving logging context to a separate
> array of strings for each object. The most common usage would be to tie this
> array of context information to an osmo_fsm_inst.
> 
> I'm not sure how far we could take this before we start producing too much
> load. It would of course be the aim to find a good balance of effort and
> usefulness, or think of nifty simplifications or optimizations.

So why not turn it the other way around?  If I understand the proposal,
you are pre-emptively preparing all kinds of string representations in
your application and providing it to the logging core (whether or not
they will be used).  I would argue that the other way around makes
sense:  Register something like a "context generator function"  which
takes a "raw" data structure like a lchan and which can
interpret/extract information from it.  This way, you *only* generate
the strings on demand which are enabled in the vty config.

So the string formatting/printing would only happen *if* we are sure
that the given object actually is going to be logged given the current
configuration of log levels/targets, and only for those parts of the
object that are requested.

Basically we're talking about an object-oriented approach, where each
object  provides some kind of poor man's serialization (for a few fields
like MSISDN, IMSI, TMSI, ... in our use cases).

This could then be used not only for string based logging but maybe also
for something like a "structured vty dump" of the object state, or even
for something like structured logging (json, extended GSMTAP, ...)

> In gsmtap_log packets, we could add a listing of key-value items for log
> context, hopefully allowing to show / hide each individual context by name in a
> wireshark UI.

I would argue in favor of not including the keys with every log line,
but rather send a periodic "GSMTAP log configuration" package which
contains something like a string-to-identifier mapping which can then be
sent only once every second, or every 10s, rather than include it in
every line. But well, that may be an over-optimization.

> Instead of repeating context like the IMSI in various FSM instance IDs, the
> logging context could be made chainable, i.e. internally, we could reference
> one logging context from another. For example, the rtp_stream FSM instance
> would only add the RTP address to its logging context, and then "upwards"
> reference the call_leg FSM's context.  That would add the callref, and
> "upwards" reference the subscriber's context, which adds IMSI, MSISDN and RAN
> conn info to the rtp_stream logging context.  (We might be able to do this
> implicitly from osmo_fsm_inst parent/child relations that we already have.)

Ok, that's what I think is a completely orthogonal proposal to the
key/value pairs.  This idea might not be as easy to combine with my
proposal of the serialization call-back function above.  But of course if every
FSM class had a serialization function pointer member, then one can also
traverse the chain of the FSM hierarchy and call that function for each
FSM instance in that hierarchy.

> It might even be thinkable to allow traversing all across related context info,
> so that filtering for a given RTP port could cause even high level objects
> (like the vlr_subscr) to search through all childrens' log ctx information to
> see whether anyone defines this RTP-local context, thus show all logging for a
> conn that currently uses this RTP port.

I don't like the idea of generating all that string-printed context a
priori all the time.  It seems like an incredible overhead.  For
filtering based on certain parameters I think we should rather think of
the log filters in existing or some extended form.  The point of those
log filters is that they operate on the native data type which exists
anyway, rather than to operate on stringified versions of everything.

> An easy way out of implementing log filters in osmocom would be to simply not
> have them, just provide well-defined log context in gsmtap_log, in a way that
> wireshark can parse. Users can then use wireshark rules to filter the log.  The
> disadvantage here is that we would have to first compose all logging, instead
> of nicely skipping entire log lines before they are even composed.

I really don't like those kind of ideas, sorry :)

I'm all for discussing possible improvements in whatever way. But I
think they should as far as possible be generic, elegant and efficient, even
if it means having to write some more code.  Hence my suggestion to
* generate related string representations only when actually needed
* think of general "serialization" of our data structures / objects
  that may have use cases in other contexts than logging.

Regards,
	Harald
-- 
- Harald Welte <laforge at gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)