RFC: FSM instance IDs / log context / log filters

historical

On Thu, Apr 11, 2019 at 12:06:03PM +0200, Harald Welte wrote:
> Hi Neels,
> 
> On Thu, Apr 11, 2019 at 03:02:17AM +0200, Neels Hofmeyr wrote:
> > Since a long time ago, we have a logging filter mechanism which requires
> > setting a global "context pointer" to some object, for every code path.  It is
> > not trivial to use and we mostly neglected using and testing it, so it doesn't
> > work well in practice.
> 
> This is true as the main developers typically test with networks/confiugrations
> of rather limited size and hence don't need the feature during their day-by-day
> work. However, it should be rather simple to add more log filters very quickly,
> once somebody actually wants to improve there.  Also, support differs a lot between

The problems I had when developing filters for this mechanism: it stores a
reference to a pointer. To make sure the filter doesn't cause use-after-free,
every talloc_free() of an object eligible for a filter needs to take care to
remove a filter. Furthermore, that makes it impossible to filter for a
subscriber that is not yet attached, which is a huge drawback.  If I want to
filter for an IMSI, I would typically write a filter that references a VLR
subscriber struct. That means the filter gets invalidated as soon as a VLR
subscriber is freed, or can't be put in place arbitrarily. The filter could
create a VLR subscriber if missing and keep a use count on it -- but a) that
affects the VLR state which we should rather not do, and b) isn't a generally
valid solution for everything else besides VLR subscribers in osmo-msc. In
summary, I find this approach impractical.

The second problem I had with this is that you need to carefully examine each
code path and set the log context to the correct object. Forget that or do it
too late in the code path, and the filter discards interesting logging. Also
take care to remove from the global context pointer if the object goes away.
This touches each and every code path and is error prone.

Finally, you don't notice bugs until you actually try to filter.

That's why my aim is rather towards a purely string based filtering, where you
can filter for arbitrary strings regardless of objects existing or not, and
where you don't need to take care to clean up context references.

> > - Special characters
> > 
> > Since an FSM instance can only have a limited set of characters for id, we
> > can't add IP addresses as 1.2.3.4, but need to use 1-2-3-4. And so on and so
> > on.
> 
> yes, this restriction is very ugly, I really don't like it.  And all of that
> just because we use the dot as separator in the CTRL interface.

> The primary use case for the ID was the CTRL interface, not logging.
> 
> So I'm all for returning to the original model, irrespective of the
> current proposal discussed here.

The most trivial solution is to add one more string to the osmo_fsm_inst: a
"log_context" string that is separate from the id but also gets printed on each
logging line. That's if we don't add more elaborate context features.

The osmo_fsm_inst->id could also become a numerical id that is implicity
increased for each new FSM instance per FSM kind. That would also implicitly
avoid having two FSM instances with the same id. I'm suggesting this because
so far fsm.c seems to not enforce a unique FSM instance ID at all, meaning that
if we poorly choose IDs then the objects will not be accessible from CTRL. We
could techincally search the context string to select an object e.g. by IMSI
instead of an arbitrary id... but well, probably a meaningful "primary key" as
id is more useful, even if not as safe as an implicit increasing numeric id.

> you are pre-emptively preparing all kinds of string representations in
> your application
> I don't like the idea of generating all that string-printed context a
> priori all the time.  It seems like an incredible overhead.

The overhead is pretty much similar to what I am doing now: composing the long
FSM instance IDs, typically only once per created object, and maybe once more
after it is connected up.

That is minuscule compared to msgb composition.
That is also minuscule compared to composing string formats for each log line.

> [key-value callback functions]

Providing key-value callback functions would mean that we need to fetch fresh
values for each and every logging line, because a value may have changed since
the last logging line. I'd like to avoid aggravating a situation where almost
everything is set to debug logging: completely re-composing the logging context
string for each log line might cause considerable load. I think this is more
important than saving one context string allocation per object.

If we have key-value callbacks, an object should have some "version token"
indicating whether any values changed since last queried, and a cache of last
composed logging context line, for rapid re-use of the same context.

> This could then be used not only for string based logging but maybe also
> for something like a "structured vty dump" of the object state, or even
> for something like structured logging (json, extended GSMTAP, ...)

Logging context that preserves the type for each item...

I also considered that, but I think that mostly complicates the API for little
benefit to logging. Ultimately all logging is string based, so I'd rather go
for strings.

Providing typed information actually starts to touch on things the CTRL
interface should provide, the whole CTRL/VTY refactoring idea.

If we play this nicely there could be a grand unified theory of log context,
log filtering *and* object property querying *and* object configuration by
CTRL and VTY...  like a grand unified String Theory ;)

Well, I doubt it. That topic will probably remain Schroedinger for a while.

> > In gsmtap_log packets, we could add a listing of key-value items for log
> 
> I would argue in favor of not including the keys with every log line,
> but rather send a periodic "GSMTAP log configuration" package which

The difficulty is that to show useful output, the dissector (and any custom
gsmtap_log client programs) have to wait for a "key frame" and cache
information before they can produce useful output... in unordered UDP, too.

> * generate related string representations only when actually needed
> * think of general "serialization" of our data structures / objects
>   that may have use cases in other contexts than logging.

I see this spectrum of proposals:

                        string based
                        ^ Y
                1       |                 < - - just one string
                        |
                4  3    |  2              < - - key-stringvalue pairs
                        |
     cache ctx ---------+--------> compose for each log line
                        |        X
                        |
                        |
                6       |       5         < - - key-type-value tuples
                        typed

                        (say, each axis ranging from -10 to 10)

1) Have one fixed string besides the FSM instance id, called "log_context",
which gets printed for each LOGPFSM() but doesn't affect the FSM instance ID.

2) Have purely string based key-stringvalue pairs that an object (an FSM instance)
stores as it gets allocated. A log line concatenates those readily composed
strings.

3) key-stringvalue pairs, but also cache the concatenated string. The object
re-creates the concatenated string from individual context strings when it
changed something.
Individual context items may be disabled by user cfg (per log target?).

4) key-stringvalue pairs with a cached complete log context string, but the
logging string is not created until the first logging for a given object
occurs. The string is returned by a callback function. A "version token"
indicates whether it should be updated on next logging.
Individual context items may be disabled by user cfg (per log target?).

5) Per object, provide a set of serialization callback functions that return
typed results (string or int or...). Each log line walks all/some of those
context functions and composes a log context string, every time.

6) Per-object typed callbacks, but cache the resulting logging context string
as soon as the first logging happens on an object. An object has a "version
token" that indicates the logging context should be refreshed.

I favor the top left.
(1) would be quite trivial to achieve "right now".
(3) or (4) would be the most trivial key-value variants.

As soon as we go to (6), we're opening up the need to keep separate API for
each type of context info, type enums, validity checking of user input...

On the right-hand side, like (5), composing log output would become a lot more
expensive than it is with the current FSM IDs, but of course would be optimal
for logging being switched off.

There is another "axis" to proposals:
Treat context separately in the logging API, or just insert in the log line.

            #define LOGP_CTX([...], ctx, fmt, args...)  LOGP([...], "<%s> " fmt, ctx, ##args)
            ^ Z
           /
          /
         +
        /
       /
      logp3([...], log_ctx, format, args)

Adding the context as argument to a logp3() function allows disabling /
enabling or moving-left-right of the context info by VTY config.

Considering the time/effort vs benefit tradeoff, if I were alone I would go for
(1) with logp3(), in coordinates:
        X=-10  Y=10        Z=-10
        cache, one string, logp3() with separate context arg
and implement log filtering as a strstr() on the single-string log_context:

- Quickly get back a useful FSM instance ID.
- Don't bother with key-value nor types. Cache a single string per fi.
- Get filtering trivially available for all cached context strings, in
  pseudocode:
          bool log_check_level()
          {
                  ...
                  if (!strstr("IMSI-123321:", log_context))
                        return false;
          }

This would be a fairly small patch, trivially backwards compatible, with
minimal impact on current osmo_fsm_inst allocation to make use of the new
features. I hope gsmtap_log is trivially extendible to add a ctx string.

~N
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20190411/75d74340/attachment.bin>