RFC: talloc contexts / automatic free at select loop

historical

On Wed, Mar 20, 2019 at 10:52:46AM +0100, Harald Welte wrote:
> To Neels' disappointment, this is not all automatic.

Flabbergasted is the word. I thought things would get easier instead of more
convoluted and bloated.

To me clearly the drawbacks of making it explicit are humungous and cumbersome
while avoiding fixing hidden bugs plus keeping all static buffers around.
Making it implicit has only positive effects AFAICT.

My bottom line is that I absolutely full on don't understand why we would
volunarily burden ourselves with this huge patch writing overhead and code
bloat, now and into the future. We could so trivially collapse all of it.

My single question is, why. What is the reason for accepting this huge tail?
(ง •̀_•́)ง -- why?

If you're interested, here are my opinions in detail...

> a) have to call the _c suffix of the respective function, e.g. osmo_hexdump_c()
>    instead of osmo_hexdump() with one extra initial argument:
>    OTC_SELECT. There's also msgb_alloc_c(), msgb_copy_c() and the like,
>    allowing msgb allocation from a context of your choice, as opposed
>    to the library-internal msgb_tall_ctx that we had so far.

These are the problems I have with this:

- complexity: constantly have to take care to choose a talloc context, mostly
  burdens writing log statements:

  - single-threaded program: Is there a global pointer to a main()
    volatile talloc context we can always use? Ok, OTC_SELECT is passed
    everywhere.

  - multi-threaded program: Is OTC_SELECT implicitly distinct per thread? Then
    multi-threaded programs pass OTC_SELECT everywhere.

  - So everyone always passes OTC_SELECT everywhere, then what's the use of the
    ctx argument.
    In the super rare cases where something like osmo_plmn_name() wants to
    allocate from a different context, I would just do a talloc_strcpy() to the
    other context.

- coding style:

  - bloat: we need to pass the same ctx argument to all those name() functions.
    That makes it longer, harder to read, more to type.

    With the implicit solution:

       LOGP(DMSC, LOGL_ERROR, "Peer's PLMN has changed: was %s, now is %s. MSC's PLMN is %s\n",
    	osmo_plmn_name(&parsed_plmn), osmo_plmn_name(&vsub->cgi.lai.plmn),
    	osmo_plmn_name(&net->plmn));

    With the explicit solution:

       LOGP(DMSC, LOGL_ERROR, "Peer's PLMN has changed: was %s, now is %s. MSC's PLMN is %s\n",
    	osmo_plmn_name_c(OTC_SELECT, &parsed_plmn), osmo_plmn_name_c(OTC_SELECT, &vsub->cgi.lai.plmn),
    	osmo_plmn_name_c(OTC_SELECT, &net->plmn));

    Would irritate me a lot. So I would usually just use the old ones instead.

  - complexity: then we are again in the situation where, when writing new code,
    I have to consciously decide where to allocate the strings from. That's
    something I wanted to get rid of. It would be so relieving to just know:
    "osmo_plmn_name() is safe" and just spam on using it without extra args,
    without the first call being osmo_plmn_name() and the second call
    osmo_plmn_name_c(OTC_SELECT, plmn), and then oh damn, I forgot that the
    third one also has to be _c, or some indirectly called function also fails
    to use the _c variant... If there is just one osmo_plmn_name() I cannot
    possibly do anything wrong, no hidden bugs anywhere, by design.

  - less arguments, less signatures: The reason why we have value_string[]
    shortcuts like osmo_gsup_message_type_name() is so that we can save an
    extra argument (the names[] array) when writing log statements. We didn't
    want to add so many bin symbols to the API just for that, so we agreed on
    using static inline functions in the .h files. Fair enough. So each
    value_string[] definition currently has a tail of such a static inline
    function attached. With this proposal we're re-adding a second static
    inline _c variant to each and every value_string[] tail, with another
    function arg again added to the _c function signature.

    We will also need a get_value_string_c() function because for unknown
    values it uses a static buffer to return a number string. We can so
    trivially avoid all of this mess.

  - less bugs: get_value_string() is a prime example for wanting to fix things
    implicitly. If a log statement uses two get_value_string() based
    implementations, even for completely different names, then it would print
    nonsense if both hit an undefined value.
    Pseudo code example:
       LOGP(DRSL, LOGL_ERROR, "Unknown message: proto %s message %s\n",
            proto_name(h->proto), msg_type_name(h->msg_type));
    If there is both an unkown proto 42 and msg_type 23, then only one of the
    "unknown 42" result strings from get_value_string() survives. It would say
    "Unknown message: proto unknown (42) message unknown (42)"
    These are hidden bugs which we would completely eradicate by making
    distinct buffer use implicit. It is not always obvious which functions use
    get_value_string() in their implementation. So we should use _c functions
    always. So then why even have them as distinct signatures.

- static buffers: We have quite a number of static char[] around the code base,
  my idea was that we could get rid of those. With this proposal we have to
  keep them.

- API bloat:

  - This adds a lot of new API signatures to the existing ones. Both static
    inline for value_string[], and bin symbols in the library files for things
    like osmo_plmn_name_c().

  - Each future function needs a _c variant added to it. We may forget them,
    then more discussions on gerrit. More patches to fix that. I would rather
    avoid that by design.

I don't understand why you want to do it this way. What is the justification
for all this bloat, required patches now and future complexity in writing code?

> b) have to use osmo_select_main_ctx() instead of osmo_select_main().  This is
>    an arbitrary limitation for optimization that Pau requested, to make sure
>    applications that don't want this can avoid all the extra talloc+free at
>    every select iteration.

IMO that is the wrong place to hinge the argument.

- If nothing used the volatile context, nothing needs to be freed.
- Applications wanting to avoid dynamic allocations can avoid calling functions
  that allocate dynamically. Admitted, with my under-the-hood proposal even
  functions like osmo_plmn_name() would do dynamic allocations, but:
- In the real world, I don't know why you would want to avoid dynamic
  allocations. AFAICT the performance impact is basically non-existent. How
  else do we get away with dynamically allocating *all* of our msgb and even
  copying them around quite often? We also dynamically allocate user contexts,
  transaction structs, basically all llist entries, strings that objects
  reference,...
- And finally, if dynamic allocations are identified as a problem, then there
  are ways to avoid them: keep N "volatile" buffers around, re-use them across
  osmo_select() iterations without freeing in-between. A dedicated function to
  return volatile buffers could take care of that.
  I mean instead of passing a specific ctx, rather call a separate function
  osmo_volatile_buffer(size), so we can slip in an optimisation like that at
  any point in the future.
- Finally finally, optimisations can be done by talloc, the kernel and whatnot,
  and I'm fairly sure there is a lot of thought going into memory management
  and optimising dynamic allocations so that applications can completely skip
  this argument, which I propose we do.
- And triple finally, switch off debug logging to completely cut out almost all
  allocations for logging. If the performance argument holds, the improvement
  should be dramatic. So far a dramatic improvement is seen when switching off
  debug logging for low-level logging categories on a sysmobts, for osmo-bts
  and osmo-pcu, but that isn't related to dynamic allocations, much rather to
  plain fprintf() I/O -- which is much more perf critical than de/allocating.

> This is debatable, and we can make it automatic
> in osmo_select_main().

+1

> It's probably worth a benchmark how expensive that 'empty' allocation + free is.

If some (inline?) osmo_volatile_buffer() function creates such a talloc context
only on demand, then I don't see how this can possibly have any impact at all.

> However, I think that's a rather "OK" price to pay.  Converting the existing
> callers can be more or less done with "sed" (yes, spatch is of course better).

ugh really; I still argue that the API is then bloated, and more importantly
writing new code becomes more complex and more bloated as well.
I don't really want to live in that API.

> While introducing this feature, I also tried to address two other topics, which
> are not strictly related.  However, ignoring those two now would mean
> we'd have API/ABI breaks if we'd care about them in the future.  Hence I
> think we should resolve them right away:
> 
> 1) introduce another context: OTC_GLOBAL.  This is basically a library-internal
>    replacement for the "g_tall_ctx" that we typically generate as one of the
>    first things in every application.  You can use it anywhere where you'd want
>    to allocate something form the global talloc context

Nice.

Shouldn't the name start with OSMO_ though?

> 2) make those contexts thread-local.

That's pretty cool.

Ok, so we already have distinct volatile talloc contexts per thread, implicitly.
Then why would you still want to pass this context to each function call explicitly?

[[
Trying to read up on __thread, I noticed there are portability / C version
considerations. This macro, if we remove the windows bit, looks interesting:
https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably
i.e.

#ifndef thread_local
# if __STDC_VERSION__ >= 201112 && !defined __STDC_NO_THREADS__
#  define thread_local _Thread_local
# elif defined __GNUC__ || \
       defined __SUNPRO_C || \
       defined __xlC__
#  define thread_local __thread
# else
#  error "Cannot define thread_local"
# endif
#endif

?
]]

~N
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20190320/244c6b0c/attachment.bin>