Re: sccp_helpers: osmo_prim_init() and msgb

2 Apr 2016

      On Fri, Apr 01, 2016 at 11:45:03AM +0200, Harald Welte wrote:
...
Hi Neels,
On Fri, Apr 01, 2016 at 12:49:13AM +0200, Neels Hofmeyr wrote:
...
First off, we allocate 1024 bytes of msgb. Into it goes an osmo_scu_prim
struct (here 'prim').  We set some parameters in prim, fair enough. But
now what, prim->oph, the prim header, has a msg pointer, which points back
at the msgb that contains the prim. really?? The msgb containing a struct
pointing back at the same msgb has my head spinning.
I actually think it's a pretty straight-forward solution when thinking
of of the general problem and the constraints:

we already had 'msgb' handling everywhere in our message processing
code and were using it quite happily to pass messages around.

Disclaimer ahead: I won't get carried away too far. It's not like I'm
going to change the most central building block of the osmoverse any time
soon (ever). I just wish it were more explicit and instantly readable.
So let's read this not as a proposal to change, but just me getting used
to the way it is.
I understand that it's handy to use a generic data structure to pass
messages around, but with msgb you have to read just about *all* of the
code to find out what is stored where in the msgb for a given code path.
I think my main problem (again) is that I like things to be explicit and
well named in data structures. So much of the msgb use is implicit, like
the dst and lNh pointers, particularly the cb[], which makes it hard to
wrap your head around, let alone change its use safely. Also, I find the
names for data_len vs len, data+tail vs head, quite confusing. It's shaken
and stirred ;)
Adding a layer of self-reference on top of this makes it even harder to
keep a straight head on what's going on.
...

at inter-layer boundaries in the ISO/OSI model, you have primitives
that consist of the primitive itself, as well as the payload data.
The primitive consists of some operation like N-DATA, plus the
associatted request/indication/confirmation/response part.  A
primitive occors on a given SAP.  This was put into struct
osmo_prim_hdr

In order to not have the primitive separate and out-of-band, we store
it inside the msgb itself.  In early versions of some code, we had the
osmo_prim_hdr allocated separately and passed both it and the msgb to
every function.  That was a bit cumbersome and error-prone.

If a primitive refers to a msgb, it would be easier to understand to have
the msgb as member of the prim struct, allocate it in one go with enough
"tail room" for the msgb, and go on passing the prim around. I admit that
using the msgb API to allocate the memory in the first place is reducing
code duplication. But it is kind of twisted :)
...

To avoid all the primitive-handling code having to derive the pointer
to osmo_prim_hdr as the first line of code, we pass around a pointer
to the osmo_prim_hdr itself, rather than a pointer to msgb.  This
also makes the code more explicit in that you can clearly see that it
doesn't expect a generic msgb, but that it deals with a primitive as
argument.

As the osmo_prim_hdr is not stored at a fixed offset inside the msgb,
we cannot use tricks like 'offset_of()' to get back from the
osmo_prim_hdr to the msgb.  Thus we store the pointer to the msgb
header inside the osmo_prim_hdr.  The knowledge of the msgb is needed
particularly in case the primitive needs to be free()d.

free() is a good point.
...
...
Finally we add the data, l2h pointing at the start of it. I'm confused ...
how large is the msgb supposed to be? Is it len + sizeof(*prim)?
yes, at the very minimum it has to be that size.   If you need to
determine that frequently, we could add an inline function or macro to
the code.
I meant: that function is allocating 1024 bytes, even though it has a len
argument and knows sizeof(*prim). It has all the information to allocate
precisely the right amount. Can len+sizeof(*prim) > 1024 happen?
...
the problem particularly when interfacing with asn1c generated code is
that you end up doing so many copies and dynamic allocations all the
time, that it probably hardly matters anymore.  It's one hell of
inefficiency, at least for from where we come from.
Yes, and arguably it doesn't make that much of a difference in real world
terms. (Of course, this mindset taken too far gets you run times and
memory usage we know from Java.)
...
...
And, given the union prim->u has "unknown" size, why not point at the data
from prim->oph directly, instead of pointing back at the msgb? Or have
that union first? That would be easier to understand.
Now you are mixing up the SCCP-User pimitives (osmo_scu_prim) with the
generic underlying osmo_prim_hdr layer.  osmo_prim_hdr is used in
other contexts, including osmo-tetra, for which I first developed it,
AFAIR.  It is also used in libosmogb, osmocom-bb.  It should IMHO be
used much more generically at inter-layer boundaries, but a lot of the
exiting code pre-dates osmo_prim_hdr.  From my point of view, every
layer boundary should be using the osmo_prim_hdr construct and
associated service access points.
To get back to your question:

SCCP Use SAP: prim->u doesn't have unknown size, but has the size of
the largest member in the union.  You should always allocate memory
for sizeof(prim).

The image in my mind was that the data starts behind the header. If the
header has a union, the data starts behind the union, even though the way
the union is used may be smaller. "unknown" is the wrong word, yes...
So it's like
msgb {       <--+
                  |
     +- *l2h      |
     |            |
     |            |
     |  data      |
     |  |prim     |
     |  | oph     |
     |  |  *msgb -+
     |  | u
     +->|payload
        |...
        |
  }
But this would be easier to read:
prim {
    oph
     *payload ---+
    u            |
    msgb         |
      data    <--+
  }
I assume this is wishful thinking, because functions that currently only
know the oph and can free the entire prim via msgb would suddenly have to
know about prim too, right?
...

prim->oph (osmo_prim_hdr) pointing to the message data is way too
restrictive.  How would you go back to 'msg' in order to free it?  How
would yo get to all the other meta-data of a msg that it carries
around, like msgb->dst, the header pointers to variuos layers, the
msgb->cb information, ..?

Well, I'd kind of wish it didn't have to get back to all the generic bits
of meta-data implicitly used in different ways depending on the code path,
but that it would name its meta-data explicitly...
...
I find that much more convenient instead of having an extra msgb
argument to each and every function dealing with osmo_prim_hdr.  The
No, of course not a second pointer, but one plain structure without
self-reference, with names a human coder can relate to...
Which gets me back to the disclaimer. I'm not going to rewire the way
messages get passed through the layers. I even assume that I'll get used
to it and learn to like the current way, once I've seen all constraints.
Nevertheless, I find the code is hard to get used to for a newcomer.
Thanks for taking the time to write down the reasoning!
~Neels

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: sccp_helpers: osmo_prim_init() and msgb