sccp_helpers: osmo_prim_init() and msgb

historical

On Fri, Apr 01, 2016 at 11:45:03AM +0200, Harald Welte wrote:
> Hi Neels,
> 
> On Fri, Apr 01, 2016 at 12:49:13AM +0200, Neels Hofmeyr wrote:
> > First off, we allocate 1024 bytes of msgb. Into it goes an osmo_scu_prim
> > struct (here 'prim').  We set some parameters in prim, fair enough. But
> > now what, prim->oph, the prim header, has a msg pointer, which points back
> > at the msgb that contains the prim. really?? The msgb containing a struct
> > pointing back at the same msgb has my head spinning.
> 
> I actually think it's a pretty straight-forward solution when thinking
> of of the general problem and the constraints:
> 
> * we already had 'msgb' handling everywhere in our message processing
>   code and were using it quite happily to pass messages around.

Disclaimer ahead: I won't get carried away too far. It's not like I'm
going to change the most central building block of the osmoverse any time
soon (ever). I just wish it were more explicit and instantly readable.

So let's read this not as a proposal to change, but just me getting used
to the way it is.

I understand that it's handy to use a generic data structure to pass
messages around, but with msgb you have to read just about *all* of the
code to find out what is stored where in the msgb for a given code path.

I think my main problem (again) is that I like things to be explicit and
well named in data structures. So much of the msgb use is implicit, like
the dst and lNh pointers, particularly the cb[], which makes it hard to
wrap your head around, let alone change its use safely. Also, I find the
names for data_len vs len, data+tail vs head, quite confusing. It's shaken
and stirred ;)

Adding a layer of self-reference on top of this makes it even harder to
keep a straight head on what's going on.

> * at inter-layer boundaries in the ISO/OSI model, you have primitives
>   that consist of the primitive itself, as well as the payload data.
>   The primitive consists of some operation like N-DATA, plus the
>   associatted request/indication/confirmation/response part.  A
>   primitive occors on a given SAP.  This was put into struct
>   osmo_prim_hdr
> 
> * In order to not have the primitive separate and out-of-band, we store
>   it inside the msgb itself.  In early versions of some code, we had the
>   osmo_prim_hdr allocated separately and passed both it and the msgb to
>   every function.  That was a bit cumbersome and error-prone.

If a primitive refers to a msgb, it would be easier to understand to have
the msgb as member of the prim struct, allocate it in one go with enough
"tail room" for the msgb, and go on passing the prim around. I admit that
using the msgb API to allocate the memory in the first place is reducing
code duplication. But it is kind of twisted :)

> * To avoid all the primitive-handling code having to derive the pointer
>   to osmo_prim_hdr as the first line of code, we pass around a pointer
>   to the osmo_prim_hdr itself, rather than a pointer to msgb.  This
>   also makes the code more explicit in that you can clearly see that it
>   doesn't expect a generic msgb, but that it deals with a primitive as
>   argument.
> 
> * As the osmo_prim_hdr is not stored at a fixed offset inside the msgb,
>   we cannot use tricks like 'offset_of()' to get back from the
>   osmo_prim_hdr to the msgb.  Thus we store the pointer to the msgb
>   header inside the osmo_prim_hdr.  The knowledge of the msgb is needed
>   particularly in case the primitive needs to be free()d.

free() is a good point.

> > Finally we add the data, l2h pointing at the start of it. I'm confused ...
> > how large is the msgb supposed to be? Is it len + sizeof(*prim)?
> 
> yes, at the very minimum it has to be that size.   If you need to
> determine that frequently, we could add an inline function or macro to
> the code.

I meant: that function is allocating 1024 bytes, even though it has a len
argument and knows sizeof(*prim). It has all the information to allocate
precisely the right amount. Can len+sizeof(*prim) > 1024 happen?

> the problem particularly when interfacing with asn1c generated code is
> that you end up doing so many copies and dynamic allocations all the
> time, that it probably hardly matters anymore.  It's one hell of
> inefficiency, at least for from where we come from.

Yes, and arguably it doesn't make that much of a difference in real world
terms. (Of course, this mindset taken too far gets you run times and
memory usage we know from Java.)

> > And, given the union prim->u has "unknown" size, why not point at the data
> > from prim->oph directly, instead of pointing back at the msgb? Or have
> > that union first? That would be easier to understand.
> 
> Now you are mixing up the SCCP-User pimitives (osmo_scu_prim) with the
> generic underlying osmo_prim_hdr layer.  osmo_prim_hdr is used in
> other contexts, including osmo-tetra, for which I first developed it,
> AFAIR.  It is also used in libosmogb, osmocom-bb.  It should IMHO be
> used much more generically at inter-layer boundaries, but a lot of the
> exiting code pre-dates osmo_prim_hdr.  From my point of view, every
> layer boundary should be using the osmo_prim_hdr construct and
> associated service access points.
> 
> To get back to your question:
> 
> * SCCP Use SAP: prim->u doesn't have unknown size, but has the size of
>   the largest member in the union.  You should always allocate memory
>   for sizeof(prim).

The image in my mind was that the data starts behind the header. If the
header has a union, the data starts behind the union, even though the way
the union is used may be smaller. "unknown" is the wrong word, yes...

So it's like

  msgb {       <--+
                  |
     +- *l2h      |
     |            |
     |            |
     |  data      |
     |  |prim     |
     |  | oph     |
     |  |  *msgb -+
     |  | u
     +->|payload
        |...
        |
  }

But this would be easier to read:

  prim {
    oph
     *payload ---+
    u            |
    msgb         |
      data    <--+
  }

I assume this is wishful thinking, because functions that currently only
know the oph and can free the entire prim via msgb would suddenly have to
know about prim too, right?

> * prim->oph (osmo_prim_hdr) pointing to the message data is way too
>   restrictive.  How would you go back to 'msg' in order to free it?  How
>   would yo get to all the other meta-data of a msg that it carries
>   around, like msgb->dst, the header pointers to variuos layers, the
>   msgb->cb information, ..?

Well, I'd kind of wish it didn't have to get back to all the generic bits
of meta-data implicitly used in different ways depending on the code path,
but that it would name its meta-data explicitly...

> I find that much more convenient instead of having an extra msgb
> argument to each and every function dealing with osmo_prim_hdr.  The

No, of course not a second pointer, but one plain structure without
self-reference, with names a human coder can relate to...

Which gets me back to the disclaimer. I'm not going to rewire the way
messages get passed through the layers. I even assume that I'll get used
to it and learn to like the current way, once I've seen all constraints.

Nevertheless, I find the code is hard to get used to for a newcomer.

Thanks for taking the time to write down the reasoning!

~Neels

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.osmocom.org/pipermail/openbsc/attachments/20160402/ee5be9d5/attachment.bin>