sccp_helpers: osmo_prim_init() and msgb

historical

Hi Neels,

On Sat, Apr 02, 2016 at 12:55:43AM +0200, Neels Hofmeyr wrote:
> I think my main problem (again) is that I like things to be explicit and
> well named in data structures. So much of the msgb use is implicit, like
> the dst and lNh pointers, particularly the cb[], which makes it hard to
> wrap your head around, let alone change its use safely. Also, I find the
> names for data_len vs len, data+tail vs head, quite confusing. It's shaken
> and stirred ;)

both the cb[] and the member names are inherited from 'struct sk_buff'
in the linux kernel, which is what I was used to in my former life (and
whihc some other early OpenBSC developers also had at least some
exposure to, I believe).

> Adding a layer of self-reference on top of this makes it even harder to
> keep a straight head on what's going on.

I think the main problem is that you have to worry so much about those
implementation details.  I'm actually surprised by this, as it points
out we must have done something wrong, and we should improve the APIs
and utilities in a way that you and other developers *using* msgb's and
primitivesit have to worry less about it.

You shouldn't have to care about the detailed usage of the cb[] as every
program / library / protocol stack should have it's own wrappers and
inline functions or macros to obtain the respective values, like
msgb_bvci() in case if the Gb protocol stack.  Similar analogy to the
kernel.  When you're inside the TCP code, you use the cb[] for a tcp
specific structure and there's an easy way to get to that structure or
its members.

The l2h/l3h should proably be used less and more with a wrapper macro,
too.  Like 'msgb_rslh()' to get the rsl header, or the like.  msgb->l3h
should really only be used in cases where there is a clear definiton
what the layer3 protocol is.  In GSM, it is RR/MM/CC.  The problem is
that depending on the interface (Abis/A/Iuh/Iub/IuCS), below can be
multiple sub-layes of 'layer 2'.

> > * In order to not have the primitive separate and out-of-band, we store
> >   it inside the msgb itself.  In early versions of some code, we had the
> >   osmo_prim_hdr allocated separately and passed both it and the msgb to
> >   every function.  That was a bit cumbersome and error-prone.
> 
> If a primitive refers to a msgb, it would be easier to understand to have
> the msgb as member of the prim struct, allocate it in one go with enough
> "tail room" for the msgb, and go on passing the prim around. I admit that
> using the msgb API to allocate the memory in the first place is reducing
> code duplication. But it is kind of twisted :)

At some point you receive data from a socket. At this point you only
have a msgb, and not yet a primitive.  Only later, after some header
parsing, you will know you need a primitive, and which type of
primitive.  So you push the osmo_prim_hdr + SAP specific struct to the
msgb, as a shim-header between the different layers in a stack.  Sounds
pretty straight-forward to me.  Think of the SAP and its detailed
primitive (including the osmo_prim_hader) as an additional prtocol layer
header between two elements of a protocol stack on the local host.

And once again my point is: Why would you want to or need to care how
the internals are twisted.  The only point that detail about the back
pointer matters is in case of msgb_copy(), see my previous mail.

> > yes, at the very minimum it has to be that size.   If you need to
> > determine that frequently, we could add an inline function or macro to
> > the code.
> 
> I meant: that function is allocating 1024 bytes, even though it has a len
> argument and knows sizeof(*prim). It has all the information to allocate
> precisely the right amount. Can len+sizeof(*prim) > 1024 happen?

Allocating blocks of memory in equal size is often/typically more
efficient and leads to less memory fragmentation than allocating odd
lengths.  This is particularly the case when using talloc pools, like we
recently started to do in osmo-bts.  In almost all OSmo* code modules,
you will find a wrapper around msgb_alloc() or msgb_alloc_headroom(),
whcih will typically allocate with a fixed size.  See RSL_ALLOC_SIZE,
L3_MSG_SIZE, MSGB_MAX, RR_ALLOC_SIZE, GSM_MACBLOCK_LEN, or sometimes
hard-coded magic numbers, like in the case of osmo-bts-* definitions of
l1p_msgb_alloc() or the like.

If the length should ever be larger, then msgb_push() beyond the size of
the msgb will OSMO_ASSERT and you see the problem at runtime.  I prefer
this generally to 'hidden' increase of allocations to a runtime-computed
length.

The idea is that a developer writing a particular module should
generally have an idea about what roughly is the maximum message size at
the given protocol that he implements and decide for a reasonable
allocation size, including sufficient space in the headroom for what he
might push to that msgb, including the respective primtive headers.

> > * SCCP Use SAP: prim->u doesn't have unknown size, but has the size of
> >   the largest member in the union.  You should always allocate memory
> >   for sizeof(prim).
> 
> The image in my mind was that the data starts behind the header. If the
> header has a union, the data starts behind the union, even though the way
> the union is used may be smaller. "unknown" is the wrong word, yes...
> 
> So it's like
> 
>   msgb {       <--+
>                   |
>      +- *l2h      |
>      |            |
>      |            |
>      |  data      |
>      |  |prim     |
>      |  | oph     |
>      |  |  *msgb -+
>      |  | u
>      +->|payload
>         |...
>         |
>   }
> 
> But this would be easier to read:
> 
>   prim {
>     oph
>      *payload ---+
>     u            |
>     msgb         |
>       data    <--+
>   }

Is it really that much more difficult to write "oph->msg->l2h" than to
your proposed "oph->data" ?  And if that's the case, why not have a
osmo_sua_prim_data(oph) that hides the additional pointer deref from
you (and which oould do consistency checking like if oph->sap really is
the SAP you expect)?

Why would you have to worry or care how things are ordered in memory?  I
really don't understand this part, sorry ;)  Why would you care?

Your typical primitive-consuming function gets a
'struct osmo_prim_hdr *' as input, and from there you

 a) know how to get to the msgb from that (oph->msgb)

 b) know how to get the SAP-specific larger primitive structure from it
    (they type-casting macro. You know to what to type-cast by knowing
     from which SAP you receive primitives.  You could even do it
     dynamically baesd on oph->sap value, or ASSERT to make sure your
     code is not making the wrong choice here)

 c) know how to get to a particular layer header at any level of your
    protocol stack inside the msgb (msgb->l2h/l3h/...)
    (as indicated, a protocol stack / application / SAP
     specificmacro/inline function should probably be used anyway to have
     a more expressive way of what the l2 header at this SAP really is,
     so think of msgb_suah(oph->msg), msgb_ruah(oph->msg),
     msgb_iuh(oph->msg) which then resolve to the respective lXh or cb
     pointer.)

And if you want to wrap a msgb in a primitive, the given SAP and related
code should contain a convenience function that does that for you, i.e.
pushing the SAP-specific structure, including the osmo_prim_hdr at its
beginning, to the head of the msgb.  Basically a "I want to turn this
msgb into a primitive at SAP X, using the following details related to
the primitive"

> > * prim->oph (osmo_prim_hdr) pointing to the message data is way too
> >   restrictive.  How would you go back to 'msg' in order to free it?  How
> >   would yo get to all the other meta-data of a msg that it carries
> >   around, like msgb->dst, the header pointers to variuos layers, the
> >   msgb->cb information, ..?
> 
> Well, I'd kind of wish it didn't have to get back to all the generic bits
> of meta-data implicitly used in different ways depending on the code path,
> but that it would name its meta-data explicitly...

but where should that meta-data be stored and named explicitly?  Whats'
wrong with having that in msgb->cb, where it is always passed along with
the mesasge?

Things that I think would improve the situation:

* include a 'magic value' in the msgb->cb[] that defines which structure
  is currently type-casted to it.  This way we could have assert-style
  macros at various places in the code, making sure that nobody is
  trying to dereference it to the wrong structure
* reduce the amount of direct lXh accesses, particularly for != l3h. In
  fact, they predate the cb, and if we had a cb from the beginning, they
  might not even have needed to exist.
* document the usage of msgb (particularly lXh and cb usage) for each
  prtoocol/interface/stack/sap.

> No, of course not a second pointer, but one plain structure without
> self-reference, with names a human coder can relate to...

but then you end up copying everything from a msgb (or it's cb) to that
new structure and vice versa.  I also think the question of naming is a
completely separate topic.  You can have (from your point of view)
better naming also in the current structures / members / apis.

Regards,
	Harald
-- 
- Harald Welte <laforge at gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)