Hi list,
osmo_prim_init() and msgb use firstly confuses me, and secondly is apparently "inefficient" in sccp_helpers.c.
First, let's look at osmo_sccp_tx_unitdata() (libosmo-sccp branch sysmocom/iu):
int osmo_sccp_tx_unitdata(struct osmo_sua_link *link, const struct osmo_sccp_addr *calling_addr, const struct osmo_sccp_addr *called_addr, uint8_t *data, unsigned int len) { struct msgb *msg = msgb_alloc(1024, "sccp_tx_unitdata"); struct osmo_scu_prim *prim; struct osmo_scu_unitdata_param *param;
prim = (struct osmo_scu_prim *) msgb_put(msg, sizeof(*prim)); param = &prim->u.unitdata; memcpy(¶m->calling_addr, calling_addr, sizeof(*calling_addr)); memcpy(¶m->called_addr, called_addr, sizeof(*called_addr)); osmo_prim_init(&prim->oph, SCCP_SAP_USER, OSMO_SCU_PRIM_N_UNITDATA, PRIM_OP_REQUEST, msg);
msg->l2h = msgb_put(msg, len); memcpy(msg->l2h, data, len);
return osmo_sua_user_link_down(link, &prim->oph); }
First off, we allocate 1024 bytes of msgb. Into it goes an osmo_scu_prim struct (here 'prim'). We set some parameters in prim, fair enough. But now what, prim->oph, the prim header, has a msg pointer, which points back at the msgb that contains the prim. really?? The msgb containing a struct pointing back at the same msgb has my head spinning.
Finally we add the data, l2h pointing at the start of it. I'm confused ... how large is the msgb supposed to be? Is it len + sizeof(*prim)?
Secondly, let's look at:
int osmo_sccp_tx_unitdata_msg(struct osmo_sua_link *link, const struct osmo_sccp_addr *calling_addr, const struct osmo_sccp_addr *called_addr, struct msgb *msg) { int rc;
rc = osmo_sccp_tx_unitdata(link, calling_addr, called_addr, msg->data, msgb_length(msg)); msgb_free(msg);
return rc; }
So, something out there has a msgb containing data. For example, from ranap_new_msg_paging_cmd(). That msgb has probably been allocated specifically to compose a message to go over the sccp link.
We go on, though, to allocate *another* msgb (size 1024, remember), and memcpy the first msgb into the second msgb, behind the osmo_scu_prim struct.
I'll first use this as-is, talking about premature optimization. But I'd prefer to have neither the wicked cunning nor the blunt dumbness in there; the combination strikes me as jolly mad ;)
Are there good reasons for this?
I would have envisioned something like: tell ranap_new_msg_paging_cmd() to allocate sizeof(*prim) of headroom; or first create a msgb of fixed known-max size, and add first the prim header and then compose the data.
And, given the union prim->u has "unknown" size, why not point at the data from prim->oph directly, instead of pointing back at the msgb? Or have that union first? That would be easier to understand.
If some code needs to know the msgb that contains the prim->oph, I find it weird to pass it the prim->oph with a backpointer, instead of just passing the msgb right away.
~Neels
Hi Neels,
On Fri, Apr 01, 2016 at 12:49:13AM +0200, Neels Hofmeyr wrote:
First off, we allocate 1024 bytes of msgb. Into it goes an osmo_scu_prim struct (here 'prim'). We set some parameters in prim, fair enough. But now what, prim->oph, the prim header, has a msg pointer, which points back at the msgb that contains the prim. really?? The msgb containing a struct pointing back at the same msgb has my head spinning.
I actually think it's a pretty straight-forward solution when thinking of of the general problem and the constraints:
* we already had 'msgb' handling everywhere in our message processing code and were using it quite happily to pass messages around.
* at inter-layer boundaries in the ISO/OSI model, you have primitives that consist of the primitive itself, as well as the payload data. The primitive consists of some operation like N-DATA, plus the associatted request/indication/confirmation/response part. A primitive occors on a given SAP. This was put into struct osmo_prim_hdr
* In order to not have the primitive separate and out-of-band, we store it inside the msgb itself. In early versions of some code, we had the osmo_prim_hdr allocated separately and passed both it and the msgb to every function. That was a bit cumbersome and error-prone.
* To avoid all the primitive-handling code having to derive the pointer to osmo_prim_hdr as the first line of code, we pass around a pointer to the osmo_prim_hdr itself, rather than a pointer to msgb. This also makes the code more explicit in that you can clearly see that it doesn't expect a generic msgb, but that it deals with a primitive as argument.
* As the osmo_prim_hdr is not stored at a fixed offset inside the msgb, we cannot use tricks like 'offset_of()' to get back from the osmo_prim_hdr to the msgb. Thus we store the pointer to the msgb header inside the osmo_prim_hdr. The knowledge of the msgb is needed particularly in case the primitive needs to be free()d.
Finally we add the data, l2h pointing at the start of it. I'm confused ... how large is the msgb supposed to be? Is it len + sizeof(*prim)?
yes, at the very minimum it has to be that size. If you need to determine that frequently, we could add an inline function or macro to the code.
Secondly, let's look at:
int osmo_sccp_tx_unitdata_msg(struct osmo_sua_link *link, const struct osmo_sccp_addr *calling_addr, const struct osmo_sccp_addr *called_addr, struct msgb *msg) { int rc;
rc = osmo_sccp_tx_unitdata(link, calling_addr, called_addr, msg->data, msgb_length(msg)); msgb_free(msg); return rc;}
So, something out there has a msgb containing data. For example, from ranap_new_msg_paging_cmd(). That msgb has probably been allocated specifically to compose a message to go over the sccp link.
We go on, though, to allocate *another* msgb (size 1024, remember), and memcpy the first msgb into the second msgb, behind the osmo_scu_prim struct.
Are there good reasons for this?
This function was just a quick wrapper around the already-existing osmo_sccp_tx_unitdata() function, avoiding a copy+paste of that funtion's body.
I would have envisioned something like: tell ranap_new_msg_paging_cmd() to allocate sizeof(*prim) of headroom; or first create a msgb of fixed known-max size, and add first the prim header and then compose the data.
the problem particularly when interfacing with asn1c generated code is that you end up doing so many copies and dynamic allocations all the time, that it probably hardly matters anymore. It's one hell of inefficiency, at least for from where we come from.
And, given the union prim->u has "unknown" size, why not point at the data from prim->oph directly, instead of pointing back at the msgb? Or have that union first? That would be easier to understand.
Now you are mixing up the SCCP-User pimitives (osmo_scu_prim) with the generic underlying osmo_prim_hdr layer. osmo_prim_hdr is used in other contexts, including osmo-tetra, for which I first developed it, AFAIR. It is also used in libosmogb, osmocom-bb. It should IMHO be used much more generically at inter-layer boundaries, but a lot of the exiting code pre-dates osmo_prim_hdr. From my point of view, every layer boundary should be using the osmo_prim_hdr construct and associated service access points.
To get back to your question:
* SCCP Use SAP: prim->u doesn't have unknown size, but has the size of the largest member in the union. You should always allocate memory for sizeof(prim).
* prim->oph (osmo_prim_hdr) pointing to the message data is way too restrictive. How would you go back to 'msg' in order to free it? How would yo get to all the other meta-data of a msg that it carries around, like msgb->dst, the header pointers to variuos layers, the msgb->cb information, ..?
If some code needs to know the msgb that contains the prim->oph, I find it weird to pass it the prim->oph with a backpointer, instead of just passing the msgb right away.
I find that much more convenient instead of having an extra msgb argument to each and every function dealing with osmo_prim_hdr. The goal should be to make it easy for the programmer to write code that is readable, rather than passing around a second pointer that points virtually to the same chunk of data. It also poses the danger of a caller passing a pointer to an oph not inside the msgb (or vice versa), with desastrous results.
The only case where you as a user of the primitives really have to worry about it is when you msgb_copy() the msgb holding a primitive. But then, if your get passed an osmo_prim_hdr as argument, you should use a (to be written) dedicated osmo_prim_copy() function, which understands the peculiarities of osmo_prim and performs a msgb_copy() with oph->msg pointer adjustment after the copy. calling msgb_copy() on a oph->msg is just not allowed. But I don't think we have ever needed that so far, that's why the osmo_prim_copy() doesn't exist yet.
Regards, Harald
On Fri, Apr 01, 2016 at 11:45:03AM +0200, Harald Welte wrote:
Hi Neels,
On Fri, Apr 01, 2016 at 12:49:13AM +0200, Neels Hofmeyr wrote:
First off, we allocate 1024 bytes of msgb. Into it goes an osmo_scu_prim struct (here 'prim'). We set some parameters in prim, fair enough. But now what, prim->oph, the prim header, has a msg pointer, which points back at the msgb that contains the prim. really?? The msgb containing a struct pointing back at the same msgb has my head spinning.
I actually think it's a pretty straight-forward solution when thinking of of the general problem and the constraints:
- we already had 'msgb' handling everywhere in our message processing code and were using it quite happily to pass messages around.
Disclaimer ahead: I won't get carried away too far. It's not like I'm going to change the most central building block of the osmoverse any time soon (ever). I just wish it were more explicit and instantly readable.
So let's read this not as a proposal to change, but just me getting used to the way it is.
I understand that it's handy to use a generic data structure to pass messages around, but with msgb you have to read just about *all* of the code to find out what is stored where in the msgb for a given code path.
I think my main problem (again) is that I like things to be explicit and well named in data structures. So much of the msgb use is implicit, like the dst and lNh pointers, particularly the cb[], which makes it hard to wrap your head around, let alone change its use safely. Also, I find the names for data_len vs len, data+tail vs head, quite confusing. It's shaken and stirred ;)
Adding a layer of self-reference on top of this makes it even harder to keep a straight head on what's going on.
at inter-layer boundaries in the ISO/OSI model, you have primitives that consist of the primitive itself, as well as the payload data. The primitive consists of some operation like N-DATA, plus the associatted request/indication/confirmation/response part. A primitive occors on a given SAP. This was put into struct osmo_prim_hdr
In order to not have the primitive separate and out-of-band, we store it inside the msgb itself. In early versions of some code, we had the osmo_prim_hdr allocated separately and passed both it and the msgb to every function. That was a bit cumbersome and error-prone.
If a primitive refers to a msgb, it would be easier to understand to have the msgb as member of the prim struct, allocate it in one go with enough "tail room" for the msgb, and go on passing the prim around. I admit that using the msgb API to allocate the memory in the first place is reducing code duplication. But it is kind of twisted :)
To avoid all the primitive-handling code having to derive the pointer to osmo_prim_hdr as the first line of code, we pass around a pointer to the osmo_prim_hdr itself, rather than a pointer to msgb. This also makes the code more explicit in that you can clearly see that it doesn't expect a generic msgb, but that it deals with a primitive as argument.
As the osmo_prim_hdr is not stored at a fixed offset inside the msgb, we cannot use tricks like 'offset_of()' to get back from the osmo_prim_hdr to the msgb. Thus we store the pointer to the msgb header inside the osmo_prim_hdr. The knowledge of the msgb is needed particularly in case the primitive needs to be free()d.
free() is a good point.
Finally we add the data, l2h pointing at the start of it. I'm confused ... how large is the msgb supposed to be? Is it len + sizeof(*prim)?
yes, at the very minimum it has to be that size. If you need to determine that frequently, we could add an inline function or macro to the code.
I meant: that function is allocating 1024 bytes, even though it has a len argument and knows sizeof(*prim). It has all the information to allocate precisely the right amount. Can len+sizeof(*prim) > 1024 happen?
the problem particularly when interfacing with asn1c generated code is that you end up doing so many copies and dynamic allocations all the time, that it probably hardly matters anymore. It's one hell of inefficiency, at least for from where we come from.
Yes, and arguably it doesn't make that much of a difference in real world terms. (Of course, this mindset taken too far gets you run times and memory usage we know from Java.)
And, given the union prim->u has "unknown" size, why not point at the data from prim->oph directly, instead of pointing back at the msgb? Or have that union first? That would be easier to understand.
Now you are mixing up the SCCP-User pimitives (osmo_scu_prim) with the generic underlying osmo_prim_hdr layer. osmo_prim_hdr is used in other contexts, including osmo-tetra, for which I first developed it, AFAIR. It is also used in libosmogb, osmocom-bb. It should IMHO be used much more generically at inter-layer boundaries, but a lot of the exiting code pre-dates osmo_prim_hdr. From my point of view, every layer boundary should be using the osmo_prim_hdr construct and associated service access points.
To get back to your question:
- SCCP Use SAP: prim->u doesn't have unknown size, but has the size of the largest member in the union. You should always allocate memory for sizeof(prim).
The image in my mind was that the data starts behind the header. If the header has a union, the data starts behind the union, even though the way the union is used may be smaller. "unknown" is the wrong word, yes...
So it's like
msgb { <--+ | +- *l2h | | | | | | data | | |prim | | | oph | | | *msgb -+ | | u +->|payload |... | }
But this would be easier to read:
prim { oph *payload ---+ u | msgb | data <--+ }
I assume this is wishful thinking, because functions that currently only know the oph and can free the entire prim via msgb would suddenly have to know about prim too, right?
- prim->oph (osmo_prim_hdr) pointing to the message data is way too restrictive. How would you go back to 'msg' in order to free it? How would yo get to all the other meta-data of a msg that it carries around, like msgb->dst, the header pointers to variuos layers, the msgb->cb information, ..?
Well, I'd kind of wish it didn't have to get back to all the generic bits of meta-data implicitly used in different ways depending on the code path, but that it would name its meta-data explicitly...
I find that much more convenient instead of having an extra msgb argument to each and every function dealing with osmo_prim_hdr. The
No, of course not a second pointer, but one plain structure without self-reference, with names a human coder can relate to...
Which gets me back to the disclaimer. I'm not going to rewire the way messages get passed through the layers. I even assume that I'll get used to it and learn to like the current way, once I've seen all constraints.
Nevertheless, I find the code is hard to get used to for a newcomer.
Thanks for taking the time to write down the reasoning!
~Neels
Hi Neels,
On Sat, Apr 02, 2016 at 12:55:43AM +0200, Neels Hofmeyr wrote:
I think my main problem (again) is that I like things to be explicit and well named in data structures. So much of the msgb use is implicit, like the dst and lNh pointers, particularly the cb[], which makes it hard to wrap your head around, let alone change its use safely. Also, I find the names for data_len vs len, data+tail vs head, quite confusing. It's shaken and stirred ;)
both the cb[] and the member names are inherited from 'struct sk_buff' in the linux kernel, which is what I was used to in my former life (and whihc some other early OpenBSC developers also had at least some exposure to, I believe).
Adding a layer of self-reference on top of this makes it even harder to keep a straight head on what's going on.
I think the main problem is that you have to worry so much about those implementation details. I'm actually surprised by this, as it points out we must have done something wrong, and we should improve the APIs and utilities in a way that you and other developers *using* msgb's and primitivesit have to worry less about it.
You shouldn't have to care about the detailed usage of the cb[] as every program / library / protocol stack should have it's own wrappers and inline functions or macros to obtain the respective values, like msgb_bvci() in case if the Gb protocol stack. Similar analogy to the kernel. When you're inside the TCP code, you use the cb[] for a tcp specific structure and there's an easy way to get to that structure or its members.
The l2h/l3h should proably be used less and more with a wrapper macro, too. Like 'msgb_rslh()' to get the rsl header, or the like. msgb->l3h should really only be used in cases where there is a clear definiton what the layer3 protocol is. In GSM, it is RR/MM/CC. The problem is that depending on the interface (Abis/A/Iuh/Iub/IuCS), below can be multiple sub-layes of 'layer 2'.
- In order to not have the primitive separate and out-of-band, we store it inside the msgb itself. In early versions of some code, we had the osmo_prim_hdr allocated separately and passed both it and the msgb to every function. That was a bit cumbersome and error-prone.
If a primitive refers to a msgb, it would be easier to understand to have the msgb as member of the prim struct, allocate it in one go with enough "tail room" for the msgb, and go on passing the prim around. I admit that using the msgb API to allocate the memory in the first place is reducing code duplication. But it is kind of twisted :)
At some point you receive data from a socket. At this point you only have a msgb, and not yet a primitive. Only later, after some header parsing, you will know you need a primitive, and which type of primitive. So you push the osmo_prim_hdr + SAP specific struct to the msgb, as a shim-header between the different layers in a stack. Sounds pretty straight-forward to me. Think of the SAP and its detailed primitive (including the osmo_prim_hader) as an additional prtocol layer header between two elements of a protocol stack on the local host.
And once again my point is: Why would you want to or need to care how the internals are twisted. The only point that detail about the back pointer matters is in case of msgb_copy(), see my previous mail.
yes, at the very minimum it has to be that size. If you need to determine that frequently, we could add an inline function or macro to the code.
I meant: that function is allocating 1024 bytes, even though it has a len argument and knows sizeof(*prim). It has all the information to allocate precisely the right amount. Can len+sizeof(*prim) > 1024 happen?
Allocating blocks of memory in equal size is often/typically more efficient and leads to less memory fragmentation than allocating odd lengths. This is particularly the case when using talloc pools, like we recently started to do in osmo-bts. In almost all OSmo* code modules, you will find a wrapper around msgb_alloc() or msgb_alloc_headroom(), whcih will typically allocate with a fixed size. See RSL_ALLOC_SIZE, L3_MSG_SIZE, MSGB_MAX, RR_ALLOC_SIZE, GSM_MACBLOCK_LEN, or sometimes hard-coded magic numbers, like in the case of osmo-bts-* definitions of l1p_msgb_alloc() or the like.
If the length should ever be larger, then msgb_push() beyond the size of the msgb will OSMO_ASSERT and you see the problem at runtime. I prefer this generally to 'hidden' increase of allocations to a runtime-computed length.
The idea is that a developer writing a particular module should generally have an idea about what roughly is the maximum message size at the given protocol that he implements and decide for a reasonable allocation size, including sufficient space in the headroom for what he might push to that msgb, including the respective primtive headers.
- SCCP Use SAP: prim->u doesn't have unknown size, but has the size of the largest member in the union. You should always allocate memory for sizeof(prim).
The image in my mind was that the data starts behind the header. If the header has a union, the data starts behind the union, even though the way the union is used may be smaller. "unknown" is the wrong word, yes...
So it's like
msgb { <--+ | +- *l2h | | | | | | data | | |prim | | | oph | | | *msgb -+ | | u +->|payload |... | }
But this would be easier to read:
prim { oph *payload ---+ u | msgb | data <--+ }
Is it really that much more difficult to write "oph->msg->l2h" than to your proposed "oph->data" ? And if that's the case, why not have a osmo_sua_prim_data(oph) that hides the additional pointer deref from you (and which oould do consistency checking like if oph->sap really is the SAP you expect)?
Why would you have to worry or care how things are ordered in memory? I really don't understand this part, sorry ;) Why would you care?
Your typical primitive-consuming function gets a 'struct osmo_prim_hdr *' as input, and from there you
a) know how to get to the msgb from that (oph->msgb)
b) know how to get the SAP-specific larger primitive structure from it (they type-casting macro. You know to what to type-cast by knowing from which SAP you receive primitives. You could even do it dynamically baesd on oph->sap value, or ASSERT to make sure your code is not making the wrong choice here)
c) know how to get to a particular layer header at any level of your protocol stack inside the msgb (msgb->l2h/l3h/...) (as indicated, a protocol stack / application / SAP specificmacro/inline function should probably be used anyway to have a more expressive way of what the l2 header at this SAP really is, so think of msgb_suah(oph->msg), msgb_ruah(oph->msg), msgb_iuh(oph->msg) which then resolve to the respective lXh or cb pointer.)
And if you want to wrap a msgb in a primitive, the given SAP and related code should contain a convenience function that does that for you, i.e. pushing the SAP-specific structure, including the osmo_prim_hdr at its beginning, to the head of the msgb. Basically a "I want to turn this msgb into a primitive at SAP X, using the following details related to the primitive"
- prim->oph (osmo_prim_hdr) pointing to the message data is way too restrictive. How would you go back to 'msg' in order to free it? How would yo get to all the other meta-data of a msg that it carries around, like msgb->dst, the header pointers to variuos layers, the msgb->cb information, ..?
Well, I'd kind of wish it didn't have to get back to all the generic bits of meta-data implicitly used in different ways depending on the code path, but that it would name its meta-data explicitly...
but where should that meta-data be stored and named explicitly? Whats' wrong with having that in msgb->cb, where it is always passed along with the mesasge?
Things that I think would improve the situation:
* include a 'magic value' in the msgb->cb[] that defines which structure is currently type-casted to it. This way we could have assert-style macros at various places in the code, making sure that nobody is trying to dereference it to the wrong structure * reduce the amount of direct lXh accesses, particularly for != l3h. In fact, they predate the cb, and if we had a cb from the beginning, they might not even have needed to exist. * document the usage of msgb (particularly lXh and cb usage) for each prtoocol/interface/stack/sap.
No, of course not a second pointer, but one plain structure without self-reference, with names a human coder can relate to...
but then you end up copying everything from a msgb (or it's cb) to that new structure and vice versa. I also think the question of naming is a completely separate topic. You can have (from your point of view) better naming also in the current structures / members / apis.
Regards, Harald
On Sat, Apr 02, 2016 at 01:47:48AM +0200, Harald Welte wrote:
You shouldn't have to care about the detailed usage of the cb[] as every program / library / protocol stack should have it's own wrappers and inline functions or macros to obtain the respective values
ok ... I see.
The l2h/l3h should proably be used less and more with a wrapper macro, too. Like 'msgb_rslh()' to get the rsl header, or the like. msgb->l3h should really only be used in cases where there is a clear definiton what the layer3 protocol is. In GSM, it is RR/MM/CC. The problem is that depending on the interface (Abis/A/Iuh/Iub/IuCS), below can be multiple sub-layes of 'layer 2'.
GSM is by far the most complex thing I've worked on in terms of layerings and specifications. It often seems so simple to just tell the other side what should happen, but to get it across takes layers of hard work ;)
Allocating blocks of memory in equal size is often/typically more efficient and leads to less memory fragmentation than allocating odd
ok, makes sense.
Is it really that much more difficult to write "oph->msg->l2h" than to
[...]
Why would you have to worry or care how things are ordered in memory? I really don't understand this part, sorry ;) Why would you care?
I think my problems start when I "briefly" want to understand everything about a given code path, but am tumbling down a rabbit hole of indirections and generic names used all over the place. Try to grep for 'cb' and you'd get huge amount of hits completely unrelated to a given 'cb' you might want to grok. If things are named explicitly, it's quite easy to grep for source and sinks of values put in it.
It's not harder to write things down, It's just often hard to find out on my own what to write down ;) My progress is often slow these days because I try to understand things, and that takes more work than I expect...
Well, but I'm getting there.
Your typical primitive-consuming function gets a 'struct osmo_prim_hdr *' as input, and from there you
a) know how to get to the msgb from that (oph->msgb)
b) know how to get the SAP-specific larger primitive structure from it
c) know how to get to a particular layer header at any level of your protocol stack inside the msgb (msgb->l2h/l3h/...)
With a firm map like this in your head, it sounds quite straightforward, yes. I've come from the other side: I saw a bare msgb struct and didn't know how to navigate it, first need to find the infrastructure around it.
Things that I think would improve the situation:
- include a 'magic value' in the msgb->cb[] that defines which structure is currently type-casted to it. This way we could have assert-style macros at various places in the code, making sure that nobody is trying to dereference it to the wrong structure
- reduce the amount of direct lXh accesses, particularly for != l3h. In fact, they predate the cb, and if we had a cb from the beginning, they might not even have needed to exist.
- document the usage of msgb (particularly lXh and cb usage) for each prtoocol/interface/stack/sap.
Sounds pretty good: a new reader should be guided to the higher level view and specific usage within a given protocol stack. Assertions for 'magic values' in the code could highlight that and guide to the documentation.
~Neels