From alex777evil at gmail.com Fri Mar 1 10:20:03 2019 From: alex777evil at gmail.com (Alex) Date: Fri, 01 Mar 2019 13:20:03 +0300 Subject: GSUP <-> GSM MAP protocol conversion Message-ID: <1551435603.28759.1.camel@gmail.com> Hi ! i'm very interested about GSUP <-> GSM MAP protocol conversion to perform HLR interrogation (at least for plastic roaming support) ... is there some related activity at OSMOCOM project ? Wbr, Alex From laforge at gnumonks.org Sun Mar 3 14:55:38 2019 From: laforge at gnumonks.org (Harald Welte) Date: Sun, 3 Mar 2019 15:55:38 +0100 Subject: Osmocom-Debian-install-test fails for 10 consecutive days! In-Reply-To: <20190220180927.GC11466@nataraja> References: <20190220084639.GA11466@nataraja> <9108471f-5684-0e31-a368-3c1ff9e5c9b0@sysmocom.de> <20190220180927.GC11466@nataraja> Message-ID: <20190303145538.GE9019@nataraja> On Wed, Feb 20, 2019 at 07:09:28PM +0100, Harald Welte wrote: > I guess unless there's significant disagreement, I'll create something like > jenkins-notifications at lists.osmocom.org ? done. Will be adding this address to our jenkins build jobs in a patch asap. -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From nhofmeyr at sysmocom.de Mon Mar 4 14:11:45 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Mon, 4 Mar 2019 15:11:45 +0100 Subject: static strings -- little fun on the side In-Reply-To: <20190226221914.GK27673@nataraja> References: <20190226192153.GA21202@my.box> <20190226221914.GK27673@nataraja> Message-ID: <20190304141145.GA3392@my.box> On Tue, Feb 26, 2019 at 11:19:14PM +0100, Harald Welte wrote: > So basically > 1) every time we come out of select() and call a filedescriptor callback, > we would create a new talloc context. > 2) any code, including those that want to print messages to buffers, etc. > could allocate memory from that context without having to care about > releasing it > 3) once we return from the file descriptor call-back, we talloc_free() > that "master" context, taking with it all the child allocations hat may have > been allocated underneath. I like that approach. If dyn allocations become a problem there could even be a large block that is re-used across select() cycles, and enlarged if necessary but otherwise staying in place. Probably will not be a problem though. > One might be able to get similar but slightly different semantics by > attaching those strings to the 'msgb' that's currently being processed. That would complicate API signatures, e.g. things like osmo_plmn_name()? I think that makes sense only for msgb_hexdump(), where a msgb is already part of the signature. But I'd say just using the same mechanism would be fine. > can make it OOM free by simply returning a static const char * "" Yes, probably better than a program abort. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From ralph at schmid.xxx Tue Mar 5 15:02:53 2019 From: ralph at schmid.xxx (Ralph A. Schmid, dk5ras) Date: Tue, 5 Mar 2019 16:02:53 +0100 Subject: LimeNET Micro In-Reply-To: References: Message-ID: <156201d4d364$82773db0$8765b910$@schmid.xxx> Hi, > I was wondering if anybody else has looked at it, has comments, or has or is > thinking of ordering one? Mine is to be shipped by the end of this month, and I also have some expectations in it regarding GSM and MMDVM/DMR. We will see. > Thanks! Ralph. From ralph at schmid.xxx Wed Mar 6 07:49:06 2019 From: ralph at schmid.xxx (Ralph A. Schmid, dk5ras) Date: Wed, 6 Mar 2019 08:49:06 +0100 Subject: LimeNET Micro In-Reply-To: <156201d4d364$82773db0$8765b910$@schmid.xxx> References: <156201d4d364$82773db0$8765b910$@schmid.xxx> Message-ID: <010c01d4d3f1$138920c0$3a9b6240$@schmid.xxx> I want to add, if someone can take me only a little bit at the hand through installing and setting up all the osmo stuff on this device, I could give access via ssh for further tests and measurements, and I would write together a kind of installation manual to set this up from scratch. This thingy has the power to bring the project to a larger audience, when setting it up gets doable for the average nerd. My knowledge is more on the air interface side, so doing the fine tuning of interwork of software modules is not exactly what I am able to. However the basic stuff I should be able to manage, installing, compiling, configuring and creating a remotely usable test bed. Ralph. > -----Original Message----- > From: OpenBSC [mailto:openbsc-bounces at lists.osmocom.org] On Behalf Of > Ralph A. Schmid, dk5ras > Sent: Tuesday, March 5, 2019 4:03 PM > To: 'openbsc at lists.osmocom.org' > Subject: RE: LimeNET Micro > > Hi, > > > I was wondering if anybody else has looked at it, has comments, or has or is > > thinking of ordering one? > > Mine is to be shipped by the end of this month, and I also have some > expectations in it regarding GSM and MMDVM/DMR. We will see. > > > Thanks! > > Ralph. From laforge at gnumonks.org Fri Mar 8 11:25:58 2019 From: laforge at gnumonks.org (Harald Welte) Date: Fri, 8 Mar 2019 12:25:58 +0100 Subject: RFC: osmo_ip_port API proposal Message-ID: <20190308112558.GI5882@nataraja> Hi all, Neels has recently proposed an osmo_ip_port API, see https://gerrit.osmocom.org/#/c/libosmocore/+/13123 I'm somewhat reluctant to get this merged into libosmocore, as from my point of view, it's reinventing what sockaddr_storage is doing in libc, but storign the address in host byte order and string format. So I would argue we should rather create helper/utility functions around sockaddr_storage and do any string/binary and endianness conversions hidden by/within that API. Irrespective of the above, I would want to hear what other developers think. Do you think that it's worthwhile to 1) have some utility functions / infrastructure (irrespective of the data type) 1a) in libosmocore, or 1b) keep it to osmo-mgw 2) prefer to 2a) have strings for IP adresses and host-byte-order port numbers like the proposed patchset, or 2b) go with native sockaddr_storage? If others think it should be merged, I won't try to veto it. I just want to hear some more voices rather than just my own point-of-view. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From nhofmeyr at sysmocom.de Sun Mar 10 03:31:05 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Sun, 10 Mar 2019 04:31:05 +0100 Subject: easier way to avoid msgb leaks in libosmo-sccp callers? Message-ID: <20190310033105.GA3276@my.box> Avoiding msgb leaks is easiest if the caller retains ownership of the msgb. Take this hypothetical chain where leaks are obviously avoided: void send() { msg = msgb_alloc(); dispatch(msg); msgb_free(msg); } void dispatch(msg) { osmo_fsm_inst_dispatch(fi, msg); } void fi_on_event(fi, data) { if (socket_is_ok) socket_write((struct msgb*)data); } void socket_write(msgb) { if (!ok1) return; if (ok2) { if (!ok3) return; write(sock, msg->data); } } However, if the caller passes ownership down to the msgb consumer, things become nightmarishly complex: void send() { msg = msgb_alloc(); rc = dispatch(msg); /* dispatching event failed? */ if (rc) msgb_free(msg); } int dispatch(msg) { if (osmo_fsm_inst_dispatch(fi, msg)) return -1; if (something_else()) return -1; // <-- double free! } void fi_on_event(fi, data) { if (socket_is_ok) { socket_write((struct msgb*)data); else /* socket didn't consume? */ msgb_free(data); } int socket_write(msgb) { if (!ok1) return -1; // <-- leak! if (ok2) { if (!ok3) goto out; write(sock, msg->data); } out: msgb_free(msg); return -2; } If any link in this call chain fails to be aware of the importance to return a failed RC or to free a msgb if the chain is broken, we have a hidden msgb leak. This is the case with osmo_sccp_user_sap_down(). In new osmo-msc, passing data through various FSM instances, there is high potential for leak/double-free bugs. A very large brain is required to track down every msgb path. Isn't it possible to provide osmo_sccp_user_sap_down() in the caller-owns paradigm? Thinking about an osmo_sccp_user_sap_down2() that simply doesn't msgb_free(). Passing ownership to the consumer is imperative if a msg queue is involved that might send out asynchronously. (A workaround could be to copy the message before passing into the wqueue.) However, looks to me like no osmo_wqueue is involved in osmo_sccp_user_sap_down()? It already frees the msgb right upon returning, so this should be perfectly fine -- right? I think I'll just try it, but if anyone knows a definite reason why this will not work, please let me know. (Remotely related, I also still have this potential xua msg leak fix lying around, never got around to verify it: https://gerrit.osmocom.org/#/c/libosmo-sccp/+/9957/ ) ~N -- - Neels Hofmeyr http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Gesch?ftsf?hrer / Managing Directors: Harald Welte -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From mykola at pentonet.com Mon Mar 11 08:57:05 2019 From: mykola at pentonet.com (Mykola Shchetinin) Date: Mon, 11 Mar 2019 10:57:05 +0200 Subject: 3G PS hack (RE: 35c3 feedback) Message-ID: Hello Keith, > 2) 3G data instabilities > ( Just a quick observation note, no pcap! ) > I happened to notice that pinging a 3G handset from the network side > (default ping: 1 sec internal, 64 ICMP bytes) keeps the connection > "alive" and using 3G data is then a pleasant experience, IM, email, > browsing, SIP call all working nicely. peak max download speed of 16Mbps > was reported at one time by m.speedof.me Can I ask you to expand a bit more on the topic? Sorry, it's been a while after you sent the original email. >From where did you perform a ping? From the machine where GGSN is running? And that also means that to run 3G PS network now it is needed to make some type of a service which will ping all the handsets registered, right? (Actually, that is what I am going to do now while the SGSN isn't fixed by somebody or us) Thank You! Kind regards, Mykola From laforge at gnumonks.org Mon Mar 11 20:48:17 2019 From: laforge at gnumonks.org (Harald Welte) Date: Mon, 11 Mar 2019 21:48:17 +0100 Subject: SCCP connection identifiers on the SCCP User SAP Message-ID: <20190311204817.GA725@nataraja> Hi Neels, from your IRC question today: > when opening a new conn on SCCP, what's the proper way to get a > conn_id? I want to feed OSMO_SCU_PRIM_N_CONNECT into > osmo_sccp_user_sap_down(), but it seems the caller needs to pick a > conn_id?? Whatever way you can think of to cough up a unique identifer for that connection. > I assumed that libosmo-sccp implicitly picks an unused local conn > reference, but that's not the case. Note that in this sentence you're now talking about the "SCCP local reference", which is something communicated on the wire between two SCCP providers (implementations). Hence, it is managed inside the SCCP provider[s] and can be seen in the source local reference / destination local reference field of the SCCP messages. That's *not strictly* the SCCP connection identifier which has significance only across the SCCP User SAP (i.e. between SCCP User and SCCP Provider on the same system), and which never is visible on any SCCP message on the wire. It's just an implementation shortcut of the Osmocom implementation that uses the same identifiers on both sides, rather than allocating separate ones. But getting back to your question: If the SCCP provider was to receive a N-CONNECT.req without some kind of identifier, and simply allocate one, how would that identifier be communicated back to the user? Those primitives work asynchronosuly. You'd have to come up with yet-another-identifier, like a "primitive tag" where that tag then would be eacho'ed back in the N-CONNECT.resp - and you end up again having to allocate some unique identifier :P > sccp_scoc.c has conn_create() which seems to pick an unused id, but > that part is static in a .c file > hnbgw just uses 1:1 the same conn_id from RUA to RANAP and thus doesn't invent new ones Now I'm confused. RUA isn't running over SCCP, right? > osmo-bsc goes through its list of &bsc_gsmnet->subscr_conns to pick an unused id > I can do that in osmo-msc, but it seems to me libosmo-sccp should have common API for that The SCCP User SAP is modelled strictly after the ITU specs. Always imagine yourself in a situation where the SCCP user and SCCP provider are running in different processes and they don't have access to each others's state - and all they can exchange are the SCCP User SAP primitives in some serialized form. While libosmo-sccp doesn't work like this (so far), we should always keep that in mind and keep the SAP boundary clean. As there's no primitive in ITU-T Q.7xx for "allocate me a local reference", we don't have one :/ I'm not sure what we should do here. If we introduce that kind of SCU primitive, then the questions is how are they allocated/released? Who is in charge of that? What kind of object would the SCCP provider use to keep track of allocated IDs for which there is no connection yet, as the N-CONNECT.req was not yet received? The current situation is not great. After all, theoretically there could be an incoming new SCCP connection for which the provider choses the same ID that the user at the same time choses for a new outbound connection -> boom. One could use something like the highest-order bit to distinguish between user-allocated and provider-allocated identifiers. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From nhofmeyr at sysmocom.de Tue Mar 12 04:04:09 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Tue, 12 Mar 2019 05:04:09 +0100 Subject: ttcn3: more than one BSC Message-ID: <20190312040409.GA15745@my.box> I'm trying to test inter-BSC Handover in ttcn3. At first I had problems grasping the concepts, but in the end it worked pretty nicely to start two distinct BSC handlers like this: testcase TC_ho_inter_bsc() runs on MTC_CT { var BSC_ConnHdlr vc_conn0; var BSC_ConnHdlr vc_conn1; f_init(2); vc_conn0 := f_start_handler(refers(f_tc_ho_inter_bsc0), 53, 0); vc_conn1 := f_start_handler(refers(f_tc_ho_inter_bsc1), 53, 1); vc_conn0.done; vc_conn1.done; } It's walking all the way through inter-BSC Handover now (!) up until the point where I want to discard the call. Now I'm facing the simple problem that I want to call f_call_hangup() in the second f_tc_ho_inter_bsc1() -- but I have no cpars (CallParameters) with a valid MNCC callref nor the CC transaction ID, those are in the first function. How can I share cpars between those functions? The transaction_id and callref determined by the MNCC and CC messages that happened in f_tc_ho_inter_bsc0 need to move over to f_tc_ho_inter_bsc1, much like the MS has moved to the other BSC. So it would make sense to have some global struct representing the MS which both BSC_ConnHdlr instances can access, if that is at all possible ... ? As a bit of a weak workaround, I could inter-BSC handover right back to the first BSC and then f_call_hangup() there :P ~N -- - Neels Hofmeyr http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Gesch?ftsf?hrer / Managing Directors: Harald Welte -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Tue Mar 12 05:47:43 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Tue, 12 Mar 2019 06:47:43 +0100 Subject: ttcn3: more than one BSC In-Reply-To: <20190312040409.GA15745@my.box> References: <20190312040409.GA15745@my.box> Message-ID: <20190312054743.GB15745@my.box> On Tue, Mar 12, 2019 at 05:04:09AM +0100, Neels Hofmeyr wrote: > So it would make sense to have some global struct representing the MS which > both BSC_ConnHdlr instances can access, if that is at all possible ... ? > The really interesting part then is the 24.007 11.2.3.2 N(SD) sequence number patching. Took me a moment to realize that when the one BSC wants to send CC DTAP while the other has advanced in N(SD) sequence numbering, the DTAP sent from TTCN3 is rejected as duplicate because TTCN3 has lost the MS state. So some way or other .. we need to manually tweak the TTCN3 N(SD) state to match the actual DTAP flow, or keep a common MS state in TTCN3 that sorts this out across BSCs. > As a bit of a weak workaround, I could inter-BSC handover right back to the > first BSC and then f_call_hangup() there :P Which works! except for the CC Release not getting recognised because of N(SD). So, yes, I'm seeing the first actual inter-BSC Handovers happening in ttcn3-msc-test :) Next, I'll go for some real-hardware testing. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Tue Mar 12 06:36:05 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Tue, 12 Mar 2019 07:36:05 +0100 Subject: SCCP connection identifiers on the SCCP User SAP In-Reply-To: <20190311204817.GA725@nataraja> References: <20190311204817.GA725@nataraja> Message-ID: <20190312063605.GC15745@my.box> On Mon, Mar 11, 2019 at 09:48:17PM +0100, Harald Welte wrote: > The SCCP User SAP is modelled strictly after the ITU specs. Always > imagine yourself in a situation where the SCCP user and SCCP provider > are running in different processes and they don't have access to each > others's state - and all they can exchange are the SCCP User SAP > primitives in some serialized form. So the point is that this next_id used in below patch could be in a different process than the one composing the N_CONNECT prim in need to invent the conn_id? How can individual users then possibly avoid treading all over each others scu_prim.u.*.conn_id numbers?? Assign a unique part of the number space to each user or something? So as long as we don't separate into different processes we're fine? To me personally, it feels like we should be done separating by now :P Using the current single-process status quo, I wrote in code what I mean. Is this bad in any way? http://git.osmocom.org/libosmo-sccp/commit/?h=neels/conn_id&id=9e735cc537dfe267aa399f58b3d79f07e590d12f The point is that when a CONNECT comes in from the remote side, this bit of code picks a conn_id that becomes visible in the scu_prim.u.connect.conn_id. The reverse way, sending a CONNECT from here to the remote side, why not use the exact same mechanism to cough up a prim.u.connect.conn_id? BTW, what osmo-bsc does (picking any conn_id it isn't currently using) also wouldn't work if there were multiple users for an SCCP provider in separate processes. How does existing separate-process software solve this? (In above patch I also introduced the possibility to exhaust the conn_id number space gracefully, however far fetched and completely unrealistic that may be.) This patch works for osmo-msc in ttcn3-msc-test for inter-BSC Handover. > What kind of object would the SCCP provider use > to keep track of allocated IDs The one in above patch is struct sccp_connection from sccp_scoc.c > for which there is no connection yet, as > the N-CONNECT.req was not yet received? (yes, also a tricky one in allocating a new RAN connection in osmo-msc.) In current single-process mode, sccp_scoc.c just inst->next_id++ and hence will only deal out the same ID 0xffffffff times later, regardless of the caller actually using them / of struct sccp_connection created or not. > The current situation is not great. After all, theoretically there > could be an incoming new SCCP connection for which the provider choses > the same ID that the user at the same time choses for a new outbound > connection -> boom. Currently, in a single process, sccp_scoc.c increases the inst->next_id for outgoing and for incoming requests, which is synchronous, and there is no problem. Would be if becoming asynchronous though. BTW, again looking at current osmo-bsc: since osmo-bsc invents its own new IDs without doing inst->next_id++, the next_id doesn't cycle so nicely, so possibly an outgoing conn_id that has just this instant stopped being used might be used right away again for an incoming N_CONNECT, because osmo-bsc didn't increment the inst->next_id. Or in other words, every Handover Request from the MSC will start from id 0 while Complete-L3 from BSC have already moved up past 0 before. Not illegal as such, but still would be nicer to have some grace period before re-using existing numbers, if for nothing else then for reading log files. So while we're having the provider in the same process, it might be nicer to also use above patch in osmo-bsc, to ensure the next id is always larger than the previous ones. ...unless you're completely against this kind of thing. Though there isn't really an alternative solution right now AFAICT. > > hnbgw just uses 1:1 the same conn_id from RUA to RANAP and thus doesn't invent new ones > > Now I'm confused. RUA isn't running over SCCP, right? Either way there is some RUA id and hnbgw just uses that 1:1 for outgoing RANAP connections. There aren't any incoming ones from RANAP, at least yet. So when I tried to copy hngbw's way of coughing up an id I was disappointed. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From msuraev at sysmocom.de Tue Mar 12 11:24:15 2019 From: msuraev at sysmocom.de (Max) Date: Tue, 12 Mar 2019 12:24:15 +0100 Subject: Google Summer of Code 2019 In-Reply-To: References: <20190115161349.GP11166@nataraja> Message-ID: <3345c985-79af-92cd-a51d-69e523460614@sysmocom.de> Thanks for working on this. Don't give up just yet though - there's another opportunity on the horizon: https://opensource.googleblog.com/2019/03/introducing-season-of-docs.html In short, Google's SoD will pay for people writing documentation for FOSS projects the same way GSoC pays for writing code. And documentation is something we can always get improved. What do you think? ;-) 26.02.19 17:03, Vadim Yanitskiy ?????: > Hi all, > > since we have seen more neutral and positive opinions than negative ones, > we've tried to apply as a mentor organization. Unfortunately, our application > has been rejected. Sadly, but not the end of the world ;) > > Thanks for your feedback! > ... and kudos to Harald! > > With best regards, > Vadim Yanitskiy. -- - Max Suraev http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Harald Welte From laforge at gnumonks.org Tue Mar 12 20:26:00 2019 From: laforge at gnumonks.org (Harald Welte) Date: Tue, 12 Mar 2019 21:26:00 +0100 Subject: Google Summer of Code 2019 In-Reply-To: <3345c985-79af-92cd-a51d-69e523460614@sysmocom.de> References: <20190115161349.GP11166@nataraja> <3345c985-79af-92cd-a51d-69e523460614@sysmocom.de> Message-ID: <20190312202600.GR16161@nataraja> Hi Max, On Tue, Mar 12, 2019 at 12:24:15PM +0100, Max wrote: > Don't give up just yet though - there's another opportunity on the horizon: > https://opensource.googleblog.com/2019/03/introducing-season-of-docs.html > > In short, Google's SoD will pay for people writing documentation for FOSS > projects the same way GSoC pays for writing code. > > And documentation is something we can always get improved. > > What do you think? ;-) I think its great. I don't have high hopes that we'd qualify, as it's a bit of a too niche topic compared to many other projects out there. However, also keep in mind that the technical writer (if any) will arrive with no knowledge about Osmocom in the first place, and will likely require tons of input from us. I actually think those manuals that we had created about two yearas ago are half-way-decent, particularly if you know where we're coming from. The big problem is that there's 1) a lack of updating manuals as we move the code along. We should pay attention in our code reviews that any user-visible code changes such as particularly VTY changes should always come with changes to the manuals at the same time. I think that the manuals where more fitting the implementation 2 years ago than they are now. This is the responsibility of the developers introducing changes. No amount of external technical writers can change that. 2) still a number of osmocom CNI projects that don't have a user manual at all. Contributions always welcome. However, again that's nothing a technical writer can do by himself unless he has deep knowledge of at least 3GPP architectures and protocols... -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Tue Mar 12 20:46:28 2019 From: laforge at gnumonks.org (Harald Welte) Date: Tue, 12 Mar 2019 21:46:28 +0100 Subject: easier way to avoid msgb leaks in libosmo-sccp callers? In-Reply-To: <20190310033105.GA3276@my.box> References: <20190310033105.GA3276@my.box> Message-ID: <20190312204628.GS16161@nataraja> Hi Neels, On Sun, Mar 10, 2019 at 04:31:05AM +0100, Neels Hofmeyr wrote: > Avoiding msgb leaks is easiest if the caller retains ownership of the msgb. ACK. > However, if the caller passes ownership down to the msgb consumer, things > become nightmarishly complex: ACK. > If any link in this call chain fails to be aware of the importance to return a > failed RC or to free a msgb if the chain is broken, we have a hidden msgb leak. Indeed. This is why I have been arguing in favor of something like a "packet dispatch talloc context" for some time. Unfortuantely I didn't have any time to work on this so far. The proposal is along those lines: * we create a talloc context every time we drop out of the select() statement * any msgb's allocated from routines that read from sockets / etc. are allocated from that context * just before we return into osmo_select_main() [or in the start of that function], we free the entire talloc context, which will free any msgb's in it * if anyone wants to take ownership of a msgb for a longer time (e.g. enqueue it somewhere for delayed processing, ...) they need to re-parent the object away from the volatile "packet dispathc talloc context" and move it into another context. Talloc provides talloc_steal() or better talloc_move() for it. I would think this is an elegant solution. That particular volatile talloc context would not be restricted to msgb's but could be used for any kind of 'temporary' allocations. We could lazily allocate whatever we wanted (like temporary string buffers, etc.) and know that it will be safely released once the current select dispatch ceases to exist. Sure, whenever we can we should still use the stack, as stack allocations are cheaper than heap allocations. > This is the case with osmo_sccp_user_sap_down(). In new osmo-msc, passing data > through various FSM instances, there is high potential for leak/double-free > bugs. A very large brain is required to track down every msgb path. Let me know what you think of the above-mentioned proposal. We could have something like msgb_alloc2() which takes an additional talloc context as argument. Pluse creating that [gloal variable] talloc context in the beginning of osmo_fd_disp_fds() and talloc_free()ing it at the end of osmo_fd_disp_fds() should do the trick. One might evene be a able to do some trickery with talloc_set_destructor() to print a BIG FAT WARNING/ERROR message in case anyone ever tries to free memory allocated from this context manually. That destructor would simpl have to be removed or disabled when the select loop handling takes care of it. > Isn't it possible to provide osmo_sccp_user_sap_down() in the caller-owns > paradigm? Thinking about an osmo_sccp_user_sap_down2() that simply doesn't > msgb_free(). It's possible. But I think there are use cases for both behaviors. > Passing ownership to the consumer is imperative if a msg queue is involved that > might send out asynchronously. (A workaround could be to copy the message > before passing into the wqueue.) The proposed approach would allow for "re-parenting" without having to copy the msgb or any of the metadata. Even the address of the msgb stays the same if you move/steal it. > I think I'll just try it, but if anyone knows a definite reason why this will > not work, please let me know. Not off my head, but I guess nobody is able to make guarantees on this without a thorough code audit and/or extensive testing. THat's why I like the idea of this volatile context. However, I realize only now that you're referring not to the case of received message buffers, but about locally-originated msgbs. But for them, too, the rule could be simply something like this: * allocate memory from the volatile context * pass it into any library function * if the library function transmits it right away, no need to free/steal * if the library function enqueues it somewhere and doesn't immediately send it over a socket or the like, it needs to move/steal it. Thinking of the above, I think the 'being queued' case is more or less the normal one, as we can never make the assumption that the underlying transport layer is able to accept transmission of any data in a non-blocking way. So there will always be some kind of write queue wherever we transmit. The only situation where we can "normally" assume that the msgb of any outbound message can be free'd immediately is: * datagram sockets like for RTP * local primitives that are simply sent to another layer in the local stack, and which are released immediately after that stack has received them Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Tue Mar 12 21:07:25 2019 From: laforge at gnumonks.org (Harald Welte) Date: Tue, 12 Mar 2019 22:07:25 +0100 Subject: ttcn3: more than one BSC In-Reply-To: <20190312054743.GB15745@my.box> References: <20190312040409.GA15745@my.box> <20190312054743.GB15745@my.box> Message-ID: <20190312210725.GU16161@nataraja> Hi Neels, On Tue, Mar 12, 2019 at 06:47:43AM +0100, Neels Hofmeyr wrote: > The really interesting part then is the 24.007 11.2.3.2 N(SD) sequence number > patching. Took me a moment to realize that when the one BSC wants to send CC > DTAP while the other has advanced in N(SD) sequence numbering, the DTAP sent > from TTCN3 is rejected as duplicate because TTCN3 has lost the MS state. So > some way or other .. we need to manually tweak the TTCN3 N(SD) state to match > the actual DTAP flow, or keep a common MS state in TTCN3 that sorts this out > across BSCs. That's another indication that somehow the component architecture of the test isn't right. Either one goes for the "single component with two BSSAP ports [one for each emulated BSC]" approach I suggested in my previous e-mail, or one has to separate the "simulated MS component" from the "simulated BSC component". This way you could "disconnect" a test port between the MS and BSC0 and re-connect that port to BSC1, while the MS component retains all of its state. Sounds more complex to implement than the "single component with two BSSAP ports" approach. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Tue Mar 12 21:04:34 2019 From: laforge at gnumonks.org (Harald Welte) Date: Tue, 12 Mar 2019 22:04:34 +0100 Subject: ttcn3: more than one BSC In-Reply-To: <20190312040409.GA15745@my.box> References: <20190312040409.GA15745@my.box> Message-ID: <20190312210434.GT16161@nataraja> Hi Neels, On Tue, Mar 12, 2019 at 05:04:09AM +0100, Neels Hofmeyr wrote: > At first I had problems grasping the concepts, but in the end it worked pretty > nicely to start two distinct BSC handlers like this: > > testcase TC_ho_inter_bsc() runs on MTC_CT { > var BSC_ConnHdlr vc_conn0; > var BSC_ConnHdlr vc_conn1; > f_init(2); > > vc_conn0 := f_start_handler(refers(f_tc_ho_inter_bsc0), 53, 0); > vc_conn1 := f_start_handler(refers(f_tc_ho_inter_bsc1), 53, 1); > vc_conn0.done; > vc_conn1.done; > } > > It's walking all the way through inter-BSC Handover now (!) up until the point > where I want to discard the call. > > Now I'm facing the simple problem that I want to call f_call_hangup() in the > second f_tc_ho_inter_bsc1() -- but I have no cpars (CallParameters) with a > valid MNCC callref nor the CC transaction ID, those are in the first function. > > How can I share cpars between those functions? You cannot. THose are not different "functions", but you have to think of those as separate processes runnign somewhere, possibly even on different machines. Every TTCN3 component (at least in Titan) runs self-contained, as a separate process. The only way of interacting with it is via message-based / sequential transports, such as the test ports (message or procedure based). Depending on the size/complexity of your setup, each component could run on a different machine, and they use socket based transports in between. Given that, you cannot have shared state between them, as you cannot assume they'd have shared memory of some sort. If they need to talk, then you have to implement some kind of "IPC", e.g. by adding message or procedure ports between them and it quickly becomes very complex. For testing the MSC side of inter-BSC-handover, I would assume one creates a single test component that has two BSSAP test ports, so basically you can behave like BSC0 and BSC1 from within a single testcase or any of its functions - as well as handle the MCCC side of it. I also think that's the only way in which you can enforce a particular order of events from your test. If you're runnign two parallel compoments, each of which is testing only half of your IUT, then they run completely concurrently without any synchronization. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From nhofmeyr at sysmocom.de Thu Mar 14 21:08:34 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Thu, 14 Mar 2019 22:08:34 +0100 Subject: merge conflicts in osmo-msc Message-ID: <20190314210834.GA2374@my.box> Hi everyone, and in this particular case Sylvain, I know I'm hogging osmo-msc.git and I don't intend to hinder everyone else's work, but I doubt that it was really necessary to merge the silent call channel types patch to master right now. I'm really overloaded with doing the inter-MSC handover work, I'm again postponing my well earned leave that I would have liked to have started two months ago, and I don't want to be burdened with also resolving everyone else's merge conflicts. You all know that pretty much everything changes in osmo-msc, even if only slightly. As a rule of thumb, if you see a 'ran_conn', 'vlr_subscr_get()', a Paging callback function or a 'gsm_trans' in a bit of code, likely there will be merge conflicts with my current branch. Most should be trivial, but they will be stones put in my way. So please, before you merge onto master, consider doing the same work on the tip of my branch 'neels/ho' in osmo-msc, rather than on current master. That would be the ideal situation for me, because then you also test my patches. There still is ongoing work at the tip of neels/ho, if you want a more stable point to apply modifications to, look at the commit 'add LOG_TRANS, proper context for all transactions' and rebase your changes onto that. If you push those changes onto a private branch, I will even see them in tig and can simply incorporate them in by branch, carrying them along until we're ready to merge. If you prefer working on master, still do me a favor: just try to rebase the patch onto neels/ho. All the conflicts that you see in such a rebase will end up on my table and stop me from going forward until I have resolved them. Consider that before you hit that "Submit" button on gerrit in the osmo-msc.git repository. Plus, if I resolve conflicts with *your* code, likely I won't grok some minor detail and introduce bugs. So let's work together on this. Thanks! Working on osmo-msc.git neels/ho branch also needs various patches in other repositories; all these branches are kept up-to-date every day and all the time: libosmocore neels/misc libosmo-sccp neels/conn_id osmo-mgw neels/endpoint_fsm osmo-msc neels/ho Thanks again! ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Thu Mar 14 21:33:22 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Thu, 14 Mar 2019 22:33:22 +0100 Subject: merge conflicts in osmo-msc In-Reply-To: <20190314210834.GA2374@my.box> References: <20190314210834.GA2374@my.box> Message-ID: <20190314213322.GB2374@my.box> Hi Sylvain, in this particular patch I ended up not having the patience to understand enough details right now to carry out the rebase. I think likewise you might need a moment to understand the new osmo-msc code. So I guess we should discuss what should happen in prose and I hopefully can help you translate into code. But right now I'm reluctant to get distracted at all, so now I actually start my neels/ho branch with a revert of the osmo-msc 'Allow different channel types to be requested as silent calls' patch -- the intention is not to undo your work, but instead to not lose my focus on inter-MSC handover until we can figure it out. I realize I should have raised these issues on gerrit before merging, but for obvious reasons I'm also currently avoiding hours of gerrit review that I would normally do... Sorry about that; let's see how we can resolve. I suspect there is a more trivial solution than deferring; after all, getting into the COMMUNICATING state should happen by event dispatch and should be possible any number of times, in particular fired whenever a valid transaction gets started (on neels/ho by sending the event MSC_A_EV_TRANSACTION_ACCEPTED to msc_a). There is no onenter action, so I'm quite sure the COMMUNICATING state is not a reason to defer anything. If entering COMMUNICATING more than once triggers bugs, then we should guard that events get triggered only on first entering the COMMUNICATING state -- but I don't see anything being triggered at all anyway. The sole purpose of the COMMUNICATING state is to remove the FSM timeout. On the neels/ho branch, you will now find MSC_A_ST_COMMUNICATING in msc_a.c, not in ran_conn.c. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Fri Mar 15 01:25:12 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Fri, 15 Mar 2019 02:25:12 +0100 Subject: osmo-bsc voice assignment broken by 'assignment_fsm: Properly support assigning signalling mode TCH/x' Message-ID: <20190315012512.GA17232@my.box> Hi Sylvain, completely unrelated to the osmo-msc patch, your other patch that was merged to osmo-bsc breaks voice channel assignment, and I had to revert it. Mentioned so on IRC, but since you didn't reply I thought it best to also post this here: tnt, I think you broke voice Assignment in osmo-bsc. Can't get a voice call in current master. It works again if I revert 4d3a21269b25e7164a94fa8ce3ad67ff80904aee tnt, I'm choosing to revert this from current master now; let's fix it, test and the re-apply the patch when it is ready https://gerrit.osmocom.org/c/osmo-bsc/+/13256 I hope I'm not annoying you by annulling all of your patches :P It's definitely not on purpose! ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Sun Mar 17 23:00:04 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Mon, 18 Mar 2019 00:00:04 +0100 Subject: GSUP routing for different kinds of entities Message-ID: <20190317230004.GA25662@my.box> Looking at sending GSUP messages between MSCs via an HLR acting as forwarding agent, I see that the current decision for GSUP message consumption is suboptimal: Depending on the message type sent and received, libvlr of osmo-msc forwards GSUP messages to the MSC code, and there, again, depending on the message type, specific callbacks get invoked. See vlr_gsupc_read_cb() and msc_vlr_route_gsup_msg(). In current osmo-msc it might seem to make sense to first resolve the IMSI to a vlr_subscr in vlr.c. But if osmo-msc acts as a Handover target for an inter-MSC Handover, it should be able to handle unknown IMSIs. Also, should we ever go for a separate SMSC process, the VLR as first stage makes no sense. Finding a vlr_subscr is a one-liner with vlr_subscr_find_by_imsi(). I would much rather have an explicit destination entity advertised in the GSUP messages, and an explicit common GSUP MUX stage. In other words, the VLR of osmo-msc shouldn't act as a GSUP forwarder, it should merely be one of the GSUP consumers, and shouldn't even be involved when the messages are intended for inter-MSC, for USSD or for SMS use. And finally, for GSUP error responses, for example a report that a specific target could not be reached, it may not be possible to trivially derive the right GSUP message consumer from the GSUP message (like "Routing Error"). Going towards that idea, I have put in place the following in my temporary dev source tree: enum osmo_gsup_entity { OSMO_GSUP_ENTITY_NONE = 0, OSMO_GSUP_ENTITY_HLR, OSMO_GSUP_ENTITY_VLR, OSMO_GSUP_ENTITY_ESME, OSMO_GSUP_ENTITY_SMSC, OSMO_GSUP_ENTITY_USSD, // FIXME: what's an "ESME"/"SMSC" for USSD? OSMO_GSUP_ENTITY_MSC_A, OSMO_GSUP_ENTITY_MSC_B, OSMO_GSUP_ENTITY_COUNT, }; struct osmo_gsup_message { [...] enum osmo_gsup_entity source_entity; enum osmo_gsup_entity destination_entity; [...] }; For calling the right rx_cb, we would need only an explicit target kind, but for returning errors it is better to also include the source entity kind explicitly. A gsup_client_mux API: struct gsup_client_mux_rx_cb { int (* func )(struct gsup_client_mux *gcm, void *data, const struct osmo_gsup_message *msg); void *data; }; struct gsup_client_mux { struct osmo_gsup_client *gsup_client; /* Target clients by enum osmo_gsup_entity */ struct gsup_client_mux_rx_cb rx_cb[OSMO_GSUP_ENTITY_COUNT]; }; int gsup_client_mux_init(struct gsup_client_mux *gcm, struct osmo_gsup_client *gsup_client); int gsup_client_mux_tx(struct gsup_client_mux *gcm, const struct osmo_gsup_message *gsup_msg); void gsup_client_mux_tx_error_reply(struct gsup_client_mux *gcm, const struct osmo_gsup_message *gsup_orig, enum gsm48_gmm_cause cause); For backwards compat, we would still need to do target classification by message type, but only if no explicit destination_entity is set: static enum osmo_gsup_entity gsup_client_mux_classify(struct gsup_client_mux *gcm, const struct osmo_gsup_message *gsup) { if (gsup->destination_entity) return gsup->destination_entity; /* Legacy message that lacks an explicit target entity. Guess by message type for backwards compat: */ switch (gsup_msg->message_type) { case OSMO_GSUP_MSGT_PROC_SS_REQUEST: case OSMO_GSUP_MSGT_PROC_SS_RESULT: case OSMO_GSUP_MSGT_PROC_SS_ERROR: return OSMO_GSUP_ENTITY_USSD; case OSMO_GSUP_MSGT_MO_FORWARD_SM_ERROR: case OSMO_GSUP_MSGT_MO_FORWARD_SM_RESULT: case OSMO_GSUP_MSGT_READY_FOR_SM_ERROR: case OSMO_GSUP_MSGT_READY_FOR_SM_RESULT: case OSMO_GSUP_MSGT_MT_FORWARD_SM_REQUEST: return OSMO_GSUP_ENTITY_SMSC; default: /* osmo-hlr capable of forwarding inter-MSC messages always includes the target entity, so any * other legacy message is for the VLR. */ return OSMO_GSUP_ENTITY_VLR; } } We'd have: HLR <-> VLR ESME <-> SMSC USSD <-> USSD (names??) MSC_A <-> MSC_B Thanks for your thoughts. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Tue Mar 19 16:57:25 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Tue, 19 Mar 2019 17:57:25 +0100 Subject: easier way to avoid msgb leaks in libosmo-sccp callers? In-Reply-To: <20190312204628.GS16161@nataraja> References: <20190310033105.GA3276@my.box> <20190312204628.GS16161@nataraja> Message-ID: <20190319165725.GA27744@my.box> On Tue, Mar 12, 2019 at 09:46:28PM +0100, Harald Welte wrote: > This is why I have been arguing in favor of something like a "packet dispatch > talloc context" for some time. Unfortuantely I didn't have any time to work > on this so far. > > The proposal is along those lines: > * we create a talloc context every time we drop out of the select() statement > * any msgb's allocated from routines that read from sockets / etc. are allocated > from that context > * just before we return into osmo_select_main() [or in the start of that function], > we free the entire talloc context, which will free any msgb's in it > * if anyone wants to take ownership of a msgb for a longer time (e.g. enqueue it > somewhere for delayed processing, ...) they need to re-parent the object away > from the volatile "packet dispathc talloc context" and move it into another > context. Talloc provides talloc_steal() or better talloc_move() for it. > > I would think this is an elegant solution. I fully agree, it's a great solution for two problems we have: msgb ownership as well as re-using foo_name() functions in the same print format. One limitation to the talloc_steal(): assume some code path already stole from the volatile context, then it would be a bit rude to have another code path steal it from the first code path again. So maybe it would be good to be able to determine whether a msgb is still volatile or already owned, so that implementations can decide to only steal from the volatile context. > We could lazily allocate whatever we wanted (like temporary string buffers, > etc.) and know that it will be safely released once the current select dispatch > ceases to exist. That is very desirable IMO. I've looked at some of your WIP patches in gerrit, and there placed a comment towards this. I would like to change the current string buffer returning functions under the hood without having to change the function signature -- after all, the callers don't need to care whether a buffer is static or from a volatile context. No caller anywhere is allowed to assume the returned string remains valid for a long time, so they do already strcpy() if they have to keep the returned string safe. The current patches look like we need to produce a lot of patches to use the volatile API... see https://gerrit.osmocom.org/#/c/libosmocore/+/13311/ > use the stack whenever we can One advantage of changing all string buffer functions to a volatile buffer would be that any static-buf-reuse bugs we might already have in our LOGP or printf() will be fixed implicitly, and will also be avoided for the future. I would very much like to not have to think about writing log statements so much, but make foo_name() invocations safe, period. After all, we don't log that much if LOGL_DEBUG is disabled, and for DEBUG switched on we are typically in a non-production environment. If dynamic allocations are a problem, I could even imagine some array of 10 or so string buffers that are always kept around to avoid re-allocating too much for logging strings. You catch my drift, I'm for changing the current implementation without needing explicit changes to calling code. I'm certain that you have good reasons to not want to do that, but are they worth the human time to use/maintain new API? > > This is the case with osmo_sccp_user_sap_down(). > > Let me know what you think of the above-mentioned proposal. It would make my other patch obsolete, in a good way. https://gerrit.osmocom.org/13277 osmo_sccp_user_sap_down2() > One might evene be a able to do some trickery with talloc_set_destructor() > to print a BIG FAT WARNING/ERROR message in case anyone ever tries to free > memory allocated from this context manually. It's not harmful to free volatile bits ahead of time though. It is even desirable. If I want to use the current osmo_sccp_user_sap_down() as-is, I could just allocate a volatile msgb and feed that down the code path that ends up in osmo_sccp_user_sap_down(). If the msgb reaches all the way there, it gets freed within osmo_sccp_user_sap_down(), otherwise the volatile context will clean up the leftovers from error handling. If we instead enforce that volatile contexts are not freed manually, then we would again need an osmo_sccp_user_sap_down2() API change; I assume that's not the only one. > However, I realize only now that you're referring not to the case of received message > buffers, but about locally-originated msgbs. But for them, too, the rule could > be simply something like this: > * allocate memory from the volatile context > * pass it into any library function > * if the library function transmits it right away, no need to free/steal > * if the library function enqueues it somewhere and doesn't immediately send > it over a socket or the like, it needs to move/steal it. exactly. For changing existing enqueueing functions without needing a new API version, above idea would come in handy: determine whether the msgb is in the / a volatile context or not. If it is, steal, otherwise act as before. > Thinking of the above, I think the 'being queued' case is more or less the normal > one, as we can never make the assumption that the underlying transport layer > is able to accept transmission of any data in a non-blocking way. So there will > always be some kind of write queue wherever we transmit. In some cases we actually memcpy(), as I think osmo_sccp_user_sap_down() does. Otherwise it wouldn't be able to msgb_free() before returning, which it does, right? > * local primitives that are simply sent to another layer in the local stack, > and which are released immediately after that stack has received them ^ yes, that's what osmo_sccp_user_sap_down() does, of course ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From laforge at gnumonks.org Wed Mar 20 07:22:35 2019 From: laforge at gnumonks.org (Harald Welte) Date: Wed, 20 Mar 2019 08:22:35 +0100 Subject: SCCP connection identifiers vs. local references Message-ID: <20190320072234.GQ20379@nataraja> Hi Neels, > 21:14 < neeels> LaF0rge, I need a solution for picking a conn_id as an SCCP SAP user. The way osmo-bsc > does it isn't much good either. There has to be some way to obtain an unused local > reference without running into potential collisions > 21:16 < neeels> (I submitted the patch expecting your -1, but just to make sure that I understood > correctly I wanted to submit as code that has no room for misunderstandings) > 21:18 < neeels> if I can't use the internal "next_id" thing, then what I would have to do is mimick the > exact same in osmo-msc by looking up what conn_ids are already in use and pick an unused > one. That's quite brain damaged IMHO, it's the same thing as that layer violation patch, > only with much more effort and more chances for id collisions. Nobody is suggesting that. > 21:21 < neeels> ideas: add some primitive to hand out a local reference -- but that would be non-standard > IIUC I had mentioned and discarded that option in Message-ID: <20190311204817.GA725 at nataraja> here on this list. The reason is not that it's non-standard, but that it again wouldn't work over any type of asynchronous / message based SAP [and that's how a SAP is conceptualized]. > 21:21 < neeels> give each application some unique number space for local references, like the highest > byte is configured in .cfg file, and then each SCCP user cycles through its own number > space of local references > 21:23 < neeels> ...and libosmo-sccp would have to be constrain-able to its own number space for incoming > connections I was quite sure we've had that discussion before. Unfortuantely it doesn't seem to be on this mailing list or in gerrit. It may have been on IRC or I am starting to be delusional :/ The correct solution from the spec / SCCP architecutre point of view is rather clear, I think: * don't recycle the SCCP local reference (protocol) from the SCCP connection ID (SAP), but use distinct number spaces for that. It was an oversimplification to do that in the original implementation. I simply missed the different scoping of those two identifiers. The scoping is the root of our problems here. * only once we have distinct identifiers, we can fix the scope: One of the two identifiers has a per-SCCP-user scope, while the other has a per-SCCP-instance scope * this way, every SCCP user knows exactly which SCCP connection identifiers are currently in use between this specific user and the SCCP provider, and it hence can choose any unused ID for a new connection * right now, the primitive exchange across the SCCP user SAP is synchronous, so there can be no races where the SCCP provider and the SCCP user could chose the same connection ID at the same time. If you want to prepare for some kind of possible future asynchronous, message-queue based SAP, then you could use the highest bit as "allocation flag", e.g. highest bit = 0, allocated by the SCCP provider and highest bit = 1, allocated by SCCP user. Let me know if you see any problem with above proposal. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Wed Mar 20 09:52:46 2019 From: laforge at gnumonks.org (Harald Welte) Date: Wed, 20 Mar 2019 10:52:46 +0100 Subject: RFC: talloc contexts / automatic free at select loop Message-ID: <20190320095246.GT20379@nataraja> Hi all, this idea has been floating around for quite some time, and I finally took some time for a proposed implementation: The introduction of a "volatile" talloc context which one can use to allocate whatever temporary things from, and which would automatically be free'd once we leave the select dispatch. The current proposal is in https://gerrit.osmocom.org/#/c/libosmocore/+/13312 and I'm happy to receive any review. How this works: * within a new osmo_select_main_ctx() function, we create a temporary talloc context before calling the filedescriptor call-back functions, and we free that after the call-backs have returned * any of the code running from within the select loop dispatch (which for "normal" osmocom CNI projects is virtually everything) can use that temporary context for allocations. There's a OTC_SELECT #define for convenience. So you could do something like talloc_zero(OTC_SELECT, ...) which would be automatically free'd after the current select loop iteration. Where is this useful? There's at least two common use cases: * allocation of message buffers without having to worry about msgb ownership * various temporary buffers e.g. for something-to-string conversions where currently we use library-internal static buffers with all their known problems (not being able to use them twice within a single printf() statement, not being thread-safe, ...) To Neels' disappointment, this is not all automatic. You a) have to call the _c suffix of the respective function, e.g. osmo_hexdump_c() instead of osmo_hexdump() with one extra initial argument: OTC_SELECT. There's also msgb_alloc_c(), msgb_copy_c() and the like, allowing msgb allocation from a context of your choice, as opposed to the library-internal msgb_tall_ctx that we had so far. b) have to use osmo_select_main_ctx() instead of osmo_select_main(). This is an arbitrary limitation for optimization that Pau requested, to make sure applications that don't want this can avoid all the extra talloc+free at every select iteration. This is debatable, and we can make it automatic in osmo_select_main(). It's probably worth a benchmark how expensive that 'empty' allocation + free is. However, I think that's a rather "OK" price to pay. Converting the existing callers can be more or less done with "sed" (yes, spatch is of course better). While introducing this feature, I also tried to address two other topics, which are not strictly related. However, ignoring those two now would mean we'd have API/ABI breaks if we'd care about them in the future. Hence I think we should resolve them right away: 1) introduce another context: OTC_GLOBAL. This is basically a library-internal replacement for the "g_tall_ctx" that we typically generate as one of the first things in every application. You can use it anywhere where you'd want to allocate something form the global talloc context 2) make those contexts thread-local. As you may know, talloc is not thread safe. So you cannot allocate/free from a single context on multiple threds. However, it is safe to have separate talloc contexts on each thread. Making the new OTC_* contexts per-thread, we are enabling the [limited] use of this functionality in multi-threaded programs. This is of course very far from making libosmocore thread-safe. There are many library-internal data structures like timers, file descriptors, the VTY, ... which are absolutely not thread-safe. However, I think it's a step in the right direction and I don't expect any performance impact for single-threaded programs by marking the contexts as __thread. -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Wed Mar 20 10:13:53 2019 From: laforge at gnumonks.org (Harald Welte) Date: Wed, 20 Mar 2019 11:13:53 +0100 Subject: libosmocore wishlist Message-ID: <20190320101353.GU20379@nataraja> While working on the talloc context patches, I was wondering if we should spend a bit of time to further improve libosmocore and collect something like a wishlist. I would currently identify the following areas: 1) initialization of the various sub-systems is too complex, there are too many functions an application has to call. I would like to move more to a global "application initialization", where an application registers some large struct [of structs, ...] at start-up and tells the library the log configuration, the copyright statement, the VTY IP/port, the config file name, ... (some of those can of course be NULL and hence not used) 2) have some kind of extensible command line options/arguments parser It would be useful to have common/library parts register some common command line arguments (like config file, logging, daemonization, ..) while the actual appliacation extending that with only its application-specific options. I don't think this is possible with how getopt() works, so it would require some new/different infrastructure how applications would register their arguments 3) move global select() state into some kind of structure. This would mean that there could be multiple lists of file descriptors rather than the one implicit global one. Alternatively, turn the state into thread-local storage, so each thread would have its own set of registered file descriptors, which probably makes most sense. Not sure if one would have diffeent 'sets' of registered file descriptors in a single thread. The same would apply for timers: Have a list of timers for each thread; timeouts would then also always execute on the same thread. This would put talloc context, select and timers all in the same concept: Have one set of each on each thread, used automatically. Any other wishlist items? -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Wed Mar 20 11:06:16 2019 From: laforge at gnumonks.org (Harald Welte) Date: Wed, 20 Mar 2019 12:06:16 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <20190320095246.GT20379@nataraja> References: <20190320095246.GT20379@nataraja> Message-ID: <20190320110616.GV20379@nataraja> Some additions below. On Wed, Mar 20, 2019 at 10:52:46AM +0100, Harald Welte wrote: > Where is this useful? There's at least two common use cases: > > * allocation of message buffers without having to worry about msgb ownership I forgot here that this obviously only works if the msgb is created + consumed within the same select dispatch. If e.g. the msgb is enqueued on a transmit queue, you would have to talloc_steal() / talloc_reparent() it to a different context before returning from the select dispatch. In terms of our usual use cases, I see the following scenarios: a) incoming, locally handled message The msgb can normally be allocated from OTC_SELECT as we are processing the entire msgb within the select dispatch. If we end up queueing it somewhere, we need to steal it. b) incoming, forwarded message if we already know we'll forward it (e.g. in gb-proxy or bsc_nat), we might directly allocate it from OTC_GLOBAL or any of its children. c) outgoing message As we're operating in non-blocking mode, we can never make the assumption that any of the outbound sockets will be write-able. Hence, we always have a transmit queue, and as a result we have to allocate from OTC_GLOBAL or one of its children. d) local primitives (osmo_prim) As osmo_prim are msgb-wrapped, and primitives [at least for now] are always only between our local SAP user and provider (or vice versa) we might use OTC_SELECT here, and make sure that all users that want to recycle a osmo_prim msgb will steal it, if needed (I don't think that's a very valid use case). This applies both ways: for primitives from the provider to the user, as well as the other way around. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From osmith at sysmocom.de Wed Mar 20 11:21:49 2019 From: osmith at sysmocom.de (Oliver Smith) Date: Wed, 20 Mar 2019 12:21:49 +0100 Subject: jenkins: master-libosmocore running for 20 hours Message-ID: Hi all, master-libosmocore was running for 20 hours on jenkins, hanging here: > make[7]: Entering directory '/home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/builddir/tests' > osmo_verify_transcript_vty.py -v \ > -p 42042 \ > -r "../tests/tdef/tdef_vty_test_config_root" \ > /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/tests/tdef/tdef_vty_test_config_root.vty > <0000> /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/src/socket.c:367 unable to bind socket:127.0.0.1:42042: Address already in use > <0000> /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/src/socket.c:378 no suitable addr found for: 127.0.0.1:42042 > <0000> /home/osmocom-build/jenkins/workspace/master-libosmocore/a2/default/a3/default/a4/default/arch/amd64/label/osmocom-master-debian9/src/vty/telnet_interface.c:100 Cannot bind telnet at 127.0.0.1 42042 https://jenkins.osmocom.org/jenkins/job/master-libosmocore/a2=default,a3=default,a4=default,arch=amd64,label=osmocom-master-debian9/826/ I've stopped the job. Right after that, a new job spawned, and it also failed to bind telnet at 42042. It did not hang this time, but stopped there instead: https://jenkins.osmocom.org/jenkins/job/master-libosmocore/a2=default,a3=default,a4=default,arch=amd64,label=osmocom-master-debian9/827/ After triggering the job once more manually, it went through. >From a quick analysis, I can not see why this has happened in the first place. The master-libosmocore job is set to non-concurrent: https://git.osmocom.org/osmo-ci/tree/jobs/master-builds.yml And the gerrit verification jobs are running on another machine. Other than that, none but libosamocore.git of the (almost all) Osmocom repositories that I have checked out mention port 42042, so nothing else should bind that in theory. Regards, Oliver -- - Oliver Smith https://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Director: Harald Welte From osmith at sysmocom.de Wed Mar 20 11:28:14 2019 From: osmith at sysmocom.de (Oliver Smith) Date: Wed, 20 Mar 2019 12:28:14 +0100 Subject: libosmocore wishlist In-Reply-To: <20190320101353.GU20379@nataraja> References: <20190320101353.GU20379@nataraja> Message-ID: <4cace9f8-7f15-f12b-f8cf-425ebecc373e@sysmocom.de> Is this for big refactoring wishes only? If not, I would like to have ^L working in telnet VTY connections like in a typical terminal: make it clear the buffer. On 3/20/19 11:13 AM, Harald Welte wrote: > While working on the talloc context patches, I was wondering if we should > spend a bit of time to further improve libosmocore and collect something > like a wishlist. > > I would currently identify the following areas: > > 1) initialization of the various sub-systems is too complex, there are too > many functions an application has to call. I would like to move more > to a global "application initialization", where an application registers > some large struct [of structs, ...] at start-up and tells the library > the log configuration, the copyright statement, the VTY IP/port, the config > file name, ... (some of those can of course be NULL and hence not used) > > 2) have some kind of extensible command line options/arguments parser > It would be useful to have common/library parts register some common > command line arguments (like config file, logging, daemonization, ..) > while the actual appliacation extending that with only its application-specific > options. I don't think this is possible with how getopt() works, so > it would require some new/different infrastructure how applications would > register their arguments > > 3) move global select() state into some kind of structure. This would mean > that there could be multiple lists of file descriptors rather than the > one implicit global one. Alternatively, turn the state into thread-local > storage, so each thread would have its own set of registered file descriptors, > which probably makes most sense. Not sure if one would have diffeent 'sets' > of registered file descriptors in a single thread. The same would apply > for timers: Have a list of timers for each thread; timeouts would then > also always execute on the same thread. This would put talloc context, select > and timers all in the same concept: Have one set of each on each thread, > used automatically. > > Any other wishlist items? > -- - Oliver Smith https://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Director: Harald Welte From pespin at sysmocom.de Wed Mar 20 11:54:11 2019 From: pespin at sysmocom.de (Pau Espin Pedrol) Date: Wed, 20 Mar 2019 12:54:11 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <20190320110616.GV20379@nataraja> References: <20190320095246.GT20379@nataraja> <20190320110616.GV20379@nataraja> Message-ID: <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> Hi, First of all, I'm not against addition of this "select scoped talloc context", and it's fine for me to merge if others find it's a really handy feature. But I have the feeling it really adds unneeded extra complexity and scenarios to take care in the code. New ways to get shot on your knee, having to use talloc_steal() and talloc_reparent(). Not sure if the benefits are worth the effort and increase of complexity. IMHO we should be fine using regular global context (and freeing stuff around) together with static/stack buffers when possible. Regards, Pau -- - Pau Espin Pedrol http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Director: Harald Welte From pespin at sysmocom.de Wed Mar 20 12:04:04 2019 From: pespin at sysmocom.de (Pau Espin Pedrol) Date: Wed, 20 Mar 2019 13:04:04 +0100 Subject: libosmocore wishlist In-Reply-To: <20190320101353.GU20379@nataraja> References: <20190320101353.GU20379@nataraja> Message-ID: <334c024b-1416-4554-14ca-fbfc452a6a84@sysmocom.de> Hi, On 3/20/19 11:13 AM, Harald Welte wrote: > While working on the talloc context patches, I was wondering if we should > spend a bit of time to further improve libosmocore and collect something > like a wishlist. > > I would currently identify the following areas: > > 1) initialization of the various sub-systems is too complex, there are too > many functions an application has to call. I would like to move more > to a global "application initialization", where an application registers > some large struct [of structs, ...] at start-up and tells the library > the log configuration, the copyright statement, the VTY IP/port, the config > file name, ... (some of those can of course be NULL and hence not used) > > 2) have some kind of extensible command line options/arguments parser > It would be useful to have common/library parts register some common > command line arguments (like config file, logging, daemonization, ..) > while the actual appliacation extending that with only its application-specific > options. I don't think this is possible with how getopt() works, so > it would require some new/different infrastructure how applications would > register their arguments 1 and 2 not really worth for me. I'm not against it, but I'm happy enough with current state. > > 3) move global select() state into some kind of structure. This would mean > that there could be multiple lists of file descriptors rather than the > one implicit global one. Alternatively, turn the state into thread-local > storage, so each thread would have its own set of registered file descriptors, > which probably makes most sense. Not sure if one would have diffeent 'sets' > of registered file descriptors in a single thread. The same would apply > for timers: Have a list of timers for each thread; timeouts would then > also always execute on the same thread. This would put talloc context, select > and timers all in the same concept: Have one set of each on each thread, > used automatically. Turn sate into thread-local storage makes sense. No need for different sets per thread imho. Loosely related: It may be good to refactor code to allow for other polling systems (like epoll()), which may be more efficient if we have a long set of file descriptors but only a few are triggered every loop step. Regarding some other thread you wrote recently, I don't see much issue in having VTY library code not supporting multithread,since you can always force polling it from same thread and then if your process is multithread you take care of message passing between threads yourself in the VTY app-specific code. For logging code we may want to add some callback which provides the application with some way to lock/unlock a mutex or something similar. If cb is NULL, then there's no performance penalty during logging. Regards, Pau -- - Pau Espin Pedrol http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Director: Harald Welte From nhofmeyr at sysmocom.de Wed Mar 20 15:26:18 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Wed, 20 Mar 2019 16:26:18 +0100 Subject: SCCP connection identifiers vs. local references In-Reply-To: <20190320072234.GQ20379@nataraja> References: <20190320072234.GQ20379@nataraja> Message-ID: <20190320152618.GB27744@my.box> On Wed, Mar 20, 2019 at 08:22:35AM +0100, Harald Welte wrote: > * don't recycle the SCCP local reference (protocol) from the SCCP connection ID (SAP), > but use distinct number spaces for that. Ah. So the proper solution is to be fine with "collisions" because local reference and SAP conn id are distinct number spaces. Ok, but... I know you will probably roll your eyes at least once when reading this mail, and I sympathise, but please bear with me so I can figure out what exactly osmo-msc should do about outgoing SAP conn_id... For me there is still a disconnect in the reasoning. In short, whoever invents SAP conn ids should invent them for both directions. (Nevermind local-reference, only talking SAP conn ID.) In long: Let's focus on that last stage alone, as close as it gets to the message consumer: a) remote peer sends a N-CONNECT. "Things happen" in SCCP, then in the end: libosmo-sccp dispatches me a struct osmo_scu_prim. prim.connect.conn_id contains a number, the SAP connection ID. b) I send an N-CONNECT. I compose an osmo_scu_prim. I put in prim.connect.conn_id a number, the SAP connection ID. "Things happen" to send SCCP. If these SAP conn_ids are of the same number space for incoming and outgoing osmo_scu_prim, which I assume, then why not provide the caller of the SAP API, in that first stage that actually dispatches/receives the struct osmo_scu_prim locally, nevermind the SCCP wiring happening further away, with one common API function to determine an unused local SAP conn_id? [----------osmo-program-----------] SCCP provider SAP API message handler ---------local-ref-----> black --conn_id-> <--------local-ref------ box <-conn_id-- ^ get_unused_sap_conn_id() Am I getting this picture correctly? (ignoring that currently the SCCP provider is also in the same program, incidentally.) That layer violating patch of mine was made from an angle of: "this osmo_user_sap API has to know which struct osmo_scu_prim.*.conn_id already exist, nevermind how they are made. Even if we separate local ref and conn_id later, the knowledge about used conn_ids will still be in this first stage, and the caller should still ask the osmo_user_sap API for an unused SAP conn_id instead of iterating its own object lists." Shouldn't be too hard to separate the SAP conn ID scope from the local-reference scope in libosmo-sccp, if that is needed to allow this in a correct way. Otherwise we only push the mistake into osmo-{bsc,msc} code. Thanks ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From laforge at gnumonks.org Wed Mar 20 16:30:30 2019 From: laforge at gnumonks.org (Harald Welte) Date: Wed, 20 Mar 2019 17:30:30 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> References: <20190320095246.GT20379@nataraja> <20190320110616.GV20379@nataraja> <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> Message-ID: <20190320163030.GX20379@nataraja> Hi Pau, On Wed, Mar 20, 2019 at 12:54:11PM +0100, Pau Espin Pedrol wrote: > First of all, I'm not against addition of this "select scoped talloc > context", and it's fine for me to merge if others find it's a really handy > feature. But I have the feeling it really adds unneeded extra complexity and > scenarios to take care in the code. New ways to get shot on your knee, > having to use talloc_steal() and talloc_reparent(). Not sure if the benefits > are worth the effort and increase of complexity. What is your solution to neels' recently described problem with msgb ownership in successful/error cases when passing msgb's between different subsystems (in his example the libosmo-sccp sigtran stack) ? I think this is where the beauty of this system really shows. But maybe it's possible even without the 'select scoped context' with keeping allocations all global (like now) and using talloc_reference()? This way the default behavior would be an explicit free() by the caller, i.e. msg = msgb_alloc() /* put some data in the msgb, e.g. received from socket */ call_some_other_subsystem(msg); talloc_unlink(msg); /* cannot use {msgb,talloc}_free() anymore! */ And if that other_subsystem would want to keep msgb around, they would have to talloc_reference() the msgb, thereby having two parents. The msgb would only be free'd at the last talloc_unlink(). But that turns the talloc tree into a graph and to me it really messes up things. Also the fact that you can no longer use talloc_free() makes things very complicated. So not really an option. -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Wed Mar 20 16:31:08 2019 From: laforge at gnumonks.org (Harald Welte) Date: Wed, 20 Mar 2019 17:31:08 +0100 Subject: libosmocore wishlist In-Reply-To: <4cace9f8-7f15-f12b-f8cf-425ebecc373e@sysmocom.de> References: <20190320101353.GU20379@nataraja> <4cace9f8-7f15-f12b-f8cf-425ebecc373e@sysmocom.de> Message-ID: <20190320163108.GY20379@nataraja> On Wed, Mar 20, 2019 at 12:28:14PM +0100, Oliver Smith wrote: > If not, I would like to have ^L working in telnet VTY connections like > in a typical terminal: make it clear the buffer. that should be rather trivial to add, I guess. I was more thinking of more fundamental changes :) -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Wed Mar 20 16:43:51 2019 From: laforge at gnumonks.org (Harald Welte) Date: Wed, 20 Mar 2019 17:43:51 +0100 Subject: libosmocore wishlist In-Reply-To: <334c024b-1416-4554-14ca-fbfc452a6a84@sysmocom.de> References: <20190320101353.GU20379@nataraja> <334c024b-1416-4554-14ca-fbfc452a6a84@sysmocom.de> Message-ID: <20190320164351.GZ20379@nataraja> Hi Pau, On Wed, Mar 20, 2019 at 01:04:04PM +0100, Pau Espin Pedrol wrote: > > 1) initialization of the various sub-systems is too complex, there are too > > many functions an application has to call. I would like to move more > > to a global "application initialization", where an application registers > > some large struct [of structs, ...] at start-up and tells the library > > the log configuration, the copyright statement, the VTY IP/port, the config > > file name, ... (some of those can of course be NULL and hence not used) One addition here is also vty initialization. It sucks that everyone has to manually install the VTY commands for talloc, etc. - that should just happen automagically in some way. > > 2) have some kind of extensible command line options/arguments parser > > It would be useful to have common/library parts register some common > > command line arguments (like config file, logging, daemonization, ..) > > while the actual appliacation extending that with only its application-specific > > options. I don't think this is possible with how getopt() works, so > > it would require some new/different infrastructure how applications would > > register their arguments Another topic that I forgot to list is signal handling. I would think it makes sense to move the SIGUSR1/SIGUSR2 handling into libosmocore now, particularly as we have the root talloc context[s] in libosmocore with my related patches. It just sucks to have to have a SIGUSR1->talloc_dump code snippet in each and every of our programs. > Turn state into thread-local storage makes sense. No need for different sets > per thread imho. Loosely related: It may be good to refactor code to allow > for other polling systems (like epoll()), which may be more efficient if we > have a long set of file descriptors but only a few are triggered every loop > step. That's a good point, indeed. It's been on the wishlist for a long time, but nobody appeared to have been reporting any performance issues as of yet. Ideally we wouldn't have to change our 'struct osmo_fd' at all but simply call a different "osmo_select_main()" function which would then call epoll instead of select. > Regarding some other thread you wrote recently, I don't see much issue in > having VTY library code not supporting multithread,since you can always > force polling it from same thread and then if your process is multithread > you take care of message passing between threads yourself in the VTY > app-specific code. I agree we shouldn't tackle this problem now. However, we have many sub-systems that e.g. add a "show ..." VTY command. That command iterates over some dynamically created structures, like e.g. subscriber connections. So if we were in a multi-threaded environment where external events/messages were processed in other threads than the main thread that runs the VTY, then we can no longer simply iterate over any of those lists, but we'd have to have mutexes/rwlocks/RCU/... to synchronize. That's out of scope for now, as we don't have any big plans for multi-threading in a big way just yet. We can look at it if we ever get to that point. > For logging code we may want to add some callback which provides the > application with some way to lock/unlock a mutex or something similar. If cb > is NULL, then there's no performance penalty during logging. Not sure I understand you here. The problem starts if we use multiple printf() calls to write a signle line, especially with LOGPC/DEBUGPC. This introduces problems if multiple threads writing to the same log output then garble / intersperse each others output at points that are not the end of a lot line. Especially with the new _buf() and even the _c() variants of our various stringify functions it should be possible to convert all callers of log functions to hand in a full line in one call and remove the LOGPC/DEBUGPC for continuation. Once we reach that point, at least log stdout/stderr/file will be ok, as glibc guarantees that a single write/printf is "atomic". Not sure about our VTY log backend, though. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From msuraev at sysmocom.de Wed Mar 20 17:04:44 2019 From: msuraev at sysmocom.de (Max) Date: Wed, 20 Mar 2019 18:04:44 +0100 Subject: TTCN-3 L1CTL-related test issue Message-ID: <0f8a1730-3739-6c7c-4f58-b0fa99e01199@sysmocom.de> Hi. While experimenting with BTS TTCN-3 tests I've got some of the tests misteriously failing with: FBSB Failed with non-zero return code 255 Anyone hit this before? Any ideas what could be causing this? -- - Max Suraev http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Directors: Harald Welte From pespin at sysmocom.de Wed Mar 20 17:12:30 2019 From: pespin at sysmocom.de (Pau Espin Pedrol) Date: Wed, 20 Mar 2019 18:12:30 +0100 Subject: libosmocore wishlist In-Reply-To: <20190320164351.GZ20379@nataraja> References: <20190320101353.GU20379@nataraja> <334c024b-1416-4554-14ca-fbfc452a6a84@sysmocom.de> <20190320164351.GZ20379@nataraja> Message-ID: <38457e99-df6e-e67d-c929-531f818b31fd@sysmocom.de> Hi, On 3/20/19 5:43 PM, Harald Welte wrote: > Hi Pau, > > On Wed, Mar 20, 2019 at 01:04:04PM +0100, Pau Espin Pedrol wrote: >>> 1) initialization of the various sub-systems is too complex, there are too >>> many functions an application has to call. I would like to move more >>> to a global "application initialization", where an application registers >>> some large struct [of structs, ...] at start-up and tells the library >>> the log configuration, the copyright statement, the VTY IP/port, the config >>> file name, ... (some of those can of course be NULL and hence not used) > > One addition here is also vty initialization. It sucks that everyone has to manually > install the VTY commands for talloc, etc. - that should just happen automagically > in some way. Fine with that. Anyway that's done at initialization, before multiple threads/workers are started, so no issue there with multithread. > Another topic that I forgot to list is signal handling. I would think it makes > sense to move the SIGUSR1/SIGUSR2 handling into libosmocore now, particularly > as we have the root talloc context[s] in libosmocore with my related patches. > It just sucks to have to have a SIGUSR1->talloc_dump code snippet in each and > every of our programs. Please don't. Let's allow apps decide how they handle user-defined signals and not try doing everything in a generic library. Some app using libosmocore may want to do something else under SIGUSR1/SIGUSR2. I really prefer having this kind of stuff being done in the app and not in the library. If still you want to do it, at least document clearly what libosmocore does and how to let apps overwrite the signals (so it becomes a "public" feature and not some hidden-implementation requirement). > >> Turn state into thread-local storage makes sense. No need for different sets >> per thread imho. Loosely related: It may be good to refactor code to allow >> for other polling systems (like epoll()), which may be more efficient if we >> have a long set of file descriptors but only a few are triggered every loop >> step. > > That's a good point, indeed. It's been on the wishlist for a long time, but nobody > appeared to have been reporting any performance issues as of yet. Ideally we > wouldn't have to change our 'struct osmo_fd' at all but simply call a > different "osmo_select_main()" function which would then call epoll instead of select. Maybe is as easy as having a compile flag which builds osmo_select_main against epoll() if existent, and use select() as fallback. > >> Regarding some other thread you wrote recently, I don't see much issue in >> having VTY library code not supporting multithread,since you can always >> force polling it from same thread and then if your process is multithread >> you take care of message passing between threads yourself in the VTY >> app-specific code. > > I agree we shouldn't tackle this problem now. However, we have many sub-systems > that e.g. add a "show ..." VTY command. That command iterates over some dynamically > created structures, like e.g. subscriber connections. So if we were in a multi-threaded > environment where external events/messages were processed in other threads than > the main thread that runs the VTY, then we can no longer simply iterate over any > of those lists, but we'd have to have mutexes/rwlocks/RCU/... to synchronize. > > That's out of scope for now, as we don't have any big plans for multi-threading > in a big way just yet. We can look at it if we ever get to that point. Sure, I'm not expecting multi-threaded for current apps so far (perhaps osmo-mgw in the future?). IF we add multi-thread to those we'll need this kind of changes. I was howerver thinking about new apps, which will already have this kind of tools/architecture in place since time 0. >> For logging code we may want to add some callback which provides the >> application with some way to lock/unlock a mutex or something similar. If cb >> is NULL, then there's no performance penalty during logging. > > Not sure I understand you here. The problem starts if we use multiple printf() > calls to write a signle line, especially with LOGPC/DEBUGPC > > This introduces problems if multiple threads writing to the same log output > then garble / intersperse each others output at points that are not the end of > a lot line. > > Especially with the new _buf() and even the _c() variants of our various stringify > functions it should be possible to convert all callers of log functions to hand > in a full line in one call and remove the LOGPC/DEBUGPC for continuation. > > Once we reach that point, at least log stdout/stderr/file will be ok, as glibc > guarantees that a single write/printf is "atomic". Not sure about our VTY > log backend, though. > Indeed, I was thinking about LOGPC/DEBUGPC and other backends like VTY here. Regards, Pau -- - Pau Espin Pedrol http://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Director: Harald Welte From nhofmeyr at sysmocom.de Wed Mar 20 18:01:48 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Wed, 20 Mar 2019 19:01:48 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <20190320095246.GT20379@nataraja> References: <20190320095246.GT20379@nataraja> Message-ID: <20190320180148.GC27744@my.box> On Wed, Mar 20, 2019 at 10:52:46AM +0100, Harald Welte wrote: > To Neels' disappointment, this is not all automatic. Flabbergasted is the word. I thought things would get easier instead of more convoluted and bloated. To me clearly the drawbacks of making it explicit are humungous and cumbersome while avoiding fixing hidden bugs plus keeping all static buffers around. Making it implicit has only positive effects AFAICT. My bottom line is that I absolutely full on don't understand why we would volunarily burden ourselves with this huge patch writing overhead and code bloat, now and into the future. We could so trivially collapse all of it. My single question is, why. What is the reason for accepting this huge tail? (? ??_??)? -- why? If you're interested, here are my opinions in detail... > a) have to call the _c suffix of the respective function, e.g. osmo_hexdump_c() > instead of osmo_hexdump() with one extra initial argument: > OTC_SELECT. There's also msgb_alloc_c(), msgb_copy_c() and the like, > allowing msgb allocation from a context of your choice, as opposed > to the library-internal msgb_tall_ctx that we had so far. These are the problems I have with this: - complexity: constantly have to take care to choose a talloc context, mostly burdens writing log statements: - single-threaded program: Is there a global pointer to a main() volatile talloc context we can always use? Ok, OTC_SELECT is passed everywhere. - multi-threaded program: Is OTC_SELECT implicitly distinct per thread? Then multi-threaded programs pass OTC_SELECT everywhere. - So everyone always passes OTC_SELECT everywhere, then what's the use of the ctx argument. In the super rare cases where something like osmo_plmn_name() wants to allocate from a different context, I would just do a talloc_strcpy() to the other context. - coding style: - bloat: we need to pass the same ctx argument to all those name() functions. That makes it longer, harder to read, more to type. With the implicit solution: LOGP(DMSC, LOGL_ERROR, "Peer's PLMN has changed: was %s, now is %s. MSC's PLMN is %s\n", osmo_plmn_name(&parsed_plmn), osmo_plmn_name(&vsub->cgi.lai.plmn), osmo_plmn_name(&net->plmn)); With the explicit solution: LOGP(DMSC, LOGL_ERROR, "Peer's PLMN has changed: was %s, now is %s. MSC's PLMN is %s\n", osmo_plmn_name_c(OTC_SELECT, &parsed_plmn), osmo_plmn_name_c(OTC_SELECT, &vsub->cgi.lai.plmn), osmo_plmn_name_c(OTC_SELECT, &net->plmn)); Would irritate me a lot. So I would usually just use the old ones instead. - complexity: then we are again in the situation where, when writing new code, I have to consciously decide where to allocate the strings from. That's something I wanted to get rid of. It would be so relieving to just know: "osmo_plmn_name() is safe" and just spam on using it without extra args, without the first call being osmo_plmn_name() and the second call osmo_plmn_name_c(OTC_SELECT, plmn), and then oh damn, I forgot that the third one also has to be _c, or some indirectly called function also fails to use the _c variant... If there is just one osmo_plmn_name() I cannot possibly do anything wrong, no hidden bugs anywhere, by design. - less arguments, less signatures: The reason why we have value_string[] shortcuts like osmo_gsup_message_type_name() is so that we can save an extra argument (the names[] array) when writing log statements. We didn't want to add so many bin symbols to the API just for that, so we agreed on using static inline functions in the .h files. Fair enough. So each value_string[] definition currently has a tail of such a static inline function attached. With this proposal we're re-adding a second static inline _c variant to each and every value_string[] tail, with another function arg again added to the _c function signature. We will also need a get_value_string_c() function because for unknown values it uses a static buffer to return a number string. We can so trivially avoid all of this mess. - less bugs: get_value_string() is a prime example for wanting to fix things implicitly. If a log statement uses two get_value_string() based implementations, even for completely different names, then it would print nonsense if both hit an undefined value. Pseudo code example: LOGP(DRSL, LOGL_ERROR, "Unknown message: proto %s message %s\n", proto_name(h->proto), msg_type_name(h->msg_type)); If there is both an unkown proto 42 and msg_type 23, then only one of the "unknown 42" result strings from get_value_string() survives. It would say "Unknown message: proto unknown (42) message unknown (42)" These are hidden bugs which we would completely eradicate by making distinct buffer use implicit. It is not always obvious which functions use get_value_string() in their implementation. So we should use _c functions always. So then why even have them as distinct signatures. - static buffers: We have quite a number of static char[] around the code base, my idea was that we could get rid of those. With this proposal we have to keep them. - API bloat: - This adds a lot of new API signatures to the existing ones. Both static inline for value_string[], and bin symbols in the library files for things like osmo_plmn_name_c(). - Each future function needs a _c variant added to it. We may forget them, then more discussions on gerrit. More patches to fix that. I would rather avoid that by design. I don't understand why you want to do it this way. What is the justification for all this bloat, required patches now and future complexity in writing code? > b) have to use osmo_select_main_ctx() instead of osmo_select_main(). This is > an arbitrary limitation for optimization that Pau requested, to make sure > applications that don't want this can avoid all the extra talloc+free at > every select iteration. IMO that is the wrong place to hinge the argument. - If nothing used the volatile context, nothing needs to be freed. - Applications wanting to avoid dynamic allocations can avoid calling functions that allocate dynamically. Admitted, with my under-the-hood proposal even functions like osmo_plmn_name() would do dynamic allocations, but: - In the real world, I don't know why you would want to avoid dynamic allocations. AFAICT the performance impact is basically non-existent. How else do we get away with dynamically allocating *all* of our msgb and even copying them around quite often? We also dynamically allocate user contexts, transaction structs, basically all llist entries, strings that objects reference,... - And finally, if dynamic allocations are identified as a problem, then there are ways to avoid them: keep N "volatile" buffers around, re-use them across osmo_select() iterations without freeing in-between. A dedicated function to return volatile buffers could take care of that. I mean instead of passing a specific ctx, rather call a separate function osmo_volatile_buffer(size), so we can slip in an optimisation like that at any point in the future. - Finally finally, optimisations can be done by talloc, the kernel and whatnot, and I'm fairly sure there is a lot of thought going into memory management and optimising dynamic allocations so that applications can completely skip this argument, which I propose we do. - And triple finally, switch off debug logging to completely cut out almost all allocations for logging. If the performance argument holds, the improvement should be dramatic. So far a dramatic improvement is seen when switching off debug logging for low-level logging categories on a sysmobts, for osmo-bts and osmo-pcu, but that isn't related to dynamic allocations, much rather to plain fprintf() I/O -- which is much more perf critical than de/allocating. > This is debatable, and we can make it automatic > in osmo_select_main(). +1 > It's probably worth a benchmark how expensive that 'empty' allocation + free is. If some (inline?) osmo_volatile_buffer() function creates such a talloc context only on demand, then I don't see how this can possibly have any impact at all. > However, I think that's a rather "OK" price to pay. Converting the existing > callers can be more or less done with "sed" (yes, spatch is of course better). ugh really; I still argue that the API is then bloated, and more importantly writing new code becomes more complex and more bloated as well. I don't really want to live in that API. > While introducing this feature, I also tried to address two other topics, which > are not strictly related. However, ignoring those two now would mean > we'd have API/ABI breaks if we'd care about them in the future. Hence I > think we should resolve them right away: > > 1) introduce another context: OTC_GLOBAL. This is basically a library-internal > replacement for the "g_tall_ctx" that we typically generate as one of the > first things in every application. You can use it anywhere where you'd want > to allocate something form the global talloc context Nice. Shouldn't the name start with OSMO_ though? > 2) make those contexts thread-local. That's pretty cool. Ok, so we already have distinct volatile talloc contexts per thread, implicitly. Then why would you still want to pass this context to each function call explicitly? [[ Trying to read up on __thread, I noticed there are portability / C version considerations. This macro, if we remove the windows bit, looks interesting: https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably i.e. #ifndef thread_local # if __STDC_VERSION__ >= 201112 && !defined __STDC_NO_THREADS__ # define thread_local _Thread_local # elif defined __GNUC__ || \ defined __SUNPRO_C || \ defined __xlC__ # define thread_local __thread # else # error "Cannot define thread_local" # endif #endif ? ]] ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Thu Mar 21 01:24:30 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Thu, 21 Mar 2019 02:24:30 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> References: <20190320095246.GT20379@nataraja> <20190320110616.GV20379@nataraja> <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> Message-ID: <20190321012430.GD27744@my.box> Talking of features, another idea comes to mind: freeing of FSM instances. I more often than not have the problem that a child FSM is freeing, telling the parent about it, then the parent decides to free because the child's result signalled some end, and then we have a double free because the child was allocated with the parent as talloc context, the parent has just deallocated, but after the child's cleanup is through, the fsm code wants to also deallocate the child. In these situations I so far need to introduce checks that prevent the parent from freeing until the children are completely finished with their actions -- for example, avoid signalling end from the cleanup function or event handling yet, instead completely rely on the parent_term_event, which is dispatched only after the child was freed. With a volatile context, when an FSM instance should go away I could reparent it to the volatile context, it will be cleaned up once all event handling is through, and I avoid running into double free by accident. It would probably have to be integrated in the fsm.c deallocation code. On Wed, Mar 20, 2019 at 12:54:11PM +0100, Pau Espin Pedrol wrote: > But I have the feeling it really adds unneeded extra complexity and > scenarios to take care in the code. On the contrary, it might require some thinking about contexts once at a few pivotal points -- AFAICT it only becomes complex with wqueues, and I have the impression we could fairly easily handle this only once in the wqueue API (reparent when adding to or removing from a wqueue) so that callers are actually going to be relieved of all thinking about msgb scoping. The benefit on the other hand: I am quite sure that we still have a whole score of mem leaks / use after free bugs in lots of error handling code paths, which no-one has noticed and no-one is fixing. With a volatile talloc context approach we can swat *all* of those bugs by design. We trade a limited amount of brain work for, theoretically, an inifnite amount of hidden bugs getting fixed. > IMHO we should be fine using regular global context (and freeing stuff > around) together with static/stack buffers when possible. These static buffers have so many various ways of shooting yourself in the foot, it is very desirable to get rid of them. The idea for me is to achieve less complexity in using the API. A volatile context is the means to that. ...are we talking about a libosmocore 2.0 now? ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Thu Mar 21 01:36:08 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Thu, 21 Mar 2019 02:36:08 +0100 Subject: GSUP routing for different kinds of entities In-Reply-To: <20190317230004.GA25662@my.box> References: <20190317230004.GA25662@my.box> Message-ID: <20190321013608.GE27744@my.box> With the lack of comments, I am implementing and using this, explicitly naming the entities that are sending and receiving GSUP messages in GSUP IEs. Still would like to get input on naming: For SMS, we have the SMSC and ESME entities. Are there similar terms for USSD? Which is the entity managing USSD dialogs? which is the entity sending the USSD messages? Is it EUSE <-> MSC? Thinking, in fact if our osmo-msc wasn't siamese twins with the SMSC we would have ESME <-> SMSC <-> MSC. And might make sense to explicitly state which MSC subsystem is being addressed, so rather: EMSE <-> SMSC <-> MSC-SMS (currently only EMSE <-> SMSC, the final SMSC <-> MSC-SMS step is implemented inside current osmo-msc) EUSE <-> MSC-USSD Does that sound about right? Thanks! ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Thu Mar 21 01:46:50 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Thu, 21 Mar 2019 02:46:50 +0100 Subject: libosmocore wishlist In-Reply-To: <20190320164351.GZ20379@nataraja> References: <20190320101353.GU20379@nataraja> <334c024b-1416-4554-14ca-fbfc452a6a84@sysmocom.de> <20190320164351.GZ20379@nataraja> Message-ID: <20190321014650.GF27744@my.box> On Wed, Mar 20, 2019 at 05:43:51PM +0100, Harald Welte wrote: > it should be possible to convert all callers of log functions to hand > in a full line in one call and remove the LOGPC/DEBUGPC for continuation. +1 for removing LOGPC. I already avoid using them, and write patches to remove them, because 1) they end up in messy multiple GSMTAP-log packets, 2) they check each time over whether a log category is enabled and invoke the log system N times for the same line, 3) often cause with buggy logging, forgetting an \n or with intermediate other logging happening in some cases. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Thu Mar 21 01:51:21 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Thu, 21 Mar 2019 02:51:21 +0100 Subject: libosmocore wishlist In-Reply-To: <38457e99-df6e-e67d-c929-531f818b31fd@sysmocom.de> References: <20190320101353.GU20379@nataraja> <334c024b-1416-4554-14ca-fbfc452a6a84@sysmocom.de> <20190320164351.GZ20379@nataraja> <38457e99-df6e-e67d-c929-531f818b31fd@sysmocom.de> Message-ID: <20190321015121.GG27744@my.box> On Wed, Mar 20, 2019 at 06:12:30PM +0100, Pau Espin Pedrol wrote: > > Another topic that I forgot to list is signal handling. I would think it makes > > sense to move the SIGUSR1/SIGUSR2 handling into libosmocore now, particularly > > as we have the root talloc context[s] in libosmocore with my related patches. > > It just sucks to have to have a SIGUSR1->talloc_dump code snippet in each and > > every of our programs. > > Please don't. Let's allow apps decide how they handle user-defined signals > and not try doing everything in a generic library. I think it's better to avoid the code dup in each application, have well-defined standard handlers once. Thinking in terms of one global libosmocore subsystem config struct, this could be switch-off-able. If it were implicit any application can easily put other signal handlers in place after the fact. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From laforge at gnumonks.org Thu Mar 21 08:30:28 2019 From: laforge at gnumonks.org (Harald Welte) Date: Thu, 21 Mar 2019 09:30:28 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <20190321012430.GD27744@my.box> References: <20190320095246.GT20379@nataraja> <20190320110616.GV20379@nataraja> <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> <20190321012430.GD27744@my.box> Message-ID: <20190321083028.GI20379@nataraja> Hi Neels, On Thu, Mar 21, 2019 at 02:24:30AM +0100, Neels Hofmeyr wrote: > Talking of features, another idea comes to mind: freeing of FSM instances. I > more often than not have the problem that a child FSM is freeing, telling the > parent about it, then the parent decides to free because the child's result > signalled some end, and then we have a double free because the child was > allocated with the parent as talloc context, the parent has just deallocated, > but after the child's cleanup is through, the fsm code wants to also deallocate > the child. That just sounds like a bug and it should be fixed? The question is whether we want to mandate all child contexts to be allocated from within a parent. If so, then whatever mechanism to prevent this could / should become part of the fsm core. > In these situations I so far need to introduce checks that prevent > the parent from freeing until the children are completely finished with their > actions -- for example, avoid signalling end from the cleanup function or event > handling yet, instead completely rely on the parent_term_event, which is > dispatched only after the child was freed. I wonder if those checks could be made part of the core. Do you have a concrete code example to look at? > With a volatile context, when an FSM instance should go away I could > reparent it to the volatile context, it will be cleaned up once all > event handling is through, and I avoid running into double free by > accident. It would probably have to be integrated in the fsm.c > deallocation code. that's of course also an option. It actually reminds me a bit of RCU (read-copy-update) in the Linux kernel, where a similar 'delay any actual free() until all possible users have moved beyond a given 'scheduling point' is used as part of a rather ingenious and complex lock-less synchronization mechanism. > ...are we talking about a libosmocore 2.0 now? I think the volatile context alone is not sufficient for such a large jump. But let's see where we end up. -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From axilirator at gmail.com Thu Mar 21 12:45:04 2019 From: axilirator at gmail.com (Vadim Yanitskiy) Date: Thu, 21 Mar 2019 19:45:04 +0700 Subject: libosmocore wishlist In-Reply-To: <20190321015121.GG27744@my.box> References: <20190320101353.GU20379@nataraja> <334c024b-1416-4554-14ca-fbfc452a6a84@sysmocom.de> <20190320164351.GZ20379@nataraja> <38457e99-df6e-e67d-c929-531f818b31fd@sysmocom.de> <20190321015121.GG27744@my.box> Message-ID: Hi all, > Thinking in terms of one global libosmocore subsystem config struct, > this could be switch-off-able. FYI, we already have something similar in layer23 applications of OsmocomBB: https://git.osmocom.org/osmocom-bb/tree/src/host/layer23/src/misc/app_cell_log.c#n235 https://git.osmocom.org/osmocom-bb/tree/src/host/layer23/src/misc/app_cell_log.c#n89 With best regards, Vadim Yanitskiy. From osmith at sysmocom.de Fri Mar 22 09:20:30 2019 From: osmith at sysmocom.de (Oliver Smith) Date: Fri, 22 Mar 2019 10:20:30 +0100 Subject: "signal.c: Make non-exported tall_sigh_ctx static" broke openbsc Message-ID: Hello, openbsc.git doesn't build anymore against libosmocore master: > ../../src/libcommon/libcommon.a(talloc_ctx.o): In function `talloc_ctx_init': > /build/openbsc/src/libcommon/talloc_ctx.c:50: undefined reference to `tall_sigh_ctx' Full build log: https://jenkins.osmocom.org/jenkins/job/master-openbsc/IU=--disable-iu,MGCP=--disable-mgcp-transcoding,SMPP=--enable-smpp,a4=default,label=osmocom-master-debian9/4131/ Looks like this patch broke it: https://gerrit.osmocom.org/#/c/libosmocore/+/13337/ Regards, Oliver -- - Oliver Smith https://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Director: Harald Welte From osmith at sysmocom.de Fri Mar 22 12:29:00 2019 From: osmith at sysmocom.de (Oliver Smith) Date: Fri, 22 Mar 2019 13:29:00 +0100 Subject: "signal.c: Make non-exported tall_sigh_ctx static" broke openbsc In-Reply-To: References: Message-ID: <5bf0e155-05cd-1a97-54d3-b6542cbfd62d@sysmocom.de> Revert submitted: https://gerrit.osmocom.org/#/c/libosmocore/+/13385/ On 3/22/19 10:20 AM, Oliver Smith wrote: > Hello, > > openbsc.git doesn't build anymore against libosmocore master: > >> ../../src/libcommon/libcommon.a(talloc_ctx.o): In function `talloc_ctx_init': >> /build/openbsc/src/libcommon/talloc_ctx.c:50: undefined reference to `tall_sigh_ctx' > > Full build log: > https://jenkins.osmocom.org/jenkins/job/master-openbsc/IU=--disable-iu,MGCP=--disable-mgcp-transcoding,SMPP=--enable-smpp,a4=default,label=osmocom-master-debian9/4131/ > > Looks like this patch broke it: > https://gerrit.osmocom.org/#/c/libosmocore/+/13337/ > > Regards, > Oliver > -- - Oliver Smith https://www.sysmocom.de/ ======================================================================= * sysmocom - systems for mobile communications GmbH * Alt-Moabit 93 * 10559 Berlin, Germany * Sitz / Registered office: Berlin, HRB 134158 B * Geschaeftsfuehrer / Managing Director: Harald Welte From aathomas at fb.com Fri Mar 22 18:31:09 2019 From: aathomas at fb.com (Alex Alwin Thomas) Date: Fri, 22 Mar 2019 18:31:09 +0000 Subject: OSMO SMSc functionality Message-ID: Hi, We had a question with regards to the OSMO MSc - as per the documentation , it does support a scaled down version of SMSC. Can we bring up just the SMSc service so that it can talk to another vendor's MSC/HLR . We are trying to perform CSFB SMS tests with an external MSC / OSMO SMSC and Amarisoft EPC Core . So wanted to check if the OSMO SMSC could be used for this test. Thanks Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From laforge at gnumonks.org Fri Mar 22 21:03:17 2019 From: laforge at gnumonks.org (Harald Welte) Date: Fri, 22 Mar 2019 22:03:17 +0100 Subject: OSMO SMSc functionality In-Reply-To: References: Message-ID: <20190322210317.GC20379@nataraja> Hi Alex, On Fri, Mar 22, 2019 at 06:31:09PM +0000, Alex Alwin Thomas wrote: > We had a question with regards to the OSMO MSc - as per the > documentation , it does support a scaled down version of SMSC. Yes. > Can we bring up just the SMSc service so that it can talk to another > vendor's MSC/HLR . No, this is not possible. The OsmoMSC internal SMSC functionality doesn't implement any external/standard interfaces. > We are trying to perform CSFB SMS tests with an external MSC / OSMO > SMSC and Amarisoft EPC Core . So wanted to check if the OSMO SMSC > could be used for this test. Unfortuantely not. However, you can of course use the Amarisoft SGs to talk to OsmoMSC SGs and support SMS services this way. You'd need OsmoMSC and OsmoHLR on the Osmocom side. You would not have any interworking with external HLRs. As there is no DIAMETER support in OsmoHLR, you'd have to manually ensure that the same subscriber information (IMSI/K/OPc/...) is present in the Amarisoft HSS and the OsmoHLR. Having a GSUP-to-DIAMETER gateway is on our wishlist, exactly to support scenarios like this: Use some 3rd party vendor EPC for 4G in parallel with the Osmocom stack for 2G/3G, both accessing one shared subscriber database. However, unfortuantely nobody has yet been able to dedicate any resources to this task. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From aathomas at fb.com Fri Mar 22 21:31:14 2019 From: aathomas at fb.com (Alex Alwin Thomas) Date: Fri, 22 Mar 2019 21:31:14 +0000 Subject: OSMO SMSc functionality In-Reply-To: <20190322210317.GC20379@nataraja> References: <20190322210317.GC20379@nataraja> Message-ID: Thanks harald for the detailed response. Unfortunately the testing scenario we want to try is to have a sperate MSC instance from another vendor , hence we cannot use the Osmo MSC . We have already tested the SG interworking between Amarisoft EPC and Osmo MSC,both MO/MT CSFB SMS was working fine. -----Original Message----- From: Harald Welte Sent: Friday, March 22, 2019 2:03 PM To: Alex Alwin Thomas Cc: openbsc at lists.osmocom.org; Saumya Raval Subject: Re: OSMO SMSc functionality Hi Alex, On Fri, Mar 22, 2019 at 06:31:09PM +0000, Alex Alwin Thomas wrote: > We had a question with regards to the OSMO MSc - as per the > documentation , it does support a scaled down version of SMSC. Yes. > Can we bring up just the SMSc service so that it can talk to another > vendor's MSC/HLR . No, this is not possible. The OsmoMSC internal SMSC functionality doesn't implement any external/standard interfaces. > We are trying to perform CSFB SMS tests with an external MSC / OSMO > SMSC and Amarisoft EPC Core . So wanted to check if the OSMO SMSC > could be used for this test. Unfortuantely not. However, you can of course use the Amarisoft SGs to talk to OsmoMSC SGs and support SMS services this way. You'd need OsmoMSC and OsmoHLR on the Osmocom side. You would not have any interworking with external HLRs. As there is no DIAMETER support in OsmoHLR, you'd have to manually ensure that the same subscriber information (IMSI/K/OPc/...) is present in the Amarisoft HSS and the OsmoHLR. Having a GSUP-to-DIAMETER gateway is on our wishlist, exactly to support scenarios like this: Use some 3rd party vendor EPC for 4G in parallel with the Osmocom stack for 2G/3G, both accessing one shared subscriber database. However, unfortuantely nobody has yet been able to dedicate any resources to this task. Regards, Harald -- - Harald Welte https://urldefense.proofpoint.com/v2/url?u=http-3A__laforge.gnumonks.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=bUQt-3DRi5-2ypcYzo74Fw&m=6lBU0kZe5yYzVejEBpuradwycqeGn0vApo9KbatrCBo&s=zgEvo9rH1-X92bDZEjtG84yfdaGGudApihwdie29O8Y&e= ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From laforge at gnumonks.org Sat Mar 23 06:14:03 2019 From: laforge at gnumonks.org (Harald Welte) Date: Sat, 23 Mar 2019 07:14:03 +0100 Subject: OSMO SMSc functionality In-Reply-To: References: <20190322210317.GC20379@nataraja> Message-ID: <20190323061403.GE20379@nataraja> Hi Alex, On Fri, Mar 22, 2019 at 09:31:14PM +0000, Alex Alwin Thomas wrote: > Unfortunately the testing scenario we want to try is to have a sperate MSC instance from another vendor , hence we cannot use the Osmo MSC . I see. In terms of open source options, I think it would boil down to then write some SMSC emulation either using: a) the TCAP/MAP stack Holger wrote in Smalltalk b) the TCAP/MAP stack of Mobicents (Java, I have no experiene with it), c) the signerl Erlang TCAP/MAP stack of Vance Shipley and myself (not so production ready but should be sufficient for some tests d) using TCAP + MAP from within TTCN-3 / Eclipse TITAN I don't know much about option 'b'. I would argue that 'a' or 'c' are the more likely candidates. 'd' seems cumbersome as the TCAP state machines are not yet implemented. However, in any of the above, I would expect quite a bit of develpoment effort is required to get to the goal of emulating the SMSC (I'm not sure if one man-week would be sufficient). > We have already tested the SG interworking between Amarisoft EPC and Osmo MSC,both MO/MT CSFB SMS was working fine. This is great. We're just about to set this up at the sysmocom office for interop testing ourselves. Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From nhofmeyr at sysmocom.de Sat Mar 23 23:43:21 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Sun, 24 Mar 2019 00:43:21 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <20190321083028.GI20379@nataraja> References: <20190320095246.GT20379@nataraja> <20190320110616.GV20379@nataraja> <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> <20190321012430.GD27744@my.box> <20190321083028.GI20379@nataraja> Message-ID: <20190323234321.GA9976@my.box> On Thu, Mar 21, 2019 at 09:30:28AM +0100, Harald Welte wrote: > Hi Neels, > > On Thu, Mar 21, 2019 at 02:24:30AM +0100, Neels Hofmeyr wrote: > > Talking of features, another idea comes to mind: freeing of FSM instances. I > > more often than not have the problem that a child FSM is freeing, telling the > > parent about it, then the parent decides to free because the child's result > > signalled some end, and then we have a double free because the child was > > allocated with the parent as talloc context, the parent has just deallocated, > > but after the child's cleanup is through, the fsm code wants to also deallocate > > the child. > > That just sounds like a bug and it should be fixed? The question is whether > we want to mandate all child contexts to be allocated from within a parent. If > so, then whatever mechanism to prevent this could / should become part > of the fsm core. > > > In these situations I so far need to introduce checks that prevent > > the parent from freeing until the children are completely finished with their > > actions -- for example, avoid signalling end from the cleanup function or event > > handling yet, instead completely rely on the parent_term_event, which is > > dispatched only after the child was freed. > > I wonder if those checks could be made part of the core. Do you have a > concrete code example to look at? Just after writing the above, I have introduced another deallocation failure mode in the cleanup among RTP-stream/call leg/MGW endpoint/MSC subscriber FSMs, and am fed up with solving these ad hoc as they show up. So now I am writing an explicit FSM deallocation test, to figure out how to properly solve various cleanup scenarios -- so that I can hopefully apply one and the same mechanism to all FSM implementations. (This might get a bit complex to follow, a proposed solution is in the bottom.) Here is the test: branch neels/fsm_dealloc_test of libosmocore http://git.osmocom.org/libosmocore/commit/?h=neels/fsm_dealloc_test&id=31ed1036cee1ed4e0ccf12e83e2bc0918af8db47 The "scene" struct shows a number of objects, but have reduced to only three being in use. branch0 <-parent/child-> twig0a \ / ----- other ------- So both a parent and child FSM reference some "other" object which they depend upon. (A real world equivalent is that both a call_leg and rtp_stream FSM depend on the MGW endpoint -- in which situation also all of those are children of an MSC subscriber FSM, just saying complexity is easily added in real life) The underlying problem here is: - when the child has an error and goes away, the parent should go away as well. So there is an event from the child to the parent saying "I am gone, you too go now": EV_CHILD_GONE. - if now I tell branch0 to deallocate regularly, it first tells the child to deallocate. The child then also says EV_CHILD_GONE. - The parent receives EV_CHILD_GONE and again triggers its own deallocation. Now there are two osmo_fsm_inst_term() called on the same FSM instance. A way to fix this is to introduce a ST_DESTROYING state. It acts as a flag to prevent re-triggering an osmo_fsm_inst_term(). So if as soon as once in ST_DESTROYING, it will no longer issue osmo_fsm_inst_term() on itself. My test models this structure. In the test code try in turns to deallocate with a EV_DESTROY event, and then a direct osmo_fsm_inst_term(). Both of them should ideally be safe to do, because on weird errors anything should be able to just term an FSM instance. Dispatching an EV_DESTROY actually works out fine. But calling osmo_fsm_inst_term() directly fails. The difference is that in the EV_DESTROY case, the branch0 FSM has changed to state ST_DESTROYING, and if it receives events that peer objects are gone, it ignores them. In the term case, the FSM instance stays in the "ALIVE" state, in which case a term triggered at the child twig0a in turn dispatches an EV_CHILD_GONE, which the parent branch0 acts upon by transitioning to the ST_DESTROYING state, which launches a *second* osmo_fsm_inst_term() on itself. The output looks like this -- first the successful run, then the failing one: ? ./fsm_dealloc_test DLGLOBAL DEBUG scene_alloc() DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: is child of test(branch0) DLGLOBAL DEBUG test(other){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: branch0.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[0] = branch0 DLGLOBAL DEBUG test(twig0a){alive}: twig0a.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[1] = twig0a DLGLOBAL DEBUG scene_alloc() DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: is child of test(branch0) DLGLOBAL DEBUG test(other){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: branch0.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[0] = branch0 DLGLOBAL DEBUG test(twig0a){alive}: twig0a.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[1] = twig0a DLGLOBAL DEBUG scene_alloc() DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: is child of test(branch0) DLGLOBAL DEBUG test(other){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: branch0.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[0] = branch0 DLGLOBAL DEBUG test(twig0a){alive}: twig0a.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[1] = twig0a DLGLOBAL DEBUG ---------------------- test_destroy(branch0) DLGLOBAL DEBUG --- before destroy cascade, got: branch0 twig0a other DLGLOBAL DEBUG --- DLGLOBAL DEBUG test(branch0){alive}: Received Event EV_DESTROY DLGLOBAL DEBUG 1 (branch0.alive()) DLGLOBAL DEBUG test(branch0){alive}: alive(EV_DESTROY) DLGLOBAL DEBUG test(branch0){alive}: state_chg to destroying DLGLOBAL DEBUG 2 (branch0.alive(),branch0.destroying_onenter()) DLGLOBAL DEBUG test(branch0){destroying}: destroying_onenter() from alive DLGLOBAL DEBUG test(branch0){destroying}: Terminating (cause = OSMO_FSM_TERM_REGULAR) DLGLOBAL DEBUG test(branch0){destroying}: pre_term() DLGLOBAL DEBUG test(twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT) DLGLOBAL DEBUG test(twig0a){alive}: pre_term() DLGLOBAL DEBUG test(twig0a){alive}: Removing from parent test(branch0) DLGLOBAL DEBUG 3 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup()) DLGLOBAL DEBUG test(twig0a){alive}: cleanup() DLGLOBAL DEBUG test(twig0a){alive}: scene forgets twig0a DLGLOBAL DEBUG test(twig0a){alive}: removing reference twig0a.other[0] -> other DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE DLGLOBAL DEBUG 4 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive()) DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE) DLGLOBAL DEBUG 5 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.other_gone()) DLGLOBAL DEBUG test(other){alive}: Dropped reference other.other[1] = twig0a DLGLOBAL DEBUG 4 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive()) DLGLOBAL DEBUG test(other){alive}: state_chg to destroying DLGLOBAL DEBUG 5 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter()) DLGLOBAL DEBUG test(other){destroying}: destroying_onenter() from alive DLGLOBAL DEBUG test(other){destroying}: Terminating (cause = OSMO_FSM_TERM_REGULAR) DLGLOBAL DEBUG test(other){destroying}: pre_term() DLGLOBAL DEBUG 6 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup()) DLGLOBAL DEBUG test(other){destroying}: cleanup() DLGLOBAL DEBUG test(other){destroying}: scene forgets other DLGLOBAL DEBUG test(other){destroying}: removing reference other.other[0] -> branch0 DLGLOBAL DEBUG test(branch0){destroying}: Received Event EV_OTHER_GONE DLGLOBAL DEBUG 7 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branc DLGLOBAL DEBUG test(branch0){destroying}: destroying(EV_OTHER_GONE) DLGLOBAL DEBUG 8 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branc DLGLOBAL DEBUG test(branch0){destroying}: Dropped reference branch0.other[0] = other DLGLOBAL DEBUG 7 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branc DLGLOBAL DEBUG 6 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup()) DLGLOBAL DEBUG test(other){destroying}: cleanup() done DLGLOBAL DEBUG 5 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter()) DLGLOBAL DEBUG test(other){destroying}: Freeing instance DLGLOBAL DEBUG test(other){destroying}: Deallocated DLGLOBAL DEBUG 4 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup(),other.alive()) DLGLOBAL DEBUG 3 (branch0.alive(),branch0.destroying_onenter(),twig0a.cleanup()) DLGLOBAL DEBUG test(twig0a){alive}: cleanup() done DLGLOBAL DEBUG 2 (branch0.alive(),branch0.destroying_onenter()) DLGLOBAL DEBUG test(twig0a){alive}: Freeing instance DLGLOBAL DEBUG test(twig0a){alive}: Deallocated DLGLOBAL DEBUG 3 (branch0.alive(),branch0.destroying_onenter(),branch0.cleanup()) DLGLOBAL DEBUG test(branch0){destroying}: cleanup() DLGLOBAL DEBUG test(branch0){destroying}: scene forgets branch0 DLGLOBAL DEBUG test(branch0){destroying}: cleanup() done DLGLOBAL DEBUG 2 (branch0.alive(),branch0.destroying_onenter()) DLGLOBAL DEBUG test(branch0){destroying}: Freeing instance DLGLOBAL DEBUG test(branch0){destroying}: Deallocated DLGLOBAL DEBUG 1 (branch0.alive()) DLGLOBAL DEBUG 0 (-) DLGLOBAL DEBUG --- after destroy cascade, still got: <------ SUCCESS! DLGLOBAL DEBUG --- cleaning up DLGLOBAL DEBUG scene_alloc() DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: is child of test(branch0) DLGLOBAL DEBUG test(other){alive}: Allocated DLGLOBAL DEBUG test(branch0){alive}: branch0.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[0] = branch0 DLGLOBAL DEBUG test(twig0a){alive}: twig0a.other[0] = other DLGLOBAL DEBUG test(other){alive}: other.other[1] = twig0a DLGLOBAL DEBUG ---------------------- test_term(branch0) DLGLOBAL DEBUG --- before term cascade, got: branch0 twig0a other DLGLOBAL DEBUG --- DLGLOBAL DEBUG test(branch0){alive}: Terminating (cause = OSMO_FSM_TERM_REGULAR) <------------ branch0 terminating DLGLOBAL DEBUG test(branch0){alive}: pre_term() DLGLOBAL DEBUG test(twig0a){alive}: Terminating (cause = OSMO_FSM_TERM_PARENT) DLGLOBAL DEBUG test(twig0a){alive}: pre_term() DLGLOBAL DEBUG test(twig0a){alive}: Removing from parent test(branch0) DLGLOBAL DEBUG 1 (twig0a.cleanup()) DLGLOBAL DEBUG test(twig0a){alive}: cleanup() DLGLOBAL DEBUG test(twig0a){alive}: scene forgets twig0a DLGLOBAL DEBUG test(twig0a){alive}: removing reference twig0a.other[0] -> other DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE DLGLOBAL DEBUG 2 (twig0a.cleanup(),other.alive()) DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE) DLGLOBAL DEBUG 3 (twig0a.cleanup(),other.alive(),other.other_gone()) DLGLOBAL DEBUG test(other){alive}: Dropped reference other.other[1] = twig0a DLGLOBAL DEBUG 2 (twig0a.cleanup(),other.alive()) DLGLOBAL DEBUG test(other){alive}: state_chg to destroying DLGLOBAL DEBUG 3 (twig0a.cleanup(),other.alive(),other.destroying_onenter()) DLGLOBAL DEBUG test(other){destroying}: destroying_onenter() from alive DLGLOBAL DEBUG test(other){destroying}: Terminating (cause = OSMO_FSM_TERM_REGULAR) DLGLOBAL DEBUG test(other){destroying}: pre_term() DLGLOBAL DEBUG 4 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup()) DLGLOBAL DEBUG test(other){destroying}: cleanup() DLGLOBAL DEBUG test(other){destroying}: scene forgets other DLGLOBAL DEBUG test(other){destroying}: removing reference other.other[0] -> branch0 DLGLOBAL DEBUG test(branch0){alive}: Received Event EV_OTHER_GONE DLGLOBAL DEBUG 5 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0.alive()) DLGLOBAL DEBUG test(branch0){alive}: alive(EV_OTHER_GONE) DLGLOBAL DEBUG 6 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0.alive(),branch0.other_gone()) DLGLOBAL DEBUG test(branch0){alive}: Dropped reference branch0.other[0] = other DLGLOBAL DEBUG 5 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0.alive()) DLGLOBAL DEBUG test(branch0){alive}: state_chg to destroying DLGLOBAL DEBUG 6 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0.alive(),branch0.destroying_onenter()) DLGLOBAL DEBUG test(branch0){destroying}: destroying_onenter() from alive DLGLOBAL DEBUG test(branch0){destroying}: Terminating (cause = OSMO_FSM_TERM_REGULAR) <-------- branch0 terminating AGAIN!?!? DLGLOBAL DEBUG test(branch0){destroying}: pre_term() DLGLOBAL DEBUG 7 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0.alive(),branch0.destroying_onenter(),branc DLGLOBAL DEBUG test(branch0){destroying}: cleanup() DLGLOBAL DEBUG test(branch0){destroying}: scene forgets branch0 DLGLOBAL DEBUG test(branch0){destroying}: cleanup() done DLGLOBAL DEBUG 6 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0.alive(),branch0.destroying_onenter()) DLGLOBAL DEBUG test(branch0){destroying}: Freeing instance DLGLOBAL DEBUG test(branch0){destroying}: Deallocated DLGLOBAL DEBUG 5 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0.alive()) DLGLOBAL DEBUG 4 (twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup()) DLGLOBAL DEBUG test(other){destroying}: cleanup() done DLGLOBAL DEBUG 3 (twig0a.cleanup(),other.alive(),other.destroying_onenter()) DLGLOBAL DEBUG test(other){destroying}: Freeing instance DLGLOBAL DEBUG test(other){destroying}: Deallocated ================================================================= ==9774==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000001128 at pc 0x7f74d8b8f249 bp 0x7ffebeffe490 sp 0x7ffebeffe488 WRITE of size 8 at 0x612000001128 thread T0 #0 0x7f74d8b8f248 in __llist_del ../../../src/libosmocore/include/osmocom/core/linuxlist.h:114 #1 0x7f74d8b8f380 in llist_del ../../../src/libosmocore/include/osmocom/core/linuxlist.h:126 #2 0x7f74d8b93eaa in osmo_fsm_inst_free ../../../src/libosmocore/src/fsm.c:404 #3 0x7f74d8b9ba9c in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:738 #4 0x563b17db94b5 in destroying_onenter ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:173 #5 0x7f74d8b987be in state_chg ../../../src/libosmocore/src/fsm.c:521 #6 0x7f74d8b98870 in _osmo_fsm_inst_state_chg ../../../src/libosmocore/src/fsm.c:577 #7 0x563b17db8db8 in alive ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:147 #8 0x7f74d8b99e2f in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:685 #9 0x563b17dbabc0 in cleanup ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:251 #10 0x7f74d8b9b292 in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:733 #11 0x7f74d8b9c1b1 in _osmo_fsm_inst_term_children ../../../src/libosmocore/src/fsm.c:784 #12 0x7f74d8b9a85e in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:720 #13 0x563b17dbd09d in obj_term ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:392 #14 0x563b17dbd8fb in test_term ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:425 #15 0x563b17dbdb62 in main ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:453 #16 0x7f74d7db909a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #17 0x563b17db5319 in _start (/n/s/dev/make/libosmocore/tests/fsm/fsm_dealloc_test+0x10319) Fixing this particular run can be done by manually setting this fi->state = ST_DESTROYING (without osmo_fsm_inst_state_chg(), since that would run the onenter() function and trigger an osmo_fsm_inst_term()). So I did: void pre_term(struct osmo_fsm_inst *fi, enum osmo_fsm_term_cause cause) { + /* sneak into "destroying" state without triggering events */ + fi->state = ST_DESTROYING; LOGPFSML(fi, LOGL_DEBUG, "%s()\n", __func__); } That's not so nice, changing states in a cheating way. A different solution could be some fsm.c internal "already terminating" flag, so I can call osmo_fsm_inst_term() any number of times, and only the first call will have an effect. One might think then the problem is solved? But then I still face this problem: - If the child is sent an EV_DESTROY, it goes into osmo_fsm_inst_term(). - As part of its cleanup, it terminates its reference to the "other" object. - Out of its own free will, the other object then decides that this means some problem appeared and decides to deallocate as well. - But the parent object also has a reference to "other". Since "other" is deallocating, it notifies the parent branch0 that it is now gone. - branch0 decides that if "other" has a problem, it is no longer useful and deallocates. - twig0a is *not* signalled to terminate from the parent branch0, because it has already removed itself from the parent at the start of its termination. - But branch0 calls talloc_free() on itself. Since twig0a was a talloc child of branch0, its memory storage is now gone. - All of the above happened while telling "other" that the twig0a is about to go. Now this code path wants to continue after the twig0a.cleanup(). Alas, its memory is no longer valid and the fsm.c termination code causes a use-after-free. Looks like this: DLGLOBAL DEBUG ---------------------- test_destroy(twig0a) DLGLOBAL DEBUG --- before destroy cascade, got: branch0 twig0a other DLGLOBAL DEBUG --- DLGLOBAL DEBUG test(twig0a){alive}: Received Event EV_DESTROY DLGLOBAL DEBUG 1 (twig0a.alive()) DLGLOBAL DEBUG test(twig0a){alive}: alive(EV_DESTROY) DLGLOBAL DEBUG test(twig0a){alive}: state_chg to destroying DLGLOBAL DEBUG 2 (twig0a.alive(),twig0a.destroying_onenter()) DLGLOBAL DEBUG test(twig0a){destroying}: destroying_onenter() from alive DLGLOBAL DEBUG test(twig0a){destroying}: Terminating (cause = OSMO_FSM_TERM_REGULAR) DLGLOBAL DEBUG test(twig0a){destroying}: pre_term() DLGLOBAL DEBUG test(twig0a){destroying}: Removing from parent test(branch0) DLGLOBAL DEBUG 3 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup()) DLGLOBAL DEBUG test(twig0a){destroying}: cleanup() DLGLOBAL DEBUG test(twig0a){destroying}: scene forgets twig0a DLGLOBAL DEBUG test(twig0a){destroying}: removing reference twig0a.other[0] -> other <----- twig0a tells "other" that it is going DLGLOBAL DEBUG test(other){alive}: Received Event EV_OTHER_GONE DLGLOBAL DEBUG 4 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive()) DLGLOBAL DEBUG test(other){alive}: alive(EV_OTHER_GONE) DLGLOBAL DEBUG 5 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.other_gone()) DLGLOBAL DEBUG test(other){alive}: Dropped reference other.other[1] = twig0a DLGLOBAL DEBUG 4 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive()) DLGLOBAL DEBUG test(other){alive}: state_chg to destroying DLGLOBAL DEBUG 5 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter()) DLGLOBAL DEBUG test(other){destroying}: destroying_onenter() from alive DLGLOBAL DEBUG test(other){destroying}: Terminating (cause = OSMO_FSM_TERM_REGULAR) DLGLOBAL DEBUG test(other){destroying}: pre_term() DLGLOBAL DEBUG 6 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup()) DLGLOBAL DEBUG test(other){destroying}: cleanup() DLGLOBAL DEBUG test(other){destroying}: scene forgets other DLGLOBAL DEBUG test(other){destroying}: removing reference other.other[0] -> branch0 DLGLOBAL DEBUG test(branch0){alive}: Received Event EV_OTHER_GONE <----- branch0 gets told "other" is gone DLGLOBAL DEBUG 7 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0 DLGLOBAL DEBUG test(branch0){alive}: alive(EV_OTHER_GONE) DLGLOBAL DEBUG 8 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0 DLGLOBAL DEBUG test(branch0){alive}: Dropped reference branch0.other[0] = other DLGLOBAL DEBUG 7 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0 DLGLOBAL DEBUG test(branch0){alive}: state_chg to destroying DLGLOBAL DEBUG 8 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0 DLGLOBAL DEBUG test(branch0){destroying}: destroying_onenter() from alive DLGLOBAL DEBUG test(branch0){destroying}: Terminating (cause = OSMO_FSM_TERM_REGULAR) DLGLOBAL DEBUG test(branch0){destroying}: pre_term() DLGLOBAL DEBUG 9 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0 DLGLOBAL DEBUG test(branch0){destroying}: cleanup() DLGLOBAL DEBUG test(branch0){destroying}: scene forgets branch0 DLGLOBAL DEBUG test(branch0){destroying}: cleanup() done DLGLOBAL DEBUG 8 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0 DLGLOBAL DEBUG test(branch0){destroying}: Freeing instance DLGLOBAL DEBUG test(branch0){destroying}: Deallocated <----- branch0 calls talloc_free() DLGLOBAL DEBUG 7 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup(),branch0 DLGLOBAL DEBUG 6 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter(),other.cleanup()) DLGLOBAL DEBUG test(other){destroying}: cleanup() done DLGLOBAL DEBUG 5 (twig0a.alive(),twig0a.destroying_onenter(),twig0a.cleanup(),other.alive(),other.destroying_onenter()) DLGLOBAL DEBUG test(other){destroying}: Freeing instance DLGLOBAL DEBUG test(other){destroying}: Deallocated <---- twig0a.cleanup() is done, fsm.c wants to continue termination... ================================================================= ==16426==ERROR: AddressSanitizer: heap-use-after-free on address 0x6120000015a8 at pc 0x7faea5cb4149 bp 0x7fff936ffed0 sp 0x7fff936ffec8 WRITE of size 8 at 0x6120000015a8 thread T0 #0 0x7faea5cb4148 in __llist_del ../../../src/libosmocore/include/osmocom/core/linuxlist.h:114 #1 0x7faea5cb4280 in llist_del ../../../src/libosmocore/include/osmocom/core/linuxlist.h:126 #2 0x7faea5cb8daa in osmo_fsm_inst_free ../../../src/libosmocore/src/fsm.c:404 #3 0x7faea5cc099c in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:738 #4 0x56367132a4c5 in destroying_onenter ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:173 #5 0x7faea5cbd6be in state_chg ../../../src/libosmocore/src/fsm.c:521 #6 0x7faea5cbd770 in _osmo_fsm_inst_state_chg ../../../src/libosmocore/src/fsm.c:577 #7 0x563671329dc8 in alive ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:147 #8 0x7faea5cbed2f in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:685 #9 0x56367132bbd0 in cleanup ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:251 #10 0x7faea5cc0192 in _osmo_fsm_inst_term ../../../src/libosmocore/src/fsm.c:733 #11 0x56367132a4c5 in destroying_onenter ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:173 #12 0x7faea5cbd6be in state_chg ../../../src/libosmocore/src/fsm.c:521 #13 0x7faea5cbd770 in _osmo_fsm_inst_state_chg ../../../src/libosmocore/src/fsm.c:577 #14 0x563671329e3d in alive ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:159 #15 0x7faea5cbed2f in _osmo_fsm_inst_dispatch ../../../src/libosmocore/src/fsm.c:685 #16 0x56367132e0be in obj_destroy ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:389 #17 0x56367132e519 in test_destroy ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:408 #18 0x56367132ebf4 in main ../../../src/libosmocore/tests/fsm/fsm_dealloc_test.c:454 #19 0x7faea4edf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #20 0x563671326329 in _start (/n/s/dev/make/libosmocore/tests/fsm/fsm_dealloc_test+0x10329) So, even though twig0a sends the EV_CHILD_GONE to its parent FSM only *after* all terminating is done, some events crossing over via "other" objects can still trigger the parent to also deallocate at the same time. Here the problem merely is that the child is allocated from the parent's talloc context. Had it been allocated from an independent talloc context, then there would not be any use-after-free. So let's try that: fi = osmo_fsm_inst_alloc_child(&test_fsm, parent->fi, EV_CHILD_GONE); OSMO_ASSERT(fi); + talloc_steal(s, fi); Now the next problem arises: when the child is done deallocating, fsm.c still has the parent FSM pointer to which it must still dispatch the parent_term event. When there is only a parent and a child involved, using the parent_term event actually solves this problem: if we were to dispatch EV_CHILD_GONE from the child's cleanup() function, then the parent would deallocate and free the child along with its own talloc context. By using the parent_term event, we wait until after cleanup() and is done and until after talloc_free() on the child is done, and only then dispatch EV_CHILD_GONE to the parent. But because a third "other" object is involved, the deallocation triger ricochets across that other object. It is out of scope to tell that other object to care about which two callers are using it and why they must not be told to deallocate in certain situations. It is certainly possible to do that, but it requires a lot of careful thinking through all possible variations, which is very easy to get wrong. The vision for fixing these kinds of situations is to avoid a use-after-free completely by keeping the talloc contexts around; Also all event dispatch and osmo_fsm_inst_term() calls must be stopped when a fi is marked to be released. - add flag osmo_fsm_inst.deallocating. If true, fsm.c denies all action triggers on the FSM instance (no osmo_fsm_inst_term(), no osmo_fsm_inst_state_chg(), no osmo_fsm_inst_dispatch()). - add a talloc bucket into which to reparent FSM instances instead of deallocating directly. Then each parent/child/"other" object that wants to trigger things on it will be able to see the "deallocating" flag, and still even use memory like the fi->id for logging. This talloc bucket could be a volatile select context, or it could even be a global context that exists once in fsm.c, the first osmo_fsm_inst_term() creates it, other callers add and remove use counts, and as soon as the last use count is removed, the entire context gets freed. I think this would solve all of the above problems. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Sun Mar 24 05:54:15 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Sun, 24 Mar 2019 06:54:15 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <20190323234321.GA9976@my.box> References: <20190320095246.GT20379@nataraja> <20190320110616.GV20379@nataraja> <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> <20190321012430.GD27744@my.box> <20190321083028.GI20379@nataraja> <20190323234321.GA9976@my.box> Message-ID: <20190324055415.GA497@my.box> > http://git.osmocom.org/libosmocore/commit/?h=neels/fsm_dealloc_test&id=31ed1036cee1ed4e0ccf12e83e2bc0918af8db47 This patch (of course) still was buggy, and now I can actually also add a working general solution and a bit more insight. At first I was really annoyed by this deallocation spaghetti distracting me yet another time from breaking through to inter-MSC HO; I was already questioning all the nice and logical FSM references I had designed in osmo-msc, even contemplated just running off and letting someone else solve it... But now I am quite glad that I took a closer look, because with this patch I can even remove some events and states (maybe drop some FSM instances entirely, which were only introduced to receive a parent_term event), while actually becoming safer than before and having to do almost no thinking to achieve that. The new fsm_dealloc_test.c and an improvement to fsm.c is pushed at branch neels/fsm_dealloc_test. http://git.osmocom.org/libosmocore/log/?h=neels/fsm_dealloc_test The first patch of three shows the current situation totally not working out. The second patch applies fsm.c "fixes" and shows all situations magically working well. The third patch simplifies fsm_dealloc_test.c, because it no longer needs the ST_DESTROYING after the new safeguards are in place. Nice. So far I am talloc_steal()ing FSM instances "freed" in osmo_fsm_inst_term() cascades to the first/outermost osmo_fsm_inst_term() fi as talloc parent, so that all get freed once in the end. Instead, "freed" instances could be reparented to a future thread volatile select context once it shows up. For now I'm very glad that I can easily fix my osmo-msc without having to depend on the select volatile ctx. > - add flag osmo_fsm_inst.deallocating. If true, fsm.c denies all action > triggers on the FSM instance (no osmo_fsm_inst_term(), no > osmo_fsm_inst_state_chg(), no osmo_fsm_inst_dispatch()). Actually it is only necessary to avoid re-entering osmo_fsm_inst_term() for the same FSM instance. - Dispatching events is fine: if FSM implementations require thwarting events when terminating, the event handlers can simply test for the new fi->proc.terminating flag; also, some FSM implementations may actually rely on receiving events while already terminating, e.g. to dereference other deallocating objects and not attempt to clean those twice. - Changing state during osmo_fsm_inst_term() is also fine, along the same reasoning. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Mon Mar 25 00:58:21 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Mon, 25 Mar 2019 01:58:21 +0100 Subject: GSUP TTCN3 tests Message-ID: <20190325005821.GA20807@my.box> The templates for GSUP in GSUP_Types.ttcn seem to expect the GSUP IEs in a specific order, which shouldn't be required. Also I cannot easily define certain GSUP IEs as not mattering. Particularly, I added Source Entity and Destination Entity IEs, and now I would have liked to just add some source_entity := *, destination_entity := *, to trivially make all tr_GSUP() pass, whether these IEs are present or not. Instead I have to now add them to a listing of IEs everywhere. That means the test suite will only work with the new osmo-msc and even if "nightly" works out, we will see "latest" failing, etc. I'd like to use the semantics I am used to in e.g. BSSMAP messages: - order doesn't matter - easy ':= *' for items in the root tr_GSUP() template What would it take to make GSUP messages match in this way? ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Mon Mar 25 13:29:27 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Mon, 25 Mar 2019 14:29:27 +0100 Subject: RFC: talloc contexts / automatic free at select loop In-Reply-To: <20190324055415.GA497@my.box> References: <20190320095246.GT20379@nataraja> <20190320110616.GV20379@nataraja> <422a9156-6675-bf82-53b2-194f05be7ca2@sysmocom.de> <20190321012430.GD27744@my.box> <20190321083028.GI20379@nataraja> <20190323234321.GA9976@my.box> <20190324055415.GA497@my.box> Message-ID: <20190325132927.GA4048@my.box> On Sun, Mar 24, 2019 at 06:54:15AM +0100, Neels Hofmeyr wrote: > So far I am talloc_steal()ing FSM instances "freed" in osmo_fsm_inst_term() > cascades to the first/outermost osmo_fsm_inst_term() fi as talloc parent, so > that all get freed once in the end. Interesting to note here is that I can apparently steal a talloc parent to become a talloc child (this is the result of the child osmo_fsm_inst_term() also causing the parent to term) parent_fi | +- child_fi talloc_steal(new_ctx=child_fi, parent_fi) I guess this should result in: child_fi | +- parent_fi I'm not entirely clear how this works out. Are those then still attached to whatever parent_fi had as a parent context? Are they floating alone? The test shows that it does work, no leaks, no loops. Since child_fi is guaranteed to be deallocated, either way would be fine. If this is bugging us we can use a different talloc_ctx to steal into -- I just wanted to avoid allocating another short-lived ctx. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From laforge at gnumonks.org Mon Mar 25 15:27:09 2019 From: laforge at gnumonks.org (Harald Welte) Date: Mon, 25 Mar 2019 16:27:09 +0100 Subject: GSUP TTCN3 tests In-Reply-To: <20190325005821.GA20807@my.box> References: <20190325005821.GA20807@my.box> Message-ID: <20190325152709.GD31330@nataraja> Hi Neels, On Mon, Mar 25, 2019 at 01:58:21AM +0100, Neels Hofmeyr wrote: > The templates for GSUP in GSUP_Types.ttcn seem to expect the GSUP IEs in a > specific order, which shouldn't be required. It shouldn't be required like in any of the protocols we deal with. The reality is different, at least for the 3GPP defined specs: Order is strictly required. You may not change the order of IEs, even though there are IEIs in the TLV structure that would allow for it. In Osmocom C language (and likely other implementations) we don't care on the receiver side, but to be a compliant transmitter, we must use the correct order. For GSUP now we can of course do something completely different, if we want to. I'm not sure what the GSUP specification document which we publish says about it. I would personally argue in favor of not permitting any random order of IEs. If you look at the various protocol Modules that Ericsson wrote for TTCN3, they all require the spec-mandated fixed order, contary to those "possibly more elgant/generic" protocol implementations that we added but which don't enforce any order. > Also I cannot easily define certain GSUP IEs as not mattering. > > Particularly, I added Source Entity and Destination Entity IEs, and now I would > have liked to just add some source_entity := *, destination_entity := *, to > trivially make all tr_GSUP() pass, whether these IEs are present or not. > > Instead I have to now add them to a listing of IEs everywhere. > That means the test suite will only work with the new osmo-msc and even if > "nightly" works out, we will see "latest" failing, etc. why is that? You can easily have a "*" inside the list of IE templates to allow for zero or more additional IEs of any kind. The GSUP IEs are defined as "type record of GSUP_IE GSUP_IEs;" which enforces strict ordering. One could theoretically use "type set of GSUP_IE GSUP_IEs" which is an unordered set allowing any order. However, a "set of" also permits for duplicate elements. Not sure if that's what you want. An alternative is the "permutation" qualifier for a "record of" template, which basically says "all of those elements must be present exactly once, in any order possible". See slide 176 of TTCN3_P.pdf So something like the combination of the change "record of" -> "set of" and the addition of a single '*' to the list of IEs should turn a receive template into what you'd like. There's also the "superset" qualifier. So if you define a set of IE templates and state you want to permit any superset on that set, then also any additional attributes beyond those minimally requierd ones in the set are accepted. See slide 175 of the "TTCN3_P.pdf" presentation. > I'd like to use the semantics I am used to in e.g. BSSMAP messages: > - order doesn't matter This is wrong. The BSSMAP dissector does not permit any order except the one specified in the specs! Regards, Harald -- - Harald Welte http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) From nhofmeyr at sysmocom.de Mon Mar 25 21:51:01 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Mon, 25 Mar 2019 22:51:01 +0100 Subject: GSUP TTCN3 tests In-Reply-To: <20190325152709.GD31330@nataraja> References: <20190325005821.GA20807@my.box> <20190325152709.GD31330@nataraja> Message-ID: <20190325215101.GB4048@my.box> Ok, wasn't actually aware that ordering is strict. But emphasizing ordering that much was actually the wrong angle in my prev mail. I see now that what I need is template overlaying, and easier item access. It's like one layer of abstraction is missing. If I want to get some value from a GSUP message, I have to use f_gsup_find_ie and further process the returned GSUP_IeValue. For example, to get a source name from a received GSUP: var GSUP_PDU prep_ho_req; GSUP.receive(tr_GSUP_E_AN_APDU(OSMO_GSUP_MSGT_E_PREPARE_HANDOVER_REQUEST, pars.imsi, OSMO_GSUP_ENTITY_MSC_A, OSMO_GSUP_ENTITY_MSC_T, destination_name := remote_msc_name)) -> value prep_ho_req; var GSUP_IeValue source_name_ie; f_gsup_find_ie(prep_ho_req, OSMO_GSUP_SOURCE_NAME_IE, source_name_ie); use(source_name_ie.source_name); The last 3 lines would, when dealing with BSSAP, look more like: use(prep_ho_req.source_name) Composing GSUP templates is less flexible, because I can't easily build up on previous templates. In other PDU types I could do overlay like template FOO_PDU tr_foo := { item := default_item, other := ?, other2 := omit } template FOO_PDU tr_foo_xyz := tr_foo { other := special_variant } But with GSUP it would have to be done like: traverse the list, find the position and work on that. This weird contraption illustrates the quirkiness: private function f_gen_tr_ss_ies( template hexstring imsi, template OCT4 sid := ?, template GSUP_SessionState state := ?, template octetstring ss := ? ) return template GSUP_IEs { /* Mandatory IEs */ var template GSUP_IEs ies := { tr_GSUP_IE_IMSI(imsi), tr_GSUP_IE_SessionId(sid), tr_GSUP_IE_SessionState(state) }; var integer last_idx := 3; /* Optional SS payload */ if (istemplatekind(ss, "*")) { ies[3] := *; last_idx := last_idx + 1; } else if (not istemplatekind(ss, "omit")) { ies[3] := tr_GSUP_IE_SSInfo(ss); last_idx := last_idx + 1; } ies[last_idx] := tr_GSUP_IE_Source_Entity(OSMO_GSUP_ENTITY_MSC_USSD); last_idx := last_idx + 1; ies[last_idx] := tr_GSUP_IE_Destination_Entity(OSMO_GSUP_ENTITY_EUSE); last_idx := last_idx + 1; return ies; } Better would be template tr_ss(imsi, sid, state, ss) := tr_GSUP { imsi := imsi, ... } where things like '*' and 'omit' trivially just work TM, and some other layer should take care of composing a list of IEs from this record. In this way I would be able to add new IEs in one single place to make all code work again. Compare this patch, where each and every GSUP composition has to be touched: http://git.osmocom.org/osmo-ttcn3-hacks/commit/?h=neels/ho&id=e8363aff0cb70889ed96b4f3f11c27b938af5222 Furthermore, what would be really nice for inter-MSC testing: if I could use templates for the inter-MSC BSSAP PDU included in the GSUP PDU directly. So far I have the PDU included in the GSUP as an octetstring, which means it can't intelligently match templates. I guess that could be simple enough to achieve, but one limitation is that the included PDU could actually be RANAP. We could model our GSUP AN-APDU in TTCN3 to always expect BSSAP messages, in practice would work fine, but it would be a bit of a cheat. Either way, I am not sure how to implicitly de-/encode the enclosed PDU. An example of where this would help -- I am currently trying to see whether I can do some SMS negotiation across the E link: On a BSSAP link it is simple: var PDU_DTAP_MT dtap_mt; l3_mt := tr_ML3_MT_SMS(?, c_TIF_ORIG, tr_CP_DATA_MT(rp_mt)); BSSAP.receive(tr_PDU_DTAP_MT(l3_mt, spars.dlci)) -> value dtap_mt; In inter-MSC GSUP the same thing becomes something like: (Not actually sure if this compiles etc.) l3_mt := tr_ML3_MT_SMS(?, c_TIF_ORIG, tr_CP_DATA_MT(rp_mt)); var GSUP_PDU gsup_l3_mt; GSUP.receive(tr_GSUP_E_AN_APDU) -> value gsup_l3_mt; <--- receive *ANY* PDU var GSUP_IeValue an_apdu; f_gsup_find_ie(gsup_l3_mt, OSMO_GSUP_AN_APDU_IE, an_apdu); <--- look up in list var PDU_ML3_NW_MS dtap_mt := dec_PDU_ML3_NW_MS(an_apdu.an_apdu.pdu); if (not match(dtap_mt, l3_mt)) { <--- match later setverdict(fail, "Unexpected DTAP from remote MSC:", dtap_mt); mtc.stop; } I think I'll rather not grind through an entire SMS negotiation like that... (I guess I'll rather do a simpler *#100# for now) On a meta level, I feel that getting the TTCN3 side to work out for what I want to test is a whole nother chapter, adding a complete new business item to the road towards reaching inter-MSC HO. Testing in TTCN3 saves time compared to testing with real hardware once the test exists, and produces nice tests for all of the future. But I am currently feeling side tracked by tweaking these things to work out nicely and would like to rather spend my time implementing the MNCC call forwarding. Is there a chance that anyone else than me can work out inter-MSC testing in TTCN, or rather, spend time to work out above two issues in general, to make enhancing the tests easier for me? A dream goal would be to be able to use other functions like f_mt_sms() 1:1, with some trivial switch that does all the BSSAP.send and BSSAP.receive via GSUP instead of direct BSSAP. I guess we can't do that though, can we? > You can easily have a "*" inside the list of IE templates to allow > for zero or more additional IEs of any kind. Ok, that could make it simpler, but that is semantically a different thing than *-ing single items in a record. Now suddenly I am allowing any and all garbage IEs. > The GSUP IEs are defined as "type record of GSUP_IE GSUP_IEs;" which enforces > strict ordering. One could theoretically use "type set of GSUP_IE GSUP_IEs" Ok, I see. Sorry to have emphasized ordering too much, it was actually a wrong perception of mine that the ordering is the pivotal point, wrote the mail after enhancing above f_gen_tr_ss_ies(). Permutation and superset also sound interesting, didn't know about them. Still I have a feeling that they are actually too permissive. And it's more about template overlaying and accessing members than ignoring new items. I would rather avoid wildcard accepting any IEs, related or not. With a template record, we would explicitly add just the new items, and allow just them to be *. AFAICT that's not possible with an IE list? ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nhofmeyr at sysmocom.de Mon Mar 25 21:58:26 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Mon, 25 Mar 2019 22:58:26 +0100 Subject: GSUP TTCN3 tests In-Reply-To: <20190325215101.GB4048@my.box> References: <20190325005821.GA20807@my.box> <20190325152709.GD31330@nataraja> <20190325215101.GB4048@my.box> Message-ID: <20190325215826.GC4048@my.box> On Mon, Mar 25, 2019 at 10:51:01PM +0100, Neels Hofmeyr wrote: > where things like '*' and 'omit' trivially just work TM, and some other layer > should take care of composing a list of IEs from this record. To clarify, the idea is to not match the IE lists of incoming messages, but do all message matching on the template record level instead. Like it happens in BSSAP AFAICT. ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From holger at freyther.de Tue Mar 26 05:23:31 2019 From: holger at freyther.de (Holger Freyther) Date: Tue, 26 Mar 2019 05:23:31 +0000 Subject: libosmocore wishlist In-Reply-To: <20190320101353.GU20379@nataraja> References: <20190320101353.GU20379@nataraja> Message-ID: <8E35532D-636C-409F-A299-9F028E06BA59@freyther.de> > On 20. Mar 2019, at 10:13, Harald Welte wrote: > > While working on the talloc context patches, I was wondering if we should > spend a bit of time to further improve libosmocore and collect something > like a wishlist. > I would currently identify the following areas: > > 1) initialization of the various sub-systems is too complex, there are too > many functions an application has to call. I would like to move more > to a global "application initialization", where an application registers > some large struct [of structs, ...] at start-up and tells the library > the log configuration, the copyright statement, the VTY IP/port, the config > file name, ... (some of those can of course be NULL and hence not used) ack. One big struct for options? But how would this work across libosmocore and libosmonetif/libosmo-abis? In Go there is a "pattern" to pass an options struct into the method. > 2) have some kind of extensible command line options/arguments parser > It would be useful to have common/library parts register some common > command line arguments (like config file, logging, daemonization, ..) > while the actual appliacation extending that with only its application-specific > options. I don't think this is possible with how getopt() works, so > it would require some new/different infrastructure how applications would > register their arguments I started to like the absl flags infrastructure (but we need to make sure to not have an excessive amount of them): https://abseil.io/docs/python/guides/flags flags.DEFINE_integer('age', None, 'Your age in years.', lower_bound=0) The same concept exsists for C++, Java, Go, Python and Bash. > 3) move global select() state into some kind of structure. This would mean > that there could be multiple lists of file descriptors rather than the > one implicit global one. Alternatively, turn the state into thread-local > storage, so each thread would have its own set of registered file descriptors, > which probably makes most sense. Not sure if one would have diffeent 'sets' > of registered file descriptors in a single thread. The same would apply > for timers: Have a list of timers for each thread; timeouts would then > also always execute on the same thread. This would put talloc context, select > and timers all in the same concept: Have one set of each on each thread, > used automatically. Do we plan to have threads? On the low-end we could have an EventServer that runs one epoll_wait per thread. But then we are in the game of scheduling across the threads, work stealing, etc. Maybe something already exists we can use? On the high-end I wondered if we could have something like "fibers" and FSMs and CSP as first class citizens? * When creating a new fsm it gets scheduled on the least busy worker thread * When creating a child it stays on the same thread. * Components communicate strictly using a CSP like primitive. * We can scale up/scale down worker threads based on load. 4) Adopt/build an RPC mechanism (maybe evolve GSUP to it). I underestimated the "network" effect of every binary offering the same RPC interface. Suddenly sending a SMS, placing a call.. becomes.. the_rpc_cli endpoint service.method < arguments And to inspect a service the_rpc_cli endpoint ls [service.method] 5) Plan for seamless/cooperative upgrades. E.g. by passing fds somewhere else. E.g. leave existing TCP connections in the old process and accept new ones in the new version. The difficulty is how to deal with the VTY and other services. We probably need a meta server.. and meta server upgrades. Or this might be the time to break from VTY. Give up on runtime reconfiguration (we never had a solid model for it) and see how plain rpc can save our day? From axilirator at gmail.com Thu Mar 28 12:02:40 2019 From: axilirator at gmail.com (Vadim Yanitskiy) Date: Thu, 28 Mar 2019 19:02:40 +0700 Subject: GSUP routing for different kinds of entities In-Reply-To: <20190321013608.GE27744@my.box> References: <20190317230004.GA25662@my.box> <20190321013608.GE27744@my.box> Message-ID: Ni Neels, Sorry for late reply, I am experiencing a big lack of free time :/ > I would much rather have an explicit destination entity advertised in the GSUP > messages, and an explicit common GSUP MUX stage. In other words, the VLR of > osmo-msc shouldn't act as a GSUP forwarder, it should merely be one of the GSUP > consumers, and shouldn't even be involved when the messages are intended for > inter-MSC, for USSD or for SMS use. ACK. I like the idea of having source / destination IEs for routing purposes. I was also thinking about having separate GSUP-connections for both MSC and VLR, but since VLR is a part of MSC, I don't see any benefits from this approach. > Still would like to get input on naming: > For SMS, we have the SMSC and ESME entities. Correct, but AFAIR, ESME is something specific to SMPP only. In case of GSUP (and MAP) we always deal with SMSC. Therefore, I think we don't need OSMO_GSUP_ENTITY_ESME entity at all. > Are there similar terms for USSD? Which is the entity managing > USSD dialogs? which is the entity sending the USSD messages? Here we have the following picture: |- <-GSUP-> EUSE (foo) MS / UE <-> RAN <-> MSC <-GSUP-> HLR <-GSUP-> EUSE (bar) |- <-GSUP-> EUSE (zoo) EUSE stands for External Unstructured supplementary Services Entity. One can configure prefix-based routing in OsmoHLR, so USSD requests coming from MS / UE can be routed to one of the connected EUSEs. The following entity types from proposed enum are involved here: - OSMO_GSUP_ENTITY_MSC_B, - OSMO_GSUP_ENTITY_HLR, - OSMO_GSUP_ENTITY_EUSE. > Thinking, in fact if our osmo-msc wasn't siamese twins with > the SMSC we would have ESME <-> SMSC <-> MSC. As Harald noted, at some point we would need to rip out the internal SMSC functionality from OsmoMSC. At the moment, it can be disabled from the VTY using 'sms-over-gsup' parameter. >From what I remember, a simplified SMS delivery in our MAP-less stack (since we use GSUP and HLR as the router) should look this way: MS / UE (Alice) -> RAN -> MSC |-GSUP-> HLR |-GSUP-> OsmoSMSC | MS / UE (Sarah) <- RAN <- MSC <-GSUP-| HLR <-GSUP-| OsmoSMSC So we have the following entity types involved here: - OSMO_GSUP_ENTITY_MSC_B, - OSMO_GSUP_ENTITY_HLR, - OSMO_GSUP_ENTITY_SMSC. For more details, please see: https://git.osmocom.org/osmo-gsm-manuals/tree/common/chapters/gsup_mo_sms.msc https://git.osmocom.org/osmo-gsm-manuals/tree/common/chapters/gsup_mt_sms.msc > MSC-SMS / MSC-USSD I don't think we need to overload the entity types with service types they provide. In other words, instead of MSC-SMS / MSC-USSD we can use single OSMO_GSUP_ENTITY_MSC_B. With best regards, Vadim Yanitskiy. From nhofmeyr at sysmocom.de Thu Mar 28 16:14:47 2019 From: nhofmeyr at sysmocom.de (Neels Hofmeyr) Date: Thu, 28 Mar 2019 17:14:47 +0100 Subject: GSUP routing for different kinds of entities In-Reply-To: References: <20190317230004.GA25662@my.box> <20190321013608.GE27744@my.box> Message-ID: <20190328161447.GB7022@my.box> Thanks for your input! Currently, my patch looks like this: enum osmo_gsup_entity { OSMO_GSUP_ENTITY_NONE = 0, OSMO_GSUP_ENTITY_HLR = 1, OSMO_GSUP_ENTITY_VLR = 2, OSMO_GSUP_ENTITY_ESME = 3, (see below) OSMO_GSUP_ENTITY_SMSC = 4, OSMO_GSUP_ENTITY_MSC_SMS = 5, OSMO_GSUP_ENTITY_EUSE = 6, OSMO_GSUP_ENTITY_MSC_USSD = 7, OSMO_GSUP_ENTITY_MSC_A = 8, OSMO_GSUP_ENTITY_MSC_I = 9, OSMO_GSUP_ENTITY_MSC_T = 10, OSMO_GSUP_ENTITY_MAXVAL = OSMO_GSUP_ENTITY_MSC_T }; > I don't think we need to overload the entity types with service types I think we do. We have various message types that are distinct to individual handlers -- SMS go to gsm_04_11_gsup.c, USSD to gsm_09_11.c. This is not osmo-msc specific, if any other MSC would implement these GSUP messages, the individual SMS, USSD, ... parts would always be handled in distinct code paths. So, we need to demux when receiving GSUP from the HLR. So far we demux by message type alone. But that means we *must* have distinct message types for each subsystem. Even for identical semantics (like routing error responses) we would need distinct msg_type, or other heuristics. When I am adding this "entity" IE, I want to rather go all the way and make demuxing in osmo-msc trivial: static enum osmo_gsup_entity gsup_client_mux_classify(struct gsup_client_mux *gcm, const struct osmo_gsup_message *gsup_msg) { if (gsup_msg->destination_entity) return gsup_msg->destination_entity; /* <-- this is the "trivial" bit */ LOGP(DLGSUP, LOGL_ERROR, "No destination entity, trying to guess from message type %s\n", osmo_gsup_message_type_name(gsup_msg->message_type)); switch (gsup_msg->message_type) { case OSMO_GSUP_MSGT_PROC_SS_REQUEST: case OSMO_GSUP_MSGT_PROC_SS_RESULT: case OSMO_GSUP_MSGT_PROC_SS_ERROR: return OSMO_GSUP_ENTITY_MSC_USSD; /* GSM 04.11 code implementing MO SMS */ case OSMO_GSUP_MSGT_MO_FORWARD_SM_ERROR: case OSMO_GSUP_MSGT_MO_FORWARD_SM_RESULT: case OSMO_GSUP_MSGT_READY_FOR_SM_ERROR: case OSMO_GSUP_MSGT_READY_FOR_SM_RESULT: case OSMO_GSUP_MSGT_MT_FORWARD_SM_REQUEST: return OSMO_GSUP_ENTITY_SMSC; default: return OSMO_GSUP_ENTITY_VLR; } } int gsup_client_mux_rx(struct osmo_gsup_client *gsup_client, struct msgb *msg) { [...] entity = gsup_client_mux_classify(gcm, &gsup); [...] rc = gcm->rx_cb[entity].func(gcm, gcm->rx_cb[entity].data, &gsup); which feeds directly into gsm411_gsup_rx(), gsm0911_gsup_rx(), ... no further interpretation needed } If you instead keep the USSD and SMS subsystems in the same "entity", then we still need to demux between those based on message type. > AFAIR, ESME is something specific to SMPP only. Ok, seems I got this wrong then. ENTITY_MSC_SMS <---[osmo-hlr]---> ENTITY_SMSC yes? > > > Are there similar terms for USSD? Which is the entity managing > > USSD dialogs? which is the entity sending the USSD messages? > > Here we have the following picture: > > |- <-GSUP-> EUSE (foo) > MS / UE <-> RAN <-> MSC <-GSUP-> HLR <-GSUP-> EUSE (bar) > |- <-GSUP-> EUSE (zoo) > > EUSE stands for External Unstructured supplementary Services Entity. > One can configure prefix-based routing in OsmoHLR, so USSD requests > coming from MS / UE can be routed to one of the connected EUSEs. > > The following entity types from proposed enum are involved here: > > - OSMO_GSUP_ENTITY_MSC_B, MSC_B didn't make it. Instead I require more distinct MSC-A, MSC-I and MSC-T. But these will only be used for actual inter-MSC messages. i.e., a USSD response doesn't go to the MSC-A entity, since that would be the inter-MSC forwarding subsystem. Rather, it would go to MSC_USSD. That again would figure out which subscriber is involved and then possibly forward to a remote ENTITY_MSC_I. > - OSMO_GSUP_ENTITY_HLR, ENTITY_HLR should only ever appear if we actually interact with the HLR database. When osmo-hlr merely does the GSUP routing, no ENTITY_HLR appears in the messages. Only ultimate source and destination entities are named. This also means we could at some point trivially separate GSUP routing from the osmo-hlr process if we wanted. > - OSMO_GSUP_ENTITY_EUSE. ack. so: ENTITY_MSC_USSD <---[osmo-hlr]---> ENTITY_EUSE > > Thinking, in fact if our osmo-msc wasn't siamese twins with > > the SMSC we would have ESME <-> SMSC <-> MSC. > > As Harald noted, at some point we would need to rip out the > internal SMSC functionality from OsmoMSC. At the moment, it > can be disabled from the VTY using 'sms-over-gsup' parameter. > > From what I remember, a simplified SMS delivery in our MAP-less > stack (since we use GSUP and HLR as the router) should look > this way: > > MS / UE (Alice) -> RAN -> MSC |-GSUP-> HLR |-GSUP-> OsmoSMSC > | > MS / UE (Sarah) <- RAN <- MSC <-GSUP-| HLR <-GSUP-| OsmoSMSC ENTITY_MSC_SMS <---[osmo-hlr]---> ENTITY_SMSC I will adjust to drop ESME and fix the ENTITY_SMSC being on the wrong side. Otherwise IMHO it should stay as I have it. Would you agree, or am I still getting something wrong? ~N -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From axilirator at gmail.com Sat Mar 30 13:20:49 2019 From: axilirator at gmail.com (Vadim Yanitskiy) Date: Sat, 30 Mar 2019 20:20:49 +0700 Subject: GSUP routing for different kinds of entities In-Reply-To: <20190328161447.GB7022@my.box> References: <20190317230004.GA25662@my.box> <20190321013608.GE27744@my.box> <20190328161447.GB7022@my.box> Message-ID: Hi Neels, >> AFAIR, ESME is something specific to SMPP only. > > Ok, seems I got this wrong then. > ENTITY_MSC_SMS <---[osmo-hlr]---> ENTITY_SMSC Yes, correct. Let's drop ESME. > ENTITY_HLR should only ever appear if we actually interact with the HLR > database. When osmo-hlr merely does the GSUP routing, no ENTITY_HLR appears in > the messages. Only ultimate source and destination entities are named. USSD is not that simple in terms of routing. According to GSM 03.90, a MO USSD request goes through the following chain: MS / UE -> RAN -> MSC -> VLR -> HLR -> EUSE and either of MSC, VLR, HLR, or EUSE can terminate this request. In other words, both MSC and VLR can also handle (and initiate) some USSD requests, and the routing configuration is up to the service provider. This is how it works in commercial networks. In our case, we still route a USSD request from MS / UE through the VLR towards OsmoHLR, which can: - either handle it internally (e.g. *#100#, *#101#), we call this IUSE - Internal USSD Entity; - or route it further to some EUSE (external one). Therefore, when a USSD request is received at OsmoMSC, we don't yet know what would be the destination entity: HLR or some EUSE. As soon as that USSD request is handled somewhere, the handling entity would indicate itself in the response using 'src_entity' IE. The question is which 'dst_entity' IE value should we use at the beginning? ;) I see two potential approaches here: a) the simplest one: just use OSMO_GSUP_ENTITY_EUSE, the "farthest" entity in the whole chain; b) a bit more elegant: use entity type of the next component until the USSD request is terminated; To clarify b), here is an example of 'dst_entity' permutations: 1. Some MS / UE initiates USSD request *#100#, and we receive it at MSC. Since our OsmoMSC doesn't handle USSD requests itself (commercial ones do), it forwards to the VLR: OSMO_GSUP_ENTITY_MSC_USSD -> OSMO_GSUP_ENTITY_VLR, 2. Since our VLR doesn't handle USSD requests itself (commercial ones do), it forwards received GSUP message to the HLR: OSMO_GSUP_ENTITY_MSC_USSD -> OSMO_GSUP_ENTITY_HLR 3. OsmoHLR receives that USSD request, and checks the routing configuration. By default, *#100# is handled by IUSE called 'own_msisdn', so the HLR terminates request and responds back to OsmoMSC. In TCAP/MAP, the response would go directly to OsmoMSC via a new connection. In our case, we just (re)use the existing TCP/GSUP connection: OSMO_GSUP_ENTITY_MSC_USSD <- OSMO_GSUP_ENTITY_HLR I prefer this approach. What do you think? What if inter-MSC handover happens during an active USSD session before the HLR (or any other 'dst_entity') responds to USSD? With best regards, Vadim Yanitskiy.