Attention is currently required from: laforge, pespin.
neels has posted comments on this change. (
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539?usp=email )
Change subject: per-HNB GTP-U traffic counters via nft
......................................................................
Patch Set 6:
(26 comments)
Patchset:
PS6:
Hi @nhofmeyr@sysmocom. […]
Actually it is the
same thread running twice.
There is one "instruction set" and the caller decides how to split the thread
responsibilities.
It would not be a problem to run only one thread, or three, which is the point of this
implementation.
For review of the blocking/nonblocking nature, you may find interesting to know that all
requests are handled by the maintenance thread, except that the second thread handles ONLY
and ALL the GET_COUNTERS requests. This is also described in a comment in-code IIRC.
For review of functionality it should not make any difference how often the thread runs.
File include/osmocom/hnbgw/hnbgw.h:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/c52e8836_5a81c567
PS6, Line 384: /* When rules to count traffic to and from this hNodeB are present, this
reflects the state in nftables
I'm not understanding this comment. You first
mention "rules", and after "nftables rules". […]
"rules" and "nftables rules" is the same thing.
is this a better comment? :
State that the main thread needs in order to know what to request from the worker
threads.
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/0d7e16e5_ff928953
PS6, Line 402: struct {
maybe you want to declare the struct type once and use
it twice here. […]
(in an earlier patch set i actually had the two combined in one
struct.)
The two nft_kpi_handle and nft_kpi_val are also used in function signatures and static
vars, but this combined one (ul, dl) exists only here. I can make an extra struct for
that, but that makes more lines of code, not less.
should i still do that? i wouldn't do it, but my opinion is not strong
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/63da49ff_ea065842
PS6, Line 446: bool enable;
I actually think they are self-explanatory.
nft_kpi.enable means nft_kpi is enabled,
the nft_kpi.table_name is used as name for the nft table... =)
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/d018d3ed_0a5347a2
PS6, Line 483: struct {
I actually think they are self-explanatory.
I'm not a friend of comments like
/* timer to get nftables counters */
nft_kpi.get_counters_timer;
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/c94caac6_48d70682
PS6, Line 514: void hnb_persistent_connected(struct hnb_persistent *hnbp);
difficult to understand what are these exactly only by
looking at the name. […]
(not sure i understand what you mean by stack level)
i'll add comments to describe. they are a late addition, it seems i forgot.
File include/osmocom/hnbgw/nft_kpi.h:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/88db9e0a_5132b9cb
PS6, Line 7: struct nft_kpi_handle {
I actually think they are simple, clarly named and
self-explanatory.
i can explain in a comment that nftables has "handles"
required to remove unnamed rules...?
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/b46445d9_0114dca7
PS6, Line 19: void nft_kpi_hnb_persistent_init(struct hnb_persistent *hnbp);
These APIs seems to be counterparts, but sounds weird
having init vs remove.
ah yes, i had names based on "init the ruleset" at
some point, this is a leftover. i'll make it "add" instead.
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/2c1be44b_7aa4dd63
PS6, Line 21: int nft_kpi_hnb_start(struct hnb_persistent *hnbp, const struct
osmo_sockaddr_str *gtpu_remote);
This addr is probably coming from RANAP in binary
form, so feels weird having it converted to a osmo […]
nftables uses strings (also
all logging), so it makes sense to use strings here.
(This is the same as in MGCP, also a string based protocol, where it is also often more
useful to store the strings than the sockaddr)
File src/osmo-hnbgw/hnbgw.c:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/992ef815_ac9bbe62
PS6, Line 346: hnb_persistent_disconnected(ctx->persistent);
so API is not really protocol related. […]
(or
"active" and "inactive", s.a.)
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/ae15427e_d9a3ee85
PS6, Line 590: if (osmo_sockaddr_str_from_osa(&remote_str, &osa)) {
there we go, a sockaddr transformed into a
sockaddr_str, not sure if there's a real reason to do so, […]
yes
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/a32a6f58_262c4afe
PS6, Line 607: {
Sounds like a self-made FSM right? :P […]
exactly what i mean above. But this patch is not about fixing the HNBAP / HNB
activity tracking. It only shows up because I add the two functions that should already
have existed before this patch.
So my idea here is to add these "stubs" for a proper FSM to be done in a future
patch.
That future FSM should then invoke these properly, instead of how osmo-hnbgw does it now.
File src/osmo-hnbgw/hnbgw_hnbap.c:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/4a88b01d_5dbf52f7
PS6, Line 571: hnb_persistent_connected(ctx->persistent);
Since this happens at registration time, maybe rename
it to "hnb_persistent_registered"? I still fin […]
yes, but how osmo-hnbgw
handles HNBAP is weird. We start up a HNB when HNBAP HNB Register is received, but we
practically never see a HNBAP HNB De-Register happen. We mostly just see the HNB's Iuh
link drop without prior notice; often even after the same cell has registered again from a
new connection.
My opinion here is not very strong. I can make it hnb_persistent_registered() and
hnb_persistent_deregistered() -- but then again we have the ambiguity that the
"deregistered()" is in practice just a disconnected link...
I think we need to actually further clarify the connected/disconnected and the HNBAP
Register and Deregister behavior of osmo-hnbgw, in general, in separate patches. (e.g.
what if HNBAP Deregisters but the Iuh link is never dropped... I think osmo-hnbgw
currently does not catch that)
Maybe "hnb_active" and "hnb_inactive" are more general terms, where
"active" means *both* HNBAP Registered *and* Iuh still connected? (and we can
later make osmo-hnbgw actually do this properly in all cases)
what do you think
File src/osmo-hnbgw/hnbgw_vty.c:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/3fa54b1c_c4bb85aa
PS6, Line 884: DEFUN(cfg_hnbgw_nft_kpi, cfg_hnbgw_nft_kpi_cmd,
I think we have VTY CMD flags to state that changes
apply after a process restart, you could use the […]
I wanted to, but it does not
exist AFAICT:
```
/*! Attributes (flags) for \ref cmd_element */
enum {
CMD_ATTR_DEPRECATED = (1 << 0),
CMD_ATTR_HIDDEN = (1 << 1),
CMD_ATTR_IMMEDIATE = (1 << 2),
CMD_ATTR_NODE_EXIT = (1 << 3),
CMD_ATTR_LIB_COMMAND = (1 << 4),
};
```
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/6cae760a_478f759a
PS6, Line 899: g_hnbgw->config.nft_kpi.table_name = talloc_strdup(g_hnbgw,
set_table_name);
These 3 lines is basically a
osmo_talloc_replace_string right?
yup, it incrementally became this so i didn't
notice, thx
File src/osmo-hnbgw/nft_kpi.c:
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/0490a540_ff8cff4a
PS6, Line 18: #include <inttypes.h>
Some overall architecture comment description here
would be great, to quickly figure out the interes […]
hm i thought i had that
somewhere...
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/3e9f7fdd_5972acb9
PS6, Line 86: static __thread struct nft_thread *g_nft_thread = NULL;
Not exactly imho. […]
the nft_thread could be
launched any number of times.
IMHO this is what the __thread is for, exactly.
This is a per-thread global state.
For any thread not running an nft command queue, g_nft_thread == NULL (here main() has
g_nft_thread == NULL, and the two nft threads have this pointing at the static global
state for that thread.)
I would like this to stay this way.
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/0e65f920_bd46f367
PS6, Line 120: LOGP(DNFT, LOGL_ERROR, "error running nft cmd %s#%u: rc=%d
cmd=%s\n",
if nft fails, we have no way to say so to the
callback?
yes we have the rc that is carried back to the main thread in the queued
response.
we could argue whether all those failures are handled properly, in nft_thread_t2m_cb().
But the hard part is: what do you want to do when nftables didn't work.
All we can do really is output an error for the user to know and otherwise carry on as we
did, to not completely disrupt service...? Or should we crash the program? Should we
reject the hNodeB because our counters don't work? none of it makes much sense
really... "nft just has to work."
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/795a7c3f_17857e37
PS6, Line 182: enum nft_thread_req_type {
These can probably be moved up before the static
functions.
can, but this is the part of the file that defines the thread request
queue, so i'd like this to be here.
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/f8daf865_8ced844c
PS6, Line 193: static const char * const nft_thread_req_type_name[] = {
osmo_value_string?
no. i can give reasons, but i
hope "no" is sufficient?
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/d21ae530_c830874b
PS6, Line 207: struct {
IIUC these are actually the params for the different
primitives right?
hence the naming yes
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/320add70_b368e60e
PS6, Line 251: OSMO_ASSERT(g_nft_thread);
why do you need the TLS g_nft_hread allocated
statically? Why not allocate it using talloc here? […]
Why allocate it using
talloc?
If you tried to implement that you'd see that it makes no sense to request that.
If at all the main thread would talloc them, but why.
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/a565f984_6fbc3077
PS6, Line 297: static void nft_t2m_enqueue(struct nft_thread *t, struct nft_thread_req
*req)
you may want to use the concept of "worker
thread" vs "main thread", because well, main is also a th […]
The
terms I use so far are "main()" and "(nft-)thread", because main()
isn't started by pthread_create(). Do you want this changed to nft_w2m_ and _m2w_?
This is either just a fringe naming opinion bikeshed (as in, why bother me with a single
letter in a static function's name?) -- or it is an important naming convention i am
not aware of. which is it?
"ctx": i don't like meaningless words like "foo_context" or
"foo_data" or "foo_storage", because every struct foo is all of these,
implicitly. It is not the thread's "context", it is the entire thread and
all of its state. It simply is the thread, period.
I really dislike the name "hnb_context" in osmo-hnbgw. It is just the hnb state,
not some context around the hnb state.
For talloc it is really an actual ctx, and the name fits well there, but anywhere else we
use the "ctx" name wrongly IMHO.
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/2ba58d8a_a76cb26d
PS6, Line 415: static int update_ctr(struct rate_ctr_group *cg, int cgidx, uint64_t
*last_val, uint64_t new_val)
I first though about cgi when reading this. […]
but the second thought made you realize that there is a cg counter group involved,
right =)
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/22ecfa02_078ef35e
PS6, Line 419: rate_ctr_add2(cg, cgidx, new_val - *last_val);
Oh I'm surprised to find this function here. […]
update_ctr does what it does: updating counters. The important part is where is it
called from:
update_ctr() is a code-dedup helper for hnb_update_counters(), that in turn is a
make-my-loop-more-readable helper for main_thread_handle_get_counters_resp(). is that not
clear enough? =)
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539/comment/64391f7e_3ff09bf0
PS6, Line 452: size_t want = g_nft_thread->counters_len + 64;
I'd say usually one does *2 here. […]
I used
\*2 in an array list for a diff algorithm implementation in the
https://gameoftrees.org/
project, and learnt that *2 is bad because it runs away too fast. My code blew up memory
usage, quite unreasonably. 64 is a reasonable amount of hNodeBs that we can tolerate a
reallocation happening for.
since a new hnb showing up is such a rare event, it would even be ok to reallocate for
each single new hnb. So now we reduce by factor 64, way way more than good enough.
(be aware that once we have enough room for the nr of hnb that are in use, there are no
reallocations anymore at all.)
--
To view, visit
https://gerrit.osmocom.org/c/osmo-hnbgw/+/36539?usp=email
To unsubscribe, or for help writing mail filters, visit
https://gerrit.osmocom.org/settings
Gerrit-Project: osmo-hnbgw
Gerrit-Branch: master
Gerrit-Change-Id: Ib2f0a9252715ea4b2fe9c367aa65f771357768ca
Gerrit-Change-Number: 36539
Gerrit-PatchSet: 6
Gerrit-Owner: neels <nhofmeyr(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: laforge <laforge(a)osmocom.org>
Gerrit-Reviewer: pespin <pespin(a)sysmocom.de>
Gerrit-Attention: laforge <laforge(a)osmocom.org>
Gerrit-Attention: pespin <pespin(a)sysmocom.de>
Gerrit-Comment-Date: Wed, 22 May 2024 00:16:09 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: laforge <laforge(a)osmocom.org>
Comment-In-Reply-To: pespin <pespin(a)sysmocom.de>
Gerrit-MessageType: comment