Attention is currently required from: dexter.
neels has posted comments on this change. ( https://gerrit.osmocom.org/c/osmo-hnbgw/+/36141?usp=email )
Change subject: improve HNBAP error logging
......................................................................
Patch Set 3:
(1 comment)
Patchset:
PS3:
i'm giving this the benefit of the doubt, because it is so trivial
--
To view, visit https://gerrit.osmocom.org/c/osmo-hnbgw/+/36141?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings
Gerrit-Project: osmo-hnbgw
Gerrit-Branch: master
Gerrit-Change-Id: I17d2809f59087d32e7c11a3ada1d3fadf6f0b660
Gerrit-Change-Number: 36141
Gerrit-PatchSet: 3
Gerrit-Owner: neels <nhofmeyr(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: neels <nhofmeyr(a)sysmocom.de>
Gerrit-Reviewer: pespin <pespin(a)sysmocom.de>
Gerrit-CC: dexter <pmaier(a)sysmocom.de>
Gerrit-Attention: dexter <pmaier(a)sysmocom.de>
Gerrit-Comment-Date: Tue, 11 Jun 2024 00:40:38 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment
Jenkins Builder has posted comments on this change. ( https://gerrit.osmocom.org/c/osmo-hnbgw/+/37188?usp=email )
Change subject: nft-kpi: remove X34 drifting: adjust delay by elapsed time
......................................................................
Patch Set 1:
(1 comment)
File src/osmo-hnbgw/nft_kpi.c:
Robot Comment from checkpatch (run ID jenkins-gerrit-lint-16677):
https://gerrit.osmocom.org/c/osmo-hnbgw/+/37188/comment/86626ffa_5787dfd3
PS1, Line 729: LOGP(DNFT, LOGL_DEBUG, "nft-kpi: scheduling timer: period is %ld.%06ld s, next occurence in %ld.%06ld s\n",
'occurence' may be misspelled - perhaps 'occurrence'?
--
To view, visit https://gerrit.osmocom.org/c/osmo-hnbgw/+/37188?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings
Gerrit-Project: osmo-hnbgw
Gerrit-Branch: master
Gerrit-Change-Id: I04f572890c04a48bb19c59f613a492ef96624baa
Gerrit-Change-Number: 37188
Gerrit-PatchSet: 1
Gerrit-Owner: neels <nhofmeyr(a)sysmocom.de>
Gerrit-CC: Jenkins Builder
Gerrit-Comment-Date: Tue, 11 Jun 2024 00:38:29 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment
neels has uploaded this change for review. ( https://gerrit.osmocom.org/c/osmo-hnbgw/+/37187?usp=email )
Change subject: nft-kpi: log errors of counter retrieval
......................................................................
nft-kpi: log errors of counter retrieval
Make sure errors of getting counters from nft are logged.
Some context: we'll try again each X34 period, hence this is only a
problem when the error persists.
For example, when the get-counters thread is faster than the maintenance
thread can even create the nft table initially, this error will show.
We could add a definite check whether the maintenance thread has created
the tables yet, and wait for that event. But this complexity is not
really needed: it is fine to just fail getting counters once or twice.
Related: SYS#6773
Change-Id: I84340482e4a5bfcac158a21c9378a9511fa5ea10
---
M src/osmo-hnbgw/nft_kpi.c
1 file changed, 30 insertions(+), 1 deletion(-)
git pull ssh://gerrit.osmocom.org:29418/osmo-hnbgw refs/changes/87/37187/1
diff --git a/src/osmo-hnbgw/nft_kpi.c b/src/osmo-hnbgw/nft_kpi.c
index 5f2359b..4fa932f 100644
--- a/src/osmo-hnbgw/nft_kpi.c
+++ b/src/osmo-hnbgw/nft_kpi.c
@@ -990,7 +990,14 @@
break;
case NFT_THREAD_GET_COUNTERS:
- main_thread_handle_get_counters_resp(req);
+ if (req->rc) {
+ /* Maybe we requested counters before the table was created by the maintenance thread. If so,
+ * it's not harmful and we can just try again next time. */
+ LOGP(DNFT, LOGL_ERROR, "Retrieving counters from nftables failed. Trying again (timer X34).\n");
+ } else {
+ main_thread_handle_get_counters_resp(req);
+ }
+ /* Schedule the next counter retrieval. */
nft_kpi_get_counters_schedule();
break;
--
To view, visit https://gerrit.osmocom.org/c/osmo-hnbgw/+/37187?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings
Gerrit-Project: osmo-hnbgw
Gerrit-Branch: master
Gerrit-Change-Id: I84340482e4a5bfcac158a21c9378a9511fa5ea10
Gerrit-Change-Number: 37187
Gerrit-PatchSet: 1
Gerrit-Owner: neels <nhofmeyr(a)sysmocom.de>
Gerrit-MessageType: newchange
neels has uploaded this change for review. ( https://gerrit.osmocom.org/c/osmo-hnbgw/+/37188?usp=email )
Change subject: nft-kpi: remove X34 drifting: adjust delay by elapsed time
......................................................................
nft-kpi: remove X34 drifting: adjust delay by elapsed time
Change-Id: I04f572890c04a48bb19c59f613a492ef96624baa
---
M include/osmocom/hnbgw/hnbgw.h
M src/osmo-hnbgw/nft_kpi.c
2 files changed, 48 insertions(+), 6 deletions(-)
git pull ssh://gerrit.osmocom.org:29418/osmo-hnbgw refs/changes/88/37188/1
diff --git a/include/osmocom/hnbgw/hnbgw.h b/include/osmocom/hnbgw/hnbgw.h
index 36358b1..a647abd 100644
--- a/include/osmocom/hnbgw/hnbgw.h
+++ b/include/osmocom/hnbgw/hnbgw.h
@@ -499,6 +499,7 @@
struct {
bool active;
struct osmo_timer_list get_counters_timer;
+ struct timespec next_timer;
} nft_kpi;
};
diff --git a/src/osmo-hnbgw/nft_kpi.c b/src/osmo-hnbgw/nft_kpi.c
index 4fa932f..aff981f 100644
--- a/src/osmo-hnbgw/nft_kpi.c
+++ b/src/osmo-hnbgw/nft_kpi.c
@@ -690,15 +690,47 @@
/* main thread */
static void nft_kpi_get_counters_schedule(void)
{
- unsigned long period_s;
+ struct timespec now;
+ struct timespec period;
+ struct timespec diff;
+ struct timespec *next = &g_hnbgw->nft_kpi.next_timer;
unsigned long period_us = osmo_tdef_get(hnbgw_T_defs, -34, OSMO_TDEF_US, 1000000);
- if (period_us < 1)
- period_us = 1;
- period_s = period_us / 1000000;
- period_us %= 1000000;
+ period.tv_sec = period_us / 1000000;
+ period.tv_nsec = (period_us % 1000000) * 1000;
+
+ /* Try to keep the period of getting counters close to the configured period, i.e. don't drift by the time it
+ * takes to read counters. */
+ osmo_clock_gettime(CLOCK_MONOTONIC, &now);
+
+ if (!next->tv_sec && !next->tv_nsec) {
+ /* Not yet initialized. Schedule to get counters one 'period' from 'now':
+ * Set 'next' to 'now', and the period is added by timespecadd() below.
+ * (We could retrieve counters immediately -- but at startup counters are then queried even before the
+ * nft table was created by the maintenance thread. That is not harmful, but it causes an ugly error
+ * message in the logs. So rather wait one period.)
+ */
+ *next = now;
+ }
+ timespecadd(next, &period, next);
+ if (timespeccmp(next, &now, <)) {
+ /* The time that has elapsed since last scheduling counter retrieval is already more than the configured
+ * period. Continue counting the time period from 'now', and ask for counters right now. */
+ timespecsub(&now, next, &diff);
+ LOGP(DNFT, LOGL_NOTICE, "nft-kpi: retrieving counters took %ld.%06ld s longer"
+ " than the timeout configured in timer hnbgw X34.\n", diff.tv_sec, diff.tv_nsec / 1000);
+ *next = now;
+ nft_kpi_get_counters_cb(NULL);
+ return;
+ }
+
+ /* next > now, wait for the remaining time. */
+ timespecsub(next, &now, &diff);
+ LOGP(DNFT, LOGL_DEBUG, "nft-kpi: scheduling timer: period is %ld.%06ld s, next occurence in %ld.%06ld s\n",
+ period.tv_sec, period.tv_nsec / 1000,
+ diff.tv_sec, diff.tv_nsec / 1000);
osmo_timer_setup(&g_hnbgw->nft_kpi.get_counters_timer, nft_kpi_get_counters_cb, NULL);
- osmo_timer_schedule(&g_hnbgw->nft_kpi.get_counters_timer, period_s, period_us);
+ osmo_timer_schedule(&g_hnbgw->nft_kpi.get_counters_timer, diff.tv_sec, diff.tv_nsec / 1000);
}
/* from main(), initialize all worker threads and other nft state. */
--
To view, visit https://gerrit.osmocom.org/c/osmo-hnbgw/+/37188?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings
Gerrit-Project: osmo-hnbgw
Gerrit-Branch: master
Gerrit-Change-Id: I04f572890c04a48bb19c59f613a492ef96624baa
Gerrit-Change-Number: 37188
Gerrit-PatchSet: 1
Gerrit-Owner: neels <nhofmeyr(a)sysmocom.de>
Gerrit-MessageType: newchange
neels has submitted this change. ( https://gerrit.osmocom.org/c/osmo-hnbgw/+/37147?usp=email )
Change subject: prevent use-after-free after FSM instance termination
......................................................................
prevent use-after-free after FSM instance termination
- Set osmo_fsm_set_dealloc_ctx(OTC_SELECT) in osmo-hnbgw's main().
- Only dispatch RANAP when FSM instances aren't terminated.
This way we possibly pre-empt use-after-free crashes for deallocating
FSM "nests" for obscure corner cases.
Use-after-free is a general problem for FSM design. For this, we created
osmo_fsm_set_dealloc_ctx(): When an FSM is terminated, move it to a
separate talloc context, instead of being deallocated.
An actual use-after-free was observed as described in OS#6484, but that
needs a separate, orthogonal fix:
When the Iuh link is lost while the CN link is waiting for SCCP CC or
CREF -- the better solution is described in OS#6085: don't wait for CC
at all, just dispatch DISCONN to SCCP-SCOC.
So even though the code where a crash was observed will be removed, this
patch is a general safeguard against corner case crashes, improving
general stability.
Related: OS#6484
Change-Id: Ib41e1a996aaa03221e73643636140947ac8f99e2
---
M src/osmo-hnbgw/context_map_rua.c
M src/osmo-hnbgw/context_map_sccp.c
M src/osmo-hnbgw/osmo_hnbgw_main.c
3 files changed, 42 insertions(+), 0 deletions(-)
Approvals:
fixeria: Looks good to me, but someone else must approve
Jenkins Builder: Verified
laforge: Looks good to me, approved
pespin: Looks good to me, but someone else must approve
diff --git a/src/osmo-hnbgw/context_map_rua.c b/src/osmo-hnbgw/context_map_rua.c
index 3c6ea1e..7286906 100644
--- a/src/osmo-hnbgw/context_map_rua.c
+++ b/src/osmo-hnbgw/context_map_rua.c
@@ -158,6 +158,11 @@
static int handle_rx_rua(struct osmo_fsm_inst *fi, struct msgb *ranap_msg)
{
struct hnbgw_context_map *map = fi->priv;
+
+ /* If the FSM instance has already terminated, don't dispatch anything. */
+ if (fi->proc.terminating)
+ return 0;
+
if (!msg_has_l2_data(ranap_msg))
return 0;
diff --git a/src/osmo-hnbgw/context_map_sccp.c b/src/osmo-hnbgw/context_map_sccp.c
index 10ddaf2..511577c 100644
--- a/src/osmo-hnbgw/context_map_sccp.c
+++ b/src/osmo-hnbgw/context_map_sccp.c
@@ -198,6 +198,10 @@
struct hnbgw_context_map *map = fi->priv;
int rc;
+ /* If the FSM instance has already terminated, don't dispatch anything. */
+ if (fi->proc.terminating)
+ return 0;
+
/* When there was no message received along with the received event, then there is nothing to forward to RUA. */
if (!msg_has_l2_data(ranap_msg))
return 0;
diff --git a/src/osmo-hnbgw/osmo_hnbgw_main.c b/src/osmo-hnbgw/osmo_hnbgw_main.c
index 2f71251..3e715b0 100644
--- a/src/osmo-hnbgw/osmo_hnbgw_main.c
+++ b/src/osmo-hnbgw/osmo_hnbgw_main.c
@@ -360,6 +360,8 @@
signal(SIGUSR2, &signal_handler);
osmo_init_ignore_signals();
+ osmo_fsm_set_dealloc_ctx(OTC_SELECT);
+
while (1) {
rc = osmo_select_main_ctx(0);
if (rc < 0)
--
To view, visit https://gerrit.osmocom.org/c/osmo-hnbgw/+/37147?usp=email
To unsubscribe, or for help writing mail filters, visit https://gerrit.osmocom.org/settings
Gerrit-Project: osmo-hnbgw
Gerrit-Branch: master
Gerrit-Change-Id: Ib41e1a996aaa03221e73643636140947ac8f99e2
Gerrit-Change-Number: 37147
Gerrit-PatchSet: 2
Gerrit-Owner: neels <nhofmeyr(a)sysmocom.de>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy(a)sysmocom.de>
Gerrit-Reviewer: laforge <laforge(a)osmocom.org>
Gerrit-Reviewer: neels <nhofmeyr(a)sysmocom.de>
Gerrit-Reviewer: pespin <pespin(a)sysmocom.de>
Gerrit-MessageType: merged