fixeria has uploaded this change for review.

View Change

osmo-bts-trx: fix spurious shutdown on first CLCK.ind from osmo-trx

osmo-trx starts its frame counter from a random value rather than 0.
When the first CLCK.ind arrives, last_fn_timer and last_clk_ind are
still zero-initialised (set by trx_sched_clock_started()), so:

* compute_elapsed_fn(0, fn) wraps to a large negative for any fn
greater than hyperframe/2 (1357824), satisfying elapsed_fn < 0;
* compute_elapsed_us({0,0}, &tv_now) returns the full CLOCK_MONOTONIC
uptime (potentially days), satisfying the error_us threshold.

Together these trip the stale-clock shutdown introduced in the previous
commit (0199c108), even though the transceiver is perfectly healthy:

DL1C NOTICE scheduler_trx.c:490 GSM clock started, waiting for clock indications
DL1C FATAL scheduler_trx.c:589 Stale CLCK.ind: fn=1456348 is 250957770198 us behind
DOML NOTICE bts_shutdown_fsm.c:268 BTS_SHUTDOWN(bts0){NONE}: Shutting down BTS, exit 1, reason: TRX clock skew too high

Fix by adding clk_ind_received to osmo_trx_clock_state. On the first
CLCK.ind after a (re)start, skip all elapsed-time checks and directly
bootstrap the scheduler from the reported FN. The stale-clock
detection remains fully active for every subsequent indication,
where last_clk_ind holds a real baseline.

Change-Id: I25e76e02d29fd8f88130d15d0adfe8d90a017924
Fixes: 0199c108 ("osmo-bts-trx: shut down on stale clock indication from transceiver")
Related: OS#7021
---
M src/osmo-bts-trx/l1_if.h
M src/osmo-bts-trx/scheduler_trx.c
2 files changed, 15 insertions(+), 0 deletions(-)

git pull ssh://gerrit.osmocom.org:29418/osmo-bts refs/changes/71/42871/1
diff --git a/src/osmo-bts-trx/l1_if.h b/src/osmo-bts-trx/l1_if.h
index 09bf7ac..d0a1347 100644
--- a/src/osmo-bts-trx/l1_if.h
+++ b/src/osmo-bts-trx/l1_if.h
@@ -39,6 +39,8 @@
struct osmo_trx_clock_state {
/*! number of FN periods without TRX clock indication */
uint32_t fn_without_clock_ind;
+ /*! set to true once the first clock indication has been received */
+ bool clk_ind_received;
struct {
/*! last FN we processed based on FN period timer */
uint32_t fn;
diff --git a/src/osmo-bts-trx/scheduler_trx.c b/src/osmo-bts-trx/scheduler_trx.c
index 105ad38..a4585ae 100644
--- a/src/osmo-bts-trx/scheduler_trx.c
+++ b/src/osmo-bts-trx/scheduler_trx.c
@@ -545,6 +545,19 @@

clock_gettime(CLOCK_MONOTONIC, &tv_now);

+ /* First clock indication after (re)start: bootstrap the clock from whatever
+ * FN the transceiver reports. osmo-trx starts its frame counter from a
+ * random value, so elapsed_fn and elapsed_us comparisons against the
+ * zero-initialised last_fn_timer / last_clk_ind would be nonsensical and
+ * would incorrectly trigger the stale-clock shutdown path. */
+ if (!tcs->clk_ind_received) {
+ LOGP(DL1C, LOGL_NOTICE, "GSM clock started: first CLCK.ind fn=%u\n", fn);
+ tcs->last_clk_ind.tv = tv_now;
+ tcs->last_clk_ind.fn = fn;
+ tcs->clk_ind_received = true;
+ return trx_setup_clock(bts, tcs, &tv_now, &interval, fn);
+ }
+
/* calculate elapsed time +fn since last timer */
elapsed_us = compute_elapsed_us(&tcs->last_fn_timer.tv, &tv_now);
elapsed_fn = compute_elapsed_fn(tcs->last_fn_timer.fn, fn);

To view, visit change 42871. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-MessageType: newchange
Gerrit-Project: osmo-bts
Gerrit-Branch: master
Gerrit-Change-Id: I25e76e02d29fd8f88130d15d0adfe8d90a017924
Gerrit-Change-Number: 42871
Gerrit-PatchSet: 1
Gerrit-Owner: fixeria <vyanitskiy@sysmocom.de>