fixeria has uploaded this change for review. ( https://gerrit.osmocom.org/c/osmo-bts/+/42871?usp=email )
Change subject: osmo-bts-trx: fix spurious shutdown on first CLCK.ind from osmo-trx ......................................................................
osmo-bts-trx: fix spurious shutdown on first CLCK.ind from osmo-trx
osmo-trx starts its frame counter from a random value rather than 0. When the first CLCK.ind arrives, last_fn_timer and last_clk_ind are still zero-initialised (set by trx_sched_clock_started()), so:
* compute_elapsed_fn(0, fn) wraps to a large negative for any fn greater than hyperframe/2 (1357824), satisfying elapsed_fn < 0; * compute_elapsed_us({0,0}, &tv_now) returns the full CLOCK_MONOTONIC uptime (potentially days), satisfying the error_us threshold.
Together these trip the stale-clock shutdown introduced in the previous commit (0199c108), even though the transceiver is perfectly healthy:
DL1C NOTICE scheduler_trx.c:490 GSM clock started, waiting for clock indications DL1C FATAL scheduler_trx.c:589 Stale CLCK.ind: fn=1456348 is 250957770198 us behind DOML NOTICE bts_shutdown_fsm.c:268 BTS_SHUTDOWN(bts0){NONE}: Shutting down BTS, exit 1, reason: TRX clock skew too high
Fix by adding clk_ind_received to osmo_trx_clock_state. On the first CLCK.ind after a (re)start, skip all elapsed-time checks and directly bootstrap the scheduler from the reported FN. The stale-clock detection remains fully active for every subsequent indication, where last_clk_ind holds a real baseline.
Change-Id: I25e76e02d29fd8f88130d15d0adfe8d90a017924 Fixes: 0199c108 ("osmo-bts-trx: shut down on stale clock indication from transceiver") Related: OS#7021 --- M src/osmo-bts-trx/l1_if.h M src/osmo-bts-trx/scheduler_trx.c 2 files changed, 15 insertions(+), 0 deletions(-)
git pull ssh://gerrit.osmocom.org:29418/osmo-bts refs/changes/71/42871/1
diff --git a/src/osmo-bts-trx/l1_if.h b/src/osmo-bts-trx/l1_if.h index 09bf7ac..d0a1347 100644 --- a/src/osmo-bts-trx/l1_if.h +++ b/src/osmo-bts-trx/l1_if.h @@ -39,6 +39,8 @@ struct osmo_trx_clock_state { /*! number of FN periods without TRX clock indication */ uint32_t fn_without_clock_ind; + /*! set to true once the first clock indication has been received */ + bool clk_ind_received; struct { /*! last FN we processed based on FN period timer */ uint32_t fn; diff --git a/src/osmo-bts-trx/scheduler_trx.c b/src/osmo-bts-trx/scheduler_trx.c index 105ad38..a4585ae 100644 --- a/src/osmo-bts-trx/scheduler_trx.c +++ b/src/osmo-bts-trx/scheduler_trx.c @@ -545,6 +545,19 @@
clock_gettime(CLOCK_MONOTONIC, &tv_now);
+ /* First clock indication after (re)start: bootstrap the clock from whatever + * FN the transceiver reports. osmo-trx starts its frame counter from a + * random value, so elapsed_fn and elapsed_us comparisons against the + * zero-initialised last_fn_timer / last_clk_ind would be nonsensical and + * would incorrectly trigger the stale-clock shutdown path. */ + if (!tcs->clk_ind_received) { + LOGP(DL1C, LOGL_NOTICE, "GSM clock started: first CLCK.ind fn=%u\n", fn); + tcs->last_clk_ind.tv = tv_now; + tcs->last_clk_ind.fn = fn; + tcs->clk_ind_received = true; + return trx_setup_clock(bts, tcs, &tv_now, &interval, fn); + } + /* calculate elapsed time +fn since last timer */ elapsed_us = compute_elapsed_us(&tcs->last_fn_timer.tv, &tv_now); elapsed_fn = compute_elapsed_fn(tcs->last_fn_timer.fn, fn);