fixeria submitted this change.
trx_toolkit/clck_gen.py: Fix clock generator not to accumulate timing error
CLCKGen currently works as follows:
sleep(ctr_interval)
some work
sleep(ctr_interval)
some work
sleep(ctr_interval)
some work
...
The intent here is to do some work at timestamps that are multiple of ctr_interval,
however the implementation does not match the intent, because
1) sleep(ctr_interval) is not guaranteed by the OS to be ideal, so there
will always be some jitter in actually slept time without any
guarantee that the error will fluctuate over zero without accumulating.
2) "some work" takes some time to run and that time adds again and again
to the current time when next sleep(ctr_interval) starts. As the
result even if sleep implementation would be ideal, then n'th sleep
would start not at
t₀ + n·ctr_interval
but instead at
t₀ + n·ctr_interval + Σ1..n t(work_i)
where trailing Σ term adds over and over as the timing error which can
be seen as e.g. increasing trend of received GSM clock jitter in
https://osmocom.org/issues/4658#note-10 .
The thinko in the clock generator logic is not so much visible if "some
work" takes only a bit of time or is done infrequently. That was
actually the case before fake_trx added tx queueing in 6e1c82d2
(trx_toolkit/transceiver.py: implement the transmit burst queue) because
before that commit some work was only "send IND CLOCK data every ~ 100th
tick". However after 6e1c82d2 the work was adjusted to do linear scan of
tx queue over and over at every tick which amplified error accumulation
and highlighted the problem.
With that tx queuing in fake_trx was disabled in d4ed09df (Revert
"trx_toolkit/transceiver.py: implement the transmit burst queue") with
the rationale being most likely, as https://osmocom.org/issues/4658#note-10 says,
Unfortunately, Python is not fast enough to handle the queues in time.
Despite the relatively low CPU usage, fake_trx.py fails to scheduler
everything during one TDMA frame period. This causes some of our TTCN-3
test cases to fail.
...
Most likely, the problem is that Python's threading.Event is not
accurate enough. Running with SCHED_RR does not change anything.
However with the above analysis we can see that it is the logic in
CLCKgen that needs fixing, not threading.Event . For the reference
threading.Event indeed used dumb timeout implementation on Python2:
https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Lib/threading.py#L597-L615
https://github.com/python/cpython/blob/2.7-0-g8d21aa21f2c/Lib/threading.py#L343-L369
but on Python3 it essentially uses plain Lock.acquire(timeout) which,
under the hood, uses PyThread_acquire_lock_timed - a plain wrapper over
sem_timedwait:
https://github.com/python/cpython/blob/v3.11.9-9-g1b0e63c81b5/Lib/threading.py#L330-L331
https://github.com/python/cpython/blob/v3.11.9-9-g1b0e63c81b5/Modules/_threadmodule.c#L75-L100
https://github.com/python/cpython/blob/v3.11.9-9-g1b0e63c81b5/Python/thread_pthread.h#L480-L491
so at least with py3 there should be no question about threading.Event .
-> Fix timing error accumulation by reworking the clock generator loop
to compensate observed jitter, caused by OS noise and the work
taking time, by adjusting to-sleep δt each tick accordingly.
This is generally good for correctness and will allow us to reinstate
tx queueing in fake_trx.
Without the fix added test fails as
FAIL: test_no_timing_error_accumulated (test_clck_gen.CLCKGen_Test.test_no_timing_error_accumulated)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kirr/src/osmocom/bb/src/target/trx_toolkit/test_clck_gen.py", line 60, in test_no_timing_error_accumulated
self.assertTrue((ntick+1)*clck.ctr_interval > δT, "tick #%d: time overrun by %dµs total" %
AssertionError: False is not true : tick #200: time overrun by 572478µs total
Change-Id: I928801422c9af80c368261f617b91d7ecfedbabf
Related: OS#4658, OS#6672
---
M src/target/trx_toolkit/clck_gen.py
A src/target/trx_toolkit/test_clck_gen.py
2 files changed, 87 insertions(+), 1 deletion(-)
diff --git a/src/target/trx_toolkit/clck_gen.py b/src/target/trx_toolkit/clck_gen.py
index 427eb88..15b653f 100755
--- a/src/target/trx_toolkit/clck_gen.py
+++ b/src/target/trx_toolkit/clck_gen.py
@@ -22,6 +22,7 @@
import logging as log
import threading
+import time
import signal
from app_common import ApplicationBase
@@ -85,7 +86,24 @@
self._breaker.clear()
def _worker(self):
- while not self._breaker.wait(self.ctr_interval):
+ # run .send_clck_ind() every .ctr_interval
+ # be careful not to accumulate timing error when organizing the clock loop
+ ns = 1e-9
+ us = 1e-6
+ t_tick = int(self.ctr_interval // ns)
+ t_next = time.monotonic_ns()
+ while 1:
+ t_next += t_tick
+ t = time.monotonic_ns()
+ dt = (t_next - t)
+ if dt < 0:
+ log.warning("CLCKGen: time overrun by %dus; resetting the clock" % (dt * ns // us))
+ t_next = time.monotonic_ns()
+ dt = 0
+
+ if self._breaker.wait(dt * ns):
+ break
+
self.send_clck_ind()
def send_clck_ind(self):
diff --git a/src/target/trx_toolkit/test_clck_gen.py b/src/target/trx_toolkit/test_clck_gen.py
new file mode 100644
index 0000000..a920ce1
--- /dev/null
+++ b/src/target/trx_toolkit/test_clck_gen.py
@@ -0,0 +1,68 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+# TRX Toolkit
+# Unit test for CLCKGen
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+import threading
+import time
+import unittest
+
+import clck_gen
+
+
+class CLCKGen_Test(unittest.TestCase):
+
+ # verify that timing error is not accumulated in the clock gen loop
+ def test_no_timing_error_accumulated(self):
+ # observe many ticks with spending some noticeable time in clock handler each tick
+ # assert that t(last) - t(first) ≈ ntick·dt(tick)
+ # this will break if the clock generator is not careful to prevent timing error from accumulating
+ clck = clck_gen.CLCKGen([])
+ ntick = 200 # ~ 1 s
+ tick = 0
+ tstart = tend = None
+ done = threading.Event()
+
+ def _(fn):
+ nonlocal tick, tstart, tend
+ if tick == 0:
+ tstart = time.monotonic()
+ if tick == ntick:
+ tend = time.monotonic()
+ done.set()
+ tick += 1
+ time.sleep(clck.ctr_interval / 2)
+ clck.clck_handler = _
+
+ clck.start()
+ try:
+ ok = done.wait(10)
+ self.assertTrue(ok, "clck_gen stuck")
+
+ self.assertIsNotNone(tstart)
+ self.assertIsNotNone(tend)
+ dT = tend - tstart
+ dTok = ntick * clck.ctr_interval
+ dTerr = dT - dTok
+ self.assertTrue((ntick-1)*clck.ctr_interval < dT, "tick #%d: time underrun by %dus total" %
+ (ntick, dTerr // 1e-6))
+ self.assertTrue((ntick+1)*clck.ctr_interval > dT, "tick #%d: time overrun by %dus total" %
+ (ntick, dTerr // 1e-6))
+
+ finally:
+ clck.stop()
+
+
+if __name__ == '__main__':
+ unittest.main()
To view, visit change 39327. To unsubscribe, or for help writing mail filters, visit settings.