trx_toolkit/clck_gen: Don't use threads because Python GIL is latency killer

Currently running fake_trx.py with just 1 BTS and 1 ccch_scan shows a
bit high CPU usage on 2 cores and regular time overrun in CLCKgen:

    1 BTS + 1 ccch_scan:

        2 threads: first ~ 85-90% CPU, second ~ 50% CPU; regular time overrun in CLCKgen

Trying to run fake_trx.py with 1 BTS and 2 ccch_scan shows even more CPU
usage and non-stop CLCKGen time overruns:

    1 BTS + 2 ccch_scan:

        2 threads: first ~ 100% CPU, second ~ 70% CPU; time overrun all the time

        with BTS also emitting something like the following all the time:

        20250416102139774 <0006> scheduler_trx.c:591 We were 42 FN faster than TRX, compensating
        20250416102140358 <0006> scheduler_trx.c:591 We were 24 FN faster than TRX, compensating

Profiling fake_trx.py for 1 BTS + 1 ccch_scan case via perf (see
https://docs.python.org/3/howto/perf_profiling.html) as

    $ perf record -g python -X perf fake_trx.py --trx M2@127.0.0.1:7700 --trx M3@127.0.0.1:8700

shows that besides in-kernel work related to send/recv/select syscalls
the system is also releasing/reacquiring GIL frequently

    http://navytux.spb.ru/~kirr/osmo/fake_trx/gil-pingpong.html  (search for "gil" and "Thread" there)

Which is understandable because in current architecture there are 2
threads:

    - T1 is running CLCKGen loop + TX work
    - T2 is running FakeTRX loop + RX work

Unfortunately, even though the functions that take/release the GIL are
not very huge themselves in the profile, still the GIL creates a huge
problem about latency, and fake_trx problems of having CLCKGen time
overrun are directly related to latency, because it means that CLCKGen
could not be woken up in time and missed 2 GSM frames to forward queued
burst timely.

The GIL creates latency problems because of the following: in py2 the
GIL was working like this: the main interpreter evaluator was checking
whether there is a GIL taking request every 100 instructions, and if
there was such a request from another python thread, it was making the
switch. However in py3 they did change the way how GIL works and
instead of switching every 100 instructions, the GIL was made to
preemptively switch every 5 _milliseconds_ to avoid unnecessary
switches. This helped throughput of computation-heavy workloads, but
harmed the workloads that are latency sensitive.

Please see the following presentation by David Beazley with overview on
how GIL works on py2 and py3:

    https://speakerdeck.com/dabeaz/understanding-the-python-gil
    https://dabeaz.com/GIL/

That presentation actually mentions the problem of GIL latency on py3
and further explains it on

    https://www.dabeaz.com/blog/2010/02/revisiting-thread-priorities-and-new.html

that problem was further filed to Python issue tracker at the same 2010 in

    https://bugs.python.org/issue7946

but was never addressed despite several propositional patches. In the
end it was canceled by the original reported ten years later

    https://bugs.python.org/issue7946#msg377865

So despite the many years of this problem being known and not addressed
despite several patches proposed, there is, practically, no hope that it
will be addressed at all. And what happens in fake_trx case is that
CLCKGen thread can be prevented from timely woken up if main loop thread
is e.g. busy in decoding received bursts. That's why we see lots of time
overruns, because even due to the Python GIL only we can be practically
wasting 5 milliseconds. Not to mention that there are more problems
related to threads and frequently switching in between them, because of
e.g. 2 cores need to ping pong data in between their caches and similar
things.

-> cancel all that threading to avoid latency hits and organize CLCKGen
   timer to be hooked into the sole IO loop by the way of timerfd system call.

This help latency and CPU usage significantly. After this patch:

    1 BTS + 1 ccch_scan:

      1 thread ~ 55% CPU; no CLCKGen time overruns

    1 BTS + 2 ccch_scan:

      1 thread ~ 65% CPU; seldom CLCKGen time overruns

Unfortunately os.timerfd_create() & friends are only available starting
from Python 3.13 . If this poses a problem it can be easily solved by
doing those timerfd related system calls in Cython, after we switch most
of the codebase to Cython later.

Change-Id: Iaa675c95059ec8ccfad667f69984d5a7f608c249
---
M src/target/trx_toolkit/.gitignore
M src/target/trx_toolkit/clck_gen.py
M src/target/trx_toolkit/fake_trx.py
M src/target/trx_toolkit/test_clck_gen.py
4 files changed, 63 insertions(+), 65 deletions(-)