fixeria has submitted this change. ( https://gerrit.osmocom.org/c/osmocom-bb/+/39533?usp=email )
Change subject: trx_toolkit/transceiver: Do not forward nor log from under tx_queue_lock ......................................................................
trx_toolkit/transceiver: Do not forward nor log from under tx_queue_lock
Even though for 1 BTS + 1 MS fake_trx works ok with tx-queuing, when I try to run two ccch_scan's with 1 BTS fake_trx starts occupy ~ 100% of CPU and emits lots of "Stale ..." messages:
[WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2791 tn=7 pwr=0 [WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=0 pwr=0 [WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=1 pwr=0 [WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=2 pwr=0 [WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=3 pwr=0 ...
Inspecting a bit with a profiler showed that fake_trx simply cannot keep up with the load.
Let's try to fix this with optimizing things a bit where it is easy to notice and easy to pick up low-hanging fruits.
This is the first patch in that optimization series. It moves blocking calls from out of under tx_queue_lock on transmit path. The reason for this move is not to block receive path while the transmit path is busy more than necessary. I originally noticed tx_queue_lock.acquire being visible in profile of the rx thread which indicates that tx/rx contention on this lock can really happen if we do non-negligible tasks from under this lock. Here, in particular, it was forward_msg that was preparing and actually sending RxMsg to destination. tx_queue_lock is needed only to protect tx_queue itself and synchronize rx and tx threads access to it. Once necessary items are appended or popped, we can do everything else out of this lock.
-> Move everything on the tx codepath, not actually needing access to tx_queue out of this lock:
- only collect messages to be sent under the lock; actually forward them after releasing the log; - same for logging.
Change-Id: I7d10c972c45b2b5765e7c3a28f8646508b3c8a82 --- M src/target/trx_toolkit/transceiver.py 1 file changed, 14 insertions(+), 5 deletions(-)
Approvals: laforge: Looks good to me, but someone else must approve fixeria: Looks good to me, approved Jenkins Builder: Verified lynxis lazus: Looks good to me, but someone else must approve pespin: Looks good to me, but someone else must approve
diff --git a/src/target/trx_toolkit/transceiver.py b/src/target/trx_toolkit/transceiver.py index 4e1cb87..7297abf 100644 --- a/src/target/trx_toolkit/transceiver.py +++ b/src/target/trx_toolkit/transceiver.py @@ -308,14 +308,23 @@ if not self.running: return
+ drop = [] + emit = [] + self._tx_queue_lock.acquire()
for msg in self._tx_queue: - if msg.fn == fn: - fwd.forward_msg(self, msg) - elif msg.fn < fn: - log.warning("(%s) Stale TRXD message (fn=%u): %s" - % (self, fn, msg.desc_hdr())) + if msg.fn < fn: + drop.append(msg) + elif msg.fn == fn: + emit.append(msg)
self._tx_queue = [msg for msg in self._tx_queue if msg.fn > fn] self._tx_queue_lock.release() + + for msg in emit: + fwd.forward_msg(self, msg) + + for msg in drop: + log.warning("(%s) Stale TRXD message (fn=%u): %s" + % (self, fn, msg.desc_hdr()))