fixeria submitted this change.

View Change

Approvals: laforge: Looks good to me, but someone else must approve fixeria: Looks good to me, approved Jenkins Builder: Verified lynxis lazus: Looks good to me, but someone else must approve pespin: Looks good to me, but someone else must approve
trx_toolkit/transceiver: Do not forward nor log from under tx_queue_lock

Even though for 1 BTS + 1 MS fake_trx works ok with tx-queuing, when I
try to run two ccch_scan's with 1 BTS fake_trx starts occupy ~ 100% of
CPU and emits lots of "Stale ..." messages:

[WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2791 tn=7 pwr=0
[WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=0 pwr=0
[WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=1 pwr=0
[WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=2 pwr=0
[WARNING] transceiver.py:317 (M2@127.0.0.1:7700) Stale TRXD message (fn=2793): fn=2792 tn=3 pwr=0
...

Inspecting a bit with a profiler showed that fake_trx simply cannot keep
up with the load.

Let's try to fix this with optimizing things a bit where it is easy to
notice and easy to pick up low-hanging fruits.

This is the first patch in that optimization series. It moves blocking
calls from out of under tx_queue_lock on transmit path. The reason for
this move is not to block receive path while the transmit path is busy
more than necessary. I originally noticed tx_queue_lock.acquire being
visible in profile of the rx thread which indicates that tx/rx
contention on this lock can really happen if we do non-negligible tasks
from under this lock. Here, in particular, it was forward_msg that was
preparing and actually sending RxMsg to destination. tx_queue_lock is
needed only to protect tx_queue itself and synchronize rx and tx threads
access to it. Once necessary items are appended or popped, we can do
everything else out of this lock.

-> Move everything on the tx codepath, not actually needing access to
tx_queue out of this lock:

- only collect messages to be sent under the lock; actually forward them
after releasing the log;
- same for logging.

Change-Id: I7d10c972c45b2b5765e7c3a28f8646508b3c8a82
---
M src/target/trx_toolkit/transceiver.py
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/target/trx_toolkit/transceiver.py b/src/target/trx_toolkit/transceiver.py
index 4e1cb87..7297abf 100644
--- a/src/target/trx_toolkit/transceiver.py
+++ b/src/target/trx_toolkit/transceiver.py
@@ -308,14 +308,23 @@
if not self.running:
return

+ drop = []
+ emit = []
+
self._tx_queue_lock.acquire()

for msg in self._tx_queue:
- if msg.fn == fn:
- fwd.forward_msg(self, msg)
- elif msg.fn < fn:
- log.warning("(%s) Stale TRXD message (fn=%u): %s"
- % (self, fn, msg.desc_hdr()))
+ if msg.fn < fn:
+ drop.append(msg)
+ elif msg.fn == fn:
+ emit.append(msg)

self._tx_queue = [msg for msg in self._tx_queue if msg.fn > fn]
self._tx_queue_lock.release()
+
+ for msg in emit:
+ fwd.forward_msg(self, msg)
+
+ for msg in drop:
+ log.warning("(%s) Stale TRXD message (fn=%u): %s"
+ % (self, fn, msg.desc_hdr()))

To view, visit change 39533. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-MessageType: merged
Gerrit-Project: osmocom-bb
Gerrit-Branch: master
Gerrit-Change-Id: I7d10c972c45b2b5765e7c3a28f8646508b3c8a82
Gerrit-Change-Number: 39533
Gerrit-PatchSet: 1
Gerrit-Owner: kirr <kirr@nexedi.com>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: fixeria <vyanitskiy@sysmocom.de>
Gerrit-Reviewer: laforge <laforge@osmocom.org>
Gerrit-Reviewer: lynxis lazus <lynxis@fe80.eu>
Gerrit-Reviewer: pespin <pespin@sysmocom.de>
Gerrit-CC: osmith <osmith@sysmocom.de>