Attention is currently required from: lynxis lazus.
1 comment:
File src/target/trx_toolkit/transceiver.py:
Patch Set #1, Line 315: self._tx_queue_lock.acquire()
Thanks for feedback.
The strange thing is, with %timedif what you've used with ipython, it is much slower.
When you are measuring with cProfile, what actually happens is that python execution is instrumented and the profiler gets to run every Nth instruction. Measuring overall time with profiler thus measures significantly adjusted code and the profiler usage can affect the result significantly if it is used for microbenchmark link this.
When you are measuring with %timeit of IPython, the code only gets executed for many times without any instrumentation. It thus reflects that actual execution time that we get during regular usage more closely.
Here is indirect proof that profiling is instrumenting the code when running it:
```diff
diff --git a/bench.py.orig b/bench.py
index 4cce94c..92f44ee 100644
--- a/bench.py.orig
+++ b/bench.py
@@ -46,6 +46,7 @@ def testrun(size):
lcomp = cProfile.Profile(timeunit=0.000_000_001)
lcomp.run('do_diff(testvec)')
lcomp_stat = lcomp.getstats()
+ lcomp.print_stats()
forloop = cProfile.Profile(timeunit=0.000_000_001)
forloop.run('do_forloop(testvec)')
forloop_stat = forloop.getstats()
@@ -54,6 +55,6 @@ def testrun(size):
print(f"{size}: forloop {forloop_stat[0].totaltime}")
testrun(1024)
-testrun(100)
-testrun(10)
-testrun(1)
+#testrun(100)
+#testrun(10)
+#testrun(1)
```
```console
(osmo.venv) kirr@deca:~/src/osmocom/t$ python bench.py
7 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 bench.py:20(do_diff)
1 0.000 0.000 0.000 0.000 bench.py:21(<listcomp>)
1 0.000 0.000 0.000 0.000 bench.py:22(<listcomp>)
1 0.000 0.000 0.000 0.000 bench.py:23(<listcomp>)
1 0.000 0.000 0.000 0.000 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1024: lcomp 0.000209867
1024: forloop 0.00011744300000000001
```
the `.getstats()` low-level method, that you used in measurements, is exactly the same method that is used by the profiler to build statistics out of lsprof samples:
https://github.com/python/cpython/blob/v3.11.9-9-g1b0e63c81b5/Lib/cProfile.py#L55-L56
https://github.com/python/cpython/blob/v3.11.9-9-g1b0e63c81b5/Lib/cProfile.py#L55-L88
So in my view the benchmarks speak for the "forloop" variant to be better.
--------
Besides that I would not go to microoptimize the code on python level, because in my view here the actual difference in between both versions is not that high and it does not contribute so much compared to other changes, like not doing tx work from under lock, and not doing useless copies. From that point of view the "forloop" variant also looks better because it is more logical to scan the queue only once extracting all the things at once.
By the way it actually turned out that this part of the code does not occupy lot of time, at least not with 4 trx. And if it will start to occupy lots of time it is better to change tx_queue to priority queue instead of still linearly scanning it all the time.
To view, visit change 39534. To unsubscribe, or for help writing mail filters, visit settings.