This is merely a historical archive of years 2008-2021, before the migration to mailman3.
A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.
Harald Welte laforge at gnumonks.orgOn Wed, Jan 23, 2019 at 01:05:58PM +0100, Gullik Webjorn wrote: > platform, within the last 8 calls time can be as low as 170 uS, i.e. a value > of roughly 170000. But I also get times up to 44625499, i.e. 4.46 mS, the question is what is the target/expected value here, i.e. how many samples at which sample rate do we expect to read in every call, and what's the resulting interval? > Right now I am wondering about fault recovery, i.e. what should the > trx do once it has detected missing data? Whatever happens has a low > chance of fixing the situation, once triggered, the condition > persists. This is also indicated by the fact that the logged "diff" > value is the *same* value in subsequent loggings, i.e. the > trx does not recover / rewind / adjust timing to get back to normal. This is a very "dangerous" area. In a system like GSM, where there are performance figures specified as part of the spec conformance, we should be very careful about plastering over bugs like this. Any system (hardware + software) must be able to handle processing of all samples at any given point in time. If it can't handle this, it introduces bit errors which, if they happen frequently/reproducibly, will for sure degrade performance of the base station. So the "right" solution is to find the issue and solve it, not to "recover" by simply continuing with increased BER and degraded performance. If the system just magically recovers, I'm afraid people will put this into production operation without understanding the gravity of the problem, or that there is one at all. > Are you running the osmo-trx process with real-time priority (SCHED_RR)? > I tried that with no obvious effect..... I think ftrace with irqsoff, preemptoff, wakeup_rt tracers could be one option to debug this further. If there's a correlation between time with irqs/preemption disabled around the time of your "high latency bursts", that would be a very clear message. > > * maybe the polling interval (bInterval) in the endpoint descriptors is > > set too low? > Hmm, my crude measurements indicate trx retrieving is cause, not lack of > data. I'm not sure I understand yet how you reach that conclusion? It would be interesting to get some kind of watermarks of the amount of "used" libusb USB transfers inside LimeSuite. Maybe it's also worth increasing them or their size? > > * maybe there's some other process using too much cpu / pre-empting > > osmo-trx? > Yes it looks like that What about modifying osmo-trx to simply read and discard the samples, rather than processing them. Do you still get the overruns then? > > Your test seem to be looking at the second part. You can use a > > CLOCK_MONOTONIC time source to take timestamps, as you indicated. > I used > > clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time); > Maybe I should refine my test.... This tells you how much CPU time a given process has consumed. It is not an absolut/reference clock. At least my understanding was that you wanted to take "absolute" timestamps. CLOCK_MONOTONIC_RAW is probably the best candidate for that. Regards, Harald -- - Harald Welte <laforge at gnumonks.org> http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)