This is merely a historical archive of years 2008-2021, before the migration to mailman3.
A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.
Gullik Webjorn gullik.webjorn at corevalue.seOk, some more experiments. I have made a small table that logs Linux time diffs in nanoseconds each time LMSDevice is called. From my first test this indicates that on this particular platform, within the last 8 calls time can be as low as 170 uS, i.e. a value of roughly 170000. But I also get times up to 44625499, i.e. 4.46 mS, and the values in the table can either look like: $2 = {32345376, 28481917, 16771791, 15794875, 16805792, 17252958, 44625499, 33037584} indicating several calls after one another had long times $4 = {198750, 179625, 33702624, 16127416, 27990666, 16007875, 13552168, 16100124} or the latest and 2'nd latest are low, but follow a sequence of long times. Thus, we are not dealing with a single interruption of short latency, but an extended period of long latency / interference. Once the condition occurs, I get 100's of logs of time mismatch, so, it does not recover. Right now I am wondering about fault recovery, i.e. what should the trx do once it has detected missing data? Whatever happens has a low chance of fixing the situation, once triggered, the condition persists. This is also indicated by the fact that the logged "diff" value is the *same* value in subsequent loggings, i.e. the trx does not recover / rewind / adjust timing to get back to normal. Are you running the osmo-trx process with real-time priority (SCHED_RR)? I tried that with no obvious effect..... > What is the CPU load? Please note on a multi-core system the > interesting bit is not the average load over all CPU cores, but the > maximum load of any one of the cores. "Normal" load is trx process taking 80 - 100 % out of 4 cpus, i.e. htop shows 4 cpus each with 20-25% load. trx seems to spread its threads over all cpus. > Correct. This is a problem we've been observing on a variety of > platforms for quite some time. Some samples are lost. > > * maybe the polling interval (bInterval) in the endpoint descriptors is > set too low? Hmm, my crude measurements indicate trx retrieving is cause, not lack of data. > * maybe the number / size of bulk-in USB transfers (URBs) is > insufficient and/or thery are not re-submitted fast enough. > * maybe there's some other process using too much cpu / pre-empting > osmo-trx? Yes it looks like that > Your test seem to be looking at the second part. You can use a > CLOCK_MONOTONIC time source to take timestamps, as you indicated. I used > clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time); Maybe I should refine my test.... Thanx for your comments, Gullik