Limesdr mini on Orange Pi Zero

This is merely a historical archive of years 2008-2021, before the migration to mailman3.

A maintained and still updated list archive can be found at https://lists.osmocom.org/hyperkitty/list/OpenBSC@lists.osmocom.org/.

Gullik Webjorn gullik.webjorn at corevalue.se
Wed Jan 23 20:39:50 UTC 2019


>> Right now I am wondering about fault recovery, i.e. what should the
>> trx do once it has detected missing data? Whatever happens has a low
>> chance of fixing the situation, once triggered, the condition
>> persists. This is also indicated by the fact that the logged "diff"
>> value is the *same* value in subsequent loggings, i.e. the
>> trx does not recover / rewind / adjust timing to get back to normal.
> This is a very "dangerous" area.  In a system like GSM, where there are
> performance figures specified as part of the spec conformance, we should
> be very careful about plastering over bugs like this.
>
> Any system (hardware + software) must be able to handle processing of
> all samples at any given point in time.  If it can't handle this, it
> introduces bit errors which, if they happen frequently/reproducibly,
> will for sure degrade performance of the base station.
>
> So the "right" solution is to find the issue and solve it, not to
> "recover" by simply continuing with increased BER and degraded
> performance.
>
> If the system just magically recovers, I'm afraid people will put this
> into production operation without understanding the gravity of the
> problem, or that there is one at all.
>
I am in violent agreement, but the process did NOT exit, and sometimes
it DID recover, and except for the log messages I would not have seen
the problem but for sporadic outage and other problems.
I was just wondering what the thinking had been on handling this
particular condition, apart from logging a LOG message...

i.e. what is the best thing to do, when the error is detected?

Mind you, I am just starting to learn how the trx is doing it's job, and
thinking of what should happen next when this condition has occured.

Is this something that *could* happen ( without broken hw ) and
is it meaningful to continue to repeat the error??

Perhaps, the "jump" in timestamp has that effect on the "rest" of trx.
What if the timestamp is screwed up on its way from Lime to trx??
>
> I think ftrace with irqsoff, preemptoff, wakeup_rt tracers could be one
> option to debug this further.  If there's a correlation between time
> with irqs/preemption disabled around the time of your "high latency
> bursts", that would be a very clear message.
Debugging, and my confusion will rise to a higher level...
>>> * maybe the polling interval (bInterval) in the endpoint descriptors is
>>>     set too low?
>> Hmm, my crude measurements indicate trx retrieving is cause, not lack of
>> data.
> I'm not sure I understand yet how you reach that conclusion?  It would
> be interesting to get some kind of watermarks of the amount of "used"
> libusb USB transfers inside LimeSuite.  Maybe it's also worth increasing
> them or their size?
Well, possible explanations..

1. The limesdr sometimes fail to deliver a significant amount of packets,
since the time "jumps" a large amount.

2. The trx / hw / linux fails to read packets, causing the lime to be unable
to deliver the data, until trx / hw / linux becomes responsive again.

3. ???

It look like my crude tests shows that the trx can loop and get data at 
170 uSec,
but sometimes does not come back within 100 times that, why?
To me 2. seems most probable....but I will see if I can check in LimeSuite.

Also, tests with Limesdr and *other* applications can give clues....

>>> * maybe there's some other process using too much cpu / pre-empting
>>>     osmo-trx?
>> Yes it looks like that
> What about modifying osmo-trx to simply read and discard the samples,
> rather than processing them.  Do you still get the overruns then?
I'll check....
>>> Your test seem to be looking at the second part. You can use a
>>> CLOCK_MONOTONIC time source to take timestamps, as you indicated.
I modified for MONOTONIC, no obvious change....
>> is tells you how much CPU time a given process has consumed. It is 
> not an absolut/reference clock.  At least my understanding was that you
> wanted to take "absolute" timestamps.  CLOCK_MONOTONIC_RAW is probably
> the best candidate for that.
The fight goes on....
> Regards,
> 	Harald
Regards,
Gullik



More information about the OpenBSC mailing list