Limesdr mini on Orange Pi Zero

historical

On Wed, Jan 23, 2019 at 01:05:58PM +0100, Gullik Webjorn wrote:
> platform, within the last 8 calls time can be as low as 170 uS, i.e. a value
> of roughly 170000. But I also get times up to 44625499, i.e. 4.46 mS,

the question is what is the target/expected value here, i.e. how many samples
at which sample rate do we expect to read in every call, and what's the resulting
interval?

> Right now I am wondering about fault recovery, i.e. what should the
> trx do once it has detected missing data? Whatever happens has a low
> chance of fixing the situation, once triggered, the condition
> persists. This is also indicated by the fact that the logged "diff"
> value is the *same* value in subsequent loggings, i.e. the
> trx does not recover / rewind / adjust timing to get back to normal.

This is a very "dangerous" area.  In a system like GSM, where there are
performance figures specified as part of the spec conformance, we should
be very careful about plastering over bugs like this.

Any system (hardware + software) must be able to handle processing of
all samples at any given point in time.  If it can't handle this, it
introduces bit errors which, if they happen frequently/reproducibly,
will for sure degrade performance of the base station.

So the "right" solution is to find the issue and solve it, not to
"recover" by simply continuing with increased BER and degraded
performance.

If the system just magically recovers, I'm afraid people will put this
into production operation without understanding the gravity of the
problem, or that there is one at all.

>  Are you running the osmo-trx process with real-time priority (SCHED_RR)?
> I tried that with no obvious effect.....

I think ftrace with irqsoff, preemptoff, wakeup_rt tracers could be one
option to debug this further.  If there's a correlation between time
with irqs/preemption disabled around the time of your "high latency
bursts", that would be a very clear message.

> > * maybe the polling interval (bInterval) in the endpoint descriptors is
> >    set too low?
> Hmm, my crude measurements indicate trx retrieving is cause, not lack of
> data.

I'm not sure I understand yet how you reach that conclusion?  It would
be interesting to get some kind of watermarks of the amount of "used"
libusb USB transfers inside LimeSuite.  Maybe it's also worth increasing
them or their size?

> > * maybe there's some other process using too much cpu / pre-empting
> >    osmo-trx?
> Yes it looks like that

What about modifying osmo-trx to simply read and discard the samples,
rather than processing them.  Do you still get the overruns then?

> > Your test seem to be looking at the second part. You can use a
> > CLOCK_MONOTONIC time source to take timestamps, as you indicated.
> I used
> >         clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time);
> Maybe I should refine my test....

This tells you how much CPU time a given process has consumed.  It is
not an absolut/reference clock.  At least my understanding was that you
wanted to take "absolute" timestamps.  CLOCK_MONOTONIC_RAW is probably
the best candidate for that.

Regards,
	Harald
-- 
- Harald Welte <laforge at gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)