Limesdr mini on Orange Pi Zero

historical

Hi Gullik,

On Wed, Jan 23, 2019 at 12:21:49AM +0100, Gullik Webjorn wrote:
> I have followed your suggestions, and rebuilt --with-neon-vfpv4 , and I have
> enabled debugging. 

Are you running the osmo-trx process with real-time priority (SCHED_RR)?
What is the CPU load?  Please note on a multi-core system the
interesting bit is not the average load over all CPU cores, but the
maximum load of any one of the cores.

> The length seems always correct, since I do not get that log entry,
> but rather that the time has "slipped", i.e.  the LMS api has not
> delivered anything for "diff" time, or the timestamp received has
> "jumped" in the Lime.
> [...]
> This indicates to me that this specific arm cpu in combination with limesdr
> mini and the software "drops data". I will gladly try to debug or narrow
> this down, but ask for some suggestions on how to proceed.

Correct. This is a problem we've been observing on a variety of
platforms for quite some time.  Some samples are lost.

* maybe the polling interval (bInterval) in the endpoint descriptors is
  set too low?
* maybe the number / size of bulk-in USB transfers (URBs) is
  insufficient and/or thery are not re-submitted fast enough.
* maybe there's some other process using too much cpu / pre-empting
  osmo-trx?
* maybe there's some [buggy?] driver used on this system that
  disables/masks interrupts or otherwise causes high scheduler
  latencies, by disabling pre-emption or the like?
* maybe there's some bios/firmware/management-mode code that can
  interrupt normal OS processing without the OS even knowing about it.
* maybe there's some power management (cpu speed throttling, thermal throttling, ...)
  interfering?

I'm not familiar with the inner workings of LimeSuite, but any program
that would expect to achieve high performance on libusb should (IMHO) be
using the asynchronous API of libusb, and it should make sure there are
always multiple URBs submitted at any given point in time, so that the
kernel can handle data from the USB device without interruption.

IF I read LimeSuite correctly, they are submitting 16 URBs for read
(USB_MAX_CONTEXTS returned by GetBuffersCount() used by
ReceivePacketsLoop() which in turn calls the somewhat interestingly
named method dataPort->BeginDataReading() for each of the buffers.

https://elinux.org/images/c/c8/Debugging_Methodologies_for_Realtime_Issues_in_Linux_Systems.pdf
is a good introduction, it may be a bit dated.

> One thought is to save a timestamp each time through readSamples and
> compare to some constant to determine that the problem is NOT that we
> are unable to read as fast as required. reading 2500 samples would
> take 2.3 mS if I understand this, so we need to cal readsamples at about this rate....

The data can be lost (at least)

* between the USB device and the USB host, if the bus is overloaded or
  somehow the kernel / hardware cannot handle/schedule transfers fast enough, or
* between kernel and userspace

Your test seem to be looking at the second part.  You can use a
CLOCK_MONOTONIC time source to take timestamps, as you indicated.

> Possible causes could be something *else* locking out the program.

yes.  That's why in the normal mode of operation you usuall start
with running osmo-trx with SCHED_RR / realtime prirority.  This way
normal tasks that run with regular priority are not going to interfere
anymore.  But then, that leaves tons of kernel/driver code, and
hardware/bios/firmware...

It would be great to sched more light on this, but it likely needs very thorough
analysis across all layers of a system.

-- 
- Harald Welte <laforge at gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)