I've been trying to get some RTL-SDRs to function properly on my Tinkerboard, running Armbian Bionic, Kernel 4.14y. I noticed that with the Osmocom drivers rtl_test would result in huge sample drops (millions of samples dropped each time). However, Keenerds branch did not drop any samples, nor did the presumably older Osmocom drivers from apt-get.
I did a diff against Keenerds and discovered the new zerocopy buffer code in the Osmocom drivers. After commenting that out, the Osmocom drivers work fine on the Tinkerboard with Armbian and there is no sample loss.
Strangely, on TinkerOS and Ubuntu OS for Tinkerboard, all of which run Kernel 4.4, I do not get sample loss with the Osmocom drivers and the zerocopy code enabled.
I'm also running the Osmocom drivers on an Odroid XU4 with Ubuntu which is also running Kernel 4.14, and on Raspbian, and on those there is no sample loss. So it seems to be something specific to the Tinkerboard and Armbian with Kernel 4.14y.
Not sure if the zerocopy code is just not getting activated on those other OSes or what. How do I check if the installed libusb API version is >= 0x01000105?
Any ideas what could cause the zero copy code to result in the lost samples?
Regards, Carl Laufer
Hey,
You can find the libusb version with e.g. `dpkg-query -W libusb-*`.
The api version you're asking about is LIBUSB_API_VERSION in libusb.h, which is usually in /usr/include/libusb-1.0/libusb.h, or you can find it with `pkg-config --cflags libusb-1.0`.
Karl Semich
On 9/30/18, Carl Laufer admin@rtl-sdr.com wrote:
I've been trying to get some RTL-SDRs to function properly on my Tinkerboard, running Armbian Bionic, Kernel 4.14y. I noticed that with the Osmocom drivers rtl_test would result in huge sample drops (millions of samples dropped each time). However, Keenerds branch did not drop any samples, nor did the presumably older Osmocom drivers from apt-get.
I did a diff against Keenerds and discovered the new zerocopy buffer code in the Osmocom drivers. After commenting that out, the Osmocom drivers work fine on the Tinkerboard with Armbian and there is no sample loss.
Strangely, on TinkerOS and Ubuntu OS for Tinkerboard, all of which run Kernel 4.4, I do not get sample loss with the Osmocom drivers and the zerocopy code enabled.
I'm also running the Osmocom drivers on an Odroid XU4 with Ubuntu which is also running Kernel 4.14, and on Raspbian, and on those there is no sample loss. So it seems to be something specific to the Tinkerboard and Armbian with Kernel 4.14y.
Not sure if the zerocopy code is just not getting activated on those other OSes or what. How do I check if the installed libusb API version is >= 0x01000105?
Any ideas what could cause the zero copy code to result in the lost samples?
Regards, Carl Laufer
Hey,
You can find the libusb version with e.g. `dpkg-query -W libusb-*`.
The api version you're asking about is LIBUSB_API_VERSION in libusb.h, which is usually in /usr/include/libusb-1.0/libusb.h, or you can find it with `pkg-config --cflags libusb-1.0`.
Karl Semich
On 9/30/18, Carl Laufer admin@rtl-sdr.com wrote:
I've been trying to get some RTL-SDRs to function properly on my Tinkerboard, running Armbian Bionic, Kernel 4.14y. I noticed that with the Osmocom drivers rtl_test would result in huge sample drops (millions of samples dropped each time). However, Keenerds branch did not drop any samples, nor did the presumably older Osmocom drivers from apt-get.
I did a diff against Keenerds and discovered the new zerocopy buffer code in the Osmocom drivers. After commenting that out, the Osmocom drivers work fine on the Tinkerboard with Armbian and there is no sample loss.
Strangely, on TinkerOS and Ubuntu OS for Tinkerboard, all of which run Kernel 4.4, I do not get sample loss with the Osmocom drivers and the zerocopy code enabled.
I'm also running the Osmocom drivers on an Odroid XU4 with Ubuntu which is also running Kernel 4.14, and on Raspbian, and on those there is no sample loss. So it seems to be something specific to the Tinkerboard and Armbian with Kernel 4.14y.
Not sure if the zerocopy code is just not getting activated on those other OSes or what. How do I check if the installed libusb API version is >= 0x01000105?
Any ideas what could cause the zero copy code to result in the lost samples?
Regards, Carl Laufer
Hi Carl,
On 01.10.2018 05:42, Carl Laufer wrote:
I did a diff against Keenerds and discovered the new zerocopy buffer code in the Osmocom drivers. After commenting that out, the Osmocom drivers work fine on the Tinkerboard with Armbian and there is no sample loss.
That's very strange, because from an rtl-sdr point of view it is just regular memory that was allocated by the Kernel (and happens to be DMA-able) and mapped into userspace. So my first guess is that something must be going wrong either inside the Kernel or libusb.
Strangely, on TinkerOS and Ubuntu OS for Tinkerboard, all of which run Kernel 4.4, I do not get sample loss with the Osmocom drivers and the zerocopy code enabled.
Yes, as 4.4 doesn't support usbfs zerocopy.
I'm also running the Osmocom drivers on an Odroid XU4 with Ubuntu which is also running Kernel 4.14, and on Raspbian, and on those there is no sample loss. So it seems to be something specific to the Tinkerboard and Armbian with Kernel 4.14y.
Not sure if the zerocopy code is just not getting activated on those other OSes or what. How do I check if the installed libusb API version is >= 0x01000105?
I've put that information into the commit message:
Requires Linux >= 4.6 and libusb >= 1.0.21.
(https://cgit.osmocom.org/rtl-sdr/commit/?id=a854ae8b48d42e8dad514c75d3a4c6cf...)
So it's enabled if both versions are there.
Any ideas what could cause the zero copy code to result in the lost samples?
Do you get any error messages when the library is loaded? What is the value of /sys/module/usbcore/parameters/usbfs_memory_mb, and can you try to increase it or set it to 0 to disable the limit?
Was the Kernel built with CONFIG_CMA=y, and what is the value of CONFIG_CMA_SIZE_MBYTES?
If any of this isn't right, there normally should appear a warning and "falling back to buffers in userspace" when starting the application, but maybe there is something going wrong...
Regards, Steve
I did some more tests. I upgraded the OS on my Odroid XU4 to Ubuntu 18.04, and this has libusb 1.0.21 in the repo's (the previous one I was using had libusb 1.0.20). Now I get the same continuous lost sample bytes problem on the XU4 as on the Tinkerboard - a huge amount of dropped samples with zerocopy enabled.
However, on the latest Raspbian on the Rpi3, I don't see the problem. On the Pi3 rtl_test returns only one line of dropped samples, then no more.
Allocating 15 zero-copy buffers lost at least 156 bytes
On the Odroid and Tinkerboard it's more like:
Allocating 15 zero-copy buffers lost at least 13617588 bytes lost at least 12326518 bytes lost at least 13208366 bytes lost at least 13301044 bytes ...and so on forever
No other error messages are seen, and disabling the usbfs_memory_mb limit by setting it to zero doesn't get rid of the problem.
I'm not too sure about the CMA stuff, i'm just using the default OS images provided for the Tinkerboard and XU4, and standard Ubuntu 18.04 on my laptop.
Regards, Carl Laufer
On Tue, Oct 2, 2018 at 10:28 AM Steve Markgraf steve@steve-m.de wrote:
Hi Carl,
On 01.10.2018 05:42, Carl Laufer wrote:
I did a diff against Keenerds and discovered the new zerocopy buffer code in the Osmocom drivers. After commenting that out, the Osmocom drivers work fine on the Tinkerboard with Armbian and there is no sample loss.
That's very strange, because from an rtl-sdr point of view it is just regular memory that was allocated by the Kernel (and happens to be DMA-able) and mapped into userspace. So my first guess is that something must be going wrong either inside the Kernel or libusb.
Strangely, on TinkerOS and Ubuntu OS for Tinkerboard, all of which run Kernel 4.4, I do not get sample loss with the Osmocom drivers and the zerocopy code enabled.
Yes, as 4.4 doesn't support usbfs zerocopy.
I'm also running the Osmocom drivers on an Odroid XU4 with Ubuntu which is also running Kernel 4.14, and on Raspbian, and on those there is no sample loss. So it seems to be something specific to the Tinkerboard and Armbian with Kernel 4.14y.
Not sure if the zerocopy code is just not getting activated on those other OSes or what. How do I check if the installed libusb API version is >= 0x01000105?
I've put that information into the commit message:
Requires Linux >= 4.6 and libusb >= 1.0.21.
( https://cgit.osmocom.org/rtl-sdr/commit/?id=a854ae8b48d42e8dad514c75d3a4c6cf... )
So it's enabled if both versions are there.
Any ideas what could cause the zero copy code to result in the lost
samples?
Do you get any error messages when the library is loaded? What is the value of /sys/module/usbcore/parameters/usbfs_memory_mb, and can you try to increase it or set it to 0 to disable the limit?
Was the Kernel built with CONFIG_CMA=y, and what is the value of CONFIG_CMA_SIZE_MBYTES?
If any of this isn't right, there normally should appear a warning and "falling back to buffers in userspace" when starting the application, but maybe there is something going wrong...
Regards, Steve
Hey,
I just wanted to offer, if time and energy is short, I'd be willing to spend some time narrowing down this issue, given shell access to one of the failing systems.
Sounds like a bug in either the kernel or libusb.
Karl
On 10/2/18, Carl Laufer admin@rtl-sdr.com wrote:
I did some more tests. I upgraded the OS on my Odroid XU4 to Ubuntu 18.04, and this has libusb 1.0.21 in the repo's (the previous one I was using had libusb 1.0.20). Now I get the same continuous lost sample bytes problem on the XU4 as on the Tinkerboard - a huge amount of dropped samples with zerocopy enabled.
However, on the latest Raspbian on the Rpi3, I don't see the problem. On the Pi3 rtl_test returns only one line of dropped samples, then no more.
Allocating 15 zero-copy buffers lost at least 156 bytes
On the Odroid and Tinkerboard it's more like:
Allocating 15 zero-copy buffers lost at least 13617588 bytes lost at least 12326518 bytes lost at least 13208366 bytes lost at least 13301044 bytes ...and so on forever
No other error messages are seen, and disabling the usbfs_memory_mb limit by setting it to zero doesn't get rid of the problem.
I'm not too sure about the CMA stuff, i'm just using the default OS images provided for the Tinkerboard and XU4, and standard Ubuntu 18.04 on my laptop.
Regards, Carl Laufer
On Tue, Oct 2, 2018 at 10:28 AM Steve Markgraf steve@steve-m.de wrote:
Hi Carl,
On 01.10.2018 05:42, Carl Laufer wrote:
I did a diff against Keenerds and discovered the new zerocopy buffer code in the Osmocom drivers. After commenting that out, the Osmocom drivers work fine on the Tinkerboard with Armbian and there is no sample loss.
That's very strange, because from an rtl-sdr point of view it is just regular memory that was allocated by the Kernel (and happens to be DMA-able) and mapped into userspace. So my first guess is that something must be going wrong either inside the Kernel or libusb.
Strangely, on TinkerOS and Ubuntu OS for Tinkerboard, all of which run Kernel 4.4, I do not get sample loss with the Osmocom drivers and the zerocopy code enabled.
Yes, as 4.4 doesn't support usbfs zerocopy.
I'm also running the Osmocom drivers on an Odroid XU4 with Ubuntu which is also running Kernel 4.14, and on Raspbian, and on those there is no sample loss. So it seems to be something specific to the Tinkerboard and Armbian with Kernel 4.14y.
Not sure if the zerocopy code is just not getting activated on those other OSes or what. How do I check if the installed libusb API version is >= 0x01000105?
I've put that information into the commit message:
Requires Linux >= 4.6 and libusb >= 1.0.21.
( https://cgit.osmocom.org/rtl-sdr/commit/?id=a854ae8b48d42e8dad514c75d3a4c6cf... )
So it's enabled if both versions are there.
Any ideas what could cause the zero copy code to result in the lost
samples?
Do you get any error messages when the library is loaded? What is the value of /sys/module/usbcore/parameters/usbfs_memory_mb, and can you try to increase it or set it to 0 to disable the limit?
Was the Kernel built with CONFIG_CMA=y, and what is the value of CONFIG_CMA_SIZE_MBYTES?
If any of this isn't right, there normally should appear a warning and "falling back to buffers in userspace" when starting the application, but maybe there is something going wrong...
Regards, Steve
Hi,
so it turned out that this is indeed a Kernel issue. I have implemented a crude workaround for both rtl-sdr and osmo-fl2k to detect the bug, and fall back to buffers in userspace if it is present.
I've sent the mail below to the linux-usb list, but didn't get any reply so far. Let's see.
-- When I debugged the issue, I found out that the Kernel maps seemingly random memory to my transfer buffers, containing memory of other processes or even the Kernel itself.
The code that does the mapping in drivers/usb/core/devio.c: (line 243 in v4.19-rc7)
if (remap_pfn_range(vma, vma->vm_start, virt_to_phys(usbm->mem) >> PAGE_SHIFT, size, vma->vm_page_prot) < 0) {
With the following change the issue is fixed for ARM systems, but it breaks x86 systems:
- virt_to_phys(usbm->mem) >> PAGE_SHIFT, + page_to_pfn(virt_to_page(dma_addr)),
Both usbm->mem and dma_addr are returned by the previous call to usb_alloc_coherent(). Here's an example of the pointers generated by the two macros on an ARM64 system for the same buffer:
virt_to_phys(usbm->mem) >> PAGE_SHIFT: 00000000808693ce page_to_pfn(virt_to_page(dma_addr)): 000000009775a856
From what I read so far I got the impression that the 'proper' way would
be to use dma_mmap_coherent() with dma_addr instead of remap_pfn_range(), however, I did not get it to work. Can anyone help out?
Best Regards, Steve Markgraf
Hey, to follow this up ...
I saw there was a reply on the linux-usb list at https://www.spinics.net/lists/linux-usb/msg173393.html: "You should ask on the linux-arch@xxxxxxxxxxxxxxx, linux-arm-kernel@xxxxxxxxxxxxxxxxxxx, and linux-kernel@xxxxxxxxxxxxxxx mailing lists."
I was using the zerocopy buffer feature, and just wondering if anybody ended up following this up on those kernel lists. I didn't find it mentioned with a brief google.
Dumping kernel ram to userspace is obviously a serious security issue, so maybe it's not being discussed via public channels. But it would be tragic if it were not being discussed at all.
On 10/9/18, Steve Markgraf steve@steve-m.de wrote:
Hi,
so it turned out that this is indeed a Kernel issue. I have implemented a crude workaround for both rtl-sdr and osmo-fl2k to detect the bug, and fall back to buffers in userspace if it is present.
I've sent the mail below to the linux-usb list, but didn't get any reply so far. Let's see.
-- When I debugged the issue, I found out that the Kernel maps seemingly random memory to my transfer buffers, containing memory of other processes or even the Kernel itself.
The code that does the mapping in drivers/usb/core/devio.c: (line 243 in v4.19-rc7)
if (remap_pfn_range(vma, vma->vm_start, virt_to_phys(usbm->mem) >> PAGE_SHIFT, size, vma->vm_page_prot) < 0) {
With the following change the issue is fixed for ARM systems, but it breaks x86 systems:
virt_to_phys(usbm->mem) >> PAGE_SHIFT,
page_to_pfn(virt_to_page(dma_addr)),
Both usbm->mem and dma_addr are returned by the previous call to usb_alloc_coherent(). Here's an example of the pointers generated by the two macros on an ARM64 system for the same buffer:
virt_to_phys(usbm->mem) >> PAGE_SHIFT: 00000000808693ce page_to_pfn(virt_to_page(dma_addr)): 000000009775a856
From what I read so far I got the impression that the 'proper' way would be to use dma_mmap_coherent() with dma_addr instead of remap_pfn_range(), however, I did not get it to work. Can anyone help out?
Best Regards, Steve Markgraf