# take note on the discussions about udp-flow-hash udp4 using ethtool
https://home.regit.org/tag/performance/
https://www.joyent.com/blog/virtualizing-nics
https://www.serializing.me/2015/04/25/rxtx-buffers-rss-others-on-boot/
You can check if your card supports adjustable parameters by using "ethtool -k DEV | egrep -v fixed". As firat eludes to (below) udp flow hashing should be supported.
VMDQ: array of int
Number of Virtual Machine Device Queues: 0/1 = disable, 2-16 enable (default=8)
RSS: array of int
Number of Receive-Side Scaling Descriptor Queues, default 1=number of cpus
MQ: array of int
Disable or enable Multiple Queues, default 1
Node: array of int
set the starting node to allocate memory on, default -1
IntMode: array of int
Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default 2
InterruptType: array of int
Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default IntMode (deprecated)
TCP Segmentation Offload (TSO)
Uses the TCP protocol to send large packets. Uses the NIC to handle segmentation, and then adds the TCP, IP and data link layer protocol headers to each segment.
UDP Fragmentation Offload (UFO)
Uses the UDP protocol to send large packets. Uses the NIC to handle IP fragmentation into MTU sized packets for large UDP datagrams.
Generic Segmentation Offload (GSO)
Uses the TCP or UDP protocol to send large packets. If the NIC cannot handle segmentation/fragmentation, GSO performs the same operations, bypassing the NIC hardware. This is achieved by delaying segmentation until as late as possible, for example, when the packet is processed by the device driver.
Large Receive Offload (LRO)
Uses the TCP protocol. All incoming packets are re-segmented as they are received, reducing the number of segments the system has to process. They can be merged either in the driver or using the NIC. A problem with LRO is that it tends to resegment all incoming packets, often ignoring differences in headers and other information which can cause errors. It is generally not possible to use LRO when IP forwarding is enabled. LRO in combination with IP forwarding can lead to checksum errors. Forwarding is enabled if /proc/sys/net/ipv4/ip_forward is set to 1.
Generic Receive Offload (GRO)
Uses either the TCP or UDP protocols. GRO is more rigorous than LRO when resegmenting packets. For example it checks the MAC headers of each packet, which must match, only a limited number of TCP or IP headers can be different, and the TCP timestamps must match. Resegmenting can be handled by either the NIC or the GSO code.
Traffic steering was on by default with the version of linux i was using, but worth checking if your using older versions.
https://www.kernel.org/doc/Documentation/networking/scaling.txt
(from the txt link) note: Some advanced NICs allow steering packets to queues based on programmable filters. For example, webserver bound TCP port 80 packets can be directed to their own receive queue. Such “n-tuple†filters can be configured from ethtool (--config-ntuple).
ethtool -x <dev>
RX flow hash indirection table for ens192 with 8 RX ring(s):
0: 0 1 2 3 4 5 6 7
8: 0 1 2 3 4 5 6 7
16: 0 1 2 3 4 5 6 7
24: 0 1 2 3 4 5 6 7
RSS hash key:
Operation not supported
RSS hash function:
toeplitz: on
xor: off
crc32: off
ethtool -g ens192
Ring parameters for ens192:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 4096
TX: 4096
Current hardware settings:
RX: 1024
RX Mini: 0
RX Jumbo: 256
TX: 512
Additionally CISCO describe that you should have VM-FEX optimisation
note:
table 4. Scaling of Dynamic vNIC with VMDirectPath, Virtual Machines Running on Linux Guest with VMXNET3 Emulated Driver and Multi-Queue Enabled
Another thing to consider/investigate - openvswitch/bridging... If your using eth pairs to send your traffic down name spaces... you can have some varied results with performance by trying openvswitch/brctl
I really enjoyed the investigation path, again thanks to Firat for the pointer, otherwise it would have taken longer to get the answer...
Tony
Hi,It has been over 2 years that I have worked with gtp and I kind of had the same problem that time, we had a 10gbit cable and tried to see how much udp flow we could get. I think we used iperf to test it and when we list all the processes, the ksoftirq was using all the resource. Then I found this page: https://blog.cloudflare.com/how-to-receive-a-million-packets/. I do not remember the exact solution, but I guess when you configure your out ethernet interface with the command below, it must work then. To my understanding all the packets are processed in the same core in your situation, because the port number is always the same. So, for example, if you add another network with gtp-u tunnel on another port (different than 3386) then again your packets will be processed on the other core, too. But with the below command, the interface will be configured in a way that it wont check the port to process on which core it should be processed, but it will use the hash from the packet to distribute over the cores.ethtool -n (your_out_eth_interface) rx-flow-hash udp4Hope it will work you.FıratTony Clark <chiefy.padua@gmail.com>, 19 Haz 2019 Çar, 15:07 tarihinde şunu yazdı:Dear All,
I've been using the GTP-U kernel module to communicate with a P-GW.
Running Fedora 29, kernel 4.18.16-300.fc29.x86_64.
At high traffic levels through the GTP-U tunnel I see the performance degrade as 100% CPU is consumed by a single ksoftirqd process.
It is running on a multi-cpu machine and as far as I can tell the load is evenly spread across the cpus (ie either manually via smp_affinity, or even irqbalance, checking /proc/interrupts so forth.).
Has anyone else experienced this?
Is there any particular area you could recommend I investigate to find the root cause of this bottleneck, as i'm starting to scratch my head where to look next...
Thanks in advance
Tony
---- FYI
modinfo gtp
filename: /lib/modules/4.18.16-300.fc29.x86_64/kernel/drivers/net/gtp.ko.xz
alias: net-pf-16-proto-16-family-gtp
alias: rtnl-link-gtp
description: Interface driver for GTP encapsulated traffic
author: Harald Welte <hwelte@sysmocom.de>
license: GPL
depends: udp_tunnel
retpoline: Y
intree: Y
name: gtp
vermagic: 4.18.16-300.fc29.x86_64 SMP mod_unload
modinfo udp_tunnel
filename: /lib/modules/4.18.16-300.fc29.x86_64/kernel/net/ipv4/udp_tunnel.ko.xz
license: GPL
depends:
retpoline: Y
intree: Y
name: udp_tunnel
vermagic: 4.18.16-300.fc29.x86_64 SMP mod_unload