From ghiturbe at tutanota.com Wed Nov 6 02:53:48 2019 From: ghiturbe at tutanota.com (ghiturbe at tutanota.com) Date: Wed, 6 Nov 2019 03:53:48 +0100 (CET) Subject: Correo confidencial de ghiturbe Message-ID: An HTML attachment was scrubbed... URL: From ghiturbe at tutanota.com Wed Nov 6 03:02:24 2019 From: ghiturbe at tutanota.com (ghiturbe at tutanota.com) Date: Wed, 6 Nov 2019 04:02:24 +0100 (CET) Subject: EDGE MCS not stable Message-ID: Dear all, Please do not consider the last message. First I want to introduce myslef, I'm Gael and few months ago I bought?a Lime-Mini to use as BTS with EDGE. I have spent some time working in that and it is doing it but I have noticed that the UP LINK MCS is not stable, it changes very much with the mobile near the board. I have read that this issue was solved in the ticket 1833 but it is not working for me, I hope that this not about my board. Do you have any suggest for this behavior? To solve that issue, I force as I read in another post, setting the minimum an maximum values to zero till the MCS to be used. It works fine setting the pcu-cfg file like: ... mcs7 0 35 mcs8 35 35. It works but it does not look well and if I change by VTY to use mcs 9 it works too but is not posible to start the system with this configuration, so I have to start it using mcs7. With this cfg I can reach 130kbps using a mobile class 12 ( 4TS RX + TX), anyway I have 6 TS configured for PDCH. I guess ?this mobile should reach almost 200 kbps (4TS * 54Kbps). What is the maximum speed? reached? Thanks in advance for your support. Best regards. -- Securely sent with Tutanota. Get your own encrypted, ad-free mailbox: https://tutanota.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From chiefy.padua at gmail.com Fri Nov 22 10:26:51 2019 From: chiefy.padua at gmail.com (chiefy) Date: Fri, 22 Nov 2019 10:26:51 +0000 Subject: high ksoftirqd while using module gtp? In-Reply-To: References: Message-ID: <1574418411.3948.13.camel@gmail.com> Dear All, To update you on investigations. If your want to push throughput even further I recommend if your running hypervisor's or others to enable SR-IOV on the network cards. Naturally your network cards need to support SR-IOV (check your tech specs). And in the case of virtualisation SR-IOV might require licensing from the vendor. This does need some changes to both changes to BIOS settings on your hardware (to enable sr-iov vt-x/vt-d (or iommu if your amd)). You will also have to configure your hypervisors to support sr-iov. You need to configure your vm guests to also to use your newly presented network cards (VF). Dont over allocate vf in your physical nic cards via sr-iov, you might run out of interrupts :-D You will see even better throughput, reducing latency, power consumption and lower resource utilisation on your hypervisors. Hope this helps. On Fri, 2019-08-16 at 15:27 +0100, Tony Clark wrote: > Firstly I would like to say great thanks Firat for the reply, it > certainly put me on a different investigation path. And apologies for > not replying sooner. I wanted to make sure it was the correct path > before I replied back to the group with the findings and associated > solution. > > > If the GTP-U connection is connecting to the PG-W with a single IP at > (src/dst) each side, and UDP flow has isn't been enabled on the > network card of the host using gtp.ko in the kernel all the > associated network traffic will be received on a single queue on the > network, which is then serviced by a single ksoftirq thread. at > somepoint the system will be receiving more traffic than there is > available thread space to service the request, your ksoftirq will > burn at 100%. That means all your traffic will be bound to a single > network queue, bound to a single irq thread, and limit your overall > throughput, no matter how big your network pipe is. > > This is because the network card hashed the packet via > SRC_IP:SRC_PORT:DEST_IP:DEST_PORT:PROTO to a single queue. > > # take note on the discussions about udp-flow-hash udp4 using ethtool > https://home.regit.org/tag/performance/ > https://www.joyent.com/blog/virtualizing-nics > https://www.serializing.me/2015/04/25/rxtx-buffers-rss-others-on-boot > / > > You can check if your card supports adjustable parameters by using > "ethtool -k DEV | egrep -v fixed". As firat eludes to (below) udp > flow hashing should be supported. > > If you enabled UDP flow hash then it will spread the hash over > multiple queues. The default number of queues on the network card can > vary, depending on your hardware firmware driver, and any additional > associated kernel parameters. > > Would recommend having the latest firmware driver for your network > card, and latest kernel driver for the network card if possible. > > Alas the network cards used by my hardware didn't support flow hash, > it had intel flow director, which wasn't granular enough and worked > with TCP, so to work around this limitation having multiple SRC_IPs > in different name spaces with the same GTP UDP PORT numbers resolved > the problem. Of course if you are sending GTP-U to a single > destination from multiple sources (say 6 IP's), via 6 different > kernel name spaces, you spread the load over 6 queues, which is > better than nothing on a limited feature network card. Time to > upgrade the 10G network card.... > > This took the system from 100% ksoftirq on a single cpu running at > throughput 1G, to around 7 to 8GIG throughput at 90% ksoftirq over > multiple cpu;s... There is still massive room for improvement. > > > > For performance some things to investigate/consider... Which I had > different levels of success... Here are my ramblings..... > > on the linux host... Assuming your traffic is now spread across > multiple queues (above) - or at least spread as best as can be... > > Kernel sysctl tweaking is always of benefit, if your using out of the > box kernel config... Example udp buffers, queue sizes, paging and > virtual memory settings...? There is a application called "tuned", > which allows you to adjust profiles for the kernel sysctl... My > performance profile which suited the testing best was "throuhput- > performance" > > if your looking for straight performance, disable audit processing > like "auditd". > > Question use of SELINUX, enforcing/permissive or disabled. can bring > results on performance, if you doing testing or load testing... > ofcourse its a security consideration..? > > If you don't need to use ipfilters/firewall in my case can increase > the throughput by a 3rd by disabling (cleaning the filter tables and > unloading the modules). Black listing the modules so they dont get > loaded at kernel time. Note you can stop modules getting loaded with > kernel.modules_disabled=1, but be careful if your also messing with > initramfs rebuilds, because you don't get any modules once your set > that parameter, i learnt the hard way :) > > Investigate smp_affinity and affinity_hint, along with irqbalane > using the --hintpolicy=exact. understand which irq's service the > network cards, and how many queues you have.. /proc/interrupts will > guide you (grep 'CPU|rxtx' /proc/interrupts)... understand the > smp_affinity numbers.. "for ((irq=START_IRQ; irq<=END_IRQ; irq++)); > do??? cat /proc/irq/$irq/smp_affinity;???? done | sort ?u", as you > can adjust which queue goes to which ksoftirq to manually balance the > queues if you so desire. brilliant document on irq debugging.... http > s://events.static.linuxfound.org/sites/events/files/slides/LinuxConJa > pan2016_makita_160714.pdf > > you can monitor what calls are been executed on cpu's by using... I > found this most useful to understand that ipfilter was eating a > significant amount of CPU cycles, and also what other calls are > eating up cycles inside ksoftirq. https://github.com/brendangregg/Fla > meGraph > > Investigate additional memory management using numactl (numa daemon). > remember if you are using virtualisation you might want to pin guests > to specific sockets, along with numa pinning on the vmhost.. Also > look at reserved memory allocation in the vmhost for the guest... > This will make your guest perform better. > > enable sysstat (sar) as it will aid your investigation if you havent > already (sar -u ALL -P ALL 1). This will show which softirqs are > eating most cpu and to which cpu they are bound, this also translates > directly to the network queue that the traffic is coming in on.. Ie, > network card queue 6, talks to cpu/6 talking to irq/6 and so on... > Using flamegraph will help you understand what syscalls and chewing > the CPU.. > > If your using virtualisation then the number of default queues that > vxnet (vmware in this example) presents to the guest might be less > than the number of network card queues the vmhost sees (so watch out > for that). You can adjust the number of queues to the guest by params > in the vmware network driver... investigate VDMQ / netqueue, to > increase the number of available hardware queues from the vmhost to > the guest. depending which quest driver your using vxnet3, or others > some drivers dont support NAPI (see further down). > ? VMDQ: array of int > ??? Number of Virtual Machine Device Queues: 0/1 = disable, 2-16 > enable (default=8) > ? RSS: array of int > ??? Number of Receive-Side Scaling Descriptor Queues, default > 1=number of cpus > ? MQ: array of int > ??? Disable or enable Multiple Queues, default 1 > ? Node: array of int > ??? set the starting node to allocate memory on, default -1 > ? IntMode: array of int > ??? Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default 2 > ? InterruptType: array of int > ??? Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default IntMode > (deprecated) > > Make sure your virtual switch (vmware) if used has Pass-through > (Direct-path I/O) enabled. NIC teaming policy should be validated > depending on your requirement, example Policy "route based on IP > hash" can be of benefit. > > Check the network card is MSI-X, and the linux driver supports NAPI > (most should these days, but you never know), also check your vmhost > driver supports napi, if not get a NAPI supported kvm driver, or > vmware driver (vib update). > > Upgrade your kernel, to a later release 4.x.. even consider using a > later distro of linux... I tried fedora 29. I also compiled latest > osmocom from source, with compile options for "optimisation -O3 and > other such". > > "bmon -b" was a good tool understand throughput loads, along with > loading through qdisc/fq_dodel mq's.... Understand qdisc via ip link > or ifconfig (http://tldp.org/HOWTO/Traffic-Control-HOWTO/components.h > tml), adjusting the queues has some traction, but if unsure leave as > default. > > TSO/UFo?GSO/LRO/GRO - understand your network card with respects to > these, this can improve performance if you haven't already enabled > (or adversely disabled options, since sometimes it doesn't actually > help). You can get the your card options using ethool > TCP Segmentation Offload (TSO) > ??? Uses the TCP protocol to send large packets. Uses the NIC to > handle segmentation, and then adds the TCP, IP and data link layer > protocol headers to each segment.? > UDP Fragmentation Offload (UFO) > ??? Uses the UDP protocol to send large packets. Uses the NIC to > handle IP fragmentation into MTU sized packets for large UDP > datagrams.? > Generic Segmentation Offload (GSO) > ??? Uses the TCP or UDP protocol to send large packets. If the NIC > cannot handle segmentation/fragmentation, GSO performs the same > operations, bypassing the NIC hardware. This is achieved by delaying > segmentation until as late as possible, for example, when the packet > is processed by the device driver.? > Large Receive Offload (LRO) > ??? Uses the TCP protocol. All incoming packets are re-segmented as > they are received, reducing the number of segments the system has to > process. They can be merged either in the driver or using the NIC. A > problem with LRO is that it tends to resegment all incoming packets, > often ignoring differences in headers and other information which can > cause errors. It is generally not possible to use LRO when IP > forwarding is enabled. LRO in combination with IP forwarding can lead > to checksum errors. Forwarding is enabled if > /proc/sys/net/ipv4/ip_forward is set to 1.? > Generic Receive Offload (GRO) > ??? Uses either the TCP or UDP protocols. GRO is more rigorous than > LRO when resegmenting packets. For example it checks the MAC headers > of each packet, which must match, only a limited number of TCP or IP > headers can be different, and the TCP timestamps must match. > Resegmenting can be handled by either the NIC or the GSO code. > > Traffic steering was on by default with the version of linux i was > using, but worth checking if your using older versions. > https://www.kernel.org/doc/Documentation/networking/scaling.txt > (from the txt link) note: Some advanced NICs allow steering packets > to queues based on programmable filters. For example, webserver bound > TCP port 80 packets can be directed to their own receive queue. Such > ???n-tuple??? filters can be configured from ethtool (--config- > ntuple). > > Interestingly investigate your network card, for its hashing > algorithms, how it distributes the traffic over its ring buffers, you > can on some cards adjust the RSS hash function. Alas the card i was > using stuck to "toeplitz" for hits hashing, which others were > disabled and unavailable / xor and crc32. The? indirection table can > be adjusted based on the tuplets "ethtool -X" but didn't really > assist too much on this. > ethtool -x > RX flow hash indirection table for ens192 with 8 RX ring(s): > ??? 0:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7 > ? ??8:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7 > ?? 16:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7 > ?? 24:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7 > RSS hash key: > Operation not supported > RSS hash function: > ??? toeplitz: on > ??? xor: off > ?? ?crc32: off > > > Check the default size of the rx/tx ring buffers, they maybe > suboptimal. > ethtool -g ens192 > Ring parameters for ens192: > Pre-set maximums: > RX:???????????? 4096 > RX Mini:??????? 0 > RX Jumbo:?????? 4096 > TX:???????????? 4096 > Current hardware settings: > RX:???????????? 1024 > RX Mini:??????? 0 > RX Jumbo:?????? 256 > TX:???????????? 512 > > If your using port channels, make sure you have the correct hashing > policy enabled at the switch end... > > I haven't investigated this option yet but some switches also do > scaling, to assist (certainly with virtualisation)... Maybe one day i > will get around to this... > Additionally CISCO describe that you should have VM-FEX optimisation > https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtua > lization/unified- > computing/vm_fex_best_practices_deployment_guide.html? > note: > table 4. Scaling of Dynamic vNIC with VMDirectPath, Virtual Machines > Running on Linux Guest with VMXNET3 Emulated Driver and Multi-Queue > Enabled > Table 5. Scaling of Dynamic vNIC with VMDirectPath, Virtual Machines > Running on Linux Guest with VMXNET3 Emulated Driver and Multi-Queue > Disabled > > > Another thing to consider/investigate - openvswitch/bridging... If > your using eth pairs to send your traffic down name spaces... you can > have some varied results with performance by trying openvswitch/brctl > > > > I really enjoyed the investigation path, again thanks to Firat for > the pointer, otherwise it would have taken longer to get the > answer... > > Tony > > On Fri, Jun 21, 2019 at 6:50 AM f?rat s?nmez > wrote: > > Hi, > > > > It has been over 2 years that I have worked with gtp and I kind of > > had the same?problem that time, we had a 10gbit cable and tried to > > see how much udp flow we could get. I think we used iperf to test > > it and when we list all the processes, the ksoftirq was using all > > the resource. Then I found this page:?https://blog.cloudflare.com/h > > ow-to-receive-a-million-packets/. I do not remember the exact > > solution, but I guess when you configure your out ethernet > > interface with the command below, it must work then. To my > > understanding all the packets are processed in the same core in > > your situation, because the port number is always the same. So, for > > example, if you add another network with gtp-u tunnel on another > > port (different than 3386) then again your packets will be > > processed on the other core, too. But with the below command, the > > interface will be configured in a way that it wont check the port > > to process on which core it should be processed, but it will use > > the hash from the packet to distribute over the cores. > > ethtool -n (your_out_eth_interface) rx-flow-hash udp4? > > > > Hope it will work you. > > > > F?rat > > > > Tony Clark , 19 Haz 2019 ?ar, 15:07 > > tarihinde ?unu yazd?: > > > Dear All, > > > > > > I've been using the GTP-U kernel module to communicate with a P- > > > GW. > > > > > > Running Fedora 29, kernel 4.18.16-300.fc29.x86_64. > > > > > > At high traffic levels through the GTP-U tunnel I see the > > > performance degrade as 100% CPU is consumed by a single ksoftirqd > > > process. > > > > > > It is running on a multi-cpu machine and as far as I can tell the > > > load is evenly spread across the cpus (ie either manually via > > > smp_affinity, or even irqbalance, checking /proc/interrupts so > > > forth.). > > > > > > Has anyone else experienced this? > > > > > > Is there any particular area you could recommend I investigate to > > > find the root cause of this bottleneck, as i'm starting to > > > scratch my head where to look next... > > > > > > Thanks in advance > > > Tony > > > ? > > > ---- FYI > > > > > > modinfo gtp > > > filename: ? ? ? /lib/modules/4.18.16- > > > 300.fc29.x86_64/kernel/drivers/net/gtp.ko.xz > > > alias: ? ? ? ? ?net-pf-16-proto-16-family-gtp > > > alias: ? ? ? ? ?rtnl-link-gtp > > > description: ? ?Interface driver for GTP encapsulated traffic > > > author: ? ? ? ? Harald Welte > > > license: ? ? ? ?GPL > > > depends: ? ? ? ?udp_tunnel > > > retpoline: ? ? ?Y > > > intree: ? ? ? ? Y > > > name: ? ? ? ? ? gtp > > > vermagic: ? ? ? 4.18.16-300.fc29.x86_64 SMP mod_unload? > > > > > > modinfo udp_tunnel > > > filename: ? ? ? /lib/modules/4.18.16- > > > 300.fc29.x86_64/kernel/net/ipv4/udp_tunnel.ko.xz > > > license: ? ? ? ?GPL > > > depends: ? ? ? ? > > > retpoline: ? ? ?Y > > > intree: ? ? ? ? Y > > > name: ? ? ? ? ? udp_tunnel > > > vermagic: ? ? ? 4.18.16-300.fc29.x86_64 SMP mod_unload? > > >