From ghiturbe at tutanota.com  Wed Nov  6 02:53:48 2019
From: ghiturbe at tutanota.com (ghiturbe at tutanota.com)
Date: Wed, 6 Nov 2019 03:53:48 +0100 (CET)
Subject: Correo confidencial de ghiturbe
Message-ID: <LsycTaX----1@tutanota.com>

An HTML attachment was scrubbed...
URL: <http://lists.osmocom.org/pipermail/osmocom-net-gprs/attachments/20191106/4f97da2d/attachment.htm>

From ghiturbe at tutanota.com  Wed Nov  6 03:02:24 2019
From: ghiturbe at tutanota.com (ghiturbe at tutanota.com)
Date: Wed, 6 Nov 2019 04:02:24 +0100 (CET)
Subject: EDGE MCS not stable
Message-ID: <LsyeR9E--3-1@tutanota.com>

Dear all,

Please do not consider the last message.

First I want to introduce myslef, I'm Gael and few months ago I bought?a Lime-Mini to use as BTS with EDGE. I have spent some time working in that and it is doing it but I have noticed that the UP LINK MCS is not stable, it changes very much with the mobile near the board. I have read that this issue was solved in the ticket 1833 but it is not working for me, I hope that this not about my board. Do you have any suggest for this behavior?

To solve that issue, I force as I read in another post, setting the minimum an maximum values to zero till the MCS to be used. It works fine setting the pcu-cfg file like: ... mcs7 0 35 mcs8 35 35.
It works but it does not look well and if I change by VTY to use mcs 9 it works too but is not posible to start the system with this configuration, so I have to start it using mcs7.

With this cfg I can reach 130kbps using a mobile class 12 ( 4TS RX + TX), anyway I have 6 TS configured for PDCH. I guess ?this mobile should reach almost 200 kbps (4TS * 54Kbps). What is the maximum speed? reached?

Thanks in advance for your support.

Best regards.

-- 
 Securely sent with Tutanota. Get your own encrypted, ad-free mailbox: 
 https://tutanota.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osmocom.org/pipermail/osmocom-net-gprs/attachments/20191106/3e4bf914/attachment.htm>

From chiefy.padua at gmail.com  Fri Nov 22 10:26:51 2019
From: chiefy.padua at gmail.com (chiefy)
Date: Fri, 22 Nov 2019 10:26:51 +0000
Subject: high ksoftirqd while using module gtp?
In-Reply-To: <CA+dJ=bzUhJh8oYAua4bjRLhShCRGyB=7HiX3dhHEhQb4Mw7GOg@mail.gmail.com>
References: <CA+dJ=bxPunvB4GOV5gwUdSf+PPkukNkwhVD=NAr5y2i5a4Gj3w@mail.gmail.com>
 <CAA5f=PhaeiKt_in-D9Tj+3uor7La4EKjK37FbfPJS7xdUAOugw@mail.gmail.com>
 <CA+dJ=bzUhJh8oYAua4bjRLhShCRGyB=7HiX3dhHEhQb4Mw7GOg@mail.gmail.com>
Message-ID: <1574418411.3948.13.camel@gmail.com>

Dear All,

To update you on investigations.

If your want to push throughput even further I recommend if your
running hypervisor's or others to enable SR-IOV on the network cards.

Naturally your network cards need to support SR-IOV (check your tech
specs). And in the case of virtualisation SR-IOV might require
licensing from the vendor.

This does need some changes to both changes to BIOS settings on your
hardware (to enable sr-iov vt-x/vt-d (or iommu if your amd)).

You will also have to configure your hypervisors to support sr-iov.

You need to configure your vm guests to also to use your newly
presented network cards (VF).

Dont over allocate vf in your physical nic cards via sr-iov, you might
run out of interrupts :-D

You will see even better throughput, reducing latency, power
consumption and lower resource utilisation on your hypervisors.


Hope this helps.


On Fri, 2019-08-16 at 15:27 +0100, Tony Clark wrote:
> Firstly I would like to say great thanks Firat for the reply, it
> certainly put me on a different investigation path. And apologies for
> not replying sooner. I wanted to make sure it was the correct path
> before I replied back to the group with the findings and associated
> solution.
> 
> 
> If the GTP-U connection is connecting to the PG-W with a single IP at
> (src/dst) each side, and UDP flow has isn't been enabled on the
> network card of the host using gtp.ko in the kernel all the
> associated network traffic will be received on a single queue on the
> network, which is then serviced by a single ksoftirq thread. at
> somepoint the system will be receiving more traffic than there is
> available thread space to service the request, your ksoftirq will
> burn at 100%. That means all your traffic will be bound to a single
> network queue, bound to a single irq thread, and limit your overall
> throughput, no matter how big your network pipe is.
> 
> This is because the network card hashed the packet via
> SRC_IP:SRC_PORT:DEST_IP:DEST_PORT:PROTO to a single queue.
> 
> # take note on the discussions about udp-flow-hash udp4 using ethtool
> https://home.regit.org/tag/performance/
> https://www.joyent.com/blog/virtualizing-nics
> https://www.serializing.me/2015/04/25/rxtx-buffers-rss-others-on-boot
> /
> 
> You can check if your card supports adjustable parameters by using
> "ethtool -k DEV | egrep -v fixed". As firat eludes to (below) udp
> flow hashing should be supported.
> 
> If you enabled UDP flow hash then it will spread the hash over
> multiple queues. The default number of queues on the network card can
> vary, depending on your hardware firmware driver, and any additional
> associated kernel parameters.
> 
> Would recommend having the latest firmware driver for your network
> card, and latest kernel driver for the network card if possible.
> 
> Alas the network cards used by my hardware didn't support flow hash,
> it had intel flow director, which wasn't granular enough and worked
> with TCP, so to work around this limitation having multiple SRC_IPs
> in different name spaces with the same GTP UDP PORT numbers resolved
> the problem. Of course if you are sending GTP-U to a single
> destination from multiple sources (say 6 IP's), via 6 different
> kernel name spaces, you spread the load over 6 queues, which is
> better than nothing on a limited feature network card. Time to
> upgrade the 10G network card....
> 
> This took the system from 100% ksoftirq on a single cpu running at
> throughput 1G, to around 7 to 8GIG throughput at 90% ksoftirq over
> multiple cpu;s... There is still massive room for improvement.
> 
> 
> 
> For performance some things to investigate/consider... Which I had
> different levels of success... Here are my ramblings.....
> 
> on the linux host... Assuming your traffic is now spread across
> multiple queues (above) - or at least spread as best as can be...
> 
> Kernel sysctl tweaking is always of benefit, if your using out of the
> box kernel config... Example udp buffers, queue sizes, paging and
> virtual memory settings...? There is a application called "tuned",
> which allows you to adjust profiles for the kernel sysctl... My
> performance profile which suited the testing best was "throuhput-
> performance"
> 
> if your looking for straight performance, disable audit processing
> like "auditd".
> 
> Question use of SELINUX, enforcing/permissive or disabled. can bring
> results on performance, if you doing testing or load testing...
> ofcourse its a security consideration..?
> 
> If you don't need to use ipfilters/firewall in my case can increase
> the throughput by a 3rd by disabling (cleaning the filter tables and
> unloading the modules). Black listing the modules so they dont get
> loaded at kernel time. Note you can stop modules getting loaded with
> kernel.modules_disabled=1, but be careful if your also messing with
> initramfs rebuilds, because you don't get any modules once your set
> that parameter, i learnt the hard way :)
> 
> Investigate smp_affinity and affinity_hint, along with irqbalane
> using the --hintpolicy=exact. understand which irq's service the
> network cards, and how many queues you have.. /proc/interrupts will
> guide you (grep 'CPU|rxtx' /proc/interrupts)... understand the
> smp_affinity numbers.. "for ((irq=START_IRQ; irq<=END_IRQ; irq++));
> do??? cat /proc/irq/$irq/smp_affinity;???? done | sort ?u", as you
> can adjust which queue goes to which ksoftirq to manually balance the
> queues if you so desire. brilliant document on irq debugging.... http
> s://events.static.linuxfound.org/sites/events/files/slides/LinuxConJa
> pan2016_makita_160714.pdf
> 
> you can monitor what calls are been executed on cpu's by using... I
> found this most useful to understand that ipfilter was eating a
> significant amount of CPU cycles, and also what other calls are
> eating up cycles inside ksoftirq. https://github.com/brendangregg/Fla
> meGraph
> 
> Investigate additional memory management using numactl (numa daemon).
> remember if you are using virtualisation you might want to pin guests
> to specific sockets, along with numa pinning on the vmhost.. Also
> look at reserved memory allocation in the vmhost for the guest...
> This will make your guest perform better.
> 
> enable sysstat (sar) as it will aid your investigation if you havent
> already (sar -u ALL -P ALL 1). This will show which softirqs are
> eating most cpu and to which cpu they are bound, this also translates
> directly to the network queue that the traffic is coming in on.. Ie,
> network card queue 6, talks to cpu/6 talking to irq/6 and so on...
> Using flamegraph will help you understand what syscalls and chewing
> the CPU..
> 
> If your using virtualisation then the number of default queues that
> vxnet (vmware in this example) presents to the guest might be less
> than the number of network card queues the vmhost sees (so watch out
> for that). You can adjust the number of queues to the guest by params
> in the vmware network driver... investigate VDMQ / netqueue, to
> increase the number of available hardware queues from the vmhost to
> the guest. depending which quest driver your using vxnet3, or others
> some drivers dont support NAPI (see further down).
> ? VMDQ: array of int
> ??? Number of Virtual Machine Device Queues: 0/1 = disable, 2-16
> enable (default=8)
> ? RSS: array of int
> ??? Number of Receive-Side Scaling Descriptor Queues, default
> 1=number of cpus
> ? MQ: array of int
> ??? Disable or enable Multiple Queues, default 1
> ? Node: array of int
> ??? set the starting node to allocate memory on, default -1
> ? IntMode: array of int
> ??? Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default 2
> ? InterruptType: array of int
> ??? Change Interrupt Mode (0=Legacy, 1=MSI, 2=MSI-X), default IntMode
> (deprecated)
> 
> Make sure your virtual switch (vmware) if used has Pass-through
> (Direct-path I/O) enabled. NIC teaming policy should be validated
> depending on your requirement, example Policy "route based on IP
> hash" can be of benefit.
> 
> Check the network card is MSI-X, and the linux driver supports NAPI
> (most should these days, but you never know), also check your vmhost
> driver supports napi, if not get a NAPI supported kvm driver, or
> vmware driver (vib update).
> 
> Upgrade your kernel, to a later release 4.x.. even consider using a
> later distro of linux... I tried fedora 29. I also compiled latest
> osmocom from source, with compile options for "optimisation -O3 and
> other such".
> 
> "bmon -b" was a good tool understand throughput loads, along with
> loading through qdisc/fq_dodel mq's.... Understand qdisc via ip link
> or ifconfig (http://tldp.org/HOWTO/Traffic-Control-HOWTO/components.h
> tml), adjusting the queues has some traction, but if unsure leave as
> default.
> 
> TSO/UFo?GSO/LRO/GRO - understand your network card with respects to
> these, this can improve performance if you haven't already enabled
> (or adversely disabled options, since sometimes it doesn't actually
> help). You can get the your card options using ethool
> TCP Segmentation Offload (TSO)
> ??? Uses the TCP protocol to send large packets. Uses the NIC to
> handle segmentation, and then adds the TCP, IP and data link layer
> protocol headers to each segment.?
> UDP Fragmentation Offload (UFO)
> ??? Uses the UDP protocol to send large packets. Uses the NIC to
> handle IP fragmentation into MTU sized packets for large UDP
> datagrams.?
> Generic Segmentation Offload (GSO)
> ??? Uses the TCP or UDP protocol to send large packets. If the NIC
> cannot handle segmentation/fragmentation, GSO performs the same
> operations, bypassing the NIC hardware. This is achieved by delaying
> segmentation until as late as possible, for example, when the packet
> is processed by the device driver.?
> Large Receive Offload (LRO)
> ??? Uses the TCP protocol. All incoming packets are re-segmented as
> they are received, reducing the number of segments the system has to
> process. They can be merged either in the driver or using the NIC. A
> problem with LRO is that it tends to resegment all incoming packets,
> often ignoring differences in headers and other information which can
> cause errors. It is generally not possible to use LRO when IP
> forwarding is enabled. LRO in combination with IP forwarding can lead
> to checksum errors. Forwarding is enabled if
> /proc/sys/net/ipv4/ip_forward is set to 1.?
> Generic Receive Offload (GRO)
> ??? Uses either the TCP or UDP protocols. GRO is more rigorous than
> LRO when resegmenting packets. For example it checks the MAC headers
> of each packet, which must match, only a limited number of TCP or IP
> headers can be different, and the TCP timestamps must match.
> Resegmenting can be handled by either the NIC or the GSO code.
> 
> Traffic steering was on by default with the version of linux i was
> using, but worth checking if your using older versions.
> https://www.kernel.org/doc/Documentation/networking/scaling.txt
> (from the txt link) note: Some advanced NICs allow steering packets
> to queues based on programmable filters. For example, webserver bound
> TCP port 80 packets can be directed to their own receive queue. Such
> ???n-tuple??? filters can be configured from ethtool (--config-
> ntuple).
> 
> Interestingly investigate your network card, for its hashing
> algorithms, how it distributes the traffic over its ring buffers, you
> can on some cards adjust the RSS hash function. Alas the card i was
> using stuck to "toeplitz" for hits hashing, which others were
> disabled and unavailable / xor and crc32. The? indirection table can
> be adjusted based on the tuplets "ethtool -X" but didn't really
> assist too much on this.
> ethtool -x <dev>
> RX flow hash indirection table for ens192 with 8 RX ring(s):
> ??? 0:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7
> ? ??8:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7
> ?? 16:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7
> ?? 24:????? 0???? 1???? 2???? 3???? 4???? 5???? 6???? 7
> RSS hash key:
> Operation not supported
> RSS hash function:
> ??? toeplitz: on
> ??? xor: off
> ?? ?crc32: off
> 
> 
> Check the default size of the rx/tx ring buffers, they maybe
> suboptimal.
> ethtool -g ens192
> Ring parameters for ens192:
> Pre-set maximums:
> RX:???????????? 4096
> RX Mini:??????? 0
> RX Jumbo:?????? 4096
> TX:???????????? 4096
> Current hardware settings:
> RX:???????????? 1024
> RX Mini:??????? 0
> RX Jumbo:?????? 256
> TX:???????????? 512
> 
> If your using port channels, make sure you have the correct hashing
> policy enabled at the switch end...
> 
> I haven't investigated this option yet but some switches also do
> scaling, to assist (certainly with virtualisation)... Maybe one day i
> will get around to this...
> Additionally CISCO describe that you should have VM-FEX optimisation
> https://www.cisco.com/c/en/us/solutions/collateral/data-center-virtua
> lization/unified-
> computing/vm_fex_best_practices_deployment_guide.html?
> note:
> table 4. Scaling of Dynamic vNIC with VMDirectPath, Virtual Machines
> Running on Linux Guest with VMXNET3 Emulated Driver and Multi-Queue
> Enabled
> Table 5. Scaling of Dynamic vNIC with VMDirectPath, Virtual Machines
> Running on Linux Guest with VMXNET3 Emulated Driver and Multi-Queue
> Disabled
> 
> 
> Another thing to consider/investigate - openvswitch/bridging... If
> your using eth pairs to send your traffic down name spaces... you can
> have some varied results with performance by trying openvswitch/brctl
> 
> 
> 
> I really enjoyed the investigation path, again thanks to Firat for
> the pointer, otherwise it would have taken longer to get the
> answer...
> 
> Tony
> 
> On Fri, Jun 21, 2019 at 6:50 AM f?rat s?nmez <firatssonmez at gmail.com>
> wrote:
> > Hi,
> > 
> > It has been over 2 years that I have worked with gtp and I kind of
> > had the same?problem that time, we had a 10gbit cable and tried to
> > see how much udp flow we could get. I think we used iperf to test
> > it and when we list all the processes, the ksoftirq was using all
> > the resource. Then I found this page:?https://blog.cloudflare.com/h
> > ow-to-receive-a-million-packets/. I do not remember the exact
> > solution, but I guess when you configure your out ethernet
> > interface with the command below, it must work then. To my
> > understanding all the packets are processed in the same core in
> > your situation, because the port number is always the same. So, for
> > example, if you add another network with gtp-u tunnel on another
> > port (different than 3386) then again your packets will be
> > processed on the other core, too. But with the below command, the
> > interface will be configured in a way that it wont check the port
> > to process on which core it should be processed, but it will use
> > the hash from the packet to distribute over the cores.
> > ethtool -n (your_out_eth_interface) rx-flow-hash udp4?
> > 
> > Hope it will work you.
> > 
> > F?rat
> > 
> > Tony Clark <chiefy.padua at gmail.com>, 19 Haz 2019 ?ar, 15:07
> > tarihinde ?unu yazd?:
> > > Dear All,
> > > 
> > > I've been using the GTP-U kernel module to communicate with a P-
> > > GW.
> > > 
> > > Running Fedora 29, kernel 4.18.16-300.fc29.x86_64.
> > > 
> > > At high traffic levels through the GTP-U tunnel I see the
> > > performance degrade as 100% CPU is consumed by a single ksoftirqd
> > > process.
> > > 
> > > It is running on a multi-cpu machine and as far as I can tell the
> > > load is evenly spread across the cpus (ie either manually via
> > > smp_affinity, or even irqbalance, checking /proc/interrupts so
> > > forth.).
> > > 
> > > Has anyone else experienced this?
> > > 
> > > Is there any particular area you could recommend I investigate to
> > > find the root cause of this bottleneck, as i'm starting to
> > > scratch my head where to look next...
> > > 
> > > Thanks in advance
> > > Tony
> > > ?
> > > ---- FYI
> > > 
> > > modinfo gtp
> > > filename: ? ? ? /lib/modules/4.18.16-
> > > 300.fc29.x86_64/kernel/drivers/net/gtp.ko.xz
> > > alias: ? ? ? ? ?net-pf-16-proto-16-family-gtp
> > > alias: ? ? ? ? ?rtnl-link-gtp
> > > description: ? ?Interface driver for GTP encapsulated traffic
> > > author: ? ? ? ? Harald Welte <hwelte at sysmocom.de>
> > > license: ? ? ? ?GPL
> > > depends: ? ? ? ?udp_tunnel
> > > retpoline: ? ? ?Y
> > > intree: ? ? ? ? Y
> > > name: ? ? ? ? ? gtp
> > > vermagic: ? ? ? 4.18.16-300.fc29.x86_64 SMP mod_unload?
> > > 
> > > modinfo udp_tunnel
> > > filename: ? ? ? /lib/modules/4.18.16-
> > > 300.fc29.x86_64/kernel/net/ipv4/udp_tunnel.ko.xz
> > > license: ? ? ? ?GPL
> > > depends: ? ? ? ?
> > > retpoline: ? ? ?Y
> > > intree: ? ? ? ? Y
> > > name: ? ? ? ? ? udp_tunnel
> > > vermagic: ? ? ? 4.18.16-300.fc29.x86_64 SMP mod_unload?
> > >