gtp_genl_send_echo_req() runs as a generic netlink doit handler in process context with BH not disabled. It calls udp_tunnel_xmit_skb(), which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the per-CPU counter pairing. The result is stale or negative recursion levels that can later produce false-positive SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected: the data path runs under ndo_start_xmit and the echo response handlers run from the UDP encap rx softirq, both with BH already disabled.
Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring commit 2cd7e6971fc2 ("sctp: disable BH before calling udp_tunnel_xmit_skb()").
Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions") Cc: stable@vger.kernel.org Signed-off-by: David Carlier devnexen@gmail.com --- drivers/net/gtp.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 70b9e58b9b78..5150f2e4f66b 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -2400,6 +2400,7 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info) return -ENODEV; }
+ local_bh_disable(); udp_tunnel_xmit_skb(rt, sk, skb_to_send, fl4.saddr, fl4.daddr, inet_dscp_to_dsfield(fl4.flowi4_dscp), @@ -2409,6 +2410,7 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info) !net_eq(sock_net(sk), dev_net(gtp->dev)), false, 0); + local_bh_enable(); return 0; }
On 4/17/26 07:54, David Carlier wrote:
gtp_genl_send_echo_req() runs as a generic netlink doit handler in process context with BH not disabled. It calls udp_tunnel_xmit_skb(), which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the per-CPU counter pairing. The result is stale or negative recursion levels that can later produce false-positive SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected: the data path runs under ndo_start_xmit and the echo response handlers run from the UDP encap rx softirq, both with BH already disabled.
Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring commit 2cd7e6971fc2 ("sctp: disable BH before calling udp_tunnel_xmit_skb()").
Why not fix iptunnel_xmit() directly, rather than fixing all possible callers? Basically, jut like we did for lwtunnel_{output|xmit}(). The advantage would be that we no longer have to worry about BHs in the callers, and BHs would only be disabled when necessary.
Hi Julian,
On Mon, 20 Apr 2026 at 20:02, Justin Iurman justin.iurman@gmail.com wrote:
On 4/17/26 07:54, David Carlier wrote:
gtp_genl_send_echo_req() runs as a generic netlink doit handler in process context with BH not disabled. It calls udp_tunnel_xmit_skb(), which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the per-CPU counter pairing. The result is stale or negative recursion levels that can later produce false-positive SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected: the data path runs under ndo_start_xmit and the echo response handlers run from the UDP encap rx softirq, both with BH already disabled.
Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring commit 2cd7e6971fc2 ("sctp: disable BH before calling udp_tunnel_xmit_skb()").
Why not fix iptunnel_xmit() directly, rather than fixing all possible callers? Basically, jut like we did for lwtunnel_{output|xmit}(). The advantage would be that we no longer have to worry about BHs in the callers, and BHs would only be disabled when necessary.
Good point — your lwtunnel fix (c03a49f3093a) is a close parallel, and a central fix would avoid chasing callers one by one (sctp was patched last week, gtp is this one, and tipc/wireguard/ovpn genl paths look similar).
Happy to respin as v2 with local_bh_disable/enable moved into iptunnel_xmit() (and ip6tunnel_xmit() for symmetry), and drop the gtp-local hunk. That would also supersede Xin Long's recent sctp commit (2cd7e6971fc2), so I'll make sure to Cc him.
One thing I'd like your take on before I send: iptunnel_xmit() feels like the natural home since it owns the recursion counter, but would you rather see it in udp_tunnel_xmit_skb()? I don't want to pick the wrong spot if you already have a preference.
Cheers !
On 4/20/26 21:44, David CARLIER wrote:
Hi Julian,
On Mon, 20 Apr 2026 at 20:02, Justin Iurman justin.iurman@gmail.com wrote:
On 4/17/26 07:54, David Carlier wrote:
gtp_genl_send_echo_req() runs as a generic netlink doit handler in process context with BH not disabled. It calls udp_tunnel_xmit_skb(), which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the per-CPU counter pairing. The result is stale or negative recursion levels that can later produce false-positive SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected: the data path runs under ndo_start_xmit and the echo response handlers run from the UDP encap rx softirq, both with BH already disabled.
Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring commit 2cd7e6971fc2 ("sctp: disable BH before calling udp_tunnel_xmit_skb()").
Why not fix iptunnel_xmit() directly, rather than fixing all possible callers? Basically, jut like we did for lwtunnel_{output|xmit}(). The advantage would be that we no longer have to worry about BHs in the callers, and BHs would only be disabled when necessary.
Good point — your lwtunnel fix (c03a49f3093a) is a close parallel, and a central fix would avoid chasing callers one by one (sctp was patched last week, gtp is this one, and tipc/wireguard/ovpn genl paths look similar).
Happy to respin as v2 with local_bh_disable/enable moved into iptunnel_xmit() (and ip6tunnel_xmit() for symmetry), and drop the gtp-local hunk. That would also supersede Xin Long's recent sctp commit (2cd7e6971fc2), so I'll make sure to Cc him.
Jakub merged it already, so no need to respin. I guess we could revisit later if required.
One thing I'd like your take on before I send: iptunnel_xmit() feels like the natural home since it owns the recursion counter, but would you rather see it in udp_tunnel_xmit_skb()? I don't want to pick the wrong spot if you already have a preference.
Since udp_tunnel_xmit_skb() is just another caller, I'd definitely do it in iptunnel_xmit() to centralize things (same for v6).
On Mon, 20 Apr 2026 21:02:55 +0200 Justin Iurman wrote:
On 4/17/26 07:54, David Carlier wrote:
gtp_genl_send_echo_req() runs as a generic netlink doit handler in process context with BH not disabled. It calls udp_tunnel_xmit_skb(), which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the per-CPU counter pairing. The result is stale or negative recursion levels that can later produce false-positive SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected: the data path runs under ndo_start_xmit and the echo response handlers run from the UDP encap rx softirq, both with BH already disabled.
Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring commit 2cd7e6971fc2 ("sctp: disable BH before calling udp_tunnel_xmit_skb()").
Why not fix iptunnel_xmit() directly, rather than fixing all possible callers? Basically, jut like we did for lwtunnel_{output|xmit}(). The advantage would be that we no longer have to worry about BHs in the callers, and BHs would only be disabled when necessary.
Oops, I pushed this already. The bot hasn't caught up yet. Let's revisit this if we find another caller in process context?
On 4/20/26 21:58, Jakub Kicinski wrote:
On Mon, 20 Apr 2026 21:02:55 +0200 Justin Iurman wrote:
On 4/17/26 07:54, David Carlier wrote:
gtp_genl_send_echo_req() runs as a generic netlink doit handler in process context with BH not disabled. It calls udp_tunnel_xmit_skb(), which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the per-CPU counter pairing. The result is stale or negative recursion levels that can later produce false-positive SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected: the data path runs under ndo_start_xmit and the echo response handlers run from the UDP encap rx softirq, both with BH already disabled.
Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring commit 2cd7e6971fc2 ("sctp: disable BH before calling udp_tunnel_xmit_skb()").
Why not fix iptunnel_xmit() directly, rather than fixing all possible callers? Basically, jut like we did for lwtunnel_{output|xmit}(). The advantage would be that we no longer have to worry about BHs in the callers, and BHs would only be disabled when necessary.
Oops, I pushed this already. The bot hasn't caught up yet. Let's revisit this if we find another caller in process context?
No worries, works for me!
Hello:
This patch was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Fri, 17 Apr 2026 06:54:08 +0100 you wrote:
gtp_genl_send_echo_req() runs as a generic netlink doit handler in process context with BH not disabled. It calls udp_tunnel_xmit_skb(), which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec on softnet_data.xmit.recursion to track the tunnel xmit recursion level.
Without local_bh_disable(), the task may migrate between dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the per-CPU counter pairing. The result is stale or negative recursion levels that can later produce false-positive SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.
[...]
Here is the summary with links: - gtp: disable BH before calling udp_tunnel_xmit_skb() https://git.kernel.org/netdev/net/c/5638504a2aa9
You are awesome, thank you!
osmocom-net-gprs@lists.osmocom.org