Hi,
24.01.2024 14:52, Eric Dumazet wrote:
On Wed, Jan 24, 2024 at 12:20 PM kovalev@altlinux.org wrote:
24.01.2024 13:57, Eric Dumazet wrote:
Oh wait, this is a 5.10 kernel ?
Yes, but the bug is reproduced on the latest stable kernels.
Please generate a stack trace using a recent tree, it is possible the bug has been fixed already.
See [PATCH 0/1] above, there's a stack for the 6.6.13 kernel at the bottom of the message.
Ah, ok. Not sure why you sent a cover letter for a single patch...
Setting a boolean, in a module that can disappear will not prevent the module from disappearing.
This work around might work, or might not work, depending on timing, preemptions, ....
Thanks.
I tested running the reproducer [1] on the 6.8-rc3 kernel, the crash occurs in less than 10 seconds and the qemu VM restarts:
dmesg -w:
[ 106.941736] gtp: GTP module unloaded [ 106.962548] gtp: GTP module loaded (pdp ctx size 104 bytes) [ 107.014691] gtp: GTP module unloaded [ 107.041554] gtp: GTP module loaded (pdp ctx size 104 bytes) [ 107.082283] gtp: GTP module unloaded [ 107.123268] general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 107.124050] KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] [ 107.124339] CPU: 1 PID: 5826 Comm: gtp Not tainted 6.8.0-rc3-std-def-alt1 #1 [ 107.124604] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014 [ 107.124916] RIP: 0010:gtp_genl_dump_pdp+0x1be/0x800 [gtp] [ 107.125141] Code: c6 89 c6 e8 64 e9 86 df 58 45 85 f6 0f 85 4e 04 00 00 e8 c5 ee 86 df 48 8b 54 24 18 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 de 05 00 00 48 8b 44 24 18 4c 8b 30 4c 39 f0 74 [ 107.125960] RSP: 0018:ffff888014107220 EFLAGS: 00010202 [ 107.126164] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 107.126434] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000 [ 107.126707] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 107.126976] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 107.127245] R13: ffff88800fcda588 R14: 0000000000000001 R15: 0000000000000000 [ 107.127515] FS: 00007f1be4eb05c0(0000) GS:ffff88806ce80000(0000) knlGS:0000000000000000 [ 107.127955] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 107.128177] CR2: 00007f1be4e766cf CR3: 000000000c33e000 CR4: 0000000000750ef0 [ 107.128450] PKRU: 55555554 [ 107.128577] Call Trace: [ 107.128699] <TASK> [ 107.128790] ? show_regs+0x90/0xa0 [ 107.128935] ? die_addr+0x50/0xd0 [ 107.129075] ? exc_general_protection+0x148/0x220 [ 107.129267] ? asm_exc_general_protection+0x22/0x30 [ 107.129469] ? gtp_genl_dump_pdp+0x1be/0x800 [gtp] [ 107.129677] ? __alloc_skb+0x1dd/0x350 [ 107.129831] ? __pfx___alloc_skb+0x10/0x10 [ 107.129999] genl_dumpit+0x11d/0x230 [ 107.130150] netlink_dump+0x5b9/0xce0 [ 107.130301] ? lockdep_hardirqs_on_prepare+0x253/0x430 [ 107.130503] ? __pfx_netlink_dump+0x10/0x10 [ 107.130686] ? kasan_save_track+0x10/0x40 [ 107.130849] ? __kasan_kmalloc+0x9b/0xa0 [ 107.131009] ? genl_start+0x675/0x970 [ 107.131162] __netlink_dump_start+0x6fc/0x9f0 [ 107.131341] genl_family_rcv_msg_dumpit+0x1bb/0x2d0 [ 107.131538] ? __pfx_genl_family_rcv_msg_dumpit+0x10/0x10 [ 107.131754] ? genl_op_from_small+0x2a/0x440 [ 107.131972] ? cap_capable+0x1d0/0x240 [ 107.132127] ? __pfx_genl_start+0x10/0x10 [ 107.132292] ? __pfx_genl_dumpit+0x10/0x10 [ 107.132461] ? __pfx_genl_done+0x10/0x10 [ 107.132645] ? security_capable+0x9d/0xe0
With the proposed patch applied, such a crash is not observed during long-term testing.
[1] https://lore.kernel.org/lkml/20240124101404.161655-1-kovalev@altlinux.org/T/...