Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Jul 6, 2024

Bumps certifi from 2023.7.22 to 2024.7.4.

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

chenhuacai and others added 30 commits July 5, 2024 17:48
Commit 343f4c4 ("kthread: Don't allocate kthread_struct for init
and umh") introduces a new function user_mode_thread() for init and umh.

init and umh are different from typical kernel threads since the don't
need a "kthread" struct and they will finally become user processes by
calling kernel_execve(), but on the other hand, they are also different
from typical user mode threads (they have no "mm" structs at creation
time, which is traditionally used to distinguish a user thread and a
kernel thread).

So I think it is reasonable to treat init and umh as "special kernel
threads". Then let's unify the kernel_thread() and user_mode_thread()
to kernel_thread() again, and add a new 'user' parameter for init and
umh.

This also makes code simpler.

Signed-off-by: Huacai Chen <[email protected]>
…rqs()

Adjust the return value semanteme of msi_domain_prepare_irqs(), which
allows us to modify the input nvec by overriding the msi_domain_ops::
msi_prepare(). This is necessary for the later patch.

Before:
0 on success, others on error.

After:
= 0: Success;
> 0: The modified nvec;
< 0: Error code.

Callers are also updated.

Signed-off-by: Huacai Chen <[email protected]>
Loongson machines can have as many as 256 logical cpus, but the maximum
of msi vectors in one irqchip is also 256 (practically that is less than
256, because pch-pic consumes some of them). Even on a 64-core machine,
256 irqs can be easily exhausted if there are several NICs (NICs usually
allocate msi irqs depending on the number of online cpus). So we want to
limit the msi allocation.

In this patch we add a machanism to limit msi allocation:
1, Modify input "nvec" by overriding the msi_domain_ops::msi_prepare();
2, The default limit is 256, which is compatible with the old behavior;
3, Add a cmdline parameter "loongson_msi_limit=xxx" to control the limit.

Signed-off-by: Juxin Gao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
On multiple bridge platform, a MSIX vector is often affinitive with only
one cpu, and the count of MSIX is determined as the count of cpus in the
system. Unfortunately, the cpu group related to a brigde is only allowed
to handle interrupts from devices behind the bridge, which breaks the
normal affinity setting for multiple MSIX vectors, and causing following
affinity setting:

IRQ: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
CPU: 0, 1, 2, 3, 4, 0,  0,  0,  0,  0

To balance the affinity, we improve the setting to assign cpu for IRQ as
following:

IRQ: 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
CPU: 0, 1, 2, 3, 4, 0,  1,  2,  3,  4

Signed-off-by: Jianmin Lv <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
When the best selected CPU is offline, work_on_cpu() will stuck forever.
This can be happen if a node is online while all its CPUs are offline
(we can use "maxcpus=1" without "nr_cpus=1" to reproduce it), Therefore,
in this case, we should call local_pci_probe() instead of work_on_cpu().

Cc: <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Hongchen Zhang <[email protected]>
LS7A chipset can be used as a downstream bridge which connected to a
high-level host bridge. In this case DEV_LS7A_PCIE_PORT5 is used as the
upward port. We should always enable MSI caps of this port, otherwise
downstream devices cannot use MSI.

Cc: <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Don't limit MRRS during resume, so that saved value can be restored,
otherwise the MRRS will become the minimum value after resume.

Cc: <[email protected]>
Fixes: 8b3517f ("PCI: loongson: Prevent LS7A MRRS increases")
Signed-off-by: Jianmin Lv <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Chromium sandbox apparently wants to deny statx [1] so it could properly
inspect arguments after the sandboxed process later falls back to fstat.
Because there's currently not a "fd-only" version of statx, so that the
sandbox has no way to ensure the path argument is empty without being
able to peek into the sandboxed process's memory. For architectures able
to do newfstatat though, glibc falls back to newfstatat after getting
-ENOSYS for statx, then the respective SIGSYS handler [2] takes care of
inspecting the path argument, transforming allowed newfstatat's into
fstat instead which is allowed and has the same type of return value.

But, as LoongArch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at all
when it gets -ENOSYS for statx -- and you see the problem there!

Actually, back when the LoongArch port was under review, people were
aware of the same problem with sandboxing clone3 [3], so clone was
eventually kept. Unfortunately it seemed at that time no one had noticed
statx, so besides restoring fstat/newfstatat to LoongArch uapi (and
postponing the problem further), it seems inevitable that we would need
to tackle seccomp deep argument inspection.

However, this is obviously a decision that shouldn't be taken lightly,
so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT
in unistd.h. This is the simplest solution for now, and so we hope the
community will tackle the long-standing problem of seccomp deep argument
inspection in the future [4][5].

More infomation please reading this thread [6].

[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux/seccomp-bpf-helpers/sigsys_handlers.cc#355
[3] https://lore.kernel.org/linux-arch/[email protected]/
[4] https://lwn.net/Articles/799557/
[5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-inspection.pdf
[6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@brauner/T/#t

Cc: [email protected]
Signed-off-by: Huacai Chen <[email protected]>
Some drivers want to use cpu_logical_map(), early_cpu_to_node() and some
other CPU mapping APIs, even if we use "nr_cpus=1" to hard limit the CPU
number. This is strongly required for the multi-bridges machines.

Currently, we stop parsing the MADT if the nr_cpus limit is reached, but
to achieve the above goal we should always enumerate the MADT table and
setup logical-physical CPU mapping whether there is a nr_cpus limit.

Rework the MADT enumeration:

1. Define a flag "cpu_enumerated" to distinguish the first enumeration
   (cpu_enumerated=0) and the physical hotplug case (cpu_enumerated=1)
   for set_processor_mask().

2. If cpu_enumerated=0, stop parsing only when NR_CPUS limit is reached,
   so we can setup logical-physical CPU mapping; if cpu_enumerated=1,
   stop parsing when nr_cpu_ids limit is reached, so we can avoid some
   runtime bugs. Once logical-physical CPU mapping is setup, we will let
   cpu_enumerated=1.

3. Use find_first_zero_bit() instead of cpumask_next_zero() to find the
   next zero bit (free logical CPU id) in the cpu_present_mask, because
   cpumask_next_zero() will stop at nr_cpu_ids.

4. Only touch cpu_possible_mask if cpu_enumerated=0, this is in order to
   avoid some potential crashes, because cpu_possible_mask is marked as
   __ro_after_init.

5. In prefill_possible_map(), clear cpu_present_mask bits greater than
   nr_cpu_ids, in order to avoid a CPU be "present" but not "possible".

Signed-off-by: Huacai Chen <[email protected]>
Add irq_work support for LoongArch via self IPIs. This make it possible
to run works in hardware interrupt context, which is a prerequisite for
NOHZ_FULL.

Implement:
 - arch_irq_work_raise()
 - arch_irq_work_has_interrupt()

Reviewed-by: Guo Ren <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
In order for things like get_user_pages() to work on ZONE_DEVICE memory,
we need a software PTE bit to identify device-backed PFNs.  Hook this up
along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.

Signed-off-by: Huacai Chen <[email protected]>
Add ARCH_HAS_DEBUG_VM_PGTABLE selection in Kconfig, in order to make
corresponding vm debug features usable on LoongArch. Also update the
corresponding arch-support.txt document.

Signed-off-by: Huacai Chen <[email protected]>
Currently, only TLB-based ioremap() support writecombine, so add the
counterpart for DMW-based ioremap() with help of DMW2. The base address
(WRITECOMBINE_BASE) is configured as 0xa000000000000000.

DMW3 is unused by kernel now, however firmware may leave garbage in them
and interfere kernel's address mapping. So clear it as necessary.

BTW, centralize the DMW configuration to macro SETUP_DMWINS.

Signed-off-by: Jiaxun Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Most LoongArch 64 machines are using custom "SADR" ACPI extension to
perform ACPI S3 sleep. However the standard ACPI way to perform sleep
is to write a value to ACPI PM1/SLEEP_CTL register, and this is never
supported properly in kernel.

Add standard S3 sleep by providing a default DoSuspend function which
calls ACPI's acpi_enter_sleep_state() routine when SADR is not provided
by the firmware.

Also fix suspend assembly code so that ra is set properly before go
into sleep routine. (Previously linked address of jirl was set to a0,
some firmware do require return address in a0 but it's already set with
la.pcrel before).

Signed-off-by: Jiaxun Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Hibernation assumes the memory layout after resume be the same as that
before sleep, so it expects the kernel is loaded at the same position.
To achieve this goal we automatically disable KASLR if user explicitly
requests hibernation via the "resume=" command line. Since "nohibernate"
and "noresume" have higher priorities than "resume=", we only disable
KASLR if there is no "nohibernate" and "noresume".

Signed-off-by: Huacai Chen <[email protected]>
fw_arg1 is in memory space rather than I/O space, so we should use
early_memremap_ro() instead of early_ioremap() to map the cmdline.
Moreover, we should unmap it after using.

Suggested-by: Jiaxun Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
-Zdirect-access-external-data is a new Rust compiler option added in
Rust 1.78, which we use to optimize the access of external data in the
Linux kernel's Rust code. This patch modifies the Rust code in vmlinux
to directly access externa data, using PC-REL instead of GOT. However,
Rust code whithin modules is constrained by the PC-REL addressing range
and is explicitly set to use an indirect method.

Signed-off-by: WANG Rui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
LoongArch defines UPROBE_SWBP_INSN as a function call and this breaks
arch_uprobe_trampoline() which uses it to initialize a static variable.

Add the new "__builtin_constant_p" helper, __emit_break(), and redefine
the current users of larch_insn_gen_break() to use it.

Fixes: ff474a7 ("uprobe: Add uretprobe syscall to speed up return probe")
Reported-by: Nathan Chancellor <[email protected]>
Closes: https://lore.kernel.org/all/20240614174822.GA1185149@thelio-3990X/
Suggested-by: Andrii Nakryiko <[email protected]>
Tested-by: Tiezhu Yang <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Binbin Zhou <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
This add CPU HWMon (temperature sensor) platform driver for Loongson-3.

Tested-by: Xi Ruoyao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected,
cpu_max_bits_warn() generates a runtime warning similar as below while
we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit)
instead of NR_CPUS to iterate CPUs.

[    3.052463] ------------[ cut here ]------------
[    3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0
[    3.070072] Modules linked in: efivarfs autofs4
[    3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052
[    3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000
[    3.109127]         9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430
[    3.118774]         90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff
[    3.128412]         0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890
[    3.138056]         0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa
[    3.147711]         ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000
[    3.157364]         900000000101c998 0000000000000004 9000000000ef7430 0000000000000000
[    3.167012]         0000000000000009 000000000000006c 0000000000000000 0000000000000000
[    3.176641]         9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286
[    3.186260]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c
[    3.195868]         ...
[    3.199917] Call Trace:
[    3.203941] [<90000000002086d8>] show_stack+0x38/0x14c
[    3.210666] [<9000000000cf846c>] dump_stack_lvl+0x60/0x88
[    3.217625] [<900000000023d268>] __warn+0xd0/0x100
[    3.223958] [<9000000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc
[    3.231150] [<9000000000210220>] show_cpuinfo+0x5e8/0x5f0
[    3.238080] [<90000000004f578c>] seq_read_iter+0x354/0x4b4
[    3.245098] [<90000000004c2e90>] new_sync_read+0x17c/0x1c4
[    3.252114] [<90000000004c5174>] vfs_read+0x138/0x1d0
[    3.258694] [<90000000004c55f8>] ksys_read+0x70/0x100
[    3.265265] [<9000000000cfde9c>] do_syscall+0x7c/0x94
[    3.271820] [<9000000000202fe4>] handle_syscall+0xc4/0x160
[    3.281824] ---[ end trace 8b484262b4b8c24c ]---

Cc: [email protected]
Signed-off-by: Huacai Chen <[email protected]>
If an NTFS file system is mounted to another system with different
PAGE_SIZE from the original system, log->page_size will change in
log_replay(), but log->page_{mask,bits} don't change correspondingly.
This will cause a panic because "u32 bytes = log->page_size - page_off"
will get a negative value in the later read_log_page().

Cc: [email protected]
Fixes: b46acd6 ("fs/ntfs3: Add NTFS journal")
Signed-off-by: Huacai Chen <[email protected]>
The label end_reply is obviously a typo. It should be "replay" in this
context. So rename end_reply to end_replay.

Cc: [email protected]
Fixes: b46acd6 ("fs/ntfs3: Add NTFS journal")
Signed-off-by: Huacai Chen <[email protected]>
Debug machanism include: gdb, ftrace, kprobe, uprobe and jump_label

Signed-off-by: Huacai Chen <[email protected]>
Add driver support for the IOMMU in LS7A. If you need to enable it,
please add the loongson_iommu=on kernel parameter.

Signed-off-by: Huacai Chen <[email protected]>
Consider a configuration like this:
1, efifb (or simpledrm) is built-in;
2, a native display driver (such as radeon) is also built-in.

As Javier said, this is not a common configuration (the native display
driver is usually built as a module), but it can happen and cause some
trouble.

In this case, since efifb, radeon and sysfb are all in device_initcall()
level, the order in practise is like this:

efifb registered at first, but no "efi-framebuffer" device yet. radeon
registered later, and /dev/fb0 created. sysfb_init() comes at last, it
registers "efi-framebuffer" and then causes an error message "efifb: a
framebuffer is already registered". Make sysfb_init() to be subsys_
initcall_sync() can avoid this. And Javier Martinez Canillas is trying
to make a more general solution in commit 873eb3b ("fbdev: Disable
sysfb device registration when removing conflicting FBs").

However, this patch still makes sense because it can make the screen
display as early as possible (We cannot move to subsys_initcall, since
sysfb_init() should be executed after PCI enumeration).

This is a better version of commit 60aebc9 ("drivers/firmware:
Move sysfb_init() from device_initcall to subsys_initcall_sync") since
the previous commit leads to blank displays on some systems. The reason
is that vgaarb initialization is also a subsys_initcall_sync function so
sysfb_disable() is sometimes missed. So here we move sysfb_init() to an
fs_initcall function which is ensured after vgaarb initialization.

Signed-off-by: Huacai Chen <[email protected]>
After commit 60aebc9 ("drivers/firmware: Move sysfb_init() from
device_initcall to subsys_initcall_sync") some Lenovo laptops get a blank
screen until the display manager starts.

This regression occurs with such a Kconfig combination:
CONFIG_SYSFB=y
CONFIG_SYSFB_SIMPLEFB=y
CONFIG_DRM_SIMPLEDRM=y
CONFIG_DRM_I915=y      # Or other native drivers such as radeon, amdgpu

If replace CONFIG_DRM_SIMPLEDRM with CONFIG_FB_SIMPLE (they use the same
device), there is no blank screen. The root cause is the initialization
order, and this order depends on the Makefile.

FB_SIMPLE is before native DRM drivers (e.g. i915, radeon, amdgpu, and
so on), but DRM_SIMPLEDRM is after them. Thus, if we use FB_SIMPLE, I915
will takeover FB_SIMPLE, then no problem; and if we use DRM_SIMPLEDRM,
DRM_SIMPLEDRM will try to takeover I915, but fails to work.

So we can move the "tiny" directory before native DRM drivers to solve
this problem.

Fixes: 60aebc9 ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync")
Closes: https://lore.kernel.org/dri-devel/[email protected]/T/#t
Reported-by: Jaak Ristioja <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
Radeon driver can not handle the interrupt is faster than DMA data, so
irq handler must update an old ih.rptr value in IH_RB_RPTR register to
enable interrupt again when interrupt is faster than DMA data.

Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Zhijie Zhang <[email protected]>
@chenhuacai chenhuacai force-pushed the master branch 2 times, most recently from 84c620d to af6528b Compare September 4, 2025 08:19
chenhuacai pushed a commit that referenced this pull request Sep 8, 2025
BUG: kernel NULL pointer dereference, address: 00000000000002ec
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 28 UID: 0 PID: 343 Comm: kworker/28:1 Kdump: loaded Tainted: G        OE       6.17.0-rc2+ #9 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_ib_is_sg_need_sync+0x9e/0xd0 [smc]
...
Call Trace:
 <TASK>
 smcr_buf_map_link+0x211/0x2a0 [smc]
 __smc_buf_create+0x522/0x970 [smc]
 smc_buf_create+0x3a/0x110 [smc]
 smc_find_rdma_v2_device_serv+0x18f/0x240 [smc]
 ? smc_vlan_by_tcpsk+0x7e/0xe0 [smc]
 smc_listen_find_device+0x1dd/0x2b0 [smc]
 smc_listen_work+0x30f/0x580 [smc]
 process_one_work+0x18c/0x340
 worker_thread+0x242/0x360
 kthread+0xe7/0x220
 ret_from_fork+0x13a/0x160
 ret_from_fork_asm+0x1a/0x30
 </TASK>

If the software RoCE device is used, ibdev->dma_device is a null pointer.
As a result, the problem occurs. Null pointer detection is added to
prevent problems.

Fixes: 0ef69e7 ("net/smc: optimize for smc_sndbuf_sync_sg_for_device and smc_rmb_sync_sg_for_cpu")
Signed-off-by: Liu Jian <[email protected]>
Reviewed-by: Guangguan Wang <[email protected]>
Reviewed-by: Zhu Yanjun <[email protected]>
Reviewed-by: D. Wythe <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
chenhuacai pushed a commit that referenced this pull request Sep 18, 2025
Steven Rostedt reported a crash with "ftrace=function" kernel command
line:

[    0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
[    0.160254] #PF: supervisor read access in kernel mode
[    0.160975] #PF: error_code(0x0000) - not-present page
[    0.161697] PGD 0 P4D 0
[    0.162055] Oops: Oops: 0000 [#1] SMP PTI
[    0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef)
[    0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237)
[ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0
[    0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086
[    0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000
[    0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000
[    0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90
[    0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000
[    0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001
[    0.174542] FS:  0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000
[    0.175684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0
[    0.177483] Call Trace:
[    0.177828]  <TASK>
[    0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2))
[    0.178692] mas_store_gfp (lib/maple_tree.c:5468)
[    0.179223] execmem_cache_add_locked (mm/execmem.c:207)
[    0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475)
[    0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169)
[    0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158)
[    0.181517] execmem_alloc_rw (mm/execmem.c:487)
[    0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474)
[    0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182)
[    0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947)
[    0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368)
[    0.184682] ftrace_startup (kernel/trace/ftrace.c:3048)
[    0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717)
[    0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745)
[    0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.187924] function_trace_init (kernel/trace/trace_functions.c:170)
[    0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349)
[    0.189088] register_tracer (kernel/trace/trace.c:2391)
[    0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149)
[    0.190204] start_kernel (init/main.c:970)
[    0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307)
[    0.191381] x86_64_start_kernel (??:?)
[    0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419)
[    0.192534]  </TASK>
[    0.192839] Modules linked in:
[    0.193267] CR2: 000000000000001c
[    0.193730] ---[ end trace 0000000000000000 ]---

The crash happens because on x86 ftrace allocations from execmem require
maple tree to be initialized.

Move maple tree initialization that depends only on slab availability
earlier in boot so that it will happen right after mm_core_init().

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5d79c2b ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations")
Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
Reported-by: Steven Rostedt (Google) <[email protected]>
Tested-by: Steven Rostedt (Google) <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
@chenhuacai chenhuacai force-pushed the master branch 3 times, most recently from d890056 to 3eb6775 Compare September 25, 2025 01:48
@chenhuacai chenhuacai force-pushed the master branch 3 times, most recently from 4fe06dd to f6ea04b Compare October 7, 2025 08:33
chenhuacai pushed a commit that referenced this pull request Oct 7, 2025
Before disabling SR-IOV via config space accesses to the parent PF,
sriov_disable() first removes the PCI devices representing the VFs.

Since commit 9d16947 ("PCI: Add global pci_lock_rescan_remove()")
such removal operations are serialized against concurrent remove and
rescan using the pci_rescan_remove_lock. No such locking was ever added
in sriov_disable() however. In particular when commit 18f9e9d
("PCI/IOV: Factor out sriov_add_vfs()") factored out the PCI device
removal into sriov_del_vfs() there was still no locking around the
pci_iov_remove_virtfn() calls.

On s390 the lack of serialization in sriov_disable() may cause double
remove and list corruption with the below (amended) trace being observed:

  PSW:  0704c00180000000 0000000c914e4b38 (klist_put+56)
  GPRS: 000003800313fb48 0000000000000000 0000000100000001 0000000000000001
	00000000f9b520a8 0000000000000000 0000000000002fbd 00000000f4cc9480
	0000000000000001 0000000000000000 0000000000000000 0000000180692828
	00000000818e8000 000003800313fe2c 000003800313fb20 000003800313fad8
  #0 [3800313fb20] device_del at c9158ad5c
  #1 [3800313fb88] pci_remove_bus_device at c915105ba
  #2 [3800313fbd0] pci_iov_remove_virtfn at c9152f198
  #3 [3800313fc28] zpci_iov_remove_virtfn at c90fb67c0
  #4 [3800313fc60] zpci_bus_remove_device at c90fb6104
  #5 [3800313fca0] __zpci_event_availability at c90fb3dca
  #6 [3800313fd08] chsc_process_sei_nt0 at c918fe4a2
  #7 [3800313fd60] crw_collect_info at c91905822
  #8 [3800313fe10] kthread at c90feb390
  #9 [3800313fe68] __ret_from_fork at c90f6aa64
  #10 [3800313fe98] ret_from_fork at c9194f3f2.

This is because in addition to sriov_disable() removing the VFs, the
platform also generates hot-unplug events for the VFs. This being the
reverse operation to the hotplug events generated by sriov_enable() and
handled via pdev->no_vf_scan. And while the event processing takes
pci_rescan_remove_lock and checks whether the struct pci_dev still exists,
the lack of synchronization makes this checking racy.

Other races may also be possible of course though given that this lack of
locking persisted so long observable races seem very rare. Even on s390 the
list corruption was only observed with certain devices since the platform
events are only triggered by config accesses after the removal, so as long
as the removal finished synchronously they would not race. Either way the
locking is missing so fix this by adding it to the sriov_del_vfs() helper.

Just like PCI rescan-remove, locking is also missing in sriov_add_vfs()
including for the error case where pci_stop_and_remove_bus_device() is
called without the PCI rescan-remove lock being held. Even in the non-error
case, adding new PCI devices and buses should be serialized via the PCI
rescan-remove lock. Add the necessary locking.

Fixes: 18f9e9d ("PCI/IOV: Factor out sriov_add_vfs()")
Signed-off-by: Niklas Schnelle <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Benjamin Block <[email protected]>
Reviewed-by: Farhan Ali <[email protected]>
Reviewed-by: Julian Ruess <[email protected]>
Cc: [email protected]
Link: https://patch.msgid.link/[email protected]
chenhuacai pushed a commit that referenced this pull request Oct 8, 2025
Revert commit 1afa706 ("serial: qcom-geni: Enable PM runtime for
serial driver") and its dependent commit 86fa39d ("serial:
qcom-geni: Enable Serial on SA8255p Qualcomm platforms") because the
first one causes regression - hang task on Qualcomm RB1 board (QRB2210)
and unable to use serial at all during normal boot:

  INFO: task kworker/u16:0:12 blocked for more than 42 seconds.
        Not tainted 6.17.0-rc1-00004-g53e760d89498 #9
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:kworker/u16:0   state:D stack:0     pid:12    tgid:12    ppid:2      task_flags:0x4208060 flags:0x00000010
  Workqueue: async async_run_entry_fn
  Call trace:
   __switch_to+0xe8/0x1a0 (T)
   __schedule+0x290/0x7c0
   schedule+0x34/0x118
   rpm_resume+0x14c/0x66c
   rpm_resume+0x2a4/0x66c
   rpm_resume+0x2a4/0x66c
   rpm_resume+0x2a4/0x66c
   __pm_runtime_resume+0x50/0x9c
   __driver_probe_device+0x58/0x120
   driver_probe_device+0x3c/0x154
   __driver_attach_async_helper+0x4c/0xc0
   async_run_entry_fn+0x34/0xe0
   process_one_work+0x148/0x290
   worker_thread+0x2c4/0x3e0
   kthread+0x118/0x1c0
   ret_from_fork+0x10/0x20

The issue was reported on 12th of August and was ignored by author of
commits introducing issue for two weeks.  Only after complaining author
produced a fix which did not work, so if original commits cannot be
reliably fixed for 5 weeks, they obviously are buggy and need to be
dropped.

Fixes: 1afa706 ("serial: qcom-geni: Enable PM runtime for serial driver")
Reported-by: Alexey Klimov <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Tested-by: Alexey Klimov <[email protected]>
Reviewed-by: Alexey Klimov <[email protected]>
Reviewed-by: Bryan O'Donoghue <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
chenhuacai pushed a commit that referenced this pull request Oct 11, 2025
The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.

Before:
```
$ perf test -vv 7
  7: PERF_RECORD_* events & perf_sample fields:
 --- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
 ------------------------------------------------------------
perf_event_attr:
  type                    	   0 (PERF_TYPE_HARDWARE)
  config                  	   0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                	   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
 ------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  config                           0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
  disabled                         1
  exclude_kernel                   1
 ------------------------------------------------------------
sys_perf_event_open: pid 0  cpu -1  group_fd -1  flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
 ------------------------------------------------------------
perf_event_attr:
  type                             1 (PERF_TYPE_SOFTWARE)
  size                             136
  config                           0x9 (PERF_COUNT_SW_DUMMY)
  sample_type                      IP|TID|TIME|CPU
  read_format                      ID|LOST
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
  ksymbol                          1
  bpf_event                        1
  { wakeup_events, wakeup_watermark } 1
 ------------------------------------------------------------
sys_perf_event_open: pid 1189569  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
 ---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
 ---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
    #0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
    #1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
    #2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
    #3 0x7f29ce849cc2 in raise raise.c:27
    #4 0x7f29ce8324ac in abort abort.c:81
    #5 0x565358f662d4 in check_leaks builtin-test.c:226
    #6 0x565358f6682e in run_test_child builtin-test.c:344
    #7 0x565358ef7121 in start_command run-command.c:128
    #8 0x565358f67273 in start_test builtin-test.c:545
    #9 0x565358f6771d in __cmd_test builtin-test.c:647
    #10 0x565358f682bd in cmd_test builtin-test.c:849
    #11 0x565358ee5ded in run_builtin perf.c:349
    #12 0x565358ee6085 in handle_internal_command perf.c:401
    #13 0x565358ee61de in run_argv perf.c:448
    #14 0x565358ee6527 in main perf.c:555
    #15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
    #16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #17 0x565358e391c1 in _start perf[851c1]
  7: PERF_RECORD_* events & perf_sample fields                       : FAILED!
```

After:
```
$ perf test 7
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Chun-Tse Shao <[email protected]>
Cc: Howard Chu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
@chenhuacai chenhuacai force-pushed the master branch 4 times, most recently from cbf82cb to f955b42 Compare November 22, 2025 02:35
@chenhuacai chenhuacai force-pushed the master branch 3 times, most recently from 71efa6e to fb4d5b4 Compare November 29, 2025 05:43
@chenhuacai chenhuacai force-pushed the master branch 2 times, most recently from 1bae970 to cbbe8fd Compare December 11, 2025 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants