sst-linux/kernel
Waiman Long 8630709ebd locking/semaphore: Use wake_q to wake up processes outside lock critical section
[ Upstream commit 85b2b9c16d053364e2004883140538e73b333cdb ]

A circular lock dependency splat has been seen involving down_trylock():

  ======================================================
  WARNING: possible circular locking dependency detected
  6.12.0-41.el10.s390x+debug
  ------------------------------------------------------
  dd/32479 is trying to acquire lock:
  0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90

  but task is already holding lock:
  000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0

  the existing dependency chain (in reverse order) is:
  -> #4 (&zone->lock){-.-.}-{2:2}:
  -> #3 (hrtimer_bases.lock){-.-.}-{2:2}:
  -> #2 (&rq->__lock){-.-.}-{2:2}:
  -> #1 (&p->pi_lock){-.-.}-{2:2}:
  -> #0 ((console_sem).lock){-.-.}-{2:2}:

The console_sem -> pi_lock dependency is due to calling try_to_wake_up()
while holding the console_sem raw_spinlock. This dependency can be broken
by using wake_q to do the wakeup instead of calling try_to_wake_up()
under the console_sem lock. This will also make the semaphore's
raw_spinlock become a terminal lock without taking any further locks
underneath it.

The hrtimer_bases.lock is a raw_spinlock while zone->lock is a
spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via
the debug_objects_fill_pool() helper function in the debugobjects code.

  -> #4 (&zone->lock){-.-.}-{2:2}:
         __lock_acquire+0xe86/0x1cc0
         lock_acquire.part.0+0x258/0x630
         lock_acquire+0xb8/0xe0
         _raw_spin_lock_irqsave+0xb4/0x120
         rmqueue_bulk+0xac/0x8f0
         __rmqueue_pcplist+0x580/0x830
         rmqueue_pcplist+0xfc/0x470
         rmqueue.isra.0+0xdec/0x11b0
         get_page_from_freelist+0x2ee/0xeb0
         __alloc_pages_noprof+0x2c2/0x520
         alloc_pages_mpol_noprof+0x1fc/0x4d0
         alloc_pages_noprof+0x8c/0xe0
         allocate_slab+0x320/0x460
         ___slab_alloc+0xa58/0x12b0
         __slab_alloc.isra.0+0x42/0x60
         kmem_cache_alloc_noprof+0x304/0x350
         fill_pool+0xf6/0x450
         debug_object_activate+0xfe/0x360
         enqueue_hrtimer+0x34/0x190
         __run_hrtimer+0x3c8/0x4c0
         __hrtimer_run_queues+0x1b2/0x260
         hrtimer_interrupt+0x316/0x760
         do_IRQ+0x9a/0xe0
         do_irq_async+0xf6/0x160

Normally a raw_spinlock to spinlock dependency is not legitimate
and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled,
but debug_objects_fill_pool() is an exception as it explicitly
allows this dependency for non-PREEMPT_RT kernel without causing
PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is
legitimate and not a bug.

Anyway, semaphore is the only locking primitive left that is still
using try_to_wake_up() to do wakeup inside critical section, all the
other locking primitives had been migrated to use wake_q to do wakeup
outside of the critical section. It is also possible that there are
other circular locking dependencies involving printk/console_sem or
other existing/new semaphores lurking somewhere which may show up in
the future. Let just do the migration now to wake_q to avoid headache
like this.

Reported-by: yzbot+ed801a886dfdbfe7136d@syzkaller.appspotmail.com
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10 14:33:39 +02:00
..
bpf bpf: skip non exist keys in generic_map_lookup_batch 2025-03-07 16:56:38 +01:00
cgroup cgroup: fix race between fork and cgroup.kill 2025-02-21 13:50:04 +01:00
configs
debug kdb: Do not assume write() callback available 2025-02-21 13:50:09 +01:00
dma dma-debug: fix a possible deadlock on radix_lock 2024-12-14 19:54:43 +01:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-03 15:19:44 +02:00
events perf/ring_buffer: Allow the EPOLLRDNORM flag for poll 2025-04-10 14:33:31 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:51:50 +01:00
gcov gcov: add support for GCC 14 2024-06-27 13:46:22 +02:00
irq genirq: Make handle_enforce_irqctx() unconditionally available 2025-02-21 13:48:57 +01:00
kcsan kcsan: Turn report_filterlist_lock into a raw_spinlock 2024-12-14 19:54:38 +01:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:52:10 +01:00
locking locking/semaphore: Use wake_q to wake up processes outside lock critical section 2025-04-10 14:33:39 +02:00
module module: Fix KCOV-ignored file name 2024-10-17 15:21:27 +02:00
power PM: hibernate: Add error handling for syscore_suspend() 2025-02-21 13:49:22 +01:00
printk printk: Fix signed integer overflow when defining LOG_BUF_LEN_MAX 2025-02-21 13:49:30 +01:00
rcu Revert "rcu-tasks: Fix access non-existent percpu rtpcp variable in rcu_tasks_need_gpcb()" 2025-01-02 10:30:55 +01:00
sched sched/deadline: Use online cpus for validating runtime 2025-04-10 14:33:39 +02:00
time hrtimers: Mark is_migration_base() with __always_inline 2025-03-28 21:58:50 +01:00
trace ring-buffer: Fix bytes_dropped calculation issue 2025-04-10 14:33:37 +02:00
.gitignore
acct.c acct: block access to kernel internal filesystems 2025-03-07 16:56:39 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-01-31 16:17:00 -08:00
audit_fsnotify.c
audit_tree.c
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-28 17:07:08 +00:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-05 20:12:47 +00:00
audit.h
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-11 12:47:16 +02:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-25 12:03:04 +02:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:29:32 +02:00
capability.c
cfi.c
compat.c
configs.c
context_tracking.c context_tracking: Fix noinstr vs KASAN 2023-03-10 09:33:45 +01:00
cpu_pm.c
cpu.c hrtimers: Handle CPU state correctly on hotplug 2025-01-23 17:17:15 +01:00
crash_core.c
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 17:00:20 +01:00
delayacct.c
dma.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
exec_domain.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
exit.c mm: optimize the redundant loop of mm_update_owner_next() 2024-07-11 12:47:13 +02:00
extable.c
fail_function.c
fork.c posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone 2024-11-14 13:15:16 +01:00
freezer.c
gen_kheaders.sh kheaders: Ignore silly-rename files 2025-01-23 17:17:11 +01:00
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c jump_label: Fix static_key_slow_dec() yet again 2024-10-17 15:21:29 +02:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2023-10-25 12:03:16 +02:00
kallsyms.c kallsyms: Add helper kallsyms_on_each_match_symbol() 2023-10-25 12:03:16 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: mark in_softirq_really() as __always_inline 2025-01-09 13:30:05 +01:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-19 16:21:08 +02:00
kexec_elf.c kexec: initialize ELF lowest address to ULONG_MAX 2025-04-10 14:33:36 +02:00
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 16:00:55 +02:00
kexec_internal.h
kexec.c kernel: kexec: copy user-array safely 2023-11-28 17:06:57 +00:00
kheaders.c
kmod.c
kprobes.c kprobes: Fix to check symbol prefixes correctly 2024-08-14 13:52:54 +02:00
ksysfs.c
kthread.c kthread: unpark only parked kthread 2024-10-17 15:22:28 +02:00
latencytop.c
Makefile kernel/numa.c: Move logging out of numa.h 2024-06-12 11:03:16 +02:00
module_signature.c
notifier.c notifier: Add blocking/atomic_notifier_chain_register_unique_prio() 2022-05-19 19:30:30 +02:00
nsproxy.c
numa.c kernel/numa.c: Move logging out of numa.h 2024-06-12 11:03:16 +02:00
padata.c padata: avoid UAF for reorder_work 2025-02-21 13:49:09 +01:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:04:54 +02:00
params.c
pid_namespace.c pid: Replace struct pid 1-element array with flex-array 2024-08-29 17:30:18 +02:00
pid.c pid: Replace struct pid 1-element array with flex-array 2024-08-29 17:30:18 +02:00
profile.c profiling: remove profile=sleep support 2024-08-14 13:52:50 +02:00
ptrace.c
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 17:07:13 +00:00
regset.c
relay.c
resource_kunit.c resource: provide meaningful MODULE_LICENSE() in test suite 2020-11-25 18:52:35 +01:00
resource.c resource: fix region_intersects() vs add_memory_driver_managed() 2024-10-17 15:21:55 +02:00
rseq.c
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:11:00 +02:00
scs.c
seccomp.c seccomp: Add wait_killable semantic to seccomp user notifier 2022-05-03 14:11:58 -07:00
signal.c signal: restore the override_rlimit logic 2024-11-14 13:15:18 +01:00
smp.c smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu() 2024-09-12 11:10:24 +02:00
smpboot.c
smpboot.h
softirq.c softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel 2025-02-01 18:30:06 +01:00
stackleak.c stackleak: add on/off stack variants 2022-05-08 01:33:09 -07:00
stacktrace.c
static_call_inline.c x86/static-call: provide a way to do very early static-call updates 2024-12-19 18:08:58 +01:00
static_call.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
stop_machine.c
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:31:59 +02:00
sys.c hrtimer: Use and report correct timerslack values for realtime tasks 2025-03-28 21:58:48 +01:00
sysctl-test.c
sysctl.c
task_work.c task_work: Introduce task_work_cancel() again 2024-08-03 08:49:34 +02:00
taskstats.c
torture.c torture: Fix hang during kthread shutdown phase 2023-03-10 09:34:07 +01:00
tracepoint.c
tsacct.c
ucount.c ucounts: fix counter leak in inc_rlimit_get_ucounts() 2024-11-14 13:15:19 +01:00
uid16.c
uid16.h
umh.c
up.c
user_namespace.c
user-return-notifier.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
user.c
usermode_driver.c
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: fix pipe accounting mismatch 2025-04-10 14:33:30 +02:00
watchdog_hld.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-08-03 08:49:42 +02:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 17:07:09 +00:00
workqueue_internal.h
workqueue.c workqueue: Improve scalability of workqueue watchdog touch 2024-09-12 11:10:27 +02:00