sst-linux/mm
Miaohe Lin 00b0752c7f mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
commit 8cf360b9d6a840700e06864236a01a883b34bbad upstream.

When I did memory failure tests recently, below panic occurs:

page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __rmqueue_pcplist+0x23b/0x520
 get_page_from_freelist+0x26b/0xe40
 __alloc_pages_noprof+0x113/0x1120
 __folio_alloc_noprof+0x11/0xb0
 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
 __alloc_fresh_hugetlb_folio+0xe7/0x140
 alloc_pool_huge_folio+0x68/0x100
 set_max_huge_pages+0x13d/0x340
 hugetlb_sysctl_handler_common+0xe8/0x110
 proc_sys_call_handler+0x194/0x280
 vfs_write+0x387/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
 </TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---

And before the panic, there had an warning about bad page state:

BUG: Bad page state in process page-types  pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
Call Trace:
 <TASK>
 dump_stack_lvl+0x83/0xa0
 bad_page+0x63/0xf0
 free_unref_page+0x36e/0x5c0
 unpoison_memory+0x50b/0x630
 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
 debugfs_attr_write+0x42/0x60
 full_proxy_write+0x5b/0x80
 vfs_write+0xcd/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
 </TASK>

The root cause should be the below race:

 memory_failure
  try_memory_failure_hugetlb
   me_huge_page
    __page_handle_poison
     dissolve_free_hugetlb_folio
     drain_all_pages -- Buddy page can be isolated e.g. for compaction.
     take_page_off_buddy -- Failed as page is not in the buddy list.
	     -- Page can be putback into buddy after compaction.
    page_ref_inc -- Leads to buddy page with refcnt = 1.

Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning.  And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.

Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.

Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes: ceaf8fbea7 ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-21 14:35:59 +02:00
..
damon mm/damon/reclaim: fix quota stauts loss due to online tunings 2024-03-01 13:26:39 +01:00
kasan kasan/test: avoid gcc warning for intentional overflow 2024-04-03 15:19:27 +02:00
kfence mm,kfence: decouple kfence from page granularity mapping judgement 2023-12-03 07:32:08 +01:00
kmsan kmsan: do not wipe out origin when doing partial unpoisoning 2024-06-16 13:41:38 +02:00
backing-dev.c writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs 2023-04-26 14:28:39 +02:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-06-16 13:41:39 +02:00
cma.h
compaction.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:19:42 +02:00
debug_page_ref.c
debug_vm_pgtable.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
folio-compat.c
frontswap.c
gup_test.c
gup_test.h selftests/vm: gup_test: fix test flag 2021-05-05 11:27:26 -07:00
gup.c mm: always expand the stack with the mmap write lock held 2023-07-01 13:16:25 +02:00
highmem.c
hmm.c
huge_memory.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:41:38 +02:00
hugetlb_cgroup.c mm/hugetlb_cgroup: convert hugetlb_cgroup_uncharge_page() to folios 2024-05-17 11:55:52 +02:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-09-19 12:27:56 +02:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb: pass correct order_per_bit to cma_declare_contiguous_nid 2024-06-16 13:41:39 +02:00
hwpoison-inject.c
init-mm.c
internal.h mm, netfs, fscache: stop read optimisation when folio removed from pagecache 2024-01-10 17:10:31 +01:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: Add ioremap/iounmap_allowed() 2022-06-27 12:22:31 +01:00
Kconfig mm: introduce new 'lock_mm_and_find_vma()' page fault helper 2023-07-01 13:16:24 +02:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-06-14 11:15:29 +02:00
khugepaged.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
kmemleak.c
ksm.c mm/ksm: fix race with VMA iteration and mm_struct teardown 2023-03-30 12:49:29 +02:00
list_lru.c
maccess.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
madvise.c madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check 2023-08-30 16:11:11 +02:00
Makefile
mapping_dirty_helpers.c
memblock.c x86/numa: Fix the address overlap check in numa_fill_memblks() 2024-03-01 13:26:36 +01:00
memcontrol.c mm: memcontrol: clarify swapaccount=0 deprecation warning 2024-03-01 13:26:32 +01:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 11:12:27 +02:00
memory_hotplug.c mm/memory_hotplug: fix error handling in add_memory_resource() 2024-01-10 17:10:33 +01:00
memory-failure.c mm/memory-failure: fix handling of dissolved but not taken off from buddy pages 2024-06-21 14:35:59 +02:00
memory-tiers.c memory tier: release the new_memtier in find_create_memory_tier() 2023-03-10 09:34:27 +01:00
memory.c x86/mm/pat: fix VM_PAT handling in COW mappings 2024-04-10 16:28:33 +02:00
mempolicy.c mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer 2023-11-08 14:11:02 +01:00
mempool.c
memremap.c
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-04-03 15:19:36 +02:00
migrate_device.c
migrate.c mm/hugetlb: add folio_hstate() 2024-05-17 11:55:52 +02:00
mincore.c mm: teach mincore_hugetlb about pte markers 2023-03-22 13:34:03 +01:00
mlock.c
mm_init.c
mm_slot.h
mmap_lock.c
mmap.c mmap: fix error paths with dup_anon_vma() 2023-11-08 14:11:03 +01:00
mmu_gather.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c mm, mremap: fix mremap() expanding for vma's with vm_ops->close() 2023-02-09 11:28:22 +01:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c xtensa: fix lock_mm_and_find_vma in case VMA not found 2023-07-05 18:27:37 +01:00
oom_kill.c
page_alloc.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:19:42 +02:00
page_counter.c
page_ext.c
page_idle.c
page_io.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm: page_table_check: Ensure user pages are not slab pages 2023-06-14 11:15:29 +02:00
page_vma_mapped.c
page-writeback.c mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again 2024-02-23 09:12:32 +01:00
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:41:38 +02:00
process_vm_access.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
ptdump.c
readahead.c mm: use memalloc_nofs_save() in page_cache_ra_order() 2024-05-17 11:56:21 +02:00
rmap.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-03-10 09:34:25 +01:00
rodata_test.c
secretmem.c mm/secretmem: remove reduntant return value 2022-10-03 14:03:36 -07:00
shmem.c mm/shmem: fix race in shmem_undo_range w/THP 2023-12-20 17:00:26 +01:00
shrinker_debug.c mm: shrinkers: fix deadlock in shrinker debugfs 2023-02-22 12:59:46 +01:00
shuffle.c
shuffle.h
slab_common.c mm/slab_common: fix slab_caches list corruption after kmem_cache_destroy() 2023-10-06 14:57:03 +02:00
slab.c mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP 2023-03-30 12:49:23 +02:00
slab.h
slob.c
slub.c
sparse-vmemmap.c
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2024-01-31 16:17:02 -08:00
swap_cgroup.c
swap_slots.c
swap_state.c
swap.c
swap.h mm/swap: fix race when skipping swapcache 2024-03-01 13:26:32 +01:00
swapfile.c mm: swap: fix race between free_swap_and_cache() and swapoff() 2024-04-03 15:19:32 +02:00
truncate.c mm: merge folio_has_private()/filemap_release_folio() call pairs 2024-01-10 17:10:31 +01:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
userfaultfd.c userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb 2024-02-23 09:12:51 +01:00
util.c rcu: dump vmalloc memory info safely 2023-09-13 09:42:59 +02:00
vmalloc.c mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL 2024-06-21 14:35:41 +02:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-09-13 09:42:33 +02:00
vmscan.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:19:42 +02:00
vmstat.c
workingset.c mm/mglru: fix underprotected page cache 2023-12-20 17:00:26 +01:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c zsmalloc: allow only one active pool compaction context 2023-08-23 17:52:40 +02:00
zswap.c mm: zswap: fix missing folio cleanup in writeback race path 2024-03-01 13:26:39 +01:00