sst-linux/mm
Ricardo Cañuelo Navarro 0a0c255c39 mm,madvise,hugetlb: check for 0-length range after end address adjustment
commit 2ede647a6fde3e54a6bfda7cf01c716649655900 upstream.

Add a sanity check to madvise_dontneed_free() to address a corner case in
madvise where a race condition causes the current vma being processed to
be backed by a different page size.

During a madvise(MADV_DONTNEED) call on a memory region registered with a
userfaultfd, there's a period of time where the process mm lock is
temporarily released in order to send a UFFD_EVENT_REMOVE and let
userspace handle the event.  During this time, the vma covering the
current address range may change due to an explicit mmap done concurrently
by another thread.

If, after that change, the memory region, which was originally backed by
4KB pages, is now backed by hugepages, the end address is rounded down to
a hugepage boundary to avoid data loss (see "Fixes" below).  This rounding
may cause the end address to be truncated to the same address as the
start.

Make this corner case follow the same semantics as in other similar cases
where the requested region has zero length (ie.  return 0).

This will make madvise_walk_vmas() continue to the next vma in the range
(this time holding the process mm lock) which, due to the prev pointer
becoming stale because of the vma change, will be the same hugepage-backed
vma that was just checked before.  The next time madvise_dontneed_free()
runs for this vma, if the start address isn't aligned to a hugepage
boundary, it'll return -EINVAL, which is also in line with the madvise
api.

From userspace perspective, madvise() will return EINVAL because the start
address isn't aligned according to the new vma alignment requirements
(hugepage), even though it was correctly page-aligned when the call was
issued.

Link: https://lkml.kernel.org/r/20250203075206.1452208-1-rcn@igalia.com
Fixes: 8ebe0a5eaa ("mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs")
Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Florent Revest <revest@google.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07 16:56:39 +01:00
..
damon mm/damon/vaddr: fix issue in damon_va_evenly_split_region() 2024-12-14 19:54:52 +01:00
kasan kasan: make report_lock a raw spinlock 2024-12-14 19:54:50 +01:00
kfence kfence: skip __GFP_THISNODE allocations on NUMA systems 2025-02-21 13:49:48 +01:00
kmsan kmsan: do not wipe out origin when doing partial unpoisoning 2024-06-16 13:41:38 +02:00
backing-dev.c writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs 2023-04-26 14:28:39 +02:00
balloon_compaction.c
bootmem_info.c
cma_debug.c
cma_sysfs.c
cma.c mm/cma: drop incorrect alignment check in cma_init_reserved_mem 2024-06-16 13:41:39 +02:00
cma.h
compaction.c mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations 2024-04-03 15:19:42 +02:00
debug_page_ref.c
debug_vm_pgtable.c
debug.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c filemap: avoid truncating 64-bit offset to 32 bits 2025-01-23 17:17:14 +01:00
folio-compat.c mm: remove try_to_free_swap() 2022-10-03 14:02:53 -07:00
frontswap.c
gup_test.c
gup_test.h
gup.c mm: gup: fix infinite loop within __get_longterm_locked 2025-02-21 13:50:10 +01:00
highmem.c
hmm.c
huge_memory.c mm: migrate: try again if THP split is failed due to page refcnt 2024-11-08 16:26:47 +01:00
hugetlb_cgroup.c mm/hugetlb_cgroup: convert hugetlb_cgroup_uncharge_page() to folios 2024-05-17 11:55:52 +02:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: fix a race between vmemmap pmd split 2023-09-19 12:27:56 +02:00
hugetlb_vmemmap.h
hugetlb.c mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio() 2024-08-14 13:53:02 +02:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c
internal.h mm: unconditionally close VMAs on error 2024-11-22 15:37:34 +01:00
interval_tree.c mm/interval_tree: add comments to improve code readability 2021-04-30 11:20:38 -07:00
io-mapping.c
ioremap.c
Kconfig mm: z3fold: deprecate CONFIG_Z3FOLD 2024-10-17 15:22:05 +02:00
Kconfig.debug mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM 2023-06-14 11:15:29 +02:00
khugepaged.c mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file() 2024-08-29 17:30:17 +02:00
kmemleak.c mm: kmemleak: fix upper boundary check for physical address objects 2025-02-21 13:49:49 +01:00
ksm.c mm/ksm: fix race with VMA iteration and mm_struct teardown 2023-03-30 12:49:29 +02:00
list_lru.c mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe 2022-06-16 19:48:31 -07:00
maccess.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
madvise.c mm,madvise,hugetlb: check for 0-length range after end address adjustment 2025-03-07 16:56:39 +01:00
Makefile
mapping_dirty_helpers.c
memblock.c x86/numa: Fix the address overlap check in numa_fill_memblks() 2024-03-01 13:26:36 +01:00
memcontrol.c memcg: fix soft lockup in the OOM process 2025-03-07 16:56:29 +01:00
memfd.c memfd: check for non-NULL file_seals in memfd_create() syscall 2023-06-28 11:12:27 +02:00
memory_hotplug.c x86/kaslr: Expose and use the end of the physical memory address space 2024-09-12 11:10:17 +02:00
memory-failure.c mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu 2024-08-29 17:30:15 +02:00
memory-tiers.c memory tier: release the new_memtier in find_create_memory_tier() 2023-03-10 09:34:27 +01:00
memory.c mm: avoid leaving partial pfn mappings around in error case 2024-09-18 19:23:04 +02:00
mempolicy.c mm/numa_balancing: teach mpol_to_str about the balancing mode 2024-08-03 08:49:40 +02:00
mempool.c
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-08 15:57:23 -08:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-04-03 15:19:36 +02:00
migrate_device.c
migrate.c migrate_pages_batch: fix statistics for longterm pin retry 2024-11-08 16:26:48 +01:00
mincore.c mm: teach mincore_hugetlb about pte markers 2023-03-22 13:34:03 +01:00
mlock.c
mm_init.c
mm_slot.h
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-08-03 08:49:30 +02:00
mmap.c mm: call the security_mmap_file() LSM hook in remap_file_pages() 2024-12-14 19:54:53 +01:00
mmu_gather.c
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-21 20:01:10 -07:00
mmzone.c
mprotect.c
mremap.c mm, mremap: fix mremap() expanding for vma's with vm_ops->close() 2023-02-09 11:28:22 +01:00
msync.c
nommu.c mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling 2024-11-22 15:37:34 +01:00
oom_kill.c memcg: fix soft lockup in the OOM process 2025-03-07 16:56:29 +01:00
page_alloc.c mm: page_alloc: move mlocked flag clearance into free_pages_prepare() 2024-12-14 19:54:31 +01:00
page_counter.c
page_ext.c mm/page_exit: fix kernel doc warning in page_ext_put() 2022-11-22 18:50:41 -08:00
page_idle.c
page_io.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
page_isolation.c mm/page_isolation: fix clang deadcode warning 2022-10-28 13:37:22 -07:00
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h mm/page_reporting: export reporting order as module parameter 2021-06-29 10:53:47 -07:00
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-27 13:46:22 +02:00
page_vma_mapped.c mm/swap: add swp_offset_pfn() to fetch PFN from swap entry 2022-09-26 19:46:05 -07:00
page-writeback.c Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again" 2024-07-11 12:47:14 +02:00
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free() 2022-07-17 17:14:47 -07:00
pgalloc-track.h mm: fix typos in comments 2021-05-07 00:26:35 -07:00
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-06-16 13:41:38 +02:00
process_vm_access.c use less confusing names for iov_iter direction initializers 2023-02-09 11:28:04 +01:00
ptdump.c
readahead.c mm/readahead: fix large folio support in async readahead 2025-01-09 13:30:06 +01:00
rmap.c mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON 2023-03-10 09:34:25 +01:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c secretmem: disable memfd_secret() if arch cannot set direct map 2024-10-17 15:22:28 +02:00
shmem.c mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling 2024-11-22 15:37:34 +01:00
shrinker_debug.c mm: shrinkers: fix deadlock in shrinker debugfs 2023-02-22 12:59:46 +01:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm/shuffle: fix section mismatch warning 2021-05-22 15:09:07 -10:00
slab_common.c mm: krealloc: Fix MTE false alarm in __do_krealloc 2024-11-17 15:07:22 +01:00
slab.c mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP 2023-03-30 12:49:23 +02:00
slab.h - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
slob.c
slub.c
sparse-vmemmap.c
sparse.c x86/kaslr: Expose and use the end of the physical memory address space 2024-09-12 11:10:17 +02:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c
swap_state.c
swap.c mm: page_alloc: move mlocked flag clearance into free_pages_prepare() 2024-12-14 19:54:31 +01:00
swap.h mm/swap: fix race when skipping swapcache 2024-03-01 13:26:32 +01:00
swapfile.c mm/swapfile: skip HugeTLB pages for unuse_vma 2024-10-22 15:56:43 +02:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-09-04 13:25:00 +02:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-06-28 11:12:17 +02:00
userfaultfd.c userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb 2024-02-23 09:12:51 +01:00
util.c mm: unconditionally close VMAs on error 2024-11-22 15:37:34 +01:00
vmalloc.c vmalloc: fix accounting with i915 2025-01-02 10:30:53 +01:00
vmpressure.c net-memcg: Fix scope of sockmem pressure indicators 2023-09-13 09:42:33 +02:00
vmscan.c mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim() 2025-01-09 13:30:07 +01:00
vmstat.c vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event 2024-12-14 19:54:13 +01:00
workingset.c mm/mglru: fix underprotected page cache 2023-12-20 17:00:26 +01:00
z3fold.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: allow only one active pool compaction context 2023-08-23 17:52:40 +02:00
zswap.c mm: zswap: fix missing folio cleanup in writeback race path 2024-03-01 13:26:39 +01:00