blatt A1
Go to file
Sean Christopherson 82d811ff56 KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
Upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in
kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
how KVM tracks the sequence counter snapshot.

Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
when checking to see if a page fault is stale, as the sequence count is
stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
where KVM will effectively hang vCPUs due to always thinking page faults
are stale, which results in KVM refusing to "fix" faults.

mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
KVM is handling page faults to detect if userspace mappings relevant to
the guest were invalidated between snapshotting the counter and acquiring
mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
resolve the page fault is fresh.  If KVM sees that the counter has
changed, KVM simply resumes the guest without fixing the fault.

What _should_ happen is that the source of the mmu_notifier invalidations
eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
again fix guest page fault(s).

But for a long-lived VM and/or a VM that the host just doesn't particularly
like, it's possible for a VM to be on the receiving end of 2 billion (with
a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
negative value when implicitly cast to an "int" by is_page_fault_stale(),
and then sign-extended into a 64-bit unsigned when the signed "int" is
implicitly cast back to an "unsigned long" on the call to
mmu_invalidate_retry_hva().

As a result of the casting and sign-extension, given a sequence counter of
e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing

	if (0x8002dc25 != 0xffffffff8002dc25)

and signals that the page fault is stale and needs to be retried even
though the sequence counter is stable, and KVM effectively hangs any vCPU
that takes a page fault (EPT violation or #NPF when TDP is enabled).

Reported-by: Brian Rak <brak@vultr.com>
Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
Fixes: a955cad84c ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:04 +02:00
arch KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs 2023-08-30 16:11:04 +02:00
block blk-crypto: dynamically allocate fallback profile 2023-08-23 17:52:39 +02:00
certs certs: Fix build error when PKCS#11 URI contains semicolon 2023-02-09 11:28:11 +01:00
crypto crypto: jitter - correct health test during initialization 2023-07-19 16:21:42 +02:00
Documentation x86/cpu: Rename srso_(.*)_alias to srso_alias_\1 2023-08-26 13:26:59 +02:00
drivers bonding: fix macvlan over alb bond support 2023-08-30 16:11:04 +02:00
fs jbd2: fix a race when checking checkpoint buffer busy 2023-08-30 16:10:58 +02:00
include bonding: fix macvlan over alb bond support 2023-08-30 16:11:04 +02:00
init x86/mm: Initialize text poking earlier 2023-08-08 20:03:49 +02:00
io_uring io_uring: correct check for O_TMPFILE 2023-08-16 18:27:24 +02:00
ipc ipc: fix memory leak in init_mqueue_fs() 2022-12-31 13:32:01 +01:00
kernel tracing: Fix memleak due to race between current_tracer and trace 2023-08-30 16:11:00 +02:00
lib debugobjects: Recheck debug_objects_enabled before reporting 2023-08-11 12:08:23 +02:00
LICENSES
mm hugetlb: do not clear hugetlb dtor until allocating vmemmap 2023-08-23 17:52:41 +02:00
net rtnetlink: Reject negative ifindexes in RTM_NEWLINK 2023-08-30 16:11:04 +02:00
rust rust: allocator: Prevent mis-aligned allocation 2023-08-11 12:08:18 +02:00
samples samples: ftrace: Save required argument registers in sample trampolines 2023-07-23 13:49:44 +02:00
scripts gcc-plugins: Reorganize gimple includes for GCC 13 2023-08-16 18:27:20 +02:00
security security: keys: Modify mismatched function name 2023-07-27 08:50:43 +02:00
sound ASoC: amd: vangogh: select CONFIG_SND_AMD_ACP_CONFIG 2023-08-23 17:52:40 +02:00
tools selftests: bonding: do not set port down before adding to bond 2023-08-30 16:11:02 +02:00
usr usr/gen_init_cpio.c: remove unnecessary -1 values from int file 2022-10-03 14:21:44 -07:00
virt KVM: Grab a reference to KVM for VM and vCPU stats file descriptors 2023-08-03 10:24:08 +02:00
.clang-format inet: ping: use hlist_nulls rcu iterator during lookup 2022-12-01 12:42:46 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
.mailmap 9 hotfixes. 6 for MM, 3 for other areas. Four of these patches address 2022-12-10 17:10:52 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS MAINTAINERS: Remove Michal Marek from Kbuild maintainers 2022-11-16 14:53:00 +09:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS devlink: move code to a dedicated directory 2023-08-30 16:11:00 +02:00
Makefile Linux 6.1.49 2023-08-27 21:01:32 +02:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.