sst-linux/kernel/bpf
Yan Zhai 36c22125e5 bpf: skip non exist keys in generic_map_lookup_batch
[ Upstream commit 5644c6b50ffee0a56c1e01430a8c88e34decb120 ]

The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.

Rather, do not retry in generic_map_lookup_batch. If it finds a non
existing element, skip to the next key. This simple solution comes with
a price that transient errors may not be recovered, and the iteration
might cycle back to the first key under parallel deletion. For example,
Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:

For LPM trie map:
(1) ->map_get_next_key(map, prev_key, key) returns a valid key

(2) bpf_map_copy_value() return -ENOMENT
It means the key must be deleted concurrently.

(3) goto next_key
It swaps the prev_key and key

(4) ->map_get_next_key(map, prev_key, key) again
prev_key points to a non-existing key, for LPM trie it will treat just
like prev_key=NULL case, the returned key will be duplicated.

With the retry logic, the iteration can continue to the key next to the
deleted one. But if we directly skip to the next key, the iteration loop
would restart from the first key for the lpm_trie type.

However, not all races may be recovered. For example, if current key is
deleted after instead of before bpf_map_copy_value, or if the prev_key
also gets deleted, then the loop will still restart from the first key
for lpm_tire anyway. For generic lookup it might be better to stay
simple, i.e. just skip to the next key. To guarantee that the output
keys are not duplicated, it is better to implement map type specific
batch operations, which can properly lock the trie and synchronize with
concurrent mutators.

Fixes: cb4d03ab49 ("bpf: Add generic support for lookup batch op")
Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07 16:56:38 +01:00
..
preload bpf: iterators: Build and use lightweight bootstrap version of bpftool 2022-07-15 12:01:30 -07:00
arraymap.c bpf: Check percpu map value size first 2024-10-17 15:22:12 +02:00
bloom_filter.c bpf: Check bloom filter map value size 2024-05-17 11:56:05 +02:00
bpf_inode_storage.c bpf: Refactor some inode/task/sk storage functions for reuse 2024-07-18 13:18:34 +02:00
bpf_iter.c bpf: Initialize the bpf_run_ctx in bpf_iter_run_prog() 2022-08-18 17:06:13 -07:00
bpf_local_storage.c bpf: fix order of args in call to bpf_map_kvcalloc 2024-07-18 13:18:34 +02:00
bpf_lru_list.c bpf: Address KCSAN report on bpf_lru_list 2023-07-27 08:50:34 +02:00
bpf_lru_list.h bpf: Address KCSAN report on bpf_lru_list 2023-07-27 08:50:34 +02:00
bpf_lsm.c bpf: Fix the kernel crash caused by bpf_setsockopt(). 2023-02-09 11:28:02 +01:00
bpf_struct_ops_types.h bpf: Add dummy BPF STRUCT_OPS for test purpose 2021-11-01 14:10:00 -07:00
bpf_struct_ops.c bpf: Remove is_valid_bpf_tramp_flags() 2022-07-11 21:04:58 +02:00
bpf_task_storage.c bpf: Refactor some inode/task/sk storage functions for reuse 2024-07-18 13:18:34 +02:00
btf.c bpf: Fix memory leak in bpf_core_apply 2024-11-01 01:55:56 +01:00
cgroup_iter.c cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers 2023-06-21 16:00:51 +02:00
cgroup.c cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction 2024-11-08 16:26:45 +01:00
core.c bpf: fix potential error return 2025-01-09 13:30:04 +01:00
cpumap.c bpf: report RCU QS in cpumap kthread 2024-03-26 18:21:02 -04:00
devmap.c bpf: fix OOB devmap writes when deleting elements 2024-12-14 19:54:35 +01:00
disasm.c bpf: Relicense disassembler as GPL-2.0-only OR BSD-2-Clause 2021-09-02 14:49:23 +02:00
disasm.h bpf: Relicense disassembler as GPL-2.0-only OR BSD-2-Clause 2021-09-02 14:49:23 +02:00
dispatcher.c bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func 2024-08-03 08:49:45 +02:00
hashtab.c bpf: Check percpu map value size first 2024-10-17 15:22:12 +02:00
helpers.c bpf: Add MEM_WRITE attribute 2025-01-17 13:34:43 +01:00
inode.c bpf: Convert bpf_preload.ko to use light skeleton. 2022-02-10 23:31:51 +01:00
Kconfig rcu: Make the TASKS_RCU Kconfig option be selected 2022-04-20 16:52:58 -07:00
link_iter.c bpf: Add bpf_link iterator 2022-05-10 11:20:45 -07:00
local_storage.c cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers 2023-06-21 16:00:51 +02:00
log.c bpf: drop unnecessary user-triggerable WARN_ONCE in verifierl log 2024-08-29 17:30:17 +02:00
lpm_trie.c bpf: Fix exact match conditions in trie_get_next_key() 2024-12-14 19:54:31 +01:00
Makefile bpf: Split off basic BPF verifier log into separate file 2024-08-29 17:30:17 +02:00
map_in_map.c bpf: Defer the free of inner map when necessary 2024-01-25 15:27:26 -08:00
map_in_map.h bpf: Add map and need_defer parameters to .map_fd_put_ptr() 2024-01-25 15:27:26 -08:00
map_iter.c bpf: Introduce MEM_RDONLY flag 2021-12-18 13:27:41 -08:00
memalloc.c bpf: Zeroing allocated object from slab in bpf memory allocator 2023-03-10 09:33:06 +01:00
mmap_unlock_work.h bpf: Introduce helper bpf_find_vma 2021-11-07 11:54:51 -08:00
net_namespace.c net: Add includes masked by netdevice.h including uapi/bpf.h 2021-12-29 20:03:05 -08:00
offload.c bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD 2023-01-24 07:24:37 +01:00
percpu_freelist.c bpf: Initialize same number of free nodes for each pcpu_freelist 2022-11-11 12:05:14 -08:00
percpu_freelist.h bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI 2020-10-06 00:04:11 +02:00
prog_iter.c bpf: Refactor bpf_iter_reg to have separate seq_info member 2020-07-25 20:16:32 -07:00
queue_stack_maps.c bpf: Avoid deadlock when using queue and stack maps from NMI 2023-10-06 14:56:35 +02:00
reuseport_array.c net: Fix suspicious RCU usage in bpf_sk_reuseport_detach() 2022-08-17 16:42:59 -07:00
ringbuf.c bpf: Add MEM_WRITE attribute 2025-01-17 13:34:43 +01:00
stackmap.c bpf: Fix stackmap overflow check on 32-bit arches 2024-03-26 18:20:41 -04:00
syscall.c bpf: skip non exist keys in generic_map_lookup_batch 2025-03-07 16:56:38 +01:00
sysfs_btf.c bpf: Load and verify kernel module BTFs 2020-11-10 15:25:53 -08:00
task_iter.c bpf: Fix iter/task tid filtering 2024-11-01 01:56:01 +01:00
tnum.c bpf, tnums: Provably sound, faster, and more precise algorithm for tnum_mul 2021-06-01 13:34:15 +02:00
trampoline.c bpf, x64: Fix tailcall infinite loop 2024-01-10 17:10:26 +01:00
verifier.c bpf: Fix overloading of MEM_UNINIT's meaning 2025-01-17 13:34:43 +01:00