sst-linux/kernel/bpf
Yan Zhai 36c22125e5 bpf: skip non exist keys in generic_map_lookup_batch
[ Upstream commit 5644c6b50ffee0a56c1e01430a8c88e34decb120 ]

The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.

Rather, do not retry in generic_map_lookup_batch. If it finds a non
existing element, skip to the next key. This simple solution comes with
a price that transient errors may not be recovered, and the iteration
might cycle back to the first key under parallel deletion. For example,
Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:

For LPM trie map:
(1) ->map_get_next_key(map, prev_key, key) returns a valid key

(2) bpf_map_copy_value() return -ENOMENT
It means the key must be deleted concurrently.

(3) goto next_key
It swaps the prev_key and key

(4) ->map_get_next_key(map, prev_key, key) again
prev_key points to a non-existing key, for LPM trie it will treat just
like prev_key=NULL case, the returned key will be duplicated.

With the retry logic, the iteration can continue to the key next to the
deleted one. But if we directly skip to the next key, the iteration loop
would restart from the first key for the lpm_trie type.

However, not all races may be recovered. For example, if current key is
deleted after instead of before bpf_map_copy_value, or if the prev_key
also gets deleted, then the loop will still restart from the first key
for lpm_tire anyway. For generic lookup it might be better to stay
simple, i.e. just skip to the next key. To guarantee that the output
keys are not duplicated, it is better to implement map type specific
batch operations, which can properly lock the trie and synchronize with
concurrent mutators.

Fixes: cb4d03ab49 ("bpf: Add generic support for lookup batch op")
Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07 16:56:38 +01:00
..
preload
arraymap.c bpf: Check percpu map value size first 2024-10-17 15:22:12 +02:00
bloom_filter.c bpf: Check bloom filter map value size 2024-05-17 11:56:05 +02:00
bpf_inode_storage.c bpf: Refactor some inode/task/sk storage functions for reuse 2024-07-18 13:18:34 +02:00
bpf_iter.c
bpf_local_storage.c bpf: fix order of args in call to bpf_map_kvcalloc 2024-07-18 13:18:34 +02:00
bpf_lru_list.c
bpf_lru_list.h
bpf_lsm.c
bpf_struct_ops_types.h
bpf_struct_ops.c
bpf_task_storage.c bpf: Refactor some inode/task/sk storage functions for reuse 2024-07-18 13:18:34 +02:00
btf.c bpf: Fix memory leak in bpf_core_apply 2024-11-01 01:55:56 +01:00
cgroup_iter.c
cgroup.c cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction 2024-11-08 16:26:45 +01:00
core.c bpf: fix potential error return 2025-01-09 13:30:04 +01:00
cpumap.c bpf: report RCU QS in cpumap kthread 2024-03-26 18:21:02 -04:00
devmap.c bpf: fix OOB devmap writes when deleting elements 2024-12-14 19:54:35 +01:00
disasm.c
disasm.h
dispatcher.c bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func 2024-08-03 08:49:45 +02:00
hashtab.c bpf: Check percpu map value size first 2024-10-17 15:22:12 +02:00
helpers.c bpf: Add MEM_WRITE attribute 2025-01-17 13:34:43 +01:00
inode.c
Kconfig
link_iter.c
local_storage.c
log.c bpf: drop unnecessary user-triggerable WARN_ONCE in verifierl log 2024-08-29 17:30:17 +02:00
lpm_trie.c bpf: Fix exact match conditions in trie_get_next_key() 2024-12-14 19:54:31 +01:00
Makefile bpf: Split off basic BPF verifier log into separate file 2024-08-29 17:30:17 +02:00
map_in_map.c bpf: Defer the free of inner map when necessary 2024-01-25 15:27:26 -08:00
map_in_map.h bpf: Add map and need_defer parameters to .map_fd_put_ptr() 2024-01-25 15:27:26 -08:00
map_iter.c
memalloc.c
mmap_unlock_work.h
net_namespace.c
offload.c
percpu_freelist.c
percpu_freelist.h
prog_iter.c
queue_stack_maps.c
reuseport_array.c
ringbuf.c bpf: Add MEM_WRITE attribute 2025-01-17 13:34:43 +01:00
stackmap.c bpf: Fix stackmap overflow check on 32-bit arches 2024-03-26 18:20:41 -04:00
syscall.c bpf: skip non exist keys in generic_map_lookup_batch 2025-03-07 16:56:38 +01:00
sysfs_btf.c
task_iter.c bpf: Fix iter/task tid filtering 2024-11-01 01:56:01 +01:00
tnum.c
trampoline.c
verifier.c bpf: Fix overloading of MEM_UNINIT's meaning 2025-01-17 13:34:43 +01:00