sst-linux/include/uapi/linux
Aleksa Sarai fddb5d430a open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].

This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).

Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.

In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.

Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).

/* Syscall Prototype. */
  /*
   * open_how is an extensible structure (similar in interface to
   * clone3(2) or sched_setattr(2)). The size parameter must be set to
   * sizeof(struct open_how), to allow for future extensions. All future
   * extensions will be appended to open_how, with their zero value
   * acting as a no-op default.
   */
  struct open_how { /* ... */ };

  int openat2(int dfd, const char *pathname,
              struct open_how *how, size_t size);

/* Description. */
The initial version of 'struct open_how' contains the following fields:

  flags
    Used to specify openat(2)-style flags. However, any unknown flag
    bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
    will result in -EINVAL. In addition, this field is 64-bits wide to
    allow for more O_ flags than currently permitted with openat(2).

  mode
    The file mode for O_CREAT or O_TMPFILE.

    Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.

  resolve
    Restrict path resolution (in contrast to O_* flags they affect all
    path components). The current set of flags are as follows (at the
    moment, all of the RESOLVE_ flags are implemented as just passing
    the corresponding LOOKUP_ flag).

    RESOLVE_NO_XDEV       => LOOKUP_NO_XDEV
    RESOLVE_NO_SYMLINKS   => LOOKUP_NO_SYMLINKS
    RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
    RESOLVE_BENEATH       => LOOKUP_BENEATH
    RESOLVE_IN_ROOT       => LOOKUP_IN_ROOT

open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.

Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).

After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.

/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.

In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).

/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).

Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.

Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).

[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs

Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-18 09:19:18 -05:00
..
android
byteorder
caif
can can: don't use deprecated license identifiers 2019-11-05 12:44:34 +01:00
cifs
dvb
genwqe
hdlc
hsi
iio
isdn
mmc
netfilter Merge branch 'master' of git://blackhole.kfki.hu/nf-next 2019-11-13 10:42:07 +01:00
netfilter_arp
netfilter_bridge
netfilter_ipv4
netfilter_ipv6
nfsd
raid
sched
spi
sunrpc
tc_act net: sched: add erspan option support to act_tunnel_key 2019-11-21 11:44:06 -08:00
tc_ematch
usb
wimax
a.out.h
acct.h
adb.h
adfs_fs.h
affs_hardblocks.h
agpgart.h
aio_abi.h
am437x-vpfe.h
apm_bios.h
arcfb.h
arm_sdei.h
aspeed-lpc-ctrl.h
aspeed-p2a-ctrl.h
atalk.h
atm_eni.h
atm_he.h
atm_idt77105.h
atm_nicstar.h
atm_tcp.h
atm_zatm.h
atm.h
atmapi.h
atmarp.h
atmbr2684.h
atmclip.h
atmdev.h
atmioc.h
atmlec.h
atmmpc.h
atmppp.h
atmsap.h
atmsvc.h
audit.h Revert "bpf: Emit audit messages upon successful prog load and unload" 2019-11-23 09:56:02 -08:00
auto_dev-ioctl.h
auto_fs4.h
auto_fs.h
auxvec.h
ax25.h
b1lli.h
batadv_packet.h
batman_adv.h
baycom.h
bcache.h
bcm933xx_hcs.h
bfs_fs.h
binfmts.h
blkpg.h
blktrace_api.h
blkzoned.h block: add zone open, close and finish ioctl support 2019-11-07 06:31:50 -07:00
bpf_common.h
bpf_perf_event.h
bpf.h bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY 2019-11-18 11:41:59 +01:00
bpfilter.h
bpqether.h
bsg.h
bt-bmc.h
btf.h
btrfs_tree.h btrfs: add support for 4-copy replication (raid1c4) 2019-11-18 17:51:49 +01:00
btrfs.h btrfs: add incompat for raid1 with 3, 4 copies 2019-11-18 17:51:49 +01:00
can.h can: don't use deprecated license identifiers 2019-11-05 12:44:34 +01:00
capability.h
capi.h
cciss_defs.h
cciss_ioctl.h
cdrom.h
cec-funcs.h
cec.h
cgroupstats.h
chio.h scsi: ch: add include guard to chio.h 2019-10-09 22:31:14 -04:00
cm4000_cs.h
cn_proc.h
coda.h
coff.h
connector.h
const.h
coresight-stm.h
cramfs_fs.h
cryptouser.h
cuda.h
cyclades.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
cycx_cfm.h
dcbnl.h
dccp.h
devlink.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-16 21:51:42 -08:00
dlm_device.h
dlm_netlink.h
dlm_plock.h
dlm.h
dlmconstants.h
dm-ioctl.h
dm-log-userspace.h
dma-buf.h
dn.h
dns_resolver.h
dqblk_xfs.h
edd.h
efs_fs_sb.h
elf-em.h
elf-fdpic.h
elf.h
elfcore.h y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
errno.h
errqueue.h y2038: socket: remove timespec reference in timestamping 2019-11-15 14:38:29 +01:00
erspan.h
ethtool.h
eventpoll.h
fadvise.h
falloc.h
fanotify.h
fb.h
fcntl.h open: introduce openat2(2) syscall 2020-01-18 09:19:18 -05:00
fd.h
fdreg.h
fib_rules.h
fiemap.h
filter.h
firewire-cdev.h
firewire-constants.h
fou.h
fpga-dfl.h
fs.h
fscrypt.h fscrypt: add support for IV_INO_LBLK_64 policies 2019-11-06 12:34:36 -08:00
fsi.h
fsl_hypervisor.h
fsmap.h
fsverity.h
fuse.h
futex.h
gameport.h
gen_stats.h net_sched: add TCA_STATS_PKT64 attribute 2019-11-05 18:20:55 -08:00
genetlink.h
gfs2_ondisk.h
gigaset_dev.h
gpio.h gpio: add new SET_CONFIG ioctl() to gpio chardev 2019-11-12 16:30:31 +01:00
gsmmux.h
gtp.h
hash_info.h
hdlc.h
hdlcdrv.h
hdreg.h
hid.h
hiddev.h
hidraw.h
hpet.h
hsr_netlink.h
hw_breakpoint.h
hyperv.h
hysdn_if.h
i2c-dev.h
i2c.h
i2o-dev.h
i8k.h
icmp.h
icmpv6.h
if_addr.h
if_addrlabel.h
if_alg.h
if_arcnet.h
if_arp.h
if_bonding.h
if_bridge.h
if_cablemodem.h
if_eql.h
if_ether.h
if_fc.h
if_fddi.h
if_frad.h
if_hippi.h
if_infiniband.h
if_link.h
if_ltalk.h
if_macsec.h
if_packet.h
if_phonet.h
if_plip.h
if_ppp.h
if_pppol2tp.h
if_pppox.h
if_slip.h
if_team.h
if_tun.h
if_tunnel.h
if_vlan.h
if_x25.h
if_xdp.h
if.h
ife.h
igmp.h
ila.h
in6.h
in_route.h
in.h
inet_diag.h
inotify.h
input-event-codes.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2019-12-07 18:33:01 -08:00
input.h
io_uring.h io_uring: mark us with IORING_FEAT_SUBMIT_STABLE 2019-12-03 07:04:32 -07:00
ioctl.h
iommu.h
ip6_tunnel.h
ip_vs.h
ip.h
ipc.h
ipmi_bmc.h
ipmi_msgdefs.h
ipmi.h
ipsec.h
ipv6_route.h
ipv6.h
ipx.h
irqnr.h
iso_fs.h
isst_if.h
ivtv.h
ivtvfb.h
jffs2.h
joystick.h
kcm.h
kcmp.h
kcov.h kcov: remote coverage support 2019-12-04 19:44:14 -08:00
kd.h
kdev_t.h
kernel-page-flags.h
kernel.h
kernelcapi.h
kexec.h
keyboard.h
keyctl.h
kfd_ioctl.h
kvm_para.h
kvm.h KVM: PPC: Book3S HV: Support reset of secure guest 2019-11-28 17:02:31 +11:00
l2tp.h
libc-compat.h
lightnvm.h
limits.h
lirc.h
llc.h
loop.h
lp.h
lwtunnel.h lwtunnel: add options setting and dumping for erspan 2019-11-06 21:14:22 -08:00
magic.h powerpc/pseries/cmm: Implement balloon compaction 2019-11-13 16:58:01 +11:00
major.h
map_to_7segment.h
matroxfb.h
max2175.h
mdio.h net: phy: add EEE-related constants 2019-08-19 13:04:45 -07:00
media-bus-format.h
media.h
mei.h
membarrier.h
memfd.h
mempolicy.h
meye.h
mic_common.h
mic_ioctl.h
mii.h
minix_fs.h
mman.h
mmtimer.h
module.h
mount.h
mpls_iptunnel.h
mpls.h
mqueue.h
mroute6.h
mroute.h
msdos_fs.h
msg.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
mtio.h
n_r3964.h
nbd-netlink.h
nbd.h
ncsi.h
ndctl.h
neighbour.h
net_dropmon.h
net_namespace.h
net_tstamp.h
net.h
netconf.h
netdevice.h
netfilter_arp.h
netfilter_bridge.h
netfilter_decnet.h
netfilter_ipv4.h
netfilter_ipv6.h
netfilter.h
netlink_diag.h
netlink.h
netrom.h
nexthop.h
nfc.h
nfs2.h
nfs3.h
nfs4_mount.h
nfs4.h
nfs_fs.h
nfs_idmap.h
nfs_mount.h
nfs.h
nfsacl.h
nilfs2_api.h
nilfs2_ondisk.h
nl80211.h cfg80211: VLAN offload support for set_key and set_sta_vlan 2019-11-08 11:19:19 +01:00
nsfs.h
nubus.h
nvme_ioctl.h nvme: change nvme_passthru_cmd64 to explicitly mark rsvd 2019-11-06 06:17:38 +09:00
nvram.h
omap3isp.h
omapfb.h
oom.h
openat2.h open: introduce openat2(2) syscall 2020-01-18 09:19:18 -05:00
openvswitch.h net: openvswitch: add hash info to upcall 2019-11-14 17:29:46 -08:00
packet_diag.h
param.h
parport.h
patchkey.h
pci_regs.h Merge branch 'pci/resource' 2019-11-28 08:54:36 -06:00
pci.h
pcitest.h
perf_event.h perf/aux: Allow using AUX data in perf samples 2019-11-13 11:06:14 +01:00
personality.h
pfkeyv2.h License cleanup: add SPDX license identifier to uapi header files with no license 2017-11-02 11:19:54 +01:00
pg.h
phantom.h
phonet.h
pkt_cls.h net: sched: allow flower to match erspan options 2019-11-21 11:44:06 -08:00
pkt_sched.h net: sched: pie: enable timestamp based delay calculation 2019-11-20 12:31:45 -08:00
pktcdvd.h
pmu.h
poll.h
posix_acl_xattr.h
posix_acl.h
posix_types.h
ppdev.h
ppp_defs.h y2038: syscall implementation cleanups 2019-12-01 14:00:59 -08:00
ppp-comp.h
ppp-ioctl.h
pps.h
pr.h
prctl.h
psample.h
psci.h
psp-sev.h
ptp_clock.h ptp: Introduce strict checking of external time stamp options. 2019-11-15 12:48:32 -08:00
ptrace.h
qemu_fw_cfg.h
qnx4_fs.h
qnxtypes.h
qrtr.h
quota.h
radeonfb.h
random.h
raw.h
rds.h
reboot.h
reiserfs_fs.h
reiserfs_xattr.h
resource.h y2038: rusage: use __kernel_old_timeval 2019-11-15 14:38:29 +01:00
rfkill.h
rio_cm_cdev.h
rio_mport_cdev.h
romfs_fs.h
rose.h
route.h
rpmsg.h
rseq.h rseq: uapi: Declare rseq_cs field as union, update includes 2018-07-10 22:18:52 +02:00
rtc.h
rtnetlink.h
rxrpc.h
scc.h linux/scc.h: make uapi linux/scc.h self-contained 2019-12-04 19:44:12 -08:00
sched.h threads-v5.5 2019-11-25 18:36:49 -08:00
scif_ioctl.h
screen_info.h
sctp.h sctp: add SCTP_PEER_ADDR_THLDS_V2 sockopt 2019-11-08 14:18:32 -08:00
sdla.h
seccomp.h
securebits.h
sed-opal.h block: sed-opal: Add support to read/write opal tables generically 2019-11-04 07:11:31 -07:00
seg6_genl.h
seg6_hmac.h
seg6_iptunnel.h
seg6_local.h
seg6.h
selinux_netlink.h
sem.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
serial_core.h
serial_reg.h
serial.h
serio.h
shm.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
signal.h
signalfd.h
smc_diag.h
smc.h
smiapp.h
snmp.h
sock_diag.h
socket.h
sockios.h
sonet.h
sonypi.h
sound.h
soundcard.h
stat.h statx: define STATX_ATTR_VERITY 2019-11-13 12:15:34 -08:00
stddef.h
stm.h
string.h
suspend_ioctls.h
swab.h
switchtec_ioctl.h
sync_file.h
synclink.h
sysctl.h
sysinfo.h
target_core_user.h
taskstats.h
tcp_metrics.h
tcp.h
tee.h
termios.h
thermal.h
time_types.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
time.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
timerfd.h
times.h
timex.h
tiocl.h
tipc_config.h
tipc_netlink.h tipc: add support for AEAD key setting via netlink 2019-11-08 14:01:59 -08:00
tipc_sockets_diag.h
tipc.h tipc: add new AEAD key structure for user API 2019-11-08 14:01:59 -08:00
tls.h
toshiba.h
tty_flags.h
tty.h
types.h
udf_fs_i.h
udmabuf.h
udp.h
uhid.h
uinput.h
uio.h
uleds.h
ultrasound.h
un.h
unistd.h
unix_diag.h
usbdevice_fs.h
usbip.h
userfaultfd.h
userio.h
utime.h y2038: uapi: change __kernel_time_t to __kernel_old_time_t 2019-11-15 14:38:29 +01:00
utsname.h
uuid.h
uvcvideo.h
v4l2-common.h
v4l2-controls.h
v4l2-dv-timings.h
v4l2-mediabus.h
v4l2-subdev.h
vbox_err.h
vbox_vmmdev_types.h
vboxguest.h
veth.h
vfio_ccw.h
vfio.h
vhost_types.h
vhost.h
videodev2.h media: v4l2_core: Add p_area to struct v4l2_ext_control 2019-11-08 07:42:25 +01:00
virtio_9p.h
virtio_balloon.h
virtio_blk.h
virtio_config.h
virtio_console.h
virtio_crypto.h
virtio_fs.h
virtio_gpu.h
virtio_ids.h
virtio_input.h
virtio_iommu.h
virtio_mmio.h
virtio_net.h
virtio_pci.h
virtio_pmem.h
virtio_ring.h
virtio_rng.h
virtio_scsi.h
virtio_types.h
virtio_vsock.h
vm_sockets_diag.h
vm_sockets.h
vmcore.h
vsockmon.h
vt.h
vtpm_proxy.h
wait.h
watchdog.h
wimax.h
wireless.h
wmi.h
x25.h
xattr.h
xdp_diag.h
xfrm.h
xilinx-v4l2-controls.h
zorro_ids.h
zorro.h