docs: scheduler: Convert schedutil.txt to ReST
All other scheduler documents have been converted to *.rst. Let's do the same for schedutil.txt. Also fixed some typos. Signed-off-by: Tang Yizhou <tangyizhou@huawei.com> Link: https://lore.kernel.org/r/20220312070751.16844-1-tangyizhou@huawei.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
ff1368763b
commit
b57b849688
@ -14,6 +14,7 @@ Linux Scheduler
|
|||||||
sched-domains
|
sched-domains
|
||||||
sched-capacity
|
sched-capacity
|
||||||
sched-energy
|
sched-energy
|
||||||
|
schedutil
|
||||||
sched-nice-design
|
sched-nice-design
|
||||||
sched-rt-group
|
sched-rt-group
|
||||||
sched-stats
|
sched-stats
|
||||||
|
@ -1,11 +1,15 @@
|
|||||||
|
=========
|
||||||
|
Schedutil
|
||||||
|
=========
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
NOTE; all this assumes a linear relation between frequency and work capacity,
|
All this assumes a linear relation between frequency and work capacity,
|
||||||
we know this is flawed, but it is the best workable approximation.
|
we know this is flawed, but it is the best workable approximation.
|
||||||
|
|
||||||
|
|
||||||
PELT (Per Entity Load Tracking)
|
PELT (Per Entity Load Tracking)
|
||||||
-------------------------------
|
===============================
|
||||||
|
|
||||||
With PELT we track some metrics across the various scheduler entities, from
|
With PELT we track some metrics across the various scheduler entities, from
|
||||||
individual tasks to task-group slices to CPU runqueues. As the basis for this
|
individual tasks to task-group slices to CPU runqueues. As the basis for this
|
||||||
@ -38,8 +42,8 @@ while 'runnable' will increase to reflect the amount of contention.
|
|||||||
For more detail see: kernel/sched/pelt.c
|
For more detail see: kernel/sched/pelt.c
|
||||||
|
|
||||||
|
|
||||||
Frequency- / CPU Invariance
|
Frequency / CPU Invariance
|
||||||
---------------------------
|
==========================
|
||||||
|
|
||||||
Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU
|
Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU
|
||||||
for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on
|
for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on
|
||||||
@ -47,7 +51,7 @@ a big CPU, we allow architectures to scale the time delta with two ratios, one
|
|||||||
Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
|
Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
|
||||||
|
|
||||||
For simple DVFS architectures (where software is in full control) we trivially
|
For simple DVFS architectures (where software is in full control) we trivially
|
||||||
compute the ratio as:
|
compute the ratio as::
|
||||||
|
|
||||||
f_cur
|
f_cur
|
||||||
r_dvfs := -----
|
r_dvfs := -----
|
||||||
@ -55,7 +59,7 @@ compute the ratio as:
|
|||||||
|
|
||||||
For more dynamic systems where the hardware is in control of DVFS we use
|
For more dynamic systems where the hardware is in control of DVFS we use
|
||||||
hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio.
|
hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio.
|
||||||
For Intel specifically, we use:
|
For Intel specifically, we use::
|
||||||
|
|
||||||
APERF
|
APERF
|
||||||
f_cur := ----- * P0
|
f_cur := ----- * P0
|
||||||
@ -87,7 +91,7 @@ For more detail see:
|
|||||||
|
|
||||||
|
|
||||||
UTIL_EST / UTIL_EST_FASTUP
|
UTIL_EST / UTIL_EST_FASTUP
|
||||||
--------------------------
|
==========================
|
||||||
|
|
||||||
Because periodic tasks have their averages decayed while they sleep, even
|
Because periodic tasks have their averages decayed while they sleep, even
|
||||||
though when running their expected utilization will be the same, they suffer a
|
though when running their expected utilization will be the same, they suffer a
|
||||||
@ -106,7 +110,7 @@ For more detail see: kernel/sched/fair.c:util_est_dequeue()
|
|||||||
|
|
||||||
|
|
||||||
UCLAMP
|
UCLAMP
|
||||||
------
|
======
|
||||||
|
|
||||||
It is possible to set effective u_min and u_max clamps on each CFS or RT task;
|
It is possible to set effective u_min and u_max clamps on each CFS or RT task;
|
||||||
the runqueue keeps an max aggregate of these clamps for all running tasks.
|
the runqueue keeps an max aggregate of these clamps for all running tasks.
|
||||||
@ -115,7 +119,7 @@ For more detail see: include/uapi/linux/sched/types.h
|
|||||||
|
|
||||||
|
|
||||||
Schedutil / DVFS
|
Schedutil / DVFS
|
||||||
----------------
|
================
|
||||||
|
|
||||||
Every time the scheduler load tracking is updated (task wakeup, task
|
Every time the scheduler load tracking is updated (task wakeup, task
|
||||||
migration, time progression) we call out to schedutil to update the hardware
|
migration, time progression) we call out to schedutil to update the hardware
|
||||||
@ -123,7 +127,7 @@ DVFS state.
|
|||||||
|
|
||||||
The basis is the CPU runqueue's 'running' metric, which per the above it is
|
The basis is the CPU runqueue's 'running' metric, which per the above it is
|
||||||
the frequency invariant utilization estimate of the CPU. From this we compute
|
the frequency invariant utilization estimate of the CPU. From this we compute
|
||||||
a desired frequency like:
|
a desired frequency like::
|
||||||
|
|
||||||
max( running, util_est ); if UTIL_EST
|
max( running, util_est ); if UTIL_EST
|
||||||
u_cfs := { running; otherwise
|
u_cfs := { running; otherwise
|
||||||
@ -135,7 +139,7 @@ a desired frequency like:
|
|||||||
|
|
||||||
f_des := min( f_max, 1.25 u * f_max )
|
f_des := min( f_max, 1.25 u * f_max )
|
||||||
|
|
||||||
XXX IO-wait; when the update is due to a task wakeup from IO-completion we
|
XXX IO-wait: when the update is due to a task wakeup from IO-completion we
|
||||||
boost 'u' above.
|
boost 'u' above.
|
||||||
|
|
||||||
This frequency is then used to select a P-state/OPP or directly munged into a
|
This frequency is then used to select a P-state/OPP or directly munged into a
|
||||||
@ -153,7 +157,7 @@ For more information see: kernel/sched/cpufreq_schedutil.c
|
|||||||
|
|
||||||
|
|
||||||
NOTES
|
NOTES
|
||||||
-----
|
=====
|
||||||
|
|
||||||
- On low-load scenarios, where DVFS is most relevant, the 'running' numbers
|
- On low-load scenarios, where DVFS is most relevant, the 'running' numbers
|
||||||
will closely reflect utilization.
|
will closely reflect utilization.
|
Loading…
x
Reference in New Issue
Block a user