442 lines
16 KiB
ReStructuredText
442 lines
16 KiB
ReStructuredText
|
.. _transformation-metadata:
|
||
|
|
||
|
============================
|
||
|
Code Transformation Metadata
|
||
|
============================
|
||
|
|
||
|
.. contents::
|
||
|
:local:
|
||
|
|
||
|
Overview
|
||
|
========
|
||
|
|
||
|
LLVM transformation passes can be controlled by attaching metadata to
|
||
|
the code to transform. By default, transformation passes use heuristics
|
||
|
to determine whether or not to perform transformations, and when doing
|
||
|
so, other details of how the transformations are applied (e.g., which
|
||
|
vectorization factor to select).
|
||
|
Unless the optimizer is otherwise directed, transformations are applied
|
||
|
conservatively. This conservatism generally allows the optimizer to
|
||
|
avoid unprofitable transformations, but in practice, this results in the
|
||
|
optimizer not applying transformations that would be highly profitable.
|
||
|
|
||
|
Frontends can give additional hints to LLVM passes on which
|
||
|
transformations they should apply. This can be additional knowledge that
|
||
|
cannot be derived from the emitted IR, or directives passed from the
|
||
|
user/programmer. OpenMP pragmas are an example of the latter.
|
||
|
|
||
|
If any such metadata is dropped from the program, the code's semantics
|
||
|
must not change.
|
||
|
|
||
|
Metadata on Loops
|
||
|
=================
|
||
|
|
||
|
Attributes can be attached to loops as described in :ref:`llvm.loop`.
|
||
|
Attributes can describe properties of the loop, disable transformations,
|
||
|
force specific transformations and set transformation options.
|
||
|
|
||
|
Because metadata nodes are immutable (with the exception of
|
||
|
``MDNode::replaceOperandWith`` which is dangerous to use on uniqued
|
||
|
metadata), in order to add or remove a loop attributes, a new ``MDNode``
|
||
|
must be created and assigned as the new ``llvm.loop`` metadata. Any
|
||
|
connection between the old ``MDNode`` and the loop is lost. The
|
||
|
``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e.
|
||
|
the loop effectively gets a new identifier. For instance,
|
||
|
``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if
|
||
|
the parallel access property is to be preserved after adding/removing
|
||
|
loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be
|
||
|
updated to the new LoopID.
|
||
|
|
||
|
Transformation Metadata Structure
|
||
|
=================================
|
||
|
|
||
|
Some attributes describe code transformations (unrolling, vectorizing,
|
||
|
loop distribution, etc.). They can either be a hint to the optimizer
|
||
|
that a transformation might be beneficial, instruction to use a specific
|
||
|
option, , or convey a specific request from the user (such as
|
||
|
``#pragma clang loop`` or ``#pragma omp simd``).
|
||
|
|
||
|
If a transformation is forced but cannot be carried-out for any reason,
|
||
|
an optimization-missed warning must be emitted. Semantic information
|
||
|
such as a transformation being safe (e.g.
|
||
|
``llvm.mem.parallel_loop_access``) can be unused by the optimizer
|
||
|
without generating a warning.
|
||
|
|
||
|
Unless explicitly disabled, any optimization pass may heuristically
|
||
|
determine whether a transformation is beneficial and apply it. If
|
||
|
metadata for another transformation was specified, applying a different
|
||
|
transformation before it might be inadvertent due to being applied on a
|
||
|
different loop or the loop not existing anymore. To avoid having to
|
||
|
explicitly disable an unknown number of passes, the attribute
|
||
|
``llvm.loop.disable_nonforced`` disables all optional, high-level,
|
||
|
restructuring transformations.
|
||
|
|
||
|
The following example avoids the loop being altered before being
|
||
|
vectorized, for instance being unrolled.
|
||
|
|
||
|
.. code-block:: llvm
|
||
|
|
||
|
br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0
|
||
|
...
|
||
|
!0 = distinct !{!0, !1, !2}
|
||
|
!1 = !{!"llvm.loop.vectorize.enable", i1 true}
|
||
|
!2 = !{!"llvm.loop.disable_nonforced"}
|
||
|
|
||
|
After a transformation is applied, follow-up attributes are set on the
|
||
|
transformed and/or new loop(s). This allows additional attributes
|
||
|
including followup-transformations to be specified. Specifying multiple
|
||
|
transformations in the same metadata node is possible for compatibility
|
||
|
reasons, but their execution order is undefined. For instance, when
|
||
|
``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are
|
||
|
specified at the same time, unrolling may occur either before or after
|
||
|
vectorization.
|
||
|
|
||
|
As an example, the following instructs a loop to be vectorized and only
|
||
|
then unrolled.
|
||
|
|
||
|
.. code-block:: llvm
|
||
|
|
||
|
!0 = distinct !{!0, !1, !2, !3}
|
||
|
!1 = !{!"llvm.loop.vectorize.enable", i1 true}
|
||
|
!2 = !{!"llvm.loop.disable_nonforced"}
|
||
|
!3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}}
|
||
|
|
||
|
If, and only if, no followup is specified, the pass may add attributes itself.
|
||
|
For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and
|
||
|
all attributes from the original loop excluding its loop vectorizer
|
||
|
attributes. To avoid this, an empty followup attribute can be used, e.g.
|
||
|
|
||
|
.. code-block:: llvm
|
||
|
|
||
|
!3 = !{!"llvm.loop.vectorize.followup_vectorized"}
|
||
|
|
||
|
The followup attributes of a transformation that cannot be applied will
|
||
|
never be added to a loop and are therefore effectively ignored. This means
|
||
|
that any followup-transformation in such attributes requires that its
|
||
|
prior transformations are applied before the followup-transformation.
|
||
|
The user should receive a warning about the first transformation in the
|
||
|
transformation chain that could not be applied if it a forced
|
||
|
transformation. All following transformations are skipped.
|
||
|
|
||
|
Pass-Specific Transformation Metadata
|
||
|
=====================================
|
||
|
|
||
|
Transformation options are specific to each transformation. In the
|
||
|
following, we present the model for each LLVM loop optimization pass and
|
||
|
the metadata to influence them.
|
||
|
|
||
|
Loop Vectorization and Interleaving
|
||
|
-----------------------------------
|
||
|
|
||
|
Loop vectorization and interleaving is interpreted as a single
|
||
|
transformation. It is interpreted as forced if
|
||
|
``!{"llvm.loop.vectorize.enable", i1 true}`` is set.
|
||
|
|
||
|
Assuming the pre-vectorization loop is
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
for (int i = 0; i < n; i+=1) // original loop
|
||
|
Stmt(i);
|
||
|
|
||
|
then the code after vectorization will be approximately (assuming an
|
||
|
SIMD width of 4):
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
int i = 0;
|
||
|
if (rtc) {
|
||
|
for (; i + 3 < n; i+=4) // vectorized/interleaved loop
|
||
|
Stmt(i:i+3);
|
||
|
}
|
||
|
for (; i < n; i+=1) // epilogue loop
|
||
|
Stmt(i);
|
||
|
|
||
|
where ``rtc`` is a generated runtime check.
|
||
|
|
||
|
``llvm.loop.vectorize.followup_vectorized`` will set the attributes for
|
||
|
the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is
|
||
|
combined with the original loop's attributes to avoid it being
|
||
|
vectorized multiple times.
|
||
|
|
||
|
``llvm.loop.vectorize.followup_epilogue`` will set the attributes for
|
||
|
the remainder loop. If not specified, it will have the original loop's
|
||
|
attributes combined with ``llvm.loop.isvectorized`` and
|
||
|
``llvm.loop.unroll.runtime.disable`` (unless the original loop already
|
||
|
has unroll metadata).
|
||
|
|
||
|
The attributes specified by ``llvm.loop.vectorize.followup_all`` are
|
||
|
added to both loops.
|
||
|
|
||
|
When using a follow-up attribute, it replaces any automatically deduced
|
||
|
attributes for the generated loop in question. Therefore it is
|
||
|
recommended to add ``llvm.loop.isvectorized`` to
|
||
|
``llvm.loop.vectorize.followup_all`` which avoids that the loop
|
||
|
vectorizer tries to optimize the loops again.
|
||
|
|
||
|
Loop Unrolling
|
||
|
--------------
|
||
|
|
||
|
Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}``
|
||
|
metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``)
|
||
|
is present. Unrolling can be full unrolling, partial unrolling of a loop
|
||
|
with constant trip count or runtime unrolling of a loop with a trip
|
||
|
count unknown at compile-time.
|
||
|
|
||
|
If the loop has been unrolled fully, there is no followup-loop. For
|
||
|
partial/runtime unrolling, the original loop of
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
for (int i = 0; i < n; i+=1) // original loop
|
||
|
Stmt(i);
|
||
|
|
||
|
is transformed into (using an unroll factor of 4):
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
int i = 0;
|
||
|
for (; i + 3 < n; i+=4) { // unrolled loop
|
||
|
Stmt(i);
|
||
|
Stmt(i+1);
|
||
|
Stmt(i+2);
|
||
|
Stmt(i+3);
|
||
|
}
|
||
|
for (; i < n; i+=1) // remainder loop
|
||
|
Stmt(i);
|
||
|
|
||
|
``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of
|
||
|
the unrolled loop. If not specified, the attributes of the original loop
|
||
|
without the ``llvm.loop.unroll.*`` attributes are copied and
|
||
|
``llvm.loop.unroll.disable`` added to it.
|
||
|
|
||
|
``llvm.loop.unroll.followup_remainder`` defines the attributes of the
|
||
|
remainder loop. If not specified the remainder loop will have no
|
||
|
attributes. The remainder loop might not be present due to being fully
|
||
|
unrolled in which case this attribute has no effect.
|
||
|
|
||
|
Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the
|
||
|
unrolled and remainder loops.
|
||
|
|
||
|
To avoid that the partially unrolled loop is unrolled again, it is
|
||
|
recommended to add ``llvm.loop.unroll.disable`` to
|
||
|
``llvm.loop.unroll.followup_all``. If no follow-up attribute specified
|
||
|
for a generated loop, it is added automatically.
|
||
|
|
||
|
Unroll-And-Jam
|
||
|
--------------
|
||
|
|
||
|
Unroll-and-jam uses the following transformation model (here with an
|
||
|
unroll factor if 2). Currently, it does not support a fallback version
|
||
|
when the transformation is unsafe.
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
for (int i = 0; i < n; i+=1) { // original outer loop
|
||
|
Fore(i);
|
||
|
for (int j = 0; j < m; j+=1) // original inner loop
|
||
|
SubLoop(i, j);
|
||
|
Aft(i);
|
||
|
}
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
int i = 0;
|
||
|
for (; i + 1 < n; i+=2) { // unrolled outer loop
|
||
|
Fore(i);
|
||
|
Fore(i+1);
|
||
|
for (int j = 0; j < m; j+=1) { // unrolled inner loop
|
||
|
SubLoop(i, j);
|
||
|
SubLoop(i+1, j);
|
||
|
}
|
||
|
Aft(i);
|
||
|
Aft(i+1);
|
||
|
}
|
||
|
for (; i < n; i+=1) { // remainder outer loop
|
||
|
Fore(i);
|
||
|
for (int j = 0; j < m; j+=1) // remainder inner loop
|
||
|
SubLoop(i, j);
|
||
|
Aft(i);
|
||
|
}
|
||
|
|
||
|
``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes
|
||
|
of the unrolled outer loop. If not specified, the attributes of the
|
||
|
original outer loop without the ``llvm.loop.unroll.*`` attributes are
|
||
|
copied and ``llvm.loop.unroll.disable`` added to it.
|
||
|
|
||
|
``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes
|
||
|
of the unrolled inner loop. If not specified, the attributes of the
|
||
|
original inner loop are used unchanged.
|
||
|
|
||
|
``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop
|
||
|
attributes of the outer remainder loop. If not specified it will not
|
||
|
have any attributes. The remainder loop might not be present due to
|
||
|
being fully unrolled.
|
||
|
|
||
|
``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop
|
||
|
attributes of the inner remainder loop. If not specified it will have
|
||
|
the attributes of the original inner loop. It the outer remainder loop
|
||
|
is unrolled, the inner remainder loop might be present multiple times.
|
||
|
|
||
|
Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are
|
||
|
added to all of the aforementioned output loops.
|
||
|
|
||
|
To avoid that the unrolled loop is unrolled again, it is
|
||
|
recommended to add ``llvm.loop.unroll.disable`` to
|
||
|
``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam
|
||
|
as well as an additional inner loop unrolling. If no follow-up
|
||
|
attribute specified for a generated loop, it is added automatically.
|
||
|
|
||
|
Loop Distribution
|
||
|
-----------------
|
||
|
|
||
|
The LoopDistribution pass tries to separate vectorizable parts of a loop
|
||
|
from the non-vectorizable part (which otherwise would make the entire
|
||
|
loop non-vectorizable). Conceptually, it transforms a loop such as
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
for (int i = 1; i < n; i+=1) { // original loop
|
||
|
A[i] = i;
|
||
|
B[i] = 2 + B[i];
|
||
|
C[i] = 3 + C[i - 1];
|
||
|
}
|
||
|
|
||
|
into the following code:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
if (rtc) {
|
||
|
for (int i = 1; i < n; i+=1) // coincident loop
|
||
|
A[i] = i;
|
||
|
for (int i = 1; i < n; i+=1) // coincident loop
|
||
|
B[i] = 2 + B[i];
|
||
|
for (int i = 1; i < n; i+=1) // sequential loop
|
||
|
C[i] = 3 + C[i - 1];
|
||
|
} else {
|
||
|
for (int i = 1; i < n; i+=1) { // fallback loop
|
||
|
A[i] = i;
|
||
|
B[i] = 2 + B[i];
|
||
|
C[i] = 3 + C[i - 1];
|
||
|
}
|
||
|
}
|
||
|
|
||
|
where ``rtc`` is a generated runtime check.
|
||
|
|
||
|
``llvm.loop.distribute.followup_coincident`` sets the loop attributes of
|
||
|
all loops without loop-carried dependencies (i.e. vectorizable loops).
|
||
|
There might be more than one such loops. If not defined, the loops will
|
||
|
inherit the original loop's attributes.
|
||
|
|
||
|
``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the
|
||
|
loop with potentially unsafe dependencies. There should be at most one
|
||
|
such loop. If not defined, the loop will inherit the original loop's
|
||
|
attributes.
|
||
|
|
||
|
``llvm.loop.distribute.followup_fallback`` defines the loop attributes
|
||
|
for the fallback loop, which is a copy of the original loop for when
|
||
|
loop versioning is required. If undefined, the fallback loop inherits
|
||
|
all attributes from the original loop.
|
||
|
|
||
|
Attributes defined in ``llvm.loop.distribute.followup_all`` are added to
|
||
|
all of the aforementioned output loops.
|
||
|
|
||
|
It is recommended to add ``llvm.loop.disable_nonforced`` to
|
||
|
``llvm.loop.distribute.followup_fallback``. This avoids that the
|
||
|
fallback version (which is likely never executed) is further optimized
|
||
|
which would increase the code size.
|
||
|
|
||
|
Versioning LICM
|
||
|
---------------
|
||
|
|
||
|
The pass hoists code out of loops that are only loop-invariant when
|
||
|
dynamic conditions apply. For instance, it transforms the loop
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
for (int i = 0; i < n; i+=1) // original loop
|
||
|
A[i] = B[0];
|
||
|
|
||
|
into:
|
||
|
|
||
|
.. code-block:: c
|
||
|
|
||
|
if (rtc) {
|
||
|
auto b = B[0];
|
||
|
for (int i = 0; i < n; i+=1) // versioned loop
|
||
|
A[i] = b;
|
||
|
} else {
|
||
|
for (int i = 0; i < n; i+=1) // unversioned loop
|
||
|
A[i] = B[0];
|
||
|
}
|
||
|
|
||
|
The runtime condition (``rtc``) checks that the array ``A`` and the
|
||
|
element `B[0]` do not alias.
|
||
|
|
||
|
Currently, this transformation does not support followup-attributes.
|
||
|
|
||
|
Loop Interchange
|
||
|
----------------
|
||
|
|
||
|
Currently, the ``LoopInterchange`` pass does not use any metadata.
|
||
|
|
||
|
Ambiguous Transformation Order
|
||
|
==============================
|
||
|
|
||
|
If there multiple transformations defined, the order in which they are
|
||
|
executed depends on the order in LLVM's pass pipeline, which is subject
|
||
|
to change. The default optimization pipeline (anything higher than
|
||
|
``-O0``) has the following order.
|
||
|
|
||
|
When using the legacy pass manager:
|
||
|
|
||
|
- LoopInterchange (if enabled)
|
||
|
- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
|
||
|
- VersioningLICM (if enabled)
|
||
|
- LoopDistribute
|
||
|
- LoopVectorizer
|
||
|
- LoopUnrollAndJam (if enabled)
|
||
|
- LoopUnroll (partial and runtime unrolling)
|
||
|
|
||
|
When using the legacy pass manager with LTO:
|
||
|
|
||
|
- LoopInterchange (if enabled)
|
||
|
- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
|
||
|
- LoopVectorizer
|
||
|
- LoopUnroll (partial and runtime unrolling)
|
||
|
|
||
|
When using the new pass manager:
|
||
|
|
||
|
- SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
|
||
|
- LoopDistribute
|
||
|
- LoopVectorizer
|
||
|
- LoopUnrollAndJam (if enabled)
|
||
|
- LoopUnroll (partial and runtime unrolling)
|
||
|
|
||
|
Leftover Transformations
|
||
|
========================
|
||
|
|
||
|
Forced transformations that have not been applied after the last
|
||
|
transformation pass should be reported to the user. The transformation
|
||
|
passes themselves cannot be responsible for this reporting because they
|
||
|
might not be in the pipeline, there might be multiple passes able to
|
||
|
apply a transformation (e.g. ``LoopInterchange`` and Polly) or a
|
||
|
transformation attribute may be 'hidden' inside another passes' followup
|
||
|
attribute.
|
||
|
|
||
|
The pass ``-transform-warning`` (``WarnMissedTransformationsPass``)
|
||
|
emits such warnings. It should be placed after the last transformation
|
||
|
pass.
|
||
|
|
||
|
The current pass pipeline has a fixed order in which transformations
|
||
|
passes are executed. A transformation can be in the followup of a pass
|
||
|
that is executed later and thus leftover. For instance, a loop nest
|
||
|
cannot be distributed and then interchanged with the current pass
|
||
|
pipeline. The loop distribution will execute, but there is no loop
|
||
|
interchange pass following such that any loop interchange metadata will
|
||
|
be ignored. The ``-transform-warning`` should emit a warning in this
|
||
|
case.
|
||
|
|
||
|
Future versions of LLVM may fix this by executing transformations using
|
||
|
a dynamic ordering.
|