This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.
This uses vscale_range for min/max vector width unless the command
line overrides are used.
As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139873
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.
Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.
The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961
Compile-time improvements:
NewPM-O3: -1.04%
NewPM-ReleaseThinLTO: -0.59%
NewPM-ReleaseLTO-g: -0.97%
https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:uFixes#40306.
Reviewed By: lebedev.ri, nikic
Differential Revision: https://reviews.llvm.org/D115261
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.
Differential Revision: https://reviews.llvm.org/D141003
This reverts commit aa2414729e.
Previously-valid IR from a tensorflow test case (as shown on the
Diffusion revision for aa2414729e) started
hanging in the loop-vectorize pass. Reverting to keep everyone working.
Check lines for some of these tests were regenerated. The difference
is that with opaque pointers SCEVExpander always emits i8 GEPs,
making the address calculation explicit. This is a known problem
that will be solved long term by making all address calculations
explicit.
Even if the the sink candidate is already in the target block, its
operands can be candidates for sinking. Queue them up as well. Also
moves the queuing logic to a helper.
Merging regions can enable new sinking opportunities (e.g. if users of a
scalar value are moved from different VPBBs into the same VPBB). Sinking
in turn can also enable new merging opportunities (e.g. if a recipe
between to merge-able regions is moved.
To enable more sinking opportunities, repeat sinking & merging if
regions could be merged.
Also fix mergeReplicateRegions to return the correct Changed status.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D139788
Add and run VPlan transform to fold blocks with a single predecessor
into the predecessor. This remove redundant blocks and addresses a TODO
to replace special handling for the vector latch VPBB.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D139927
Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.
The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once in more cases.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D133017
This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.
The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.
Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D135017
This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.
Differential Revision: https://reviews.llvm.org/D139342
This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.
Differential Revision: https://reviews.llvm.org/D139340
Code generation now uses the start VPValue of induction recipes.
This makes it possible to adjust the start value of the epilogue
vector loop to use the 'resume' value of the main vector loop.
Fixes#59459.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D92132
During scalar promotion, if there are additional potentially-aliasing
loads outside the promoted set, we can still perform a load-only
promotion. As the stores are retained, any potentially-aliasing
loads will still read the correct value.
This increases the number of load promotions in llvm-test-suite by
a factor of two:
| Old | New
licm.NumPromotionCandidates | 4448 | 6038
licm.NumLoadPromoted | 479 | 1069
licm.NumLoadStorePromoted | 1459 | 1459
Unfortunately, this does have some impact on compile-time:
http://llvm-compile-time-tracker.com/compare.php?from=57f7f0d6cf0706a88e1ecb74f3d3e8891cceabfa&to=72b811738148aab399966a0435f13b695da1c1c8&stat=instructions
In part this is because we now have less early bailouts from
promotion, but also due to second order effects (e.g. for one case
I looked at we spend more time in SLP now).
Differential Revision: https://reviews.llvm.org/D133192
In scalar plans, replicate recipes will only generate a single value per
UF, independent of whether they are uniform or not. So don't consider
uniformity for plans with scalar VFs only.
This allows us to handle a few additional cases in VPlan sinking instead
of non-VPlan sinkScalarOperands.
Depends on D133762.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D134218
The vectorizer has code to reject scalable vectorization of loops with very short trip counts, and instead use fixed length vectors. The current code doesn't account for the minimum vscale value known, and thus under estimates the number of lanes in the scalable type for RISCV's default configuration. This results in use of predication and a trivially dead loop where a single straight line piece of code would suffice.
Note that the code quality of the original scalable vectorization could (and probably should) be improved other ways as well. This patch is solely about whether the scalable vectorization was the right choice to begin with.
This bit of code - both with and without my change - does make the unchecked assumption that the target knows how to lower fixed length vectors whose length is provably less than the vector length.
Differential Revision: https://reviews.llvm.org/D137285
This reverts commit e71b81cab0.
As discussed in the planned follow-on to this patch (D138874),
this and the subsequent patches in this set can cause trouble for
the backend, and there's probably no quick fix. We may even
want to canonicalize in the opposite direction (towards insertelt).
A few more that I missed in commit 3528e63d89.
There could be more duplicates remaining, since I've only focused
on exactly duplicated "RUN: opt" lines (ignoring multi line RUN
lines ending with '\').
This patch extends VP-based sinking to also sink VPScalarStepsRecipe.
This takes us a step closer towards retiring the IR based sinking.
The main change is extending VPScalarIVStepsRecipe::execute to support
executing in a replicate-region.
Depends on D133758.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D133760
This reuses the routine implemented in 0e6f0b7 to implement several existing TODOs. Many of the operations scale linearly with LMUL; this change represents that in the cost model.
Differential Revision: https://reviews.llvm.org/D139039
The first attempt was reverted because a clang test changed
unexpectedly - the file is already marked with a FIXME, so
I just updated it this time to pass.
Original commit message:
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue
The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.
In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask
Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC
The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.
Differential Revision: https://reviews.llvm.org/D138872
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue
The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.
In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask
Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC
The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.
Differential Revision: https://reviews.llvm.org/D138872
This reverts commit bf15f1e489.
The updated version fixes a crash by checking the induction kind instead
of the opcode; for integer inductions, the step is always added, but the
opcode might not be set.
In revision B.q and before of the Armv8-M architecture reference
manual, the vector/scalar forms of the `vmla` and `vmlas` instructions
came in signed and unsigned integer forms, such as `vmla.s8 q0,q1,r2`
or `vmlas.u32 q3,q4,r5`.
Revision B.r has changed this. There are no longer signed and unsigned
versions of these instructions, since they were functionally identical
anyway. Now there is just `vmla.i8` (or `i16` or `i32`, and similarly
for `vmlas`). Bit 28 of the instruction encoding, which was previously
0 for signed or 1 for unsigned, is now expected to be 0 always.
This change updates LLVM to the new version of the architecture. The
obsoleted encodings for unsigned integers are now decoding errors, and
only the still-valid encoding is ever emitted. This shouldn't break
any existing assembly code, because the old signed and unsigned
versions of the mnemonic are still accepted by the assembler (which is
standard practice anyway for all signedness-agnostic MVE integer
instructions).
Reviewed By: dmgreen, lenary
Differential Revision: https://reviews.llvm.org/D138827
This patch implements getArithmeticInstrCost for RISCV, supports cost
model for integer and float vector arithmetic instructions.
Differential Revision: https://reviews.llvm.org/D133552 (Original patch by jacquesguan. Subset by me with todos added.)
This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D133758