VPReductionRecipe may be executed for scalar VFs. Make sure to access
part 0 of the condition, as it could be an active-lane-mask, which is a
vector <1 x i1>
Fixes https://github.com/llvm/llvm-project/issues/72720.
Reverse mask early on when populating BlockInMask. This will enable
separating mask management and address computation from the memory
recipes in the future and is also needed to enable explicit unrolling in
VPlan.
This an alternative to #69886. The basic problem is that SCEV can look
through trivial LCSSA phis. When the phi node later becomes non-trivial,
we do invalidate it, but this doesn't catch uses that are not covered by
the IR use-def walk, such as those in BECounts.
Fix this by adding a special invalidation method for LCSSA phis, which
will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the
LCSSA phi node and defined in the loop.
We should probably also use this invalidation method in other places
that add predecessors to exit blocks, such as loop unrolling and loop
peeling.
Fixes#69097.
Fixes#66616.
Fixes#63970.
Consistently add `branch_weights` metadata in any condition branch
created by `LoopVectorize.cpp`:
- Will only add metadata if the original loop-latch branch had metadata
assigned.
- Most checks should rarely trigger so I am using a 127:1 ratio.
- For the middle block we assume an equal distribution of modulo
results.
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
This patch moves creating the middle VPBBs and an initial empty
vector loop region for the top-level loop to createInitialVPlan.
This consolidates code to create the initial VPlan skeleton and enables
adding other bits outside the main region during initial VPlan
construction. In particular, D150398 will add the exit check & branch to
the middle block.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158333
VPReductionRecipe::execute was not handling predicates for ordered
reduction with scalar VFs, which was causing a crash. Thsi patch adds
dedicated handling for scalar VFs when dealing with the condition.
The other operands are already handled in a similar fashion below.
Fixes#70988.
Use KnownBits to infer the nneg flag on zext instructions.
Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
zext nneg was recently added to the IR in #67982. This patch teaches
demanded bits and known bits about the semantics of the instruction, and
adds a couple of test cases to illustrate basic functionality.
With the recent change 98c90a13 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint
and llvm.llrint, with vector codegen for the RISC-V target. Make a
trivial change to VectorUtils, and update the corresponding tests.
A couple of important fixes have been landed since the original patch
was landed and reverted, and it is now safe to re-land the patch:
5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and
fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in
xrint). See also #71399, which proves that lrint and llrint will indeed
produce vector codegen on RISC-V.
Fixes#55208.
Support recipes without underlying instruction in
collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null
the underlying value.
Fixes https://github.com/llvm/llvm-project/issues/70590.
There was a silly mistake in the expandBounds function that was using
the wrong type when calling expandCodeFor and always assuming the stride
is 64 bits. I've added the following test to defend this fix:
Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll
With the recent change 98c90a13 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint
and llvm.llrint, with vector codegen for the RISC-V target. Make a
trivial change to VectorUtils, and update the corresponding tests.
zext nneg was recently added to the IR in #67982. Teaching SCEVExpander
to emit nneg when possible is valuable since SCEV may have proved
non-trivial facts about loop bounds which would otherwise be lost when
materializing the value.
With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible to vectorize
llvm.lrint and llvm.llrint with a trivial change to VectorUtils. In
preparation for this change, and the corresponding test update, add a
negative test for lrint and llrint.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
* Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known
to be predicate tail-folded, delegating to `areRuntimeChecksProfitable`
to decide on the profitability of vectorizing loops with runtime checks.
* Update the `areRuntimeChecksProfitable` function to consider the
`ScalarEpilogueLowering` setting when assessing vectorization of a loop.
With this patch, we can make more informed decisions for loops with low
trip counts, especially when leveraging Profile-Guided Optimization
(PGO) data.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:
* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
Add simplification for redundant trunc(zext A) pairs. Generally apply a
transform from D149903.
Depends on D159200.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D159202
This patch enables scalable vectors in the VPlan-native path.
If a vectorization factor is specified via loop vectorization hints,
that factor is used. If no vectorization factor is specified, but the
target preferes scalable vectorization, a scalable vectorization factor
is selected.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D157484
This reverts commit e4ea099748.
The recommit fixes a reported crash by adding a missing check to make
sure the cast recipes are only introduced when vectorizing.
Test coverage added in 3cac608fbd.
Original commit message:
Update the code to create Trunc/Ext recipes directly in
adjustRecipesForReductions instead of fixing it up later in
fixReductions.
This explicitly models the required conversions and also makes sure they
are generated at the right place (instead of after the exit condition),
hence the changes in a few tests.
Update the code to create Trunc/Ext recipes directly in
adjustRecipesForReductions instead of fixing it up later in
fixReductions.
This explicitly models the required conversions and also makes sure they
are generated at the right place (instead of after the exit condition),
hence the changes in a few tests.
The tests introduced by https://reviews.llvm.org/D134719 and later
modified in https://reviews.llvm.org/D146839 are not testing LV in
isolation. This patch:
1. Assures that all tests test LV in isolation.
2. Adds LV tests using llvm intrinsics that have libm mappings.
llrint, llround and lrint are not included as currently IR verifier pass
does not allow to use vector types with them.
The test reduction.ll was introduced before utils/update_test_checks.py,
and hence contains hand-written CHECK lines. Revisit the test today, and
modernize it by:
- Removing extranous attributes on functions and their arguments, as
LoopVectorize doesn't even look at these attributes.
- Removing the target datalayout, as it is not essential for
LoopVectorize.
Finally, regenerate the CHECK lines using update_test_checks.py,
eliminating hand-written error-prone CHECK lines.
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.
We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449