Commit Graph

1566 Commits

Author SHA1 Message Date
Malhar Jajoo
b75bdff4a0 Trivial update for debug location in LIT test.
This just updates debug location of a loop in a LIT test to point
to the correct source line.
2022-01-27 19:07:47 +00:00
Congzhe Cao
f3e1f44340 [IVDescriptor] Get the exact FP instruction that does not allow reordering
This is a bugfix in IVDescriptor.cpp.

The helper function `RecurrenceDescriptor::getExactFPMathInst()`
is supposed to return the 1st FP instruction that does not allow
reordering. However, when constructing the RecurrenceDescriptor,
we trace the use-def chain staring from a PHI node and for each
instruction in the use-def chain, its descriptor overrides the
previous one. Therefore in the final RecurrenceDescriptor we
constructed, we lose previous FP instructions that does not allow
reordering.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D118073
2022-01-27 00:33:46 -05:00
Igor Kirillov
d3932c690d [LoopVectorize] Add tests with reductions that are stored in invariant address
This patch adds tests for functionality that is to be implemented in D110235.

Differential Revision: https://reviews.llvm.org/D117213
2022-01-24 21:26:38 +00:00
Florian Hahn
b2a8eff45c [LV] Make some tests more robust by adding missing users. 2022-01-24 13:04:09 +00:00
Florian Hahn
b7f69b8d46 [LV] Name values and blocks in same induction tests (NFC).
This reduces the churn in the test in future updates due to numbering
changes.
2022-01-24 12:28:43 +00:00
Kerry McLaughlin
8082ab2fc3 [LoopVectorize] Support epilogue vectorisation of loops with reductions
isCandidateForEpilogueVectorization will currently return false for loops
which contain reductions. This patch removes this restriction and makes
the following changes to support epilogue vectorisation with reductions:

- `fixReduction`: If fixReduction is being called during vectorisation of the
    epilogue, the phi node it creates will need to additionally carry incoming
     values from the middle block of the main loop.

- `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi
    created by fixReduction are updated after the vec.epilog.iter.check block
    is added. The phi is also moved to the preheader of the epilogue.

- `processLoop`: The start value of any VPReductionPHIRecipes are updated before
    vectorising the epilogue loop. The getResumeInstr function added to the ILV
    will return the resume instruction associated with the recurrence descriptor.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D116928
2022-01-24 12:03:31 +00:00
eopXD
3cf15af2da [RISCV] Remove experimental prefix from rvv-related extensions.
Extensions affected: +v, +zve*, +zvl*

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117860
2022-01-22 20:18:40 -08:00
Kerry McLaughlin
c740a07863 [LoopVectorize] Test in-loop reductions with tail folding for scalable vectors
Adds `-prefer-inloop-reductions` to the RUN line of sve-tail-folding.ll & adds
a new test where in-loop reductions cannot be used (`@cond_xor_reduction`). NFC.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D117578
2022-01-19 14:36:23 +00:00
David Sherwood
e781620dee [LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled
When SVE is enabled for AArch64 targets it makes more sense to use the
get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping
from the intrinsic to the 'whilelo' instruction for legal vector types.
This instruction neatly takes overflow into account as well. This patch
fixes an issue in VPInstruction::generateInstruction that assumed we are
only dealing with fixed-width vectors.

Differential Revision: https://reviews.llvm.org/D117109
2022-01-18 11:59:30 +00:00
Florian Hahn
524150fe07 [LV] Add test coverage for reductions with odd interleave counts.
Add test coverage for loops with reductions and odd (3, 5) interleave
counts.
2022-01-17 14:34:21 +00:00
Florian Hahn
4a6f475446 [LV] Make test more robust by adding users of inductions.
The modified tests didn't have actual users of all inductions, making it
trivial to eliminate them. Add users to make sure the inductions are
actually used in the vectorized version.
2022-01-17 13:28:59 +00:00
Kito Cheng
cc35161dc7 [RISCV] Add initial support for getRegUsageForType and getNumberOfRegisters
Those two TTI hooks are used during vectorization for calculating
register pressure, the default implementation isn't consider for LMUL,
and that's also definitly wrong value for register number (all register class
are 8 registers).

So in this patch we tried to:

1. Calculate right register usage for vector type and scalar type.
2. Return right number of register for general purpose register and
   vector register.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116890
2022-01-17 15:27:54 +08:00
Florian Hahn
070d1034da [LV] Restore metadata to disable runtime unrolling for epilogue loop.
After d4a8fc3a87 LV stopped adding metadata to disable runtime
unrolling to the vectorized epilogue loop. This was missed because
278aa65cc4 removed the relevant test coverage.

This patch fixes that by adding the relevant metadata after
vector loop generation.
2022-01-16 13:14:16 +00:00
Florian Hahn
ba3198cfd1 [IRBuilder] Migrate select-folding to value-based FoldSelect.
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D117228
2022-01-15 11:26:44 +00:00
Florian Hahn
42b34facfd Recommit "[LV] Inline CreateSplatIV call for scalar VFs."
This reverts the revert commit 073c27b5e5.

A reduced test case has been added in 5e4966cbae and the code has
been updated to handle the case where getInductionOpcode returns
BinaryOpsEnd. In this case, the original code was always using
Instruction::Add. Do the same in the patch.

Note this commit may slightly change the value naming, because it now
also assigns the 'induction' name in the floating point case.
2022-01-14 19:03:49 +00:00
Florian Hahn
5e4966cbae [LV] Add test with an integer induction based on a ptr one.
Reduced test case from the reproducer mentioned in
073c27b5e5.
2022-01-14 15:56:47 +00:00
James Y Knight
073c27b5e5 Revert "[LV] Inline CreateSplatIV call for scalar VFs (NFC)."
Causes a crash with the following (creduce'd) test-case:

clang -O3 '--target=aarch64-grtev4-linux-gnu' -xc - -c -o /dev/null <<EOF
int *e;
int f;
int g() {
  int h;
  int *j = 0;
  while (&f - j > 0) {
    int k;
    k = j;
    if (e == j && *e)
      k = 5;
    h = k;
    j++;
  }
  return h;
}
EOF

This reverts commit 7ce48be0fd.
2022-01-14 00:00:02 +00:00
Florian Hahn
7b9f5cbfa7 [LV] Extend check lines for pr34681.ll to cover foldable select. 2022-01-13 16:42:47 +00:00
Florian Hahn
3f2fb767e3 [VPlan] Make IV operand explicit for VPWidenCanonicalIVRecipe (NFC).
This makes the def-use relationship between VPCanonicalIVPHIRecipe and
VPWidenCanonicalIVRecipe explicit. Needed for D117140.
2022-01-13 11:13:05 +00:00
Florian Hahn
7ce48be0fd [LV] Inline CreateSplatIV call for scalar VFs (NFC).
This is a NFC change split off from D116123, as suggested there.
D116123 will remove the last user of CreateSplatIV.
2022-01-13 09:34:31 +00:00
Florian Hahn
d4a8fc3a87 [VPlan] Introduce and use BranchOnCount VPInstruction.
This patch adds a new BranchOnCount VPInstruction opcode with 2
operands. It first compares its 2 operands (increment of canonical
induction and vector trip count), followed by a branch to either the
exit block or back to the vector header.

It must be the last recipe in the exit block of the topmost vector loop
region.

This extracts parts from D113224 and was discussed in D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116479
2022-01-12 13:42:13 +00:00
Rosie Sumpter
552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
Florian Hahn
138fcc5f76 [IRBuilder] Migrate icmp-folding to value-based FoldICmp.
Depends on D116935.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D116969
2022-01-12 12:37:46 +00:00
Florian Hahn
7e68061305 [IRBuilder] Migrate add-folding to value-based FoldAdd.
Depends on D116935.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D116968
2022-01-12 09:24:46 +00:00
Florian Hahn
f0ef1ea6dd [IRBuilder] Introduce folder using inst-simplify, use for Or fold.
Alternative to D116817.

This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.

This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify `Or` instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D116935
2022-01-11 17:30:48 +00:00
David Sherwood
b0922a9dcd [LoopVectorize] Make VPWidenCanonicalIVRecipe::execute work for scalable vectors
The code in VPWidenCanonicalIVRecipe::execute only worked for fixed-width
vectors due to the way we generate the values per lane. This patch changes
the code to use a combination of vector splats and step vectors to get
the same result. This then works for both fixed-width and scalable vectors.

Tests that exercise this code path for scalable vectors have been added here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

Differential Revision: https://reviews.llvm.org/D113180
2022-01-10 14:12:32 +00:00
Florian Hahn
aecad5828e [SCEVExpander] Only create trunc when needed.
9345ab3a45 updated generateOverflowCheck to skip creating checks that
always evaluate to false. This in turn means that we only need to
create TruncTripCount if it is actually used.

Sink the TruncTripCount creating into ComputeEndCheck, so it is only
created when there's an actual check.
2022-01-10 11:31:27 +00:00
David Sherwood
e3c84fb948 [LoopVectorize] Add support for tail folding using scalable vectors
This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount
whereby we weren't correctly generating the runtime trip count
for scalable vectors when tail-folding.

It also removes some asserts in the tail-folding path for cases when
the VF is not scalable.

In this patch I have only permitted tail-folding to be enabled
explicitly for scalable vectors when the user has specified one
of the following flags:

  -prefer-predicate-over-epilogue=predicate-dont-vectorize
  -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue

For now it's best not to enable tail-folding with scalable vectors for
low trip counts or when optimising for code size, since there has been
no analysis on whether this is worth it.

Various tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
  Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll

The tests cannot be target independent because they require masked
load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore
need to return true.

Differential Revision: https://reviews.llvm.org/D113003
2022-01-10 10:55:40 +00:00
Florian Hahn
7f1bf68d7d [SCEVExpander] Only check overflow if it is needed.
9345ab3a45 updated generateOverflowCheck to skip creating checks that
always evaluate to false. This in turn means that we only need to check
for overflows if the result of the multiplication is actually used.

Sink the Or for the overflow check into ComputeEndCheck, so it is only
created when there's an actual check.
2022-01-09 12:55:41 +00:00
Florian Hahn
3b7b1a75b0 [LV] Improve check lines in existing tests.
Update the check lines in 2 existing tests to use patterns + variables
to match some IR to make them independent of value naming.
2022-01-08 20:46:31 +00:00
Florian Hahn
daa5e26312 [LV] Make tests more robust by removing undef.
Replace some uses of undef in the tests. The undef causes runtime checks
to be trivially fold/removeable, which does defeat the purpose of the tests.
2022-01-08 15:21:57 +00:00
Florian Hahn
9345ab3a45 [SCEVExpander] Skip creating <u 0 check, which is always false.
Unsigned compares of the form <u 0 are always false. Do not create such
a redundant check in generateOverflowCheck.

The patch introduces a new lambda to create the check, so we can
exit early conveniently and skip creating some instructions feeding the
check.

I am planning to sink a few additional instructions as follow-ups, but I
would prefer to do this separately, to keep the changes and diff
smaller.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D116811
2022-01-08 10:31:04 +00:00
Craig Topper
042394b69e [RISCV] Add a command line option to control the LMUL used by TTI's getRegisterBitWidth.
By default we return the width of an LMUL=1 register. We can enable
testing with larger LMUL values by returning a larger bit width.

This patch adds a RISCV specific option to provide a LMUL which will be
multiplied by the LMUL=1 bit width.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D116339
2022-01-07 20:02:10 -08:00
David Green
bc615e436c [AArch64] Update addo and subo costs
Similar to D116732, this adds basic scalar sadd_with_overflow,
uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for
aarch64, which are usually quite efficiently lowered.

Differential Revision: https://reviews.llvm.org/D116734
2022-01-07 16:20:23 +00:00
Florian Hahn
f395a4f8d5 [SCEVExpand] Only create required predicate checks.
Currently generateOverflowCheck always creates code for Step being
negative and positive, followed by a select at the end depending on
Step's sign.

This patch updates the code to only create either the checks for step
being positive or negative, if the sign is known.

Follow-up to D116696.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D116747
2022-01-07 14:49:02 +00:00
Florian Hahn
86d113a8b8 [SCEVExpand] Do not create redundant 'or false' for pred expansion.
This patch updates SCEVExpander::expandUnionPredicate to not create
redundant 'or false, x' instructions. While those are trivially
foldable, they can be easily avoided and hinder code that checks the
size/cost of the generated checks before further folds.

I am planning on look into a few other similar improvements to code
generated by SCEVExpander.

I remember a while ago @lebedev.ri working on doing some trivial folds
like that in IRBuilder itself, but there where concerns that such
changes may subtly break existing code.

Reviewed By: reames, lebedev.ri

Differential Revision: https://reviews.llvm.org/D116696
2022-01-06 11:52:19 +00:00
Sander de Smalen
95a93722db [LV] Remove what seems like stale code in collectElementTypesForWidening.
This was originally added in rG22174f5d5af1eb15b376c6d49e7925cbb7cca6be
although that patch doesn't really mention any reasons for ignoring the
pointer type in this calculation if the memory access isn't consecutive.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D115356
2022-01-05 12:20:59 +00:00
Florian Hahn
65c4d6191f [VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable.
At the moment, the primary induction variable for the vector loop is
created as part of the skeleton creation. This is tied to creating the
vector loop latch outside of VPlan. This prevents from modeling the
*whole* vector loop in VPlan, which in turn is required to model
preheader and exit blocks in VPlan as well.

This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the
primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for
VPInstruction to model the increment.

This allows us to partly retire createInductionVariable. At the moment,
a bit of patching up is done after executing all blocks in the plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D113223
2022-01-05 10:46:06 +00:00
Rosie Sumpter
961f51fdf0 [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores
For loops that contain in-loop reductions but no loads or stores, large
VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes
has no element types to check through and so returns the default widths
(-1U for the smallest and 8 for the widest). This results in the widest
VF being chosen for the following example,

float s = 0;
for (int i = 0; i < N; ++i)
  s += (float) i*i;

which, for more computationally intensive loops, leads to large loop
sizes when the operations end up being scalarized.

In this patch, for the case where ElementTypesInLoop is empty, the widest
type is determined by finding the smallest type used by recurrences in
the loop instead of falling back to a default value of 8 bits. This
results in the cost model choosing a more sensible VF for loops like
the one above.

Differential Revision: https://reviews.llvm.org/D113973
2022-01-04 10:12:57 +00:00
Florian Hahn
b1a333f0fe [VPlan] Don't consider VPWidenCanonicalIVRecipe phi-like.
VPWidenCanonicalIVRecipe does not create PHI instructions, so it does
not need to be placed in the phi section of a VPBasicBlock.

Also tidies the code so the WidenCanonicalIV recipe and the
compare/lane-masks are created in the header.

Discussed D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116473
2022-01-02 12:48:17 +00:00
Sanjay Patel
0c6979b2d6 [InstCombine] fold opposite shifts around an add
((X << C) + Y) >>u C --> (X + (Y >>u C)) & (-1 >>u C)

https://alive2.llvm.org/ce/z/DY9DPg

This replaces a shift with an 'and', and in the case
where the add has a constant operand, it eliminates
both shifts.

As noted in the TODO comment, we already have this fold when
the shifts are in the opposite order (and that code handles
bitwise logic ops too).

Fixes #52851
2021-12-30 12:01:06 -05:00
Sanjay Patel
fd9cd3408b Revert "[InstCombine] fold opposite shifts around an add"
This reverts commit 2e3e0a5c28.
Some unintended diffs snuck into this patch.
2021-12-30 11:54:55 -05:00
Sanjay Patel
2e3e0a5c28 [InstCombine] fold opposite shifts around an add
((X << C) + Y) >>u C --> (X + (Y >>u C)) & (-1 >>u C)

https://alive2.llvm.org/ce/z/DY9DPg

This replaces a shift with an 'and', and in the case
where the add has a constant operand, it eliminates
both shifts.

As noted in the TODO comment, we already have this fold when
the shifts are in the opposite order (and that code handles
bitwise logic ops too).

Fixes #52851
2021-12-30 11:52:29 -05:00
Craig Topper
a9486a40f7 [RISCV] Disable interleaving scalar loops in the loop vectorizer.
The loop vectorizer can interleave scalar loops even if it doesn't
vectorize them. I don't believe we intended to enable this when
we enabled interleaving for vector instructions.

Disable interleaving for VF=1 like X86 and AMDGPU already do. Test
lifted from AMDGPU.

Differential Revision: https://reviews.llvm.org/D115975
2021-12-23 08:37:24 -06:00
Florian Hahn
ede7c2438f [VPlan] Create header & latch blocks for skeleton up front (NFC).
By creating the header and latch blocks up front and adding blocks and
recipes in between those 2 blocks we ensure that the entry and exits of
the plan remain valid throughout construction.

In order to avoid test changes and keep printing of the plans the same,
we use the new header block instead of creating a new block on the first
iteration of the loop traversing the original loop.

We also fold the latch into its predecessor.

This is a follow up to a post-commit suggestion in D114586.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115793
2021-12-22 12:44:25 +00:00
Sander de Smalen
290ae657a6 Fix buildbot failure caused by D115651
I somehow missed updating the RUN line of this test.
2021-12-20 17:18:59 +00:00
Sander de Smalen
b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Florian Hahn
5b362e4c7f [VPlan] Add Debugloc to VPInstruction.
Upcoming changes require attaching debug locations to VPInstructions,
e.g. adding induction increment recipes in D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115123
2021-12-20 15:10:41 +00:00
Philip Reames
e6ad9ef4e7 [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

We chose the canonical type as i64 arbitrarily.  We might consider changing this to something else in the future if we have cause.

Differential Revision: https://reviews.llvm.org/D115387
2021-12-13 16:56:22 -08:00
Philip Reames
eb052f6b8f Reapply: Autogen more vectorizer tests in advance of D115387.
Drop changes to consecutive-ptr-uniforms.ll since that test checks boths IR output and debug messages.  I'd missed this in the original commit, and Florian pointed it out in post-commit review.

Original commit message:

These are the ones my first round of scripting couldn't handle that required a bit of manual messaging.  This should be the last batch in llvm-check.

This reverts commit bbba86764a.
2021-12-13 15:49:14 -08:00