Commit Graph

1401 Commits

Author SHA1 Message Date
Florian Hahn
ec1f6f7e3f Revert "[LAA] Support pointer phis in loop by analyzing each incoming pointer."
This reverts commit 1ed7f8ede5.

This change can cause loop-distribute to crash in some cases. Revert
until I have more time to wrap up a fix.

See  PR50296, PR5028 and D102266.
2021-05-28 10:33:52 +01:00
Florian Hahn
38641ddf3e [VPlan] Do not sink uniform recipes in sinkScalarOperands.
For uniform ReplicateRecipes, only the first lane should be used, so
sinking them would mean we have to compute the value of the first lane
multiple times. Also, at the moment, sinking them causes a crash because
the value of the first lane is re-used by all users.

Reported post-commit for D100258.
2021-05-27 14:07:48 +01:00
Kerry McLaughlin
9f76a85260 [LoopVectorize] Enable strict reductions when allowReordering() returns false
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

  bool allowReordering() const {
    // When enabling loop hints are provided we allow the vectorizer to change
    // the order of operations that is given by the scalar loop. This is not
    // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101836
2021-05-26 13:59:12 +01:00
Kerry McLaughlin
6b0fe3c63b [NFC] Add CHECK lines for unordered FP reductions
An additional RUN line has been added to both strict-fadd.ll &
scalable-strict-fadd.ll to ensure the correct behaviour of these
tests where `-enable-strict-reductions` is false.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D103015
2021-05-26 11:00:20 +01:00
serge-sans-paille
4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille
bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Florian Hahn
65d3dd7c88 [VPlan] Add first VPlan version of sinkScalarOperands.
This patch adds a first VPlan-based implementation of sinking of scalar
operands.

The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.

The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100258
2021-05-24 15:29:58 +01:00
Sander de Smalen
1e6630311c NFC: cleaned up and renamed scalable-vf-analysis.ll -> scalable-vectorization.ll
* Removes unnecessary loop hints.
* Use RUN line with '-scalable-vectorization=preferred' instead of 'on'
  for the maximize-bandwidth behaviour. This prepares the test for enabling
  scalable vectorization; With a forced instruction-cost of 1, 'on' will
  always favour fixed-width VF to be chosen, whereas with 'preferred'
  we can check that the maximize-bandwidth option in combination with
  scalable-vectorization=preferred actually picks a scalable VF.
* Renamed to scalable-vectorization.ll, because a follow-up patch will
  test more than just analysis.
2021-05-23 19:53:51 +01:00
Simon Pilgrim
2fca555866 [CostModel][X86] Improve fneg costs
These are always lowered as xor ops, so are always cheap
2021-05-21 17:23:45 +01:00
Luke
1595994b28 [RISCV] Add legality check for vectorizing reduction
Check if it is legal to vectorize reduction.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99509
2021-05-20 17:45:46 +08:00
Sander de Smalen
4f86aa650c [LV] Add -scalable-vectorization=<option> flag.
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.

Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.

The possible options are:
- Disabled:   Disables vectorization with scalable vectors.
- Enabled:    Vectorize loops using scalable vectors or fixed-width
              vectors, but favors fixed-width vectors when the cost
              is a tie.
- Preferred:  Like 'Enabled', but favoring scalable vectors when the
              cost-model is inconclusive.

Reviewed By: paulwalker-arm, vkmr

Differential Revision: https://reviews.llvm.org/D101945
2021-05-19 10:40:56 +01:00
Florian Hahn
fff84d3a2e [LV] Add test which sinks a load a across an aliasing store. 2021-05-18 12:25:57 +01:00
Sander de Smalen
81fdc73e5d [LV] Return both fixed and scalable Max VF from computeMaxVF.
This patch introduces a new class, MaxVFCandidates, that holds the
maximum vectorization factors that have been computed for both scalable
and fixed-width vectors.

This patch is intended to be NFC for fixed-width vectors, although
considering a scalable max VF (which is disabled by default) pessimises
tail-loop elimination, since it can no longer determine if any chosen VF
(less than fixed/scalable MaxVFs) is guaranteed to handle all vector
iterations if the trip-count is known. This issue will be addressed in
a future patch.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D98721
2021-05-18 08:03:48 +01:00
Philip Reames
ed9d70781b Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)"
This reverts commit 6d3e3ae8a9.

Still seeing PPC build bot failures, and one arm self host bot failing.  I'm officially stumped, and need help from a bot owner to reduce.
2021-05-17 20:53:28 -07:00
Philip Reames
6d3e3ae8a9 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)
Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:59:25 -07:00
Philip Reames
d16da7343d Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit c23ce54b36.  I apparently missed some newly added non-x86 tests.
2021-05-17 16:49:32 -07:00
Philip Reames
c23ce54b36 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:33:56 -07:00
Florian Hahn
68d52f0dbe [LV] Add another more complex first-order recurrence sinking test. 2021-05-14 21:26:34 +01:00
Florian Hahn
c62f984814 [LV] Add a few more complex first-order recurrence tests. 2021-05-14 17:27:17 +01:00
Florian Hahn
bdada7546e [VPlan] Adjust assert in splitBlock to allow splitting at end.
SplitAt should only be dereferenced in the assert if it does not point
to the end of the block. This fixes a crash in the added test case.
2021-05-13 13:36:35 +01:00
David Sherwood
b7a11274f9 [LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors
In InnerLoopVectorizer::widenPHIInstruction there are cases where we have
to scalarise a pointer induction variable after vectorisation. For scalable
vectors we already deal with the case where the pointer induction variable
is uniform, but we currently crash if not uniform. For fixed width vectors
we calculate every lane of the scalarised pointer induction variable for a
given VF, however this cannot work for scalable vectors. In this case I
have added support for caching the whole vector value for each unrolled
part so that we can always extract an arbitrary element. Additionally, we
still continue to cache the known minimum number of lanes too in order
to improve code quality by avoiding an extractelement operation.

I have adapted an existing test `pointer_iv_mixed` from the file:

  Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

and added it here for scalable vectors instead:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D101294
2021-05-12 11:02:11 +01:00
Florian Hahn
faebc6bf10 [VPlan] Register recipe for instr if the simplified value is recipe.
If the simplified VPValue is a recipe, we need to register it for Instr,
in case it needs to be recorded. The way this is handled in general may
change soon, following some post-commit comments.

This fixes PR50298.
2021-05-11 14:32:34 +01:00
Florian Hahn
93a9a8a8d9 [VecLib] Add support for vector fns from Darwin's libsystem.
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.

This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D101856
2021-05-10 21:19:58 +01:00
Florian Hahn
2bf34c0a93 [VPlan] Add test for sink scalars and merging using VPlan.
Add a couple of tests with scalars that can be sunk to their predicated
users.

This pre-commits tests for D100258.
2021-05-08 16:47:48 +01:00
Caroline Concatto
cf06c8eee3 [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.

Differential Revision: https://reviews.llvm.org/D101260
2021-05-07 09:37:37 +01:00
David Green
4979c90458 [LV] Account for tripcount when calculation vectorization profitability
The loop vectorizer will currently assume a large trip count when
calculating which of several vectorization factors are more profitable.
That is often not a terrible assumption to make as small trip count
loops will usually have been fully unrolled. There are cases however
where we will try to vectorize them, and especially when folding the
tail by masking can incorrectly choose to vectorize loops that are not
beneficial, due to the folded tail rounding the iteration count up for
the vectorized loop.

The motivating example here has a trip count of 5, so either performs 5
scalar iterations or 2 vector iterations (with VF=4). At a high enough
trip count the vectorization becomes profitable, but the rounding up to
2 vector iterations vs only 5 scalar makes it unprofitable.

This adds an alternative cost calculation when we know the max trip
count and are folding tail by masking, rounding the iteration count up
to the correct number for the vector width. We still do not account for
anything like setup cost or the mixture of vector and scalar loops, but
this is at least an improvement in a few cases that we have had
reported.

Differential Revision: https://reviews.llvm.org/D101726
2021-05-06 12:36:46 +01:00
Kerry McLaughlin
8c9742bd23 [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences
Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
2021-05-06 11:35:39 +01:00
Juneyoung Lee
8a156d1c27 [InstCombine] Fully disable select to and/or i1 folding
This is a patch that disables the poison-unsafe select -> and/or i1 folding.

It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.

Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.

I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101191
2021-05-06 09:29:52 +09:00
Philip Reames
80e8025083 [LV] Workaround PR49900 (a crash due to analyzing partially mutated IR)
LoopVectorize has a fairly deeply baked in design problem where it will try to query analysis (primarily SCEV, but also ValueTracking) in the midst of mutating IR. In particular, the intermediate IR state does not represent the semantics of the original (or final) program.

Fixing this for real is hard, but all of the cases seen so far share a common symptom. In cases seen to date, the analysis being queried is the computation of the original loop's trip count. We can fix this particular instance of the issue by simply computing the trip count early, and caching it.

I want to be really clear that this is nothing but a workaround. It does nothing to fix the root issue, and at best, delays the time until we have to fix this for real. Florian and I have discussed an eventual solution in the review comments for https://reviews.llvm.org/D100663, but it's a lot of work.

Test taken from https://reviews.llvm.org/D100663.

Differential Revision: https://reviews.llvm.org/D101487
2021-05-05 09:56:28 -07:00
Florian Hahn
ccebf7a109 [VPlan] Properly handle sinking of replicate regions.
This patch updates the code that sinks recipes required for first-order
recurrences to properly handle replicate-regions. At the moment, the
code would just move the replicate recipe out of its replicate-region,
producing an invalid VPlan.

When sinking a recipe in a replicate-region, we have to sink the whole
region. To do that, we first need to split the block at the target
recipe and move the region in between.

This patch also adds a splitAt helper to VPBasicBlock to split a
VPBasicBlock at a given iterator.

Fixes PR50009.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100751
2021-05-04 22:36:01 +01:00
Florian Hahn
4ba8720f88 [VPlan] Representing backedge def-use feeding reduction phis.
This patch updates the code handling reduction recipes to also keep
track of the incoming value from the latch in the recipe. This is needed
to model the def-use chains completely in VPlan, so that it is possible
to replace the incoming value with an arbitrary VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99294
2021-05-04 16:33:22 +01:00
Sander de Smalen
9931ae645e Reland "[LV] Calculate max feasible scalable VF."
Relands https://reviews.llvm.org/D98509

This reverts commit 51d648c119.
2021-05-04 15:44:41 +01:00
Juneyoung Lee
e639bccefd run update_test_checks.py for the tests in D101191 (NFC)
This is an NFC that reruns update_test_checks.py on the tests that are
going to be updated in D101191.
2021-05-02 13:11:57 +09:00
Sander de Smalen
51d648c119 Revert "[LV] Calculate max feasible scalable VF."
Temporarily reverting this patch due to some unexpected issue found
by one of the PPC buildbots.

This reverts commit 584e9b6e4b.
2021-04-29 16:04:37 +01:00
Sjoerd Meijer
837fded984 Follow up of rGddb3b26a1269: added 'requires asserts' to test case. 2021-04-29 08:34:24 +01:00
Bardia Mahjour
ddb3b26a12 [LV] Consider Loop Unroll Hints When Making Interleave Decisions
This patch causes the loop vectorizer to not interleave loops that have
nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)).
Note that if a particular interleave count is being requested
(through llvm.loop.interleave_count), it will still be honoured, regardless
of the presence of nounroll hints.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101374
2021-04-28 17:27:52 -04:00
Florian Hahn
1ed7f8ede5 [LAA] Support pointer phis in loop by analyzing each incoming pointer.
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101286
2021-04-28 20:19:40 +01:00
David Sherwood
00e65f3345 [LoopVectorize][SVE] Fix crash when vectorising FP negation
This patch fixes a crash encountered when vectorising the following loop:

 void foo(float *dst, float *src, long long n) {
   for (long long i = 0; i < n; i++)
     dst[i] = -src[i];
 }

using scalable vectors. I've added a test to

 Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Differential Revision: https://reviews.llvm.org/D98054
2021-04-28 15:22:35 +01:00
David Sherwood
6998f8ae2d [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-28 13:41:07 +01:00
Sander de Smalen
584e9b6e4b [LV] Calculate max feasible scalable VF.
This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.

After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.

Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.

Reviewed By: c-rhodes, fhahn

Differential Revision: https://reviews.llvm.org/D98509
2021-04-28 12:30:00 +01:00
Kerry McLaughlin
9cc217ab36 [LoopVectorize] Prevent multiple Phis being generated with in-order reductions
When using the -enable-strict-reductions flag where UF>1 we generate multiple
Phi nodes, though only one of these is used as an input to the vector.reduce.fadd
intrinsics. The unused Phi nodes are removed later by instcombine.

This patch changes widenPHIInstruction/fixReduction to only generate
one Phi, and adds an additional test for unrolling to strict-fadd.ll

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D100570
2021-04-28 11:29:01 +01:00
David Sherwood
6968520c3b Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 4afeda9157.
2021-04-27 15:46:03 +01:00
David Sherwood
4afeda9157 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-27 15:26:15 +01:00
Florian Hahn
a950f66de2 [LV,LAA] Add test cases with pointer phis in loops.
Pre-commits tests for D101286.
2021-04-27 13:49:32 +01:00
David Sherwood
cf7276820c [NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions
We can already vectorize loops that involve int<>int, fp<>fp, int<>fp
and fp<>int conversions, however we didn't previously have any tests
for them. This patch adds some tests for each conversion type.

Differential Revision: https://reviews.llvm.org/D99951
2021-04-26 11:01:14 +01:00
David Sherwood
a458b7855e [AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function
When vectorising for AArch64 targets if you specify the SVE attribute
we automatically then treat masked loads and stores as legal. Also,
since we have no cost model for masked memory ops we believe it's
cheap to use the masked load/store intrinsics even for fixed width
vectors. This can lead to poor code quality as the intrinsics will
currently be scalarised in the backend. This patch adds a basic
cost model that marks fixed-width masked memory ops as significantly
more expensive than for scalable vectors.

Tests for the cost model are added here:

  Transforms/LoopVectorize/AArch64/masked-op-cost.ll

Differential Revision: https://reviews.llvm.org/D100745
2021-04-26 11:00:03 +01:00
Joe Ellis
2c551aedcf [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
Alexey Bataev
673e2f1b70 [COST][AARCH64] Improve cost of reverse shuffles for AArch64.
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.

Differential Revision: https://reviews.llvm.org/D100871
2021-04-20 13:47:56 -07:00
Roman Lebedev
a1d283b71e [NFC][LoopVectorize] Autogenerate check lines in pr45259.ll
We might as well test all of the codegen here.
2021-04-20 21:29:21 +03:00
Alexey Bataev
683dc41695 Update tests checks, NFC. 2021-04-20 10:20:15 -07:00