Commit Graph

196 Commits

Author SHA1 Message Date
Kerry McLaughlin
49d73130ca [LV] Avoid scalable vectorization for loops containing alloca
This patch returns an Invalid cost from getInstructionCost() for alloca
instructions if the VF is scalable, as otherwise loops which contain
these instructions will crash when attempting to scalarize the alloca.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105824
2021-07-16 11:47:13 +01:00
Sander de Smalen
239d01fa88 Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs."
The original patch was:
  https://reviews.llvm.org/D105806

There were some issues with undeterministic behaviour of the sorting
function, which led to scalable-call.ll passing and/or failing. This
patch fixes the issue by numbering all instructions in the array first,
and using that number as the order, which should provide a consistent
ordering.

This reverts commit a607f64118.
2021-07-16 10:52:01 +01:00
Sander de Smalen
a607f64118 Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs."
This reverts commit efaf3099c8.
This reverts commit dc7bdc1e71.

Reverting patches due to buildbot failures.
2021-07-15 15:21:57 +01:00
Sander de Smalen
dc7bdc1e71 [LV] Fix determinism for failing scalable-call.ll test.
The sort function for emitting an OptRemark was not deterministic,
which caused scalable-call.ll to fail on some buildbots. This patch
fixes that.

This patch also fixes an issue where `Instruction::comesBefore()`
is called when two Instructions are in different basic blocks,
which would otherwise cause an assertion failure.
2021-07-15 13:16:59 +01:00
Sander de Smalen
efaf3099c8 [LV] Print remark when loop cannot be vectorized due to invalid costs.
This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:

  t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
      dst[i] = sinf(src[i]);
                    ^
  t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
      dst[i] = sinf(src[i]);
               ^
  t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
      dst[i] = sinf(src[i]);
             ^

Reviewed By: fhahn, kmclaughlin

Differential Revision: https://reviews.llvm.org/D105806
2021-07-14 17:11:33 +01:00
Sander de Smalen
eac1670739 [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid.
At the moment, <vscale x 1 x eltty> are not yet fully handled by the
code-generator, so to avoid vectorizing loops with that VF, we mark the
cost for these types as invalid.
The reason for not adding a new "TTI::getMinimumScalableVF" is because
the type is supposed to be a type that can be legalized. It partially is,
although the support for these types need some more work.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D103882
2021-07-14 16:44:22 +01:00
Sander de Smalen
d2e4ccc790 [LV] Ignore candidate VFs with invalid costs.
This follows on from discussion on the mailing-list:
  https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html

to interpret an Invalid cost as 'infinitely expensive', as this
simplifies some of the legalization issues with scalable vectors.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105473
2021-07-12 09:58:22 +01:00
Kerry McLaughlin
a7512401e5 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Sjoerd Meijer
ee752134ac [AArch64] Cost-model i8 vector loads/stores
Loads of <4 x i8> vectors were modeled as extremely expensive. And while we
don't have a load instruction that supports this, it isn't that expensive to
create a vector of i8 elements. The codegen for this was fixed/optimised in
D105110. This now tweaks the cost model and enables SLP vectorisation of my
motivating case loadi8.ll.

Differential Revision: https://reviews.llvm.org/D103629
2021-07-05 11:25:10 +01:00
David Sherwood
303b6d5e98 [LoopVectorize] Add support for scalable vectorization of invariant stores
Previously in setCostBasedWideningDecision if we encountered an
invariant store we just assumed that we could scalarize the store
and called getUniformMemOpCost to get the associated cost.
However, for scalable vectors this is not an option because it is
not currently possibly to scalarize the store. At the moment we
crash in VPReplicateRecipe::execute when trying to scalarize the
store.

Therefore, I have changed setCostBasedWideningDecision so that if
we are storing a scalable vector out to a uniform address and the
target supports scatter instructions, then we should use those
instead.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-inv-store.ll

Differential Revision: https://reviews.llvm.org/D104624
2021-06-29 11:56:09 +01:00
Kerry McLaughlin
f99672568f [LoopVectorize] Fix strict reductions where VF = 1
Currently we will allow loops with a fixed width VF of 1 to vectorize
if the -enable-strict-reductions flag is set. However, the loop vectorizer
will not use ordered reductions if `VF.isScalar()` and the resulting
vectorized loop will be out of order.

This patch removes `VF.isVector()` when checking if ordered reductions
should be used. Also, instead of converting the FAdds to reductions if the
VF = 1, operands of the FAdds are changed such that the order is preserved.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D104533
2021-06-28 11:27:10 +01:00
Kerry McLaughlin
5db52751a5 [CostModel] Return an invalid cost for memory ops with unsupported types
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.

The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102515
2021-06-08 12:07:36 +01:00
Kerry McLaughlin
14eeccfe9a [LoopVectorize] Don't use strict reductions when reordering is allowed
If the `-enable-strict-reductions` flag is set to true, then currently we will
always choose to vectorize the loop with strict in-order reductions. This is
not necessary where we allow the reordering of FP operations, such as
when loop hints are passed via metadata.

This patch moves useOrderedReductions so that we can also check whether
loop hints allow reordering, in which case we should use the default
behaviour of vectorizing with unordered reductions.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103814
2021-06-08 10:39:29 +01:00
Florian Hahn
23c2f2e6b2 [LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103255
2021-06-07 10:47:52 +01:00
Sander de Smalen
d41cb6bb26 [LV] Build and cost VPlans for scalable VFs.
This patch uses the calculated maximum scalable VFs to build VPlans,
cost them and select a suitable scalable VF.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98722
2021-06-02 14:47:47 +01:00
Kerry McLaughlin
9f76a85260 [LoopVectorize] Enable strict reductions when allowReordering() returns false
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

  bool allowReordering() const {
    // When enabling loop hints are provided we allow the vectorizer to change
    // the order of operations that is given by the scalar loop. This is not
    // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101836
2021-05-26 13:59:12 +01:00
Kerry McLaughlin
6b0fe3c63b [NFC] Add CHECK lines for unordered FP reductions
An additional RUN line has been added to both strict-fadd.ll &
scalable-strict-fadd.ll to ensure the correct behaviour of these
tests where `-enable-strict-reductions` is false.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D103015
2021-05-26 11:00:20 +01:00
serge-sans-paille
4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille
bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Sander de Smalen
1e6630311c NFC: cleaned up and renamed scalable-vf-analysis.ll -> scalable-vectorization.ll
* Removes unnecessary loop hints.
* Use RUN line with '-scalable-vectorization=preferred' instead of 'on'
  for the maximize-bandwidth behaviour. This prepares the test for enabling
  scalable vectorization; With a forced instruction-cost of 1, 'on' will
  always favour fixed-width VF to be chosen, whereas with 'preferred'
  we can check that the maximize-bandwidth option in combination with
  scalable-vectorization=preferred actually picks a scalable VF.
* Renamed to scalable-vectorization.ll, because a follow-up patch will
  test more than just analysis.
2021-05-23 19:53:51 +01:00
Sander de Smalen
4f86aa650c [LV] Add -scalable-vectorization=<option> flag.
This patch adds a new option to the LoopVectorizer to control how
scalable vectors can be used.

Initially, this suggests three levels to control scalable
vectorization, although other more aggressive options can be added in
the future.

The possible options are:
- Disabled:   Disables vectorization with scalable vectors.
- Enabled:    Vectorize loops using scalable vectors or fixed-width
              vectors, but favors fixed-width vectors when the cost
              is a tie.
- Preferred:  Like 'Enabled', but favoring scalable vectors when the
              cost-model is inconclusive.

Reviewed By: paulwalker-arm, vkmr

Differential Revision: https://reviews.llvm.org/D101945
2021-05-19 10:40:56 +01:00
Sander de Smalen
81fdc73e5d [LV] Return both fixed and scalable Max VF from computeMaxVF.
This patch introduces a new class, MaxVFCandidates, that holds the
maximum vectorization factors that have been computed for both scalable
and fixed-width vectors.

This patch is intended to be NFC for fixed-width vectors, although
considering a scalable max VF (which is disabled by default) pessimises
tail-loop elimination, since it can no longer determine if any chosen VF
(less than fixed/scalable MaxVFs) is guaranteed to handle all vector
iterations if the trip-count is known. This issue will be addressed in
a future patch.

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D98721
2021-05-18 08:03:48 +01:00
David Sherwood
b7a11274f9 [LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors
In InnerLoopVectorizer::widenPHIInstruction there are cases where we have
to scalarise a pointer induction variable after vectorisation. For scalable
vectors we already deal with the case where the pointer induction variable
is uniform, but we currently crash if not uniform. For fixed width vectors
we calculate every lane of the scalarised pointer induction variable for a
given VF, however this cannot work for scalable vectors. In this case I
have added support for caching the whole vector value for each unrolled
part so that we can always extract an arbitrary element. Additionally, we
still continue to cache the known minimum number of lanes too in order
to improve code quality by avoiding an extractelement operation.

I have adapted an existing test `pointer_iv_mixed` from the file:

  Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

and added it here for scalable vectors instead:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D101294
2021-05-12 11:02:11 +01:00
Florian Hahn
93a9a8a8d9 [VecLib] Add support for vector fns from Darwin's libsystem.
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.

This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D101856
2021-05-10 21:19:58 +01:00
Kerry McLaughlin
8c9742bd23 [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences
Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
2021-05-06 11:35:39 +01:00
Sander de Smalen
9931ae645e Reland "[LV] Calculate max feasible scalable VF."
Relands https://reviews.llvm.org/D98509

This reverts commit 51d648c119.
2021-05-04 15:44:41 +01:00
Sander de Smalen
51d648c119 Revert "[LV] Calculate max feasible scalable VF."
Temporarily reverting this patch due to some unexpected issue found
by one of the PPC buildbots.

This reverts commit 584e9b6e4b.
2021-04-29 16:04:37 +01:00
David Sherwood
00e65f3345 [LoopVectorize][SVE] Fix crash when vectorising FP negation
This patch fixes a crash encountered when vectorising the following loop:

 void foo(float *dst, float *src, long long n) {
   for (long long i = 0; i < n; i++)
     dst[i] = -src[i];
 }

using scalable vectors. I've added a test to

 Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Differential Revision: https://reviews.llvm.org/D98054
2021-04-28 15:22:35 +01:00
David Sherwood
6998f8ae2d [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-28 13:41:07 +01:00
Sander de Smalen
584e9b6e4b [LV] Calculate max feasible scalable VF.
This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.

After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.

Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.

Reviewed By: c-rhodes, fhahn

Differential Revision: https://reviews.llvm.org/D98509
2021-04-28 12:30:00 +01:00
Kerry McLaughlin
9cc217ab36 [LoopVectorize] Prevent multiple Phis being generated with in-order reductions
When using the -enable-strict-reductions flag where UF>1 we generate multiple
Phi nodes, though only one of these is used as an input to the vector.reduce.fadd
intrinsics. The unused Phi nodes are removed later by instcombine.

This patch changes widenPHIInstruction/fixReduction to only generate
one Phi, and adds an additional test for unrolling to strict-fadd.ll

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D100570
2021-04-28 11:29:01 +01:00
David Sherwood
6968520c3b Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 4afeda9157.
2021-04-27 15:46:03 +01:00
David Sherwood
4afeda9157 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-27 15:26:15 +01:00
David Sherwood
cf7276820c [NFC] Add scalable vectorisation tests for int/FP <> int/FP conversions
We can already vectorize loops that involve int<>int, fp<>fp, int<>fp
and fp<>int conversions, however we didn't previously have any tests
for them. This patch adds some tests for each conversion type.

Differential Revision: https://reviews.llvm.org/D99951
2021-04-26 11:01:14 +01:00
David Sherwood
a458b7855e [AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function
When vectorising for AArch64 targets if you specify the SVE attribute
we automatically then treat masked loads and stores as legal. Also,
since we have no cost model for masked memory ops we believe it's
cheap to use the masked load/store intrinsics even for fixed width
vectors. This can lead to poor code quality as the intrinsics will
currently be scalarised in the backend. This patch adds a basic
cost model that marks fixed-width masked memory ops as significantly
more expensive than for scalable vectors.

Tests for the cost model are added here:

  Transforms/LoopVectorize/AArch64/masked-op-cost.ll

Differential Revision: https://reviews.llvm.org/D100745
2021-04-26 11:00:03 +01:00
Joe Ellis
2c551aedcf [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
Alexey Bataev
673e2f1b70 [COST][AARCH64] Improve cost of reverse shuffles for AArch64.
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.

Differential Revision: https://reviews.llvm.org/D100871
2021-04-20 13:47:56 -07:00
Alexey Bataev
683dc41695 Update tests checks, NFC. 2021-04-20 10:20:15 -07:00
Kerry McLaughlin
62ee638a87 [NFC] Add tests for scalable vectorization of loops with in-order reductions
D98435 added support for in-order reductions and included tests for fixed-width
vectorization with the -enable-strict-reductions flag.

This patch adds similar tests to verify support for scalable vectorization of loops
with in-order reductions.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D100385
2021-04-19 11:15:55 +01:00
Kerry McLaughlin
93f54fae9d [NFC] Remove the -instcombine flag from strict-fadd.ll
This also fixes a CHECK line in @fadd_strict_unroll which ensures the
changes made to fixReduction() to support in-order reductions with
unrolling are being tested correctly.
2021-04-15 15:10:48 +01:00
David Sherwood
ea14df695e [SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction
There were a few places in widenPHIInstruction where calculations of
offsets were failing to take the runtime calculation of VF into
account for scalable vectors. I've fixed those cases in this patch
as well as adding an assert that we should not be scalarising for
scalable vectors.

Tests are added here:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D99254
2021-04-15 10:51:49 +01:00
Sander de Smalen
672f673004 [SVE] Remove checks for warnings in scalable-vector tests.
After D98856 these tests will by default break (fatal_error) if any of
the wrong interfaces are used, so there's no longer a need to have a
RUN line that checks for a warning message emitted by the compiler.
2021-04-07 15:59:32 +01:00
Kerry McLaughlin
7344f3d39a [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
David Sherwood
e3a13304fc [NFC] Add tests for scalable vectorization of loops with large stride acesses
This patch just adds tests that we can vectorize loop such as these:

  for (i = 0; i < n; i++)
    dst[i * 7] += 1;

and

  for (i = 0; i < n; i++)
    if (cond[i])
      dst[i * 7] += 1;

using scalable vectors, where we expect to use gathers and scatters in the
vectorized loop. The vector of pointers used for the gather is identical
to those used for the scatter so there should be no memory dependences.

Tests are added here:

  Transforms/LoopVectorize/AArch64/sve-large-strides.ll

Differential Revision: https://reviews.llvm.org/D99192
2021-04-01 10:25:06 +01:00
Sander de Smalen
7108b2dec1 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00
David Sherwood
a08c7736a7 [LoopVectorize] Add support for scalable vectorization of induction variables
This patch adds support for the vectorization of induction variables when
using scalable vectors, which required the following changes:

1. Removed assert from InnerLoopVectorizer::getStepVector.
2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use
   a runtime determined value for VF and removed an assert.
3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable
   vectors. I did this by calculating the full vector value for each Part
   of the unroll factor (UF) and caching this in the VP state. This means
   that we are always able to extract an arbitrary element from the vector
   if necessary. In addition to this, I also permitted the caching of the
   individual lane values themselves for the known minimum number of elements
   in the same way we do for fixed width vectors. This is a further
   optimisation that improves the code quality since it avoids unnecessary
   extractelement operations when extracting the first lane.
4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while
   testing some code paths I noticed this is currently broken for scalable
   vectors.

Various tests to support different cases have been added here:

  Transforms/LoopVectorize/AArch64/sve-inductions.ll

Differential Revision: https://reviews.llvm.org/D98715
2021-03-30 11:13:31 +01:00
David Sherwood
c39460cc4f Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 240aa96cf2.
2021-03-26 11:36:53 +00:00
David Sherwood
240aa96cf2 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast with scalar uses, or b) this is an update to an induction variable
that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. For all other cases I have added an assert that none of the
users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

Differential Revision: https://reviews.llvm.org/D98512
2021-03-26 11:27:12 +00:00
Kerry McLaughlin
1f46499690 [SVE][LoopVectorize] Verify support for vectorizing loops with invariant loads
D95598 added a cost model for broadcast shuffle, which should enable loops
such as the following to vectorize, where the load of b[42] is invariant
and can be done using a scalar load + splat:

  for (int i=0; i<n; ++i)
    a[i] = b[i] + b[42];

This patch adds tests to verify that we can vectorize such loops.

Reviewed By: joechrisellis

Differential Revision: https://reviews.llvm.org/D98506
2021-03-25 14:10:21 +00:00
Caroline Concatto
3c03635d53 [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse
This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
  for (int i = n-1; i >= 0; --i)
    a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.

Differential Revision: https://reviews.llvm.org/D95363
2021-03-16 07:51:59 +00:00