Commit Graph

91 Commits

Author SHA1 Message Date
Jordan Rupprecht
10c82eecbc Revert "[LV] Enable the LoopVectorizer to create pointer inductions"
This reverts commit a8fe12065e.

It causes a crash when building gzip. Will post the detailed reduced test case to D81267.
2020-07-06 17:50:38 -07:00
David Green
146dad0077 [ARM] MVE FP16 cost adjustments
This adjusts the MVE fp16 cost model, similar to how we already do for
integer casts. It uses the base cost of 1 per cvt for most fp extend /
truncates, but adjusts it for loads and stores where we know that a
extending load has been used to get the load into the correct lane, and
only an MVE VCVTB is then needed.

Differential Revision: https://reviews.llvm.org/D81813
2020-07-06 15:57:51 +01:00
David Green
55227f85d0 [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost
This alters getMemoryOpCost to use the Base TargetTransformInfo version
that includes some additional checks for whether extending loads are
legal. This will generally have the effect of making <2 x ..> and some
<4 x ..> loads/stores more expensive, which in turn should help favour
larger vector factors.

Notably it alters the cost of a <4 x half>, which with the current
codegen will be expensive if it is not extended.

Differential Revision: https://reviews.llvm.org/D82456
2020-07-06 10:58:40 +01:00
Anna Welker
a8fe12065e [LV] Enable the LoopVectorizer to create pointer inductions
This patch enables the LoopVectorizer to build a phi of pointer
type and provide the vector loads and stores with vector type
getelementptrs built from the pointer induction variable, which
produces much less instructions than the previous approach of
creating scalar getelementpointers and glue them together to a
vector.

Differential Revision: https://reviews.llvm.org/D81267
2020-07-02 11:39:28 +01:00
Sjoerd Meijer
e345d547a0 Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops"
Fixed ARM regression test.

Please see the original commit message rG47650451738c for details.
2020-06-17 13:12:15 +01:00
Sjoerd Meijer
d4e183f686 Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops"
This reverts commit 4765045173
while I investigate the build bot failures.
2020-06-17 10:09:54 +01:00
Sjoerd Meijer
4765045173 [LV] Emit @llvm.get.active.mask for tail-folded loops
This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised
loops if the intrinsic is supported by the backend, which is checked by
querying TargetTransform hook emitGetActiveLaneMask.

This intrinsic creates a mask representing active and inactive vector lanes,
which is used by the masked load/store instructions that are created for
tail-folded loops. The semantics of @llvm.get.active.mask are described here in
LangRef:

https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics

This intrinsic is also used to provide a hint to the backend. That is, the
second argument of the intrinsic represents the back-edge taken count of the
loop. For MVE, for example, we use that to set up tail-predication, which is a
new form of predication in MVE for vector loops that implicitely predicates the
last vector loop iteration by implicitely setting active/inactive lanes, i.e.
the tail loop is predicated. In order to set up a tail-predicated vector loop,
we need to know the number of data elements processed by the vector loop, which
corresponds the the tripcount of the scalar loop, which we can now reconstruct
using @llvm.get.active.mask.

Differential Revision: https://reviews.llvm.org/D79100
2020-06-17 09:53:58 +01:00
David Green
46529978bf [ARM] Always use reductions intrinsics under MVE
Similar to a recent change to the X86 backend, this changes things so
that we always produce a reduction intrinsics for all reduction types,
not just the legal ones. This gives a better chance in the backend to
custom lower them to something more suitable for MVE. Especially for
something like fadd the in-order reduction produced during DAG lowering
is already better than the shuffles produced in the midend, and we can
do even better with a bit of custom lowering.

Differential Revision: https://reviews.llvm.org/D81398
2020-06-12 19:21:17 +01:00
David Green
a4cf68e743 [ARM] MVE vectorizer reduction tests for each reduction type. NFC 2020-06-10 11:18:24 +01:00
Jon Roelofs
5fb979dd06 [llvm][test] Add missing FileCheck colons. NFC 2020-05-21 09:29:27 -06:00
Sjoerd Meijer
9529597cf4 Recommit #2: "[LV] Induction Variable does not remain scalar under tail-folding."
This was reverted because of a miscompilation. At closer inspection, the
problem was actually visible in a changed llvm regression test too. This
one-line follow up fix/recommit will splat the IV, which is what we are trying
to avoid if unnecessary in general, if tail-folding is requested even if all
users are scalar instructions after vectorisation. Because with tail-folding,
the splat IV will be used by the predicate of the masked loads/stores
instructions. The previous version omitted this, which caused the
miscompilation. The original commit message was:

If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.

Thanks to Ayal Zaks for the direction how to fix this.
2020-05-13 13:50:09 +01:00
Sam Parker
b4a8091a11 [ARM][CostModel] Improve getCastInstrCost
- Specifically check for sext/zext users which have 'long' form NEON
  instructions.
- Add more entries to the table for sext/zexts so that we can report
  more accurately the number of vmovls required for NEON.
- Pass the instruction to the pass implementation.

Differential Revision: https://reviews.llvm.org/D79561
2020-05-12 10:32:20 +01:00
Benjamin Kramer
f936457f80 Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding.""
This reverts commit ae45b4dbe7. It
causes miscompilations, test case on the mailing list.
2020-05-08 14:49:10 +02:00
Sjoerd Meijer
ae45b4dbe7 Recommit "[LV] Induction Variable does not remain scalar under tail-folding."
With 3 llvm regr tests fixed/updated that I had missed.
2020-05-07 11:52:20 +01:00
Sjoerd Meijer
20d67ffeae Revert "[LV] Induction Variable does not remain scalar under tail-folding."
This reverts commit 617aa64c84.

while I investigate buildbot failures.
2020-05-07 09:29:56 +01:00
Sjoerd Meijer
617aa64c84 [LV] Induction Variable does not remain scalar under tail-folding.
If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.

Thanks to Ayal Zaks for the direction how to fix this.

Differential Revision: https://reviews.llvm.org/D78911
2020-05-07 09:15:23 +01:00
David Green
eecba95067 [ARM] Replace arm vendor with none. NFC 2020-04-22 18:19:35 +01:00
Sjoerd Meijer
5be767d489 NFC: remove outdated TODOs from ARM test file. 2020-04-17 17:08:11 +01:00
Sjoerd Meijer
3ef614a007 NFC: update of ARM llvm regr test, follow up of 9633fc14ae. 2020-04-14 21:30:22 +01:00
Sjoerd Meijer
9633fc14ae [LV][ARM] Add tail-folding tests for MVE. NFC.
D77635 added support to recognise primary induction variables for counting-down
loops. This allows us to fold the scalar tail loop into the main vector body,
which we need for MVE tail-predication. This adds some ARM tail-folding test
cases that we want to support.

This test was extracted from D76838, which implemented a different approach to
reverse and thus find a primary induction variable.
2020-04-14 16:03:29 +01:00
Jonathan Roelofs
7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
David Green
587feec07e [ARM] Change all tests from "thumbv8.1-m.main" to "thumbv8.1m.main". NFC 2020-03-04 13:47:35 +00:00
David Green
b535aa405a [ARM] Use reduction intrinsics for larger than legal reductions
The codegen for splitting a llvm.vector.reduction intrinsic into parts
will be better than the codegen for the generic reductions. This will
only directly effect when vectorization factors are specified by the
user.

Also added tests to make sure the codegen for larger reductions is OK.

Differential Revision: https://reviews.llvm.org/D72257
2020-01-24 17:07:24 +00:00
Momchil Velikov
173b711e83 [ARM][MVE] MVE-I should not be disabled by -mfpu=none
Architecturally, it's allowed to have MVE-I without an FPU, thus
-mfpu=none should not disable MVE-I, or moves to/from FP-registers.

This patch removes `+/-fpregs` from features unconditionally added to
target feature list, depending on FPU and moves the logic to Clang
driver, where the negative form (`-fpregs`) is conditionally added to
the target features list for the cases of `-mfloat-abi=soft`, or
`-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative
form is added by the driver, the positive one is derived from other
features in the backend.

Differential Revision: https://reviews.llvm.org/D71843
2020-01-09 14:03:25 +00:00
Sjoerd Meijer
8f1887456a [LV] Still vectorise when tail-folding can't find a primary inducation variable
This addresses a vectorisation regression for tail-folded loops that are
counting down, e.g. loops as simple as this:

  void foo(char *A, char *B, char *C, uint32_t N) {
    while (N > 0) {
      *C++ = *A++ + *B++;
       N--;
    }
  }

These are loops that can be vectorised, but when tail-folding is requested, it
can't find a primary induction variable which we do need for predicating the
loop. As a result, the loop isn't vectorised at all, which it is able to do
when tail-folding is not attempted. So, this adds a check for the primary
induction variable where we decide how to lower the scalar epilogue. I.e., when
there isn't a primary induction variable, a scalar epilogue loop is allowed
(i.e. don't request tail-folding) so that vectorisation could still be
triggered.

Having this check for the primary induction variable make sense anyway, and in
addition, in a follow-up of this I will look into discovering earlier the
primary induction variable for counting down loops, so that this can also be
tail-folded.

Differential revision: https://reviews.llvm.org/D72324
2020-01-09 09:14:00 +00:00
Fangrui Song
502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
David Green
d6642ed1c8 [ARM] Add missing REQUIRES: asserts to test. NFC 2019-12-09 11:43:43 +00:00
David Green
b1aba0378e [ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
2019-12-09 11:37:34 +00:00
David Green
be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
David Green
f008b5b8ce [ARM] Additional tests and minor formatting. NFC
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
2019-12-09 10:24:33 +00:00
David Green
3a6eb5f160 [ARM] Disable VLD4 under MVE
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
2019-12-08 10:37:29 +00:00
Sjoerd Meijer
901cd3b3f6 [LV] PreferPredicateOverEpilog respecting option
Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when
option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not
to predicate the loop.

Differential Revision: https://reviews.llvm.org/D70382
2019-11-21 14:06:10 +00:00
David Green
882f23caea [ARM] MVE interleaving load and stores.
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
2019-11-19 18:37:30 +00:00
David Green
411bfe476b [ARM] Add and update a lot of VLDn tests. NFC 2019-11-19 18:37:30 +00:00
Sjoerd Meijer
71327707b0 [ARM][MVE] tail-predication
This is a follow up of d90804d, to also flag fmcp instructions as instructions
that we do not support in tail-predicated vector loops.

Differential Revision: https://reviews.llvm.org/D70295
2019-11-15 11:01:13 +00:00
Sjoerd Meijer
cb47b87830 [LV] PreferPredicateOverEpilog respecting predicate loop hint
The vectoriser queries TTI->preferPredicateOverEpilogue to determine if
tail-folding is preferred for a loop, but it was not respecting loop hint
'predicate' that can disable this, which has now been added. This showed that
we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0
(i.e. FK_Disabled) but this should have been FK_Undefined, which has been
fixed.

Differential Revision: https://reviews.llvm.org/D70125
2019-11-14 13:10:44 +00:00
Sjoerd Meijer
d90804d26b [ARM][MVE] canTailPredicateLoop
This implements TTI hook 'preferPredicateOverEpilogue' for MVE.  This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.

This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.

I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.

Differential Revision: https://reviews.llvm.org/D69845
2019-11-13 13:24:33 +00:00
Sjoerd Meijer
6c2a4f5ff9 [TTI][LV] preferPredicateOverEpilogue
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
2019-11-06 10:14:20 +00:00
Sam Parker
39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
Sjoerd Meijer
c2bafadd7a [LV] Add ARM MVE tail-folding tests
Now that the vectorizer can do tail-folding (rL367592), and the ARM backend
understands MVE masked loads/stores (rL371932), it's time to add the MVE
tail-folding equivalent of the X86 tests that I added.

llvm-svn: 371996
2019-09-16 14:56:26 +00:00
David Green
b325c05732 [ARM] Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc, and so is
currently behind an option.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

Differential Revision: https://reviews.llvm.org/D67186

llvm-svn: 371932
2019-09-15 14:14:47 +00:00
David Green
8c2c5f5045 [ARM] Don't pretend we know how to generate MVE VLDn
We don't yet know how to generate these instructions for MVE. And in the case
of VLD3, we don't even have the instruction. For the moment don't tell the
vectoriser that we have VLD4, just to end up serialising the results.

Differential Revision: https://reviews.llvm.org/D66009

llvm-svn: 369101
2019-08-16 13:06:49 +00:00
David Green
44f8d635e2 [ARM] Permit auto-vectorization using MVE
With enough codegen complete, we can now correctly report the number and size
of vector registers for MVE, allowing auto vectorisation. This also allows FP
auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE
FP arithmetic and parity between scalar and vector FP behaviour.

Patch by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63728

llvm-svn: 368529
2019-08-11 08:42:57 +00:00
Fangrui Song
ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Daniel Neilson
9e4bbe801a [LV] Preserve inbounds on created GEPs
Summary:
This is a fix for PR23997.

The loop vectorizer is not preserving the inbounds property of GEPs that it creates.
This is inhibiting some optimizations. This patch preserves the inbounds property in
the case where a load/store is being fed by an inbounds GEP.

Reviewers: mkuper, javed.absar, hsaito

Reviewed By: hsaito

Subscribers: dcaballe, hsaito, llvm-commits

Differential Revision: https://reviews.llvm.org/D46191

llvm-svn: 331269
2018-05-01 15:35:08 +00:00
Sanjay Patel
8529dd5ee1 [ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.

llvm-svn: 326221
2018-02-27 18:33:24 +00:00
Teresa Johnson
c12306c0ad Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction
instructions to dehao.

llvm-svn: 306936
2017-07-01 03:24:09 +00:00
Teresa Johnson
eb4fba9d61 re-commit r306336: Enable vectorizer-maximize-bandwidth by default.
Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306935
2017-07-01 03:24:08 +00:00