Commit Graph

1929 Commits

Author SHA1 Message Date
Florian Hahn
78914e8c32 [VPlan] Keep entries in worklist in sinkScalarOperands.
Not removing the entries ensures that duplicates are avoided,
reducing the number of iterations.
2023-01-08 15:52:57 +00:00
Craig Topper
e5a71a41d8 [RISCV] Add support for the vscale_range attribute.
This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.

This uses vscale_range for min/max vector width unless the command
line overrides are used.

As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139873
2023-01-06 08:20:37 -08:00
Nikita Popov
5867241eac [Transforms] Convert some tests to opaque pointers (NFC) 2023-01-06 12:14:45 +01:00
Florian Hahn
68469a80cb [LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
David Green
586fd86b0a [LoopVectorizer] Fix inloop reductions mask placement
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.

Differential Revision: https://reviews.llvm.org/D141003
2023-01-05 11:37:37 +00:00
Augie Fackler
0676156f81 Revert "[VPlan] Also consider operands of sink candidates in same block."
This reverts commit aa2414729e.

Previously-valid IR from a tensorflow test case (as shown on the
Diffusion revision for aa2414729e) started
hanging in the loop-vectorize pass. Reverting to keep everyone working.
2023-01-04 16:17:13 -05:00
Nikita Popov
2fab927546 [LoopVectorize] Convert some tests to opaque pointers (NFC)
Check lines for some of these tests were regenerated. The difference
is that with opaque pointers SCEVExpander always emits i8 GEPs,
making the address calculation explicit. This is a known problem
that will be solved long term by making all address calculations
explicit.
2023-01-04 17:25:42 +01:00
David Green
11f3308ca2 [NFC] Regenerate reduction-inloop.ll check lines. NFC 2023-01-04 16:02:20 +00:00
Florian Hahn
aa2414729e [VPlan] Also consider operands of sink candidates in same block.
Even if the the sink candidate is already in the target block, its
operands can be candidates for sinking. Queue them up as well. Also
moves the queuing logic to a helper.
2022-12-30 18:24:35 +00:00
Florian Hahn
f5c766ba14 [LV] Convert a few tests to use opaque pointers (NFC). 2022-12-27 23:01:41 +00:00
Florian Hahn
e91e62db14 [LV] Sink scalar operands and merge regions repeatedly.
Merging regions can enable new sinking opportunities (e.g. if users of a
scalar value are moved from different VPBBs into the same VPBB). Sinking
in turn can also enable new merging opportunities (e.g. if a recipe
between to merge-able regions is moved.

To enable more sinking opportunities, repeat sinking & merging if
regions could be merged.

Also fix mergeReplicateRegions to return the correct Changed status.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139788
2022-12-27 18:08:32 +00:00
Florian Hahn
36d70a6aea [VPlan] Remove redundant blocks by merging them into predecessors.
Add and run VPlan transform to fold blocks with a single predecessor
into the predecessor. This remove redundant blocks and addresses a TODO
to replace special handling for the vector latch VPBB.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139927
2022-12-26 22:47:09 +00:00
Florian Hahn
9758242046 [LV] Use SCEV to check if the trip count <= VF * UF.
Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.

The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once in more cases.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133017
2022-12-24 18:34:54 +00:00
Florian Hahn
e1650c8d52 [LV] Move exit cond simplification to separate transform.
This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.

The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.

Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D135017
2022-12-23 12:51:21 +00:00
Paul Walker
0bca44680a [InstCombine] Bubble vector.reverse of binop operands to their result.
This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.

Differential Revision: https://reviews.llvm.org/D139342
2022-12-21 15:53:14 +00:00
Paul Walker
362c52ad5a [InstCombine] Bubble vector.reverse of compare operands to their result.
This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.

Differential Revision: https://reviews.llvm.org/D139340
2022-12-21 15:53:14 +00:00
Florian Hahn
f69ac9a22d [LV] Support widened induction variables in epilogue vectorization.
Code generation now uses the start VPValue of induction recipes.

This makes it possible to adjust the start value of the epilogue
vector loop to use the 'resume' value of the main vector loop.

Fixes #59459.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D92132
2022-12-21 13:58:50 +00:00
Florian Hahn
d58f670788 [LV] Add test for #59459. 2022-12-21 13:23:25 +00:00
Nikita Popov
88419a30a0 [LICM] Allow load-only scalar promotion in the presence of aliasing loads
During scalar promotion, if there are additional potentially-aliasing
loads outside the promoted set, we can still perform a load-only
promotion. As the stores are retained, any potentially-aliasing
loads will still read the correct value.

This increases the number of load promotions in llvm-test-suite by
a factor of two:

                                |  Old |  New
    licm.NumPromotionCandidates | 4448 | 6038
    licm.NumLoadPromoted        |  479 | 1069
    licm.NumLoadStorePromoted   | 1459 | 1459

Unfortunately, this does have some impact on compile-time:
http://llvm-compile-time-tracker.com/compare.php?from=57f7f0d6cf0706a88e1ecb74f3d3e8891cceabfa&to=72b811738148aab399966a0435f13b695da1c1c8&stat=instructions
In part this is because we now have less early bailouts from
promotion, but also due to second order effects (e.g. for one case
I looked at we spend more time in SLP now).

Differential Revision: https://reviews.llvm.org/D133192
2022-12-20 10:02:46 +01:00
Florian Hahn
c6bbf05a02 [LV] Convert some tests to use opaque pointers (NFC). 2022-12-19 20:55:44 +00:00
Florian Hahn
cf8d8a33c6 [LV] Convert some tests to use opaque pointers (NFC). 2022-12-19 20:44:44 +00:00
Roman Lebedev
4def99e642 [InstCombine] Try to fold not into cmp iff other users of cmp are freely invertible
There is still some such patterns that require collaboration
of folds to handle,that we don't currently do.
2022-12-19 00:24:28 +03:00
Florian Hahn
3d3634e8bd [LV] Add extra test for D139927. 2022-12-14 22:47:05 +00:00
Florian Hahn
e898479f2b [VPlan] Sink non-uniform recieps for scalar plans.
In scalar plans, replicate recipes will only generate a single value per
UF, independent of whether they are uniform or not. So don't consider
uniformity for plans with scalar VFs only.

This allows us to handle a few additional cases in VPlan sinking instead
of non-VPlan sinkScalarOperands.

Depends on D133762.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D134218
2022-12-14 17:55:31 +00:00
Nikita Popov
5b40015063 [LoopVectorize] Convert some tests to opaque pointers (NFC)
For these tests update_test_checks.py had to be rerun.
2022-12-14 15:27:31 +01:00
Nikita Popov
7d7577256b [LoopVectorize] Convert some tests to opaque pointers (NFC) 2022-12-14 15:16:59 +01:00
Philip Reames
b0f904b6da [LV] Account for minimum vscale when rejecting scalable vectorization of short loops
The vectorizer has code to reject scalable vectorization of loops with very short trip counts, and instead use fixed length vectors. The current code doesn't account for the minimum vscale value known, and thus under estimates the number of lanes in the scalable type for RISCV's default configuration. This results in use of predication and a trivially dead loop where a single straight line piece of code would suffice.

Note that the code quality of the original scalable vectorization could (and probably should) be improved other ways as well. This patch is solely about whether the scalable vectorization was the right choice to begin with.

This bit of code - both with and without my change - does make the unchecked assumption that the target knows how to lower fixed length vectors whose length is provably less than the vector length.

Differential Revision: https://reviews.llvm.org/D137285
2022-12-09 11:29:41 -08:00
liqinweng
6efb45f5ab [AARCH64][CostModel] Modified the cost of mask vector load/store
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D134413
2022-12-09 14:11:21 +08:00
Sanjay Patel
05dbdb0088 Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)"
This reverts commit e71b81cab0.

As discussed in the planned follow-on to this patch (D138874),
this and the subsequent patches in this set can cause trouble for
the backend, and there's probably no quick fix. We may even
want to canonicalize in the opposite direction (towards insertelt).
2022-12-08 14:16:46 -05:00
Bjorn Pettersson
51ee10747d [test] Remove duplicate RUN lines
A few more that I missed in commit 3528e63d89.

There could be more duplicates remaining, since I've only focused
on exactly duplicated "RUN: opt" lines (ignoring multi line RUN
lines ending with '\').
2022-12-08 12:47:24 +01:00
Bjorn Pettersson
3528e63d89 [test] Remove duplicate RUN lines in Transform tests 2022-12-08 11:47:16 +01:00
Roman Lebedev
1e08a08a87 [NFC] Port all LoopVectorize tests to -passes= syntax 2022-12-08 02:38:47 +03:00
Roman Lebedev
be51fa4580 [NFC] Port all runlines for LoopVectorize pass tests to -passes syntax 2022-12-05 22:17:30 +03:00
Florian Hahn
37809c867a [VPlan] Support sinking VPScalarIVStepsRecipe.
This patch extends VP-based sinking to also sink VPScalarStepsRecipe.
This takes us a step closer towards retiring the IR based sinking.

The main change is extending VPScalarIVStepsRecipe::execute to support
executing in a replicate-region.

Depends on D133758.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133760
2022-12-04 22:59:17 +00:00
Florian Hahn
fb84dad58b [LV] Update test to use use variables in CHECK lines.
This makes the test more robust with respect to value numbering which
will change with future changes.
2022-12-04 11:59:00 +00:00
Matt Arsenault
a74c5707be Fix some test files with executable permissions 2022-12-02 17:12:03 -05:00
Mel Chen
7b5928e4a7 [NFC] Update Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll. 2022-12-02 01:07:16 -08:00
Philip Reames
73eacf94e0 [RISCV] Incorporate LMUL into costs for arithmetic and shuffles
This reuses the routine implemented in 0e6f0b7 to implement several existing TODOs. Many of the operations scale linearly with LMUL; this change represents that in the cost model.

Differential Revision: https://reviews.llvm.org/D139039
2022-12-01 10:46:27 -08:00
Sanjay Patel
e71b81cab0 [InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)
The first attempt was reverted because a clang test changed
unexpectedly - the file is already marked with a FIXME, so
I just updated it this time to pass.

Original commit message:
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue

The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.

In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask

Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC

The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.

Differential Revision: https://reviews.llvm.org/D138872
2022-11-30 14:52:20 -05:00
Sanjay Patel
5eacdcff06 Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1"
This reverts commit a4c466766d.
This broke clang tests that are wrongly dependent on the optimizer.
2022-11-30 14:10:50 -05:00
Sanjay Patel
a4c466766d [InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1
This is the main patch for converting a truncated scalar that is
inserted into a vector to bitcast+shuffle. We could go either way
on patterns like this, but this direction will allow collapsing a
pair of these sequences on the motivating example from issue

The patch is split into 3 parts to make it easier to see the
progression of tests diffs. We allow inserting/shuffling into a
different size vector for flexibility, so there are several test
variations. The length-changing is handled by shortening/padding
the shuffle mask with undef elements.

In part 1, handle the basic pattern:
inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask

Proof for the endian-dependency behaving as expected:
https://alive2.llvm.org/ce/z/BsA7yC

The TODO items for handling shifts and insert into an arbitrary base
vector value are implemented as follow-ups.

Differential Revision: https://reviews.llvm.org/D138872
2022-11-30 13:22:04 -05:00
Florian Hahn
0c5df7cd2f Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit bf15f1e489.

The updated version fixes a crash by checking the induction kind instead
of the opcode; for integer inductions, the step is always added, but the
opcode might not be set.
2022-11-30 17:04:20 +00:00
Florian Hahn
c7ca21816a [LV] Add test showing crash with 0fa666eced. 2022-11-30 16:51:24 +00:00
Florian Hahn
06ed6edc87 [LV] Update test to use opaque pointers. 2022-11-30 15:21:52 +00:00
William Huang
be4b1dd35b [InstCombine] Revert D125845
Reverting D125845 `[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back` because multiple users reported performance regression

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D138950
2022-11-29 22:02:40 +00:00
Simon Tatham
e45cbf9923 [ARM,MVE] Update MVE_VMLA_qr for architecture change.
In revision B.q and before of the Armv8-M architecture reference
manual, the vector/scalar forms of the `vmla` and `vmlas` instructions
came in signed and unsigned integer forms, such as `vmla.s8 q0,q1,r2`
or `vmlas.u32 q3,q4,r5`.

Revision B.r has changed this. There are no longer signed and unsigned
versions of these instructions, since they were functionally identical
anyway. Now there is just `vmla.i8` (or `i16` or `i32`, and similarly
for `vmlas`). Bit 28 of the instruction encoding, which was previously
0 for signed or 1 for unsigned, is now expected to be 0 always.

This change updates LLVM to the new version of the architecture. The
obsoleted encodings for unsigned integers are now decoding errors, and
only the still-valid encoding is ever emitted. This shouldn't break
any existing assembly code, because the old signed and unsigned
versions of the mnemonic are still accepted by the assembler (which is
standard practice anyway for all signedness-agnostic MVE integer
instructions).

Reviewed By: dmgreen, lenary

Differential Revision: https://reviews.llvm.org/D138827
2022-11-29 08:47:00 +00:00
Florian Hahn
bf15f1e489 Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit 0fa666eced.

This triggers an assertion during AArch64 stage2 builds. Revert while I
investigate.

See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio
2022-11-28 22:43:11 +00:00
Philip Reames
db07d79ab0 [RISCV] Add cost model for integer and float vector arithmetic instructions.
This patch implements getArithmeticInstrCost for RISCV, supports cost
model for integer and float vector arithmetic instructions.

Differential Revision: https://reviews.llvm.org/D133552 (Original patch by jacquesguan.  Subset by me with todos added.)
2022-11-28 09:04:38 -08:00
Florian Hahn
0fa666eced [VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133758
2022-11-28 16:32:31 +00:00
Florian Hahn
12bb5535d2 [VPlan] Move cast codegen to emitTransformedIndex (NFCI).
This reduces duplication a bit.

Suggested as simplification in D133758.
2022-11-26 22:47:13 +00:00