Commit Graph

653 Commits

Author SHA1 Message Date
Florian Hahn
8f781b96e2 Revert "[VPlan] Mark recurrence recipes as not having side-effects."
This reverts commit 02369b75fd.

At the moment, live-outs used *only* for the resume values in the scalar
loop are not modeled in VPlan yet. This means first-order recurrence
recipes could be removed, when a scalar epilogue is required and the
only use of a FOR is outside the loop.

Keep treating recurrence recipes as having side-effects for now, to
avoid them being removed.

Fixes #62954.
2023-06-06 11:35:26 +02:00
Florian Hahn
f47084ecfb [LV] Use force-vector-width for X86 recurrence test.
This makes sure that all tests that can be vectorized in the file are
vectorized.
2023-06-06 11:27:35 +02:00
Florian Hahn
4c51a45e80 [LV] Add test for #62954. 2023-06-06 11:20:22 +02:00
Florian Hahn
572cfa3fde [LV] Use SCEV for uniformity analysis across VF
This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D148841
2023-05-31 16:01:00 +01:00
Tobias Hieta
f84bac329b [NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4e
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Florian Hahn
3d4eed0133 [LV] Reuse SCEV expansion results for epilogue vectorization.
When generating code for the epilogue vector loop, we need to re-use the
expansion results for induction steps generated for the main vector
loop, as the pre-header of the epilogue vector loop may not dominate the
vector preheader of the epilogue.

This fixes a reported crash. Note that this is a workaround which should
be removed soon once induction resume value creation is handled in VPlan
directly.
2023-05-11 22:00:07 +01:00
Philip Reames
cb3cb417a0 [LV] Refresh some auto-gen tests to reduce diff [nfc] 2023-05-01 13:55:11 -07:00
Noah Goldstein
d840391401 [ValueTracking] Add logic for isKnownNonZero(smin/smax X, Y)
For `smin` if either `X` or `Y` is negative, the result is non-zero.
For `smax` if either `X` or `Y` is strictly positive, the result is
non-zero.

For both if `X != 0` and `Y != 0` the result is non-zero.

Alive2 Link:
    https://alive2.llvm.org/ce/z/7yvbgN
    https://alive2.llvm.org/ce/z/zizbvq

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149417
2023-04-30 10:06:46 -05:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
David Green
1869a9c225 [LV] Use the known trip count when costing non-tail folded VFs
Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.

This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.

Differential Revision: https://reviews.llvm.org/D147720
2023-04-24 22:02:30 +01:00
Nikita Popov
2ec1d0f427 [InstCombine] Don't reassociate GEPs for loop invariance
Since D146813, LICM will reassociate GEPs to expose hoisting
opportunities itself. Don't perform this transform in InstCombine,
where it is fragile because it depends on an optional LoopInfo
analysis.
2023-04-18 12:17:07 +02:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Florian Hahn
35af27c30a [VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Nikita Popov
53f7f85703 [LoopVectorize] Convert some tests to opaque pointers (NFC) 2023-04-06 09:38:47 +02:00
Florian Hahn
0d61ffd350 [Loads] Support SCEVAddExpr as start for pointer AddRec.
Extend handling to support `%base + offset` as start for AddRecs in
isDereferenceableAndAlignedInLoop. This is done by adjusting AccessSize
by the offset and effectively checking if the full object starting from
%base to %base + offset + access-size is dereferenceable.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D147260
2023-04-02 12:33:44 +01:00
Anna Thomas
4277d932ef [LV] Use speculatability within entire loop to avoid strided load predication
Use existing functionality for identifying total access size by strided
loads. If we can speculate the load across all vector iterations, we can
avoid predication for these strided loads (or masked gathers in
architectures which support it).

Differential Revision: https://reviews.llvm.org/D145616
2023-03-21 12:08:25 -04:00
Sanjay Patel
ef6f23535d Revert "[InstCombine] use loop info when running the pass after loop vectorization"
This reverts commit 43ae4b62b2.

This was intended to be practically NFC in terms of the overall
opt pipeline, but there is experimental data showing that code
changes occurred here:
https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text
2023-03-11 17:28:56 -05:00
Sanjay Patel
43ae4b62b2 [InstCombine] use loop info when running the pass after loop vectorization
This is the follow-up to D144199 and suggestion from D144045.
We make use of loop info explicit via InstCombine pass parameter
rather than semi-arbitrary via caching.

The only InstCombine transform that uses LoopInfo currently is a
GEP fold in visitGEPOfGEP(), so that shows up as a failure in the
dedicated test for the fold as well as several LoopVectorizer tests
that run extra passes.

I don't see any pass manager regression tests that actually check
for pass options, but this is intended to be NFC for the pass
pipeline behavior - we only try to use loop info where it would
have been used before via caching .

Differential Revision: https://reviews.llvm.org/D144274
2023-03-11 14:20:30 -05:00
Anna Thomas
ac4c0ea73b [Tests] Precommit tests for D145616 2023-03-08 17:30:53 -05:00
Florian Hahn
79272ec028 [VPlan] Add predicate to VPReplicateRecipe, expand region later.
This patch adds the predicate as additional operand to VPReplicateRecipe
during initial construction. The predicated recipes are later moved into
replicate regions. This simplifies constructions and some VPlan
transformations, like fixed-order recurrence handling.

It also improves codegen in some cases (e.g. for in-loop reductions),
because the recipes remain in the same block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143865
2023-03-08 20:11:28 +01:00
Florian Hahn
7019624ee1 [SCEV] Strengthen nowrap flags via ranges for ARs on construction.
At the moment, proveNoWrapViaConstantRanges is only used when creating
SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by
strengthening flags after creating the AddRec.

I'll also share a follow-up patch that removes the code to strengthen
flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while
creating those can lead to surprising changes.

Compile-time looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u

Reviewed By: mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D144050
2023-03-07 17:10:34 +01:00
sgokhale
4f9a5447c6 [LV] Reland "Update logic for calculating register usage due to invariants"
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 17:32:39 +05:30
sgokhale
3c8ddbde37 Revert "[LV] Update logic for calculating register usage due to invariants"
Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll

This reverts commit d162826694.
2023-02-28 15:46:59 +05:30
sgokhale
d162826694 [LV] Update logic for calculating register usage due to invariants
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 11:05:26 +05:30
Florian Hahn
a9ac22b501 [LV] Add users for loads to make tests more robust.
Update a few tests to add users to loads to avoid them being optimized
out by future changes. In cases the unused loads didn't matter for the
test, remove them.
2023-02-04 20:42:57 +00:00
Craig Topper
df76ff98e8 [InstCombine][LV] Fold (add (zext (add X, -1)), 1) -> (zext X) if X is non-zero.
This artifact can appear from the vectorizer. (add X, -1) is the
backedge taken count. It gets zero extended and then 1 is added to
it to get the trip count.

There is usually a dominating branch that rules out X being zero.

Alive: https://alive2.llvm.org/ce/z/NsRDwX
2023-01-30 17:45:01 -08:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Paul Walker
eae26b6640 [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983
2023-01-11 14:08:06 +00:00
Florian Hahn
78914e8c32 [VPlan] Keep entries in worklist in sinkScalarOperands.
Not removing the entries ensures that duplicates are avoided,
reducing the number of iterations.
2023-01-08 15:52:57 +00:00
Florian Hahn
68469a80cb [LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
Nikita Popov
2fab927546 [LoopVectorize] Convert some tests to opaque pointers (NFC)
Check lines for some of these tests were regenerated. The difference
is that with opaque pointers SCEVExpander always emits i8 GEPs,
making the address calculation explicit. This is a known problem
that will be solved long term by making all address calculations
explicit.
2023-01-04 17:25:42 +01:00
Florian Hahn
f69ac9a22d [LV] Support widened induction variables in epilogue vectorization.
Code generation now uses the start VPValue of induction recipes.

This makes it possible to adjust the start value of the epilogue
vector loop to use the 'resume' value of the main vector loop.

Fixes #59459.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D92132
2022-12-21 13:58:50 +00:00
Florian Hahn
d58f670788 [LV] Add test for #59459. 2022-12-21 13:23:25 +00:00
Nikita Popov
5b40015063 [LoopVectorize] Convert some tests to opaque pointers (NFC)
For these tests update_test_checks.py had to be rerun.
2022-12-14 15:27:31 +01:00
Nikita Popov
7d7577256b [LoopVectorize] Convert some tests to opaque pointers (NFC) 2022-12-14 15:16:59 +01:00
Bjorn Pettersson
51ee10747d [test] Remove duplicate RUN lines
A few more that I missed in commit 3528e63d89.

There could be more duplicates remaining, since I've only focused
on exactly duplicated "RUN: opt" lines (ignoring multi line RUN
lines ending with '\').
2022-12-08 12:47:24 +01:00
Roman Lebedev
1e08a08a87 [NFC] Port all LoopVectorize tests to -passes= syntax 2022-12-08 02:38:47 +03:00
Roman Lebedev
be51fa4580 [NFC] Port all runlines for LoopVectorize pass tests to -passes syntax 2022-12-05 22:17:30 +03:00
Matt Arsenault
a74c5707be Fix some test files with executable permissions 2022-12-02 17:12:03 -05:00
Florian Hahn
ed2fdace89 [LV] Use separate index to access StoredValues in vectorizeInterleave.
StoredValues only has entries for members of the interleave group. If
there are gaps, then using the index i here will either access a wrong
entry or be out-of-bounds.

Instead use a dedicated index that only gets incremented for members of
the interleave group.

Fixes #59090.
2022-11-25 15:28:05 +00:00
Roman Lebedev
8e37b53360 [X86] Rewrite getScalarizationOverhead()
All of our insert/extract ops work on 128-bit lanes.

For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.

But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.

Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.

`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
  // Note that in general, the insertion starting at the beginning of a vector
  // isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.

I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.

The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137913
2022-11-15 21:07:12 +03:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Arthur Eubanks
d3d8465446 [opt] Stop treating alias analysis specially when translating legacy opt syntax
I've attempted to keep AA tests as close to their original intent as possible.
2022-10-07 11:50:43 -07:00
Philip Reames
32dc1151e2 [VPlan] Only generate single instr for unpredicated stores of varying value to invariant address
This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)

This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)

Differential Revision: https://reviews.llvm.org/D133580
2022-09-22 08:53:46 -07:00
Simon Pilgrim
e030be64d8 [CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates
This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad.

This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 11:24:11 +01:00
Simon Pilgrim
b2cd8118d0 [CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions
This was achieved with the 'cost-tables vs llvm-mca' script D103695
2022-09-22 10:19:23 +01:00
Philip Reames
8c46881a53 [TTI] Recognize fp constants in getOperandInfo
We were recognizing vectors of floats, but not scalars.  That's a tad odd.
2022-09-21 14:34:34 -07:00
Simon Pilgrim
71162ad957 [LoopVectorize] Fix test name - the test is for fshl not cttz intrinsic costs 2022-09-21 15:24:43 +01:00
Simon Pilgrim
09cb9fdef9 [InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635)
Alive2: https://alive2.llvm.org/ce/z/sZ6wwS

As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero.

Differential Revision: https://reviews.llvm.org/D134172
2022-09-20 16:44:41 +01:00
Vitaly Buka
bbef90ace4 [IRBuilder] Use PoisonValue in CreateMasked*
Followup to 72b776168c

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D133967
2022-09-19 11:01:41 -07:00