Commit Graph

2299 Commits

Author SHA1 Message Date
Graham Hunter
84ebe5b7e8 [LV] Precommit tests for uniform arguments for vector function variants
See https://github.com/llvm/llvm-project/pull/68879
2023-11-20 13:30:25 +00:00
Florian Hahn
7fd021a092 [LV] Don't crash on vector masks during scalar VPReductionRecipe::exec.
VPReductionRecipe may be executed for scalar VFs. Make sure to access
part 0 of the condition, as it could be an active-lane-mask, which is a
vector <1 x i1>

Fixes https://github.com/llvm/llvm-project/issues/72720.
2023-11-18 21:52:22 +00:00
Nilanjana Basu
e2210cefb1 [LV] Pre-committing tests for changing loop interleaving count computation (#70272)
Added tests for evaluating changes to loop interleaving count computation and for removing loop interleaving threshold in subsequent patches.
2023-11-17 17:38:04 -08:00
Florian Hahn
e5e71affb7 [LV] Reverse mask up front, not when creating vector pointer. (#72163)
Reverse mask early on when populating BlockInMask. This will enable
separating mask management and address computation from the memory
recipes in the future and is also needed to enable explicit unrolling in
VPlan.
2023-11-17 13:59:35 +00:00
Nikita Popov
de176d8c54 [SCEV][LV] Invalidate LCSSA exit phis more thoroughly (#69909)
This an alternative to #69886. The basic problem is that SCEV can look
through trivial LCSSA phis. When the phi node later becomes non-trivial,
we do invalidate it, but this doesn't catch uses that are not covered by
the IR use-def walk, such as those in BECounts.

Fix this by adding a special invalidation method for LCSSA phis, which
will also invalidate all the SCEVUnknowns/SCEVAddRecExprs used by the
LCSSA phi node and defined in the loop.

We should probably also use this invalidation method in other places
that add predecessors to exit blocks, such as loop unrolling and loop
peeling.

Fixes #69097.
Fixes #66616.
Fixes #63970.
2023-11-17 09:34:24 +01:00
Matthias Braun
a9cc6fc280 LoopVectorize: Set branch_weight for conditional branches (#72450)
Consistently add `branch_weights` metadata in any condition branch
created by `LoopVectorize.cpp`:
- Will only add metadata if the original loop-latch branch had metadata
assigned.
- Most checks should rarely trigger so I am using a 127:1 ratio.
- For the middle block we assume an equal distribution of modulo
results.
2023-11-16 11:33:46 -08:00
Florian Hahn
1b82cc1186 [LV] Regenerate check lines for scalable-trunc-min-bitwidth.ll.
Re-generate check lines to reduce diff in follow-up change.
2023-11-16 12:34:40 +00:00
Florian Hahn
95eaaa7d71 [LV] Replace undef with constant and pointer argument in tests.
This makes the tests more defined, prevents uses of the add being folded
and remove UB when loading from undef.
2023-11-16 12:23:17 +00:00
Philip Reames
c05ab7b850 Regenerate a couple of auto-gen tests to reduce diffs in upcoming change [nfc] 2023-11-15 12:33:15 -08:00
Florian Hahn
097ba5366c [VPlan] Use VPTypeInfo in simplifyRecipes.
Replace getTypeForVPValue with the recently added, more general
VPTypeAnalysis.
2023-11-15 15:28:51 +00:00
Yingwei Zheng
dc6d077396 [CVP] Infer nneg on existing zext (#72052)
This patch infers `nneg` flags for existing zext instructions in CVP.
After https://github.com/llvm/llvm-project/pull/71534 and this patch, we
can drop `zext -> zext nneg` transform in `RISCVCodeGenPrepare`:


40671bbdef/llvm/lib/Target/RISCV/RISCVCodeGenPrepare.cpp (L74-L83)

This is an alternative to #72049.
2023-11-13 22:41:37 +08:00
Graham Hunter
b070629c10 [LV] Increase max VF if vectorized function variants exist (#66639)
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
2023-11-13 10:27:10 +00:00
Florian Hahn
34c2dcd5ac [VPlan] Move initial skeleton construction to createInitialVPlan. (NFC)
This patch moves creating the  middle VPBBs and an initial empty
vector loop region for the top-level loop to createInitialVPlan.

This consolidates code to create the initial VPlan skeleton and enables
adding other bits outside the main region during initial VPlan
construction. In particular, D150398 will add the exit check & branch to
the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158333
2023-11-12 13:00:44 +00:00
Florian Hahn
ed6f4994d8 [VPlan] Handle conditional ordered reductions with scalar VFs.
VPReductionRecipe::execute was not handling predicates for ordered
reduction with scalar VFs, which was causing a crash. Thsi patch adds
dedicated handling for scalar VFs when dealing with the condition.
The other operands are already handled in a similar fashion below.

Fixes #70988.
2023-11-11 12:55:40 +00:00
Nikita Popov
5918f62301 [InstCombine] Infer zext nneg flag (#71534)
Use KnownBits to infer the nneg flag on zext instructions.

Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
2023-11-08 09:34:40 +01:00
Philip Reames
23099ac239 Add known and demanded bits support for zext nneg (#70858)
zext nneg was recently added to the IR in #67982.   This patch teaches
demanded bits and known bits about the semantics of the instruction, and
adds a couple of test cases to illustrate basic functionality.
2023-11-06 18:47:56 -08:00
Ramkumar Ramachandra
2302e4c327 Reland "VectorUtils: mark xrint as trivially vectorizable" (#71416)
With the recent change 98c90a13 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint
and llvm.llrint, with vector codegen for the RISC-V target. Make a
trivial change to VectorUtils, and update the corresponding tests.

A couple of important fixes have been landed since the original patch
was landed and reverted, and it is now safe to re-land the patch:
5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and
fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in
xrint). See also #71399, which proves that lrint and llrint will indeed
produce vector codegen on RISC-V.

Fixes #55208.
2023-11-06 18:49:49 +00:00
Florian Hahn
fd82b5b287 [LV] Support recieps without underlying instr in collectPoisonGenRec.
Support recipes without underlying instruction in
collectPoisonGeneratingRecipes by directly trying to dyn_cast_or_null
the underlying value.

Fixes https://github.com/llvm/llvm-project/issues/70590.
2023-11-03 10:21:14 +00:00
David Sherwood
07f0e75b53 [LoopVectorize] Fix bug with code to hoist runtime checks (#70937)
There was a silly mistake in the expandBounds function that was using
the wrong type when calling expandCodeFor and always assuming the stride
is 64 bits. I've added the following test to defend this fix:

Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll
2023-11-02 10:02:50 +00:00
Ramkumar Ramachandra
ac7c816dc2 Revert "VectorUtils: mark lrint, llrint as trivially vectorizable (#69945)"
This reverts commit 5bfd89bda7.

It was causing build failures on ffmpeg on i686.
2023-11-01 09:57:22 +00:00
Ramkumar Ramachandra
5bfd89bda7 VectorUtils: mark lrint, llrint as trivially vectorizable (#69945)
With the recent change 98c90a13 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint
and llvm.llrint, with vector codegen for the RISC-V target. Make a
trivial change to VectorUtils, and update the corresponding tests.
2023-10-31 21:29:15 +00:00
Philip Reames
f8742b8d6a [SCEV] Teach SCEVExpander to use zext nneg when possible (#70815)
zext nneg was recently added to the IR in #67982. Teaching SCEVExpander
to emit nneg when possible is valuable since SCEV may have proved
non-trivial facts about loop bounds which would otherwise be lost when
materializing the value.
2023-10-31 09:33:07 -07:00
Philip Reames
6485978120 Refresh a couple of auto-gen tests [nfc]
Reducing spurious diff in an upcoming review.
2023-10-31 07:46:01 -07:00
Ramkumar Ramachandra
562ce8bbd2 LoopVectorize: add negative test for lrint, llrint (#70211)
With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible to vectorize
llvm.lrint and llvm.llrint with a trivial change to VectorUtils. In
preparation for this change, and the corresponding test update, add a
negative test for lrint and llrint.
2023-10-31 13:13:26 +00:00
Ramkumar Ramachandra
1d090b8241 LoopVectorize/test: add missing CHECK lines, cleanup intrinsic.ll (#70202)
Clean up intrinsic.ll by removing extraneous attributes and target
datalayout, fix a bug in the copysign_f64 test, and add missing CHECK
lines.
2023-10-31 12:50:46 +00:00
Philip Reames
3f2ed812f0 [InstCombine] Infer nneg on zext when forming from non-negative sext (#70706)
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
2023-10-30 12:09:43 -07:00
Igor Kirillov
70904226e1 [LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (#69588)
* Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known
to be predicate tail-folded, delegating to `areRuntimeChecksProfitable`
to decide on the profitability of vectorizing loops with runtime checks.
* Update the `areRuntimeChecksProfitable` function to consider the
`ScalarEpilogueLowering` setting when assessing vectorization of a loop.

With this patch, we can make more informed decisions for loops with low
trip counts, especially when leveraging Profile-Guided Optimization
(PGO) data.
2023-10-30 13:43:26 +00:00
Allen
46cb7e4eea [LoopDist] Update the pragma info of loop distribute, NFC (#69825)
Base on D19403, the exact pragma of distribute is
  `#pragma clang loop distribute`
2023-10-28 17:47:46 +08:00
Florian Hahn
cdc5e00e73 [LV] Add test case to scalarize ptrtoint instructions.
Extra test for https://github.com/llvm/llvm-project/pull/69013
2023-10-27 14:32:54 +01:00
Florian Hahn
cff6652129 [VPlan] Handle VPValues without underlying values in getTypeForVPValue.
Fixes a crash after 0c8e5be6fa.

Full type inference will be added in
https://github.com/llvm/llvm-project/pull/69013
2023-10-27 13:34:54 +01:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Matthias Braun
e3cf80c5c1 BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-10-24 20:27:39 -07:00
Florian Hahn
159614a52f [LV] Use variable instead of value number in vplan-dot-printing.ll test. 2023-10-23 20:25:22 +01:00
Florian Hahn
0c8e5be6fa [VPlan] Simplify redundant trunc (zext A) pairs to A.
Add simplification for redundant trunc(zext A) pairs. Generally apply a
transform from D149903.

Depends on D159200.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159202
2023-10-22 11:41:38 +01:00
Lou Knauer
852bac4439 [VPlan] Support scalable vectors in outer-loop vectorization
This patch enables scalable vectors in the VPlan-native path.
If a vectorization factor is specified via loop vectorization hints,
that factor is used. If no vectorization factor is specified, but the
target preferes scalable vectorization, a scalable vectorization factor
is selected.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D157484
2023-10-20 23:17:35 +01:00
Florian Hahn
2ec7bba77b Recommit "[VPlan] Insert Trunc/Exts for reductions directly in VPlan."
This reverts commit e4ea099748.

The recommit fixes a reported crash by adding a missing check to make
sure the cast recipes are only introduced when vectorizing.

Test coverage added in 3cac608fbd.
Original commit message:
   Update the code to create Trunc/Ext recipes directly in
    adjustRecipesForReductions instead of fixing it up later in
    fixReductions.

    This explicitly models the required conversions and also makes sure they
    are generated at the right place (instead of after the exit condition),
    hence the changes in a few tests.
2023-10-20 14:30:04 +01:00
Graham Hunter
1abc28fea0 [NFC][LV] Add test for vectorizing fmuladd with another call (#68601)
As requested in (#66521)

I confirmed a crash with "return" instead of "continue" in
setVectorizedCallDecision's fmuladd reduction recognition.
2023-10-20 10:23:31 +01:00
Florian Hahn
3cac608fbd [LV] Add interleave only test case with reduction requiring casts.
This adds test coverage for a crash exposed by
d311126349b8fe1684d62154a9fa5a7bbb0b713.
2023-10-19 20:52:21 +01:00
Igor Kirillov
b84977bcc1 Rename test to avoid overlapping with debug output 2023-10-19 12:21:31 +00:00
Fangrui Song
e4ea099748 Revert "[VPlan] Insert Trunc/Exts for reductions directly in VPlan."
This reverts commit fd31112634.

There are two different crash reports on fd31112634
2023-10-18 23:25:31 -07:00
Florian Hahn
fd31112634 [VPlan] Insert Trunc/Exts for reductions directly in VPlan.
Update the code to create Trunc/Ext recipes directly in
adjustRecipesForReductions instead of fixing it up later in
fixReductions.

This explicitly models the required conversions and also makes sure they
are generated at the right place (instead of after the exit condition),
hence the changes in a few tests.
2023-10-17 19:17:40 +01:00
Yingwei Zheng
4718b4011f [LV] Invalidate disposition of SCEV values after loop vectorization (#69230)
This PR fixes the assertion failure of `SE.verify()` after loop vectorization.
2023-10-17 03:49:39 +08:00
Florian Hahn
f7a8a78cb7 [VPlan] Also print operands of canonical IV (NFC).
Also print the operands of VPCanonicalIVPHIRecipe. That was missed
earlier.
2023-10-16 20:28:23 +01:00
Florian Hahn
38f8b7cbe4 [LV] Replace value numbers with patterns in tests (NFC).
Replace some hardcoded value numbers in CHECK-LINES to use patterns, to
 make the tests more robust wrt renumbering.
2023-10-16 19:53:44 +01:00
JolantaJensen
afdb18df4d [NFC][AArch64][LV] Reorganise LV tests using symbols from SLEEF (#68207)
The tests introduced by https://reviews.llvm.org/D134719 and later
modified in https://reviews.llvm.org/D146839 are not testing LV in
isolation. This patch:
  1. Assures that all tests test LV in isolation.
  2. Adds LV tests using llvm intrinsics that have libm mappings.

llrint, llround and lrint are not included as currently IR verifier pass
does not allow to use vector types with them.
2023-10-13 12:10:21 +01:00
Ramkumar Ramachandra
8593c0bc02 LoopVectorize/test: clean up reduction.ll; generate using UTC (NFC) (#68890)
The test reduction.ll was introduced before utils/update_test_checks.py,
and hence contains hand-written CHECK lines. Revisit the test today, and
modernize it by:

- Removing extranous attributes on functions and their arguments, as
LoopVectorize doesn't even look at these attributes.
- Removing the target datalayout, as it is not essential for
LoopVectorize.

Finally, regenerate the CHECK lines using update_test_checks.py,
eliminating hand-written error-prone CHECK lines.
2023-10-12 15:45:15 +01:00
Nikita Popov
30faaaf626 [LoopVectorize] Regenerate test checks (NFC) 2023-10-12 14:35:23 +02:00
Rin
df8e0d057d [AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697)
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.

We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
2023-10-09 16:26:19 +01:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00