Commit Graph

526 Commits

Author SHA1 Message Date
Nikita Popov
90ba33099c [InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Alexandros Lamprineas
530c72b498 [TLI] Add missing ArmPL mappings (#78474)
Adds TLI mappings for fixed and scalable vector variants of cospi(f),
fmax(f), ilogb(f) and ldexp(f).
2024-01-22 17:15:17 +00:00
Graham Hunter
689da340ed [NFC][LV] Test precommit for interleaved linear args 2024-01-19 12:59:09 +00:00
Alexandros Lamprineas
92289db82f [VFABI] Move the Vector ABI demangling utility to LLVMCore. (#77513)
This fixes #71892 allowing us to check magled names in the IR verifier.
2024-01-17 09:55:30 +00:00
Maciej Gabka
279dfe77da [TLI][AArch64] Add extra SLEEF mappings and tests (#78140)
This patch is adding more scalar to vector mappings to the TLI
for the SLEEF vector library.
The added mappings are for the following functions:

 acosh, asinh, cbrt, copysign, cospi
 erf, erfc, expm1, fdim, fma, fmax, fmin
 hypot, ilogb, ldexp, log1p, nextafter, sinpi.

It also brings back accidentally removed tests for sincospi.
2024-01-16 14:51:38 +00:00
Maciej Gabka
5dbf178154 [TLI][NFC] Fix ordering of ArmPL and SLEEF tests (#77609)
This patch sorts the tests which check if SLEEF and ArmPL mappings are
used, in the order of the math functions base names.
2024-01-12 15:06:25 +00:00
Florian Hahn
51afb10174 [LV] Create block in mask up-front if needed. (#76635)
At the moment, block and edge masks are created on demand, which means
that they are inserted at the point where they are demanded and then
cached. It is possible that the mask for a block is looked up later at a
point that's not dominated by the point where the mask has been
inserted.

To avoid this, create masks up front on entry to the corresponding basic
block and leave it to VPlan simplification to remove unneeded masks.

Note that we need to create masks for all blocks, if any of the blocks
in the loop needs predication, as computing the mask of a block depends
on the masks of its predecessor.

Needed for #76090.

https://github.com/llvm/llvm-project/pull/76635
2024-01-09 10:50:08 +00:00
Alexandros Lamprineas
8c7f10eadb [TLI] Add mappings to SLEEF/ArmPL libcall variants taking linear args. (#76060)
The mappings correspond to vectorized variants (fixed/scalable) for the
math functions: modf, sincos, sincospi.
2024-01-05 11:01:09 +00:00
Florian Hahn
241fe83704 [VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253)
This patch introduces a new ComputeReductionResult opcode to compute the
final reduction result in the middle block. The code from fixReduction
has been moved to ComputeReductionResult, after some earlier cleanup
changes to model parts of fixReduction explicitly elsewhere as needed.

The recipe may be broken down further in the future.

Note that  the phi nodes to merge the reduction result from the trip 
count check and the middle block, to be used as resume value for the
scalar remainder loop are also generated based on 
ComputeReductionResult.

Once we have a VPValue for the reduction result, this can also be
modeled explicitly and moved out of the recipe.
2024-01-04 22:53:18 +00:00
Nilanjana Basu
cd28da390f [LV] Change loops' interleave count computation (#73766)
[LV] Change loops' interleave count computation

A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both.

The initial tests for this change were submitted in PRs:
https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.
2024-01-04 12:45:22 +05:30
Alexandros Lamprineas
e512df3ecc [LV] Fix crash when vectorizing function calls with linear args. (#76274)
llvm/lib/IR/Type.cpp:694:
    Assertion `isValidElementType(ElementType) && "Element type of a
    VectorType must be an integer, floating point, or pointer type."'
    failed.
Stack dump:
    llvm::FixedVectorType::get(llvm::Type*, unsigned int)
    llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&)
    llvm::VPBasicBlock::execute(llvm::VPTransformState*)
    llvm::VPRegionBlock::execute(llvm::VPTransformState*)
    llvm::VPlan::execute(llvm::VPTransformState*)
    ...

Happens with function calls of void return type.
2024-01-02 18:14:16 +00:00
Florian Hahn
f18536d642 [VPlan] Model address separately. (#72164)
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.

https://github.com/llvm/llvm-project/pull/72164
2024-01-01 19:51:15 +00:00
Alexandros Lamprineas
6c2ad8ac7b [TLI][NFC] Autogenerate vectorized call tests for SLEEF/ArmPL. (#76146)
This patch prepares the ground for #76060.

* Unifies ArmPL and SLEEF tests for better coverage
* Replaces deprecated float* and double* types with ptr
* Adds noalias attribute to pointer arguments
* Adds some cmd-line options to the RUN lines to simplify output
* Removes datalayout since target triple is provided
* Removes checks for return statements
* Refactors the regex filter for autogenerated checks
* Removes redundant test file suffix (already under the AArch64 dir)
2023-12-22 16:29:18 +00:00
Paschalis Mpeis
2349731992 [TLI] Add SLEEFGNUABI mappings for fmod/fmodf fixed-width. (#75803)
Cleanup test sleef-calls-aarch64.ll:
- make the util update script's regex more clear
- eliminate scalar epilogues in tests
2023-12-20 09:08:17 +00:00
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
Shih-Po Hung
b97c5a9554 [VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
- Replace `undef` pointer operands and add stores to avoid the loads
   being optmized away.
2023-12-14 21:16:11 +08:00
Simon Pilgrim
b7fc78255e Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026)"
vplan-unused-interleave-group.ll is causing buildbot failures
2023-12-14 10:25:41 +00:00
Shih-Po Hung
2047ab00ea [VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
   - Replace `undef` pointer operands and add stores to avoid the loads
      being optmized away.
2023-12-14 17:36:58 +08:00
Nilanjana Basu
41a3828838 [LV] Added pre-commit tests for changing loop interleaving count computation (#74689)
Added more pre-commit tests for evaluating changes to loop interleaving count computation in (https://github.com/llvm/llvm-project/pull/73766). The new set of tests address the change in IC computation to minimize the remainder TC of the vectorized loop while maximizing the IC when the
remainder TC is the same.
2023-12-12 11:09:25 +05:30
Florian Hahn
a5891fa4d2 [VPlan] Initial modeling of VF * UF as VPValue. (#74761)
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)

This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.

The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.

Moved from Phabricator (https://reviews.llvm.org/D157322)
2023-12-08 18:30:30 +00:00
Florian Hahn
5ea6a3fc6d [VPlan] Compute scalable VF in preheader for induction increment. (#74762)
UF * VF is loop invariant and can be computed directly in the preheader.
This prepares the code for #74761 and reduces the test changes.
2023-12-08 12:18:31 +00:00
Graham Hunter
d0d5ef8133 [LV] Add support for linear arguments for vector function variants (#73941)
If we have vectorized variants of a function which take linear
parameters, we should be able to vectorize assuming the strides match.
2023-12-08 10:24:05 +00:00
Nikita Popov
d77067d08a [ValueTracking] Add dominating condition support in computeKnownBits() (#73662)
This adds support for using dominating conditions in computeKnownBits()
when called from InstCombine. The implementation uses a
DomConditionCache, which stores which branches may provide information
that is relevant for a given value.

DomConditionCache is similar to AssumptionCache, but does not try to do
any kind of automatic tracking. Relevant branches have to be explicitly
registered and invalidated values explicitly removed. The necessary
tracking is done inside InstCombine.

The reason why this doesn't just do exactly the same thing as
AssumptionCache is that a lot more transforms touch branches and branch
conditions than assumptions. AssumptionCache is an immutable analysis
and mostly gets away with this because only a handful of places have to
register additional assumptions (mostly as a result of cloning). This is
very much not the case for branches.

This change regresses compile-time by about ~0.2%. It also improves
stage2-O0-g builds by about ~0.2%, which indicates that this change results
in additional optimizations inside clang itself.

Fixes https://github.com/llvm/llvm-project/issues/74242.
2023-12-06 14:17:18 +01:00
Graham Hunter
f0f899932b [LV] Linear argument tests for vectorization of function calls (#73936)
Tests to exercise vectorization of function calls where a vector variant
takes a linear parameter.
2023-12-06 11:55:03 +00:00
Florian Hahn
bbd1941a38 [VPlan] Add disjoint flag to VPRecipeWithIRFlags. (#74364)
A new disjoint flag was added for OR instructions in #72583. 

Update VPRecipeWithIRFlags to also support the new flag. This
allows printing and preserving the disjoint flag in vectorized code.
2023-12-05 15:21:59 +00:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Florian Hahn
cd4348349a [VPlan] Sink cases where no truncate is needed in truncateMinimalBWs.
MinBWs contains entries that specify the minimum required bitwidth. In
some cases, the old and new bitwidths can be equal (see test case) and
in those cases no truncations are needed, so skip those cases.

Fixes #74307.
2023-12-04 15:35:54 +00:00
Florian Hahn
efec4cc501 [LV] Remove unused CHECK lines, remove IR references from test.
Clean up sve-tail-folding-option.ll by removing the unused
CHECK-TF-NEOVERSE-V1 prefix (note the use of non-opaque pointers) and
remove IR value references.
2023-12-04 13:06:30 +00:00
Florian Hahn
c890582912 [VPlan] Account for live-in entries in MinBW used by replicate recipes.
In some cases MinBWs may contain entries for live-ins that are not used
by VPWidenRecipe or VPWidenSelectRecipes. In those cases, the live-ins
won't get processed, so make sure we include them in the count when used
as operands in VPWidenCast and VPWidenSelectRecipe.

Fixes https://github.com/llvm/llvm-project/issues/74231
2023-12-03 11:15:29 +00:00
Craig Topper
7ec4f6094e [InstCombine] Infer disjoint flag on Or instructions. (#72912)
The disjoint flag was recently added to IR in #72583

We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.
2023-12-02 14:11:12 -08:00
Florian Hahn
70535f5e60 [VPlan] Replace IR based truncateToMinimalBitwidths with VPlan version.
This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 3 benefits:
1) the VPlan-based version is simpler; we don't need to implement
   special codegen for each supported instruction type like the IR based
   one.
2) Removes a dependency on the cost-model after VPlan execution and
3) Removes a use of getVPValue that uses underlying values after VPlan
   execution (See removed FIXME).

Depends on D149081.

Depends on D149079.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149903
2023-12-02 16:12:38 +00:00
Paschalis Mpeis
1bfb84b477 [NFC][TLI] Improve tests for ArmPL and SLEEF Intrinsics. (#73352)
Auto-generate test `armpl-intrinsics.ll` and simplify tests:
- Eliminate scalar tail with no tail-folding flag.
- Use active lane mask for shorter check lines (no long `shufflevectors`).
- Eliminate scalar loops by providing `noalias` to relevant arguments and
run `simplifycfg` to drop them.
- Update script now use `@llvm.compiler.used` instead of a longer regex.
2023-11-29 11:19:10 +00:00
Graham Hunter
104b7c624e [LV] Add support for uniform parameters on vectorized function variants (#72891)
Parameters marked as uniform take a scalar value, assuming the value is
invariant in the scalar loop.
2023-11-28 15:01:32 +00:00
Nikita Popov
f0faff8b9b [LoopVectorize] Regenerate test checks (NFC) 2023-11-28 15:50:27 +01:00
Craig Topper
03d4a9d94d [InstCombine] Set disjoint flag when turning Add into Or. (#72702)
The disjoint flag was recently added to IR in #72583
2023-11-27 12:54:11 -08:00
pasmpe01
de6c9c84e2 [TLI][AArch64] Add TLI Mappings of @llvm.exp10 for ArmPL and SLEEF.
Update regex to _explicitly_ show which exp versions are added.
The previous regex used `exp[^e]` to avoid matching calls like:
`@llvm.experimental.stepvector`.

Note: ArmPL Mappings for scalable types are not yet utilized
(eg, `llvm.exp10.nxv2f64`, `llvm.exp10.nxv4f32`), as `replace-with-veclib`
pass needs improvements.
2023-11-24 12:24:33 +00:00
Graham Hunter
b1fba568f6 [SVE] Don't require lookup when demangling vector function mappings (#72260)
We can determine the VF from a combination of the mangled name (which
indicates the arguments that take vectors) and the element sizes of
the arguments for the scalar function the mapping has been established
for.

The assert when demangling fails has been removed in favour of just
not adding the mapping, which prevents the crash seen in
https://github.com/llvm/llvm-project/issues/71892

This patch also stops using _LLVM_ as an ISA for scalable vector tests,
since there aren't defined rules for the way vector arguments should be
handled (e.g. packed vs. unpacked representation).
2023-11-23 17:15:48 +00:00
Florian Hahn
32d1197a8f [LV] Use SCEV for subtraction of src/sink for diff runtime checks.
Instead of expanding the src/sink SCEV expressions and emitting an IR
sub to compute the difference, the subtraction can be directly be
performed by ScalarEvolution. This allows the subtraction to be
simplified by SCEV, which in turn can reduced the number of redundant
runtime check instructions generated.

It also allows to generate checks that are invariant w.r.t. an outer
loop, if he inner loop AddRecs have the same outer loop AddRec as start.
2023-11-22 12:48:04 +00:00
Graham Hunter
84ebe5b7e8 [LV] Precommit tests for uniform arguments for vector function variants
See https://github.com/llvm/llvm-project/pull/68879
2023-11-20 13:30:25 +00:00
Nilanjana Basu
e2210cefb1 [LV] Pre-committing tests for changing loop interleaving count computation (#70272)
Added tests for evaluating changes to loop interleaving count computation and for removing loop interleaving threshold in subsequent patches.
2023-11-17 17:38:04 -08:00
Florian Hahn
e5e71affb7 [LV] Reverse mask up front, not when creating vector pointer. (#72163)
Reverse mask early on when populating BlockInMask. This will enable
separating mask management and address computation from the memory
recipes in the future and is also needed to enable explicit unrolling in
VPlan.
2023-11-17 13:59:35 +00:00
Florian Hahn
95eaaa7d71 [LV] Replace undef with constant and pointer argument in tests.
This makes the tests more defined, prevents uses of the add being folded
and remove UB when loading from undef.
2023-11-16 12:23:17 +00:00
Graham Hunter
b070629c10 [LV] Increase max VF if vectorized function variants exist (#66639)
If there are function calls in the candidate loop and we have vectorized
variants available, try some wider VFs in case the conservative initial
maximum based on the widest types in the loop won't actually allow us
to make use of those function variants.
2023-11-13 10:27:10 +00:00
Philip Reames
3f2ed812f0 [InstCombine] Infer nneg on zext when forming from non-negative sext (#70706)
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
2023-10-30 12:09:43 -07:00
Igor Kirillov
70904226e1 [LoopVectorize] Enhance Vectorization decisions for predicate tail-folded loops with low trip counts (#69588)
* Avoid using `CM_ScalarEpilogueNotAllowedLowTripLoop` for loops known
to be predicate tail-folded, delegating to `areRuntimeChecksProfitable`
to decide on the profitability of vectorizing loops with runtime checks.
* Update the `areRuntimeChecksProfitable` function to consider the
`ScalarEpilogueLowering` setting when assessing vectorization of a loop.

With this patch, we can make more informed decisions for loops with low
trip counts, especially when leveraging Profile-Guided Optimization
(PGO) data.
2023-10-30 13:43:26 +00:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Lou Knauer
852bac4439 [VPlan] Support scalable vectors in outer-loop vectorization
This patch enables scalable vectors in the VPlan-native path.
If a vectorization factor is specified via loop vectorization hints,
that factor is used. If no vectorization factor is specified, but the
target preferes scalable vectorization, a scalable vectorization factor
is selected.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D157484
2023-10-20 23:17:35 +01:00
Graham Hunter
1abc28fea0 [NFC][LV] Add test for vectorizing fmuladd with another call (#68601)
As requested in (#66521)

I confirmed a crash with "return" instead of "continue" in
setVectorizedCallDecision's fmuladd reduction recognition.
2023-10-20 10:23:31 +01:00
JolantaJensen
afdb18df4d [NFC][AArch64][LV] Reorganise LV tests using symbols from SLEEF (#68207)
The tests introduced by https://reviews.llvm.org/D134719 and later
modified in https://reviews.llvm.org/D146839 are not testing LV in
isolation. This patch:
  1. Assures that all tests test LV in isolation.
  2. Adds LV tests using llvm intrinsics that have libm mappings.

llrint, llround and lrint are not included as currently IR verifier pass
does not allow to use vector types with them.
2023-10-13 12:10:21 +01:00
Rin
df8e0d057d [AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697)
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.

We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
2023-10-09 16:26:19 +01:00