Commit Graph

3535 Commits

Author SHA1 Message Date
Alexey Bataev
755282ec1e [SLP][NFC]Move getExtractIndex function for future changes, NFC. 2023-01-09 09:53:01 -08:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Florian Hahn
78914e8c32 [VPlan] Keep entries in worklist in sinkScalarOperands.
Not removing the entries ensures that duplicates are avoided,
reducing the number of iterations.
2023-01-08 15:52:57 +00:00
Alexey Bataev
996ad44b97 [SLP][NFC]Fix compile build by declaring ArrayRef, NFC.
Fix compiler build reported in https://lab.llvm.org/buildbot#builders/243/builds/218
2023-01-06 17:01:48 -08:00
Alexey Bataev
cc17e93178 [SLP][NFC]Remove unused variables, NFC. 2023-01-06 16:55:54 -08:00
Alexey Bataev
7439e1b2de [SLP]Fix incorrect reordering of clustered scalars.
The new mask represents the order, not the mask itself. At first, need
to treat as the order, convert to mask and only after that reorder
gathered scalars to build correct clustered order.

Differential Revision: https://reviews.llvm.org/D141161
2023-01-06 16:04:09 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Florian Hahn
68469a80cb [LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
Valery N Dmitriev
6d677c0b3d [SLP] Unify GEP cost modeling for load, store and GEP nodes.
Make a separate routine for GEPs cost calculation and make
the approach uniform across load, store and GEP tree nodes.
Additional issue fixed is GEP cost savings were applied twice
for ScatterVectorize nodes (aka gather load) making them look
unrealistically profitable for vectorization.

Differential Revision: https://reviews.llvm.org/D140789
2023-01-05 10:11:36 -08:00
serge-sans-paille
38818b60c5 Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
David Green
586fd86b0a [LoopVectorizer] Fix inloop reductions mask placement
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.

Differential Revision: https://reviews.llvm.org/D141003
2023-01-05 11:37:37 +00:00
Augie Fackler
0676156f81 Revert "[VPlan] Also consider operands of sink candidates in same block."
This reverts commit aa2414729e.

Previously-valid IR from a tensorflow test case (as shown on the
Diffusion revision for aa2414729e) started
hanging in the loop-vectorize pass. Reverting to keep everyone working.
2023-01-04 16:17:13 -05:00
Alexey Bataev
a1b18946f9 [SLP]Fix incorrect shuffle results because of missing shuffle mask
analysis.

Missed the analysis of the shuffle mask when trying to analyze the
operands of the shuffle instruction during peeking through shuffle
instructions.
2023-01-04 13:10:40 -08:00
Dinar Temirbulatov
55c600819f [SLP][AArch64] Incorrectly estimated intrinsic as a function call.
We incorrectly assume intrinsic as a function call and it prevents us from
the opportunity to vectorize. On Aarch64 Cortex-A53 we think that
llvm.fmuladd.f64 is a function call which is wrong.

Differential Revision: https://reviews.llvm.org/D140392
2023-01-03 19:45:24 +00:00
Alexey Bataev
26fec4e845 [SLP]Fix crash on casting non-instruction extractelement.
Need to check if the extractelement operation is an extraction before
trying to move it around the buildblocks to avoid crash on cast.
2023-01-03 09:45:57 -08:00
Florian Hahn
ce1be13a86 [VPlan] Use VP_CLASSOF_IMPL for VPWidenCanonicalIVRecipe(NFC).
Replace VPWidenCanonicalIVRecipe::classof implementation with general
VP_CLASSOF_IMPL.
2023-01-02 17:52:13 +00:00
Florian Hahn
64f1d845b3 [VPlan] Use VP_CLASSOF_IMPL for VPWidenMemoryInstructionRecipe (NFC).
Replace VPWidenMemoryInstructionRecipe ::classof implementation with general
VP_CLASSOF_IMPL.
2023-01-02 17:32:31 +00:00
Florian Hahn
2d6d47f807 [VPlan] Use VP_CLASSOF_IMPL for VPPredInstPHI (NFC).
Replace VPPredInstPHI::classof implementation with general
VP_CLASSOF_IMPL.
2023-01-02 17:22:34 +00:00
Florian Hahn
89718815c6 [VPlan] Adjust mergeReplicateRegions to be in line with mergeBlock (NFC)
Adjust mergeReplicateRegions to be in line with
mergeBlocksIntoPredecessors added in 36d70a6aea by collecting only the
valid candidates first.

Also rename to mergeReplicateRegionsIntoSuccessors and add missing
doc-comment.

This addresses post-commit suggestions by @Ayal.
2023-01-01 19:48:49 +00:00
Florian Hahn
cd16a3f04c [VPlan] Move GraphTraits definitions to separate header (NFC).
This reduces the size of VPlan.h and avoids future growth of the file
when the graph traits are extended in future patches.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140500
2022-12-31 15:14:57 +00:00
Florian Hahn
aa2414729e [VPlan] Also consider operands of sink candidates in same block.
Even if the the sink candidate is already in the target block, its
operands can be candidates for sinking. Queue them up as well. Also
moves the queuing logic to a helper.
2022-12-30 18:24:35 +00:00
Alexey Bataev
5dccea5a68 [SLP]Do not emit many extractelements, reuse the single one emitted.
We do not need to emit many extractelements for each particular use, we
can reuse the only one, just need to adjust it to make it dominate on
all uses.

Differential Revision: https://reviews.llvm.org/D140580
2022-12-30 06:38:06 -08:00
Valery N Dmitriev
ad956ed568 [SLP] Fix debug print for cost in tryToVectorizeList - NFC.
Actual VF was confused with local variable named "VF".
2022-12-29 11:30:10 -08:00
Valery N Dmitriev
8eb3698b94 [SLP] A couple of minor improvements for slp graph view - NFC.
Show ScatterVectorize nodes in frames of blue color
and print vectorize tree indices.
2022-12-29 11:02:36 -08:00
Alexey Bataev
ac01ae71f0 [SLP]Use ShuffleInstructionBuilder for vector shrinking.
We can use ShuffleInstructionBuilder now for shrinking shuffle emission.
It allows to remove extra shuffle from the emitted code and reuse
original vector.

Part of D110978

Differential Revision: https://reviews.llvm.org/D140499
2022-12-28 06:09:04 -08:00
Michael Maitland
396b0b2b13 [LV] Remove duplicate name set of vector header basic block. NFC
The preheader was named explicitly in 256c6b0ba1
which makes setting the name in prior commit 95b2aa511e
unnecessary.

Differential Revision: https://reviews.llvm.org/D140246
2022-12-27 17:19:08 -08:00
Florian Hahn
e91e62db14 [LV] Sink scalar operands and merge regions repeatedly.
Merging regions can enable new sinking opportunities (e.g. if users of a
scalar value are moved from different VPBBs into the same VPBB). Sinking
in turn can also enable new merging opportunities (e.g. if a recipe
between to merge-able regions is moved.

To enable more sinking opportunities, repeat sinking & merging if
regions could be merged.

Also fix mergeReplicateRegions to return the correct Changed status.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139788
2022-12-27 18:08:32 +00:00
Alexey Bataev
a9b052e2ef [SLP]Fix PR59693: Do not crash trying to set insert point for buildvector
of extractvalues.

No need to get the last instruction only for vectorized extractvalues,
for gathered(buildvector sequence) still need to get the insertion
  point.
2022-12-27 06:01:38 -08:00
Florian Hahn
36d70a6aea [VPlan] Remove redundant blocks by merging them into predecessors.
Add and run VPlan transform to fold blocks with a single predecessor
into the predecessor. This remove redundant blocks and addresses a TODO
to replace special handling for the vector latch VPBB.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139927
2022-12-26 22:47:09 +00:00
Florian Hahn
435e220ba6 [VPlan] Use VPBB in sinkScalarOperands directly. (NFC)
Suggested by @Ayal in D139790.
2022-12-25 21:34:59 +00:00
Florian Hahn
9758242046 [LV] Use SCEV to check if the trip count <= VF * UF.
Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.

The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once in more cases.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133017
2022-12-24 18:34:54 +00:00
Florian Hahn
e1650c8d52 [LV] Move exit cond simplification to separate transform.
This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.

The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.

Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D135017
2022-12-23 12:51:21 +00:00
Florian Hahn
b7b1e5c96f [LV] Assert that the executed plan contains selected VF & UF (NFC).
Add assertion to ensure the executed plan is valid for the selected VF
and UF.
2022-12-23 11:44:42 +00:00
Florian Hahn
5df34e971d [VPlan] Add support for tracking UFs applicable to VPlan (NFC).
Explicitly track the UFs supported in a VPlan. This is needed to
allow transformations to restrict the UFs which are supported.

Discussed as separate improvement in D135017.
2022-12-22 18:58:25 +00:00
Florian Hahn
96296922b6 [VPlan] Move VF and UF string generation to getName() (NFC).
The VFs and UFs may be more constrained as the plans are transformed
(e.g. see D135017 for an example).

To make sure the VFs/UFs included in the VPlan dump are accurate,
generate them when accessing a plan's name, rather than include them in
the name string set after initial construction.
2022-12-22 13:15:01 +00:00
Mircea Trofin
946831ea2d [NFC] Rename Function::isDebugInfoForProfiling to shouldEmit[...]
The function name was misleading - the expectation set both by the name
and by other members of Function (like isDeclaration or isIntrinsic)
would be that the function somehow would "be" "debug info for
profiling". But that's not the case - the property indicates (as the
comment over the declaration also explains) whether debug info should be
emitted (for profiling).
2022-12-21 18:36:59 -08:00
Florian Hahn
a84064bcda [LV] Add createTripCountSCEV helper (NFC).
Split off helper function in preparation for D135017.
2022-12-21 22:02:31 +00:00
Florian Hahn
7d8528dbf2 [LV] Move SCEV caching workaround to executePlan (NFC).
As suggested by @Ayal in D92132.

This avoids having to duplicate the workaround in multiple places.
2022-12-21 14:51:21 +00:00
Alexey Bataev
2e972ea056 [SLP]Integrate looking through shuffles logic into ShuffleInstructionBuilder.
Added BaseShuffleAnalysis as a base class for ShuffleInstructionBuilder
and integrated shuffle logic from shuffles for externally used scalars
into this class. This class is used as the main container that
implements smart shuffle instruction builder logic.
ShuffleInstructionBuilder uses this logic.
ShuffleInstructionBuilder is also used in building of the shuffle for
the externally used scalars instead of lambdas, which are now part of BaseShuffleAnalysis class.

Differential Revision: https://reviews.llvm.org/D140100
2022-12-21 06:12:53 -08:00
Florian Hahn
f69ac9a22d [LV] Support widened induction variables in epilogue vectorization.
Code generation now uses the start VPValue of induction recipes.

This makes it possible to adjust the start value of the epilogue
vector loop to use the 'resume' value of the main vector loop.

Fixes #59459.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D92132
2022-12-21 13:58:50 +00:00
Kazu Hirata
c08fad8193 [llvm] Remove redundant initialization of std::optional (NFC) 2022-12-20 15:53:38 -08:00
Florian Hahn
41b45ce656 [LV] Remove unused AAResults argument (NFC).
AAResults is passed to LoopVectorizationLegality but no longer used.
Remove the dead code.
2022-12-19 20:37:47 +00:00
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
Florian Hahn
08f16a8217 [VPlan] Use macro to define recipe classof implementation (NFC).
Add a VP_CLASSOF_IMPL macro to define common classof implementations for
recipes. This reduces duplication and also adds missing implementations
to existing recipes.
2022-12-16 17:52:15 +00:00
Kazu Hirata
6eb0b0a045 Don't include Optional.h
These files no longer use llvm::Optional.
2022-12-14 21:16:22 -08:00
Florian Hahn
e898479f2b [VPlan] Sink non-uniform recieps for scalar plans.
In scalar plans, replicate recipes will only generate a single value per
UF, independent of whether they are uniform or not. So don't consider
uniformity for plans with scalar VFs only.

This allows us to handle a few additional cases in VPlan sinking instead
of non-VPlan sinkScalarOperands.

Depends on D133762.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D134218
2022-12-14 17:55:31 +00:00
Fangrui Song
d4b6fcb32e [Analysis] llvm::Optional => std::optional 2022-12-14 07:32:24 +00:00
Alexey Bataev
ecac8192db [SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC.
The patch redesigns ShuffleInstructionBuilder so it could later be used
for reshuffling of the buildvector sequences and vectorized parts of
  externally used scalars. Also will allow to generalize cost model for
  the gathers/buildvectors.

Part of D110978.

Differential Revision: https://reviews.llvm.org/D139718
2022-12-13 09:37:18 -08:00
Fangrui Song
1ec11d2d48 [Transforms/Vectorize] llvm::Optional => std::optional 2022-12-12 08:56:35 +00:00
Fangrui Song
c178ed33bd Transforms/Utils: llvm::Optional => std::optional 2022-12-12 08:29:05 +00:00