Commit Graph

1294 Commits

Author SHA1 Message Date
Alexey Bataev
7439e1b2de [SLP]Fix incorrect reordering of clustered scalars.
The new mask represents the order, not the mask itself. At first, need
to treat as the order, convert to mask and only after that reorder
gathered scalars to build correct clustered order.

Differential Revision: https://reviews.llvm.org/D141161
2023-01-06 16:04:09 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Valery N Dmitriev
6d677c0b3d [SLP] Unify GEP cost modeling for load, store and GEP nodes.
Make a separate routine for GEPs cost calculation and make
the approach uniform across load, store and GEP tree nodes.
Additional issue fixed is GEP cost savings were applied twice
for ScatterVectorize nodes (aka gather load) making them look
unrealistically profitable for vectorization.

Differential Revision: https://reviews.llvm.org/D140789
2023-01-05 10:11:36 -08:00
Nikita Popov
b061159e79 [SLPVectorizer] Convert test to opaque pointers (NFC) 2023-01-05 12:32:44 +01:00
Alexey Bataev
a1b18946f9 [SLP]Fix incorrect shuffle results because of missing shuffle mask
analysis.

Missed the analysis of the shuffle mask when trying to analyze the
operands of the shuffle instruction during peeking through shuffle
instructions.
2023-01-04 13:10:40 -08:00
Alexey Bataev
352b660c1b [SLP][NFC]Add a pass. 2023-01-04 10:30:48 -08:00
Alexey Bataev
53a858f7fc [SLP][NFC]Add a test for incorrect skipping of shuffle instruction at
peek-through-shuffles, NFC.
2023-01-04 10:17:03 -08:00
Nikita Popov
51ba34708d [SLPVectorizer] Convert test to opaque pointers (NFC) 2023-01-04 16:39:51 +01:00
Nikita Popov
8383da1583 [SLPVectorizer] Name instructions in test (NFC) 2023-01-04 16:35:45 +01:00
Nikita Popov
a34ae06c20 [SLPVectorizer] Convert some tests to opaque pointers (NFC) 2023-01-04 16:34:39 +01:00
Dinar Temirbulatov
55c600819f [SLP][AArch64] Incorrectly estimated intrinsic as a function call.
We incorrectly assume intrinsic as a function call and it prevents us from
the opportunity to vectorize. On Aarch64 Cortex-A53 we think that
llvm.fmuladd.f64 is a function call which is wrong.

Differential Revision: https://reviews.llvm.org/D140392
2023-01-03 19:45:24 +00:00
Alexey Bataev
26fec4e845 [SLP]Fix crash on casting non-instruction extractelement.
Need to check if the extractelement operation is an extraction before
trying to move it around the buildblocks to avoid crash on cast.
2023-01-03 09:45:57 -08:00
Dinar Temirbulatov
3c205efe8b [SLP][AArch64] Add fmuladd test coverage 2023-01-03 11:28:18 +00:00
Valery N Dmitriev
6bb4b2d002 [NFC] Test case intended to cover SLP cost for chain with masked gather loads.
SLP produces two gather loads (one feeds another).
For the first set of scalar loads GEP indices are all constant.
The result of the second load is then fed into reduction (as a seed).

Differential Revision: https://reviews.llvm.org/D140785
2022-12-30 12:27:34 -08:00
Alexey Bataev
5dccea5a68 [SLP]Do not emit many extractelements, reuse the single one emitted.
We do not need to emit many extractelements for each particular use, we
can reuse the only one, just need to adjust it to make it dominate on
all uses.

Differential Revision: https://reviews.llvm.org/D140580
2022-12-30 06:38:06 -08:00
Alexey Bataev
ac01ae71f0 [SLP]Use ShuffleInstructionBuilder for vector shrinking.
We can use ShuffleInstructionBuilder now for shrinking shuffle emission.
It allows to remove extra shuffle from the emitted code and reuse
original vector.

Part of D110978

Differential Revision: https://reviews.llvm.org/D140499
2022-12-28 06:09:04 -08:00
Alexey Bataev
a9b052e2ef [SLP]Fix PR59693: Do not crash trying to set insert point for buildvector
of extractvalues.

No need to get the last instruction only for vectorized extractvalues,
for gathered(buildvector sequence) still need to get the insertion
  point.
2022-12-27 06:01:38 -08:00
Nikita Popov
580210a0c9 [SLP] Convert some tests to opaque pointers (NFC) 2022-12-23 10:02:57 +01:00
Alexey Bataev
2e972ea056 [SLP]Integrate looking through shuffles logic into ShuffleInstructionBuilder.
Added BaseShuffleAnalysis as a base class for ShuffleInstructionBuilder
and integrated shuffle logic from shuffles for externally used scalars
into this class. This class is used as the main container that
implements smart shuffle instruction builder logic.
ShuffleInstructionBuilder uses this logic.
ShuffleInstructionBuilder is also used in building of the shuffle for
the externally used scalars instead of lambdas, which are now part of BaseShuffleAnalysis class.

Differential Revision: https://reviews.llvm.org/D140100
2022-12-21 06:12:53 -08:00
Sjoerd Meijer
5c94faba0b [TTI] [AArch64] getMemoryOpCost for ptr types
Opaque ptr types have a size in bits of 0. The legalised type is an i64 or
vector of i64s, which do have a size. Because of this difference in size, target
hook getMemoryOpCost modelled stores of ptr types as extending/truncating
load/stores. Now we just check for opaque ptr types and return the legalised
cost. This makes stores of pointers cheaper, and as a result we now SLP
vectorise the changed test case.

Differential Revision: https://reviews.llvm.org/D140193
2022-12-16 15:38:17 +00:00
Sjoerd Meijer
e909c3d31f [CostModel][AArch64] Precommit opaque ptr store tests. NFC. 2022-12-16 15:34:12 +00:00
Simon Pilgrim
90b02f6c63 [SLP][X86] slp-fma-loss.ll - add various targets with different FMA abilities
Add targets with FMA3, FMA4 and no-FMA support

Should help with D132872 testing
2022-12-09 11:46:06 +00:00
Bjorn Pettersson
3528e63d89 [test] Remove duplicate RUN lines in Transform tests 2022-12-08 11:47:16 +01:00
Roman Lebedev
59ffac7dd2 [NFC] Port all SLPVectorizer tests to -passes= syntax 2022-12-08 02:38:50 +03:00
Roman Lebedev
6697140ba1 [NFC] Port all SLPVectorizer tests to -passes= syntax 2022-12-07 21:44:09 +03:00
Alexey Bataev
0cc15050a4 [SLP]Fix PR59230: Use actual vector factor when sorting entries.
When we sort entries for attempting to reorder scalars, need to use
actual vectorization factor, not the number of scalars. Otherwise the
compiler crashes, if the scalars has to be reordered.

Differential Revision: https://reviews.llvm.org/D138819
2022-11-29 06:46:06 -08:00
Qiongsi Wu
f946c70130 [SLPVectorizer] Do Not Move Loads/Stores Beyond Stacksave/Stackrestore Boundaries
If left unchecked, the SLPVecrtorizer can move loads/stores below a stackrestore. The move can cause issues if the loads/stores have pointer operands from `alloca`s that are reset by the stackrestores. This patch adds the dependency check.

The check is conservative, in that it does not check if the pointer operands of the loads/stores are actually from `alloca`s that may be reset. We did not observe any SPECCPU2017 performance degradation so this simple fix seems sufficient.

The test could have been added to `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll`, but that test has not been updated to use opaque pointers. I am not inclined to add tests that still use typed pointers, or to refactor `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll` to use opaque pointers in this patch. If desired, I will open a different patch to refactor and consolidate the tests.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D138585
2022-11-28 10:00:29 -05:00
Alexey Bataev
ac93b61165 [SLP]Fix PR59098: check if the vector type is scalarized for
extractelements.

If the resulting type is going to be scalarized, no need to adjust the
cost of removed extractelement and insert/extract subvector costs.
Otherwise, the compiler can crash because of the wrong type sizes.
2022-11-21 10:26:01 -08:00
Alexey Bataev
07015e12f0 [SLP]Fix PR59053: trying to erase instruction with users.
Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.
2022-11-17 17:23:48 -08:00
Alexey Bataev
9f9fdab9f1 [SLP]Fix PR58766: deleted value used after vectorization.
If same instruction is reduced several times, but in one graph is part
of buildvector sequence and in another it is vectorized, we may loose
information that it was part of buildvector and must be extracted from
later vectorized value.
2022-11-16 10:57:03 -08:00
Alexey Bataev
2f8f17c157 [SLP]Fix PR58956: fix insertpoint for reduced buildvector graphs.
If the graph is only the buildvector node without main operation, need
to inherit insrtpoint from the redution instruction. Otherwise the
compiler crashes trying to insert instruction at the entry block.
2022-11-16 07:38:49 -08:00
Alexey Bataev
0a33ceee01 [SLP]Fix a crash on analysis of the vectorized node.
Need to use advanced check for the same vectorized node to avoid
possible compiler crash. We may have 2 similar nodes (vector one and
gather) after graph nodes rotation, need to do extra checks for the
exact match.
2022-11-15 13:40:28 -08:00
Roman Lebedev
8e37b53360 [X86] Rewrite getScalarizationOverhead()
All of our insert/extract ops work on 128-bit lanes.

For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.

But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.

Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.

`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
  // Note that in general, the insertion starting at the beginning of a vector
  // isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.

I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.

The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137913
2022-11-15 21:07:12 +03:00
Alexey Bataev
b505fd559d [SLP]Redesign vectorization of the gather nodes.
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Differential Revision: https://reviews.llvm.org/D135174
2022-11-10 10:59:54 -08:00
Alexey Bataev
563d03d65e [SLP][NFC]Add a test for vectorization with scheduling blocks order
different than the instruction order, NFC.
2022-11-10 10:12:51 -08:00
Alexey Bataev
b5d91ab73e [SLP]Fix PR58863: Mask index beyond mask size for non-power-2 insertelement analysis.
Need to check if the insertelement mask size is reached during cost analysis to avoid compiler crash.

Differential Revision: https://reviews.llvm.org/D137639
2022-11-08 07:54:57 -08:00
skc7
42bce72536 Reapply "[SLP] Extend reordering data of tree entry to support PHInodes".
Reapplies 87a2086 (which was reverted in 656f1d8).
Fix for scalable vectors in getInsertIndex merged in 46d53f4.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137537
2022-11-08 21:21:28 +05:30
Alexey Bataev
ecd0b5a532 Revert "[SLP]Redesign vectorization of the gather nodes."
This reverts commit 8ddd1ccdf8 to fix
buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839
2022-11-07 08:35:21 -08:00
Alexey Bataev
8ddd1ccdf8 [SLP]Redesign vectorization of the gather nodes.
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Differential Revision: https://reviews.llvm.org/D135174
2022-11-07 07:04:38 -08:00
David Green
0e9dfff37e [SLP][AArch64] Add a test case for SLP phi ordering of scalable vectors. NFC 2022-11-06 12:06:12 +00:00
David Green
656f1d8b74 Revert "[SLP] Extend reordering data of tree entry to support PHI nodes"
This reverts commit 87a20868eb as it has
problems with scalable vectors and use-list orders. Test to follow.
2022-11-06 11:43:51 +00:00
skc7
87a20868eb [SLP] Extend reordering data of tree entry to support PHI nodes
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D136757
2022-11-01 04:50:04 +00:00
Alexey Bataev
99f9bd4807 [SLP]Fix a crash in the analysis of the compatible cmp operands.
We can skip the analysis of the operands opcodes, can compare directly
them in some cases.
2022-10-31 09:47:25 -07:00
Alexey Bataev
2ec51f1c75 [SLP]Improve analysis of same/alternate code ops and scheduling.
Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531
2022-10-27 16:29:16 -07:00
Alexey Bataev
8ce0c7b1c9 Revert "[SLP]Improve analysis of same/alternate code ops and scheduling."
This reverts commit dad64448c6 to fix
a crash in https://lab.llvm.org/buildbot/#/builders/74/builds/14584
2022-10-27 15:21:35 -07:00
Alexey Bataev
dad64448c6 [SLP]Improve analysis of same/alternate code ops and scheduling.
Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531
2022-10-27 11:31:18 -07:00
skc7
e98501e27e [SLP][NFC] Added test to check resulting mask in shufflevector as per order of phinodes
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D136553
2022-10-23 17:20:47 +00:00
Alexey Bataev
456951dcd3 [SLP][NFC]Add a test for possible reordering gap in SLP, NFC. 2022-10-19 08:22:07 -07:00
Alexey Bataev
087dadfd37 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 11:55:59 -07:00
Alexey Bataev
62267e8de0 Revert "[SLP]Generalize cost model."
This reverts commit f12fb91188 and
f5c747bfbe to fix detected non-initialized
var use.
2022-10-18 11:25:59 -07:00