Commit Graph

1390 Commits

Author SHA1 Message Date
Alexey Bataev
95b631181a [SLP]Fix getSpillCost functions.
There are several issues in the current implementation. The instructions
are not properly ordered, if they are placed in different basic blocks,
need to reverse the order of blocks. Also, need to exclude
non-vectorizable nodes and check for CallBase, not CallInst, otherwise
invoke calls are not handled correctly.
2023-05-26 12:19:28 -07:00
Alexey Bataev
e892193cc8 [SLP][NFC]Add a test for spill cost, NFC. 2023-05-26 11:04:46 -07:00
Alexey Bataev
ae5ff3ca0c [SLP]Fix PR62665: compiler crash when trying to access non-existing mask
element.

Need to check at first if the SubMask element is PoisonMaskElem to avoid
compiler crash.
2023-05-22 13:43:25 -07:00
Luke Lau
c27a0b21c5 [SLP][RISCV] Account for offset folding in getPointersChainCost
For a GEP in a pointer chain, if:
1) a pointer chain is unit-strided
2) the base pointer wasn't folded and is sitting in a register somewhere
3) the distance between the GEP and the base pointer is small enough and
   can be folded into the addressing mode of the using load/store

Then we can exclude that GEP from the total cost of the pointer chain,
as it will likely be folded away.

In order to check if 3) holds, we need to know the type of memory access
being made by the users of the pointer chain. For that, we need to pass
along a new argument to getPointersChainCost. (Using the source pointer
type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for
more details).

Also note that 2) is currently an assumption, and could be modelled more
accurately.

This prevents some unprofitable cases from being SLP vectorized on
RISC-V by making the scalar costs cheaper and closer to the actual
codegen.

For now the getPointersChainCost hook is duplicated for RISC-V to prevent
disturbing other targets, but could be merged back in and shared with
other targets in a following patch.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D149654
2023-05-22 13:55:30 +01:00
Luke Lau
53afdb712d [SLP][RISCV] Add test for folding offsets in GEP pointer chains 2023-05-22 10:11:02 +01:00
Luke Lau
8288d39b4c [RISCV] Add test for unprofitable SLP vectorization
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D149653
2023-05-19 14:45:39 +01:00
Tobias Hieta
f84bac329b [NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4e
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Alexey Bataev
9a7248f561 [SLP]Fix crash for scalarized vectors.
Need to remove insertion of the nodes to the InVector in case of
scalarized vectors too to avoid compiler crashes.
2023-05-17 06:32:22 -07:00
ManuelJBrito
e335e8a432 [InstCombine] Update instcombine for vectorOps to use new shufflevector semantics
This patch updates the transformations in InstCombineVectorOps to use the new
hufflevector semantics that say that undefined values in the mask yield poison.

To prevent miscompilations we have to match with m_Poison instead of m_Undef.
Otherwise, we might introduce poison where there was previously undef.

Differential Revision: https://reviews.llvm.org/D150039
2023-05-17 07:56:45 +01:00
Alexey Bataev
b33b000ac8 [SLP][NFC]Add remark output to the test with the perfect diamond match
in vectorbuild nodes, NFC.
2023-05-05 08:19:54 -07:00
Alexey Bataev
c0e5e7db9a [SLP]Fix a crash trying finding insert point for GEP nodes with non-gep
insts.

If the vectorizable GEP node is built, which should not be scheduled,
and at least one node is a non-gep instruction, need to insert the
vectorized instructions before the last instruction in the list, not
before the first one, otherwise the instructions may be emitted in the
wrong order.
2023-05-04 09:43:37 -07:00
Krzysztof Drewniak
f0415f2a45 Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.

This reverts commit 3f2fbe92d0.

Differential Revision: https://reviews.llvm.org/D149776
2023-05-03 19:43:56 +00:00
Krzysztof Drewniak
3f2fbe92d0 Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede254.

Differential Revision: https://reviews.llvm.org/D149758
2023-05-03 16:11:00 +00:00
Krzysztof Drewniak
f9c1ede254 [AMDGPU] Define data layout entries for buffers
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.

The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.

The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.

Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.

This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.

Depends on D143437

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145441
2023-05-03 15:25:58 +00:00
Alexey Bataev
f305cafc58 [SLP][NFC]Add a test with the reshuffled nodes in buildvector nodes,
NFC.
2023-05-02 13:51:45 -07:00
Alexey Bataev
cf792f664a [SLP]Fix a crash for the replaced vectorized value.
If two nodes share the same value, which is replaced in one of the
nodes, need to automatically replace same value in all nodes. Btter to
use WeakTrackingVH for this to fix compiler crash.
2023-04-27 09:32:00 -07:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Alexey Bataev
b1abc2beaf [SLP]Fix PR58616: assert for gep nodes with different basic blocks.
Need to relax the assertion check in the FindFirstInst lambda for GEP
nodes with non-GEP instruction to avoid compiler crash.
2023-04-24 07:41:06 -07:00
Jay Foad
593e25ffae [Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.

Differential Revision: https://reviews.llvm.org/D148905
2023-04-24 13:42:08 +01:00
Jay Foad
3237497d01 [Vectorize] Pre-commit tests for D148905
Differential Revision: https://reviews.llvm.org/D149050
2023-04-24 13:42:08 +01:00
Simon Pilgrim
aca5f9aeea [CostModel][X86] getMemoryOpCost - increase cost of sub-32-bit vector load/stores
For 8-bit/16-bit vector loads/stores we scalarize and transfer to/from the vector unit, or use the (usually slow) PINSR/PEXTR instructions.

Fixes #59867
2023-04-23 21:48:25 +01:00
Simon Pilgrim
97927c380f [SLP][X86] Add test coverage for Issue #59867 2023-04-23 21:20:44 +01:00
Alexey Bataev
851a12138a [SLP]Fix the cost for the extractelements, used in several nodes.
Currently the compiler calculates the compensation cost for the
extractelements, removed during vectorization. But if the extractelement
instruction is used in several nodes, we can calculate the compensation
for them several times.

Differential Revision: https://reviews.llvm.org/D148806
2023-04-21 09:05:03 -07:00
Alexey Bataev
403bd583a8 [SLP]Fix a crash on scalarized vectors.
Need to register in-vector for scalarized types to avoid crash in
further analysis.
2023-04-21 08:13:48 -07:00
Alexey Bataev
ecc204b64e [SLP][NFC]Add a test with an extra cost of the reused extractelement
instruction, NFC.
2023-04-20 13:27:48 -07:00
Simon Pilgrim
4060042384 [CostModel][X86] Improve i8 and vXi8 MUL costs
We were treating vXi8 multiply as the sum of a trunc(mul(extend(),extend())) which diverged from the costs from llvm-mcaonce we extended beyond legal types

Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates

Helps address some of the regressions identified in D148806
2023-04-20 19:38:51 +01:00
Alexey Bataev
0e1312fbe0 [SLP][X86]Fix the cost of reused gathers/buildvectors and floats insert.
There are 2 problems in the cost estimation for buildvector/gather.
1. If the buildvector/gather node is the same as another one node, need
   to estimate the cost of this node as 0.
2. The cost of inserting float point register to non-poison vector is
   not 0, it should not be considered free.

Differential Revision: https://reviews.llvm.org/D148801
2023-04-20 09:34:46 -07:00
Vasileios Porpodas
a72bcc1252 [SLP][NFC] Test showing a cost estimation issue caused by f82eb7e066
The buildvector cost for the case shown in the test should be 0 but it is -1, causing the code to get vectorized, whenit shouldn't.

Differential Revision: https://reviews.llvm.org/D148732
2023-04-19 14:32:16 -07:00
Alexey Bataev
8cf0290c4a [SLP]Fix cost estimation for buildvectors with extracts and/or constants.
If the partial matching is found and some other scalars must be
inserted, need to account the cost of the extractelements, transformed
to shuffles, and/or reused entries and calculate the cost of inserting
constants properly into the non-poison vectors.
Also, fixed the cost calculation for final gather/buildvector sequence.

Differential Revision: https://reviews.llvm.org/D148362
2023-04-19 05:54:58 -07:00
Alexey Bataev
1ce4b26a21 [SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions.
Implemented the reshuffling in finalize member function + add basic
support for add member functions, used during vector build.

Part of D110978

Differential Revision: https://reviews.llvm.org/D148279
2023-04-18 11:52:04 -07:00
Alexey Bataev
d7a40a447f Revert "[SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions."
This reverts commit cd341f3f48 to fix
a crash revealed by buildbot https://lab.llvm.org/buildbot#builders/124/builds/7108.
2023-04-18 10:41:00 -07:00
Alexey Bataev
cd341f3f48 [SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions.
Implemented the reshuffling in finalize member function + add basic
support for add member functions, used during vector build.

Part of D110978

Differential Revision: https://reviews.llvm.org/D148279
2023-04-18 05:51:23 -07:00
Alexey Bataev
f82eb7e066 [SLP]Introduce gather cost estimation function.
Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial
implementation of the gather/buildvector cost estimation for buildvector
nodes. It will allow to use general codegen infrastructure for better
cost estimation + it improves the cost estimation for the
gathers/buildvectors.

Improved part of D110978.

Differential Revision: https://reviews.llvm.org/D148174
2023-04-13 10:16:00 -07:00
Simon Pilgrim
b3480d5ede [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support).

Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
2023-04-13 17:00:39 +01:00
Alexey Bataev
b28f407df9 [SLP]Improve reduction cost model for scalars.
Instead of abstract cost of the scalar reduction ops, try to use the
cost of actual reduction operation instructions, where possible. Also,
remove the estimation of the vectorized GEPs pointers for reduced loads,
since it is already handled in the tree.

Differential Revision: https://reviews.llvm.org/D148036
2023-04-12 11:32:51 -07:00
Simon Pilgrim
162284b2e1 [SLP][X86] Add SSE4 test coverage to minmax reduction tests
Improve coverage for D148036
2023-04-12 17:41:31 +01:00
Simon Pilgrim
63c3895327 [TTI][X86] getMinMaxCost - use existing integer min/max intrinsic cost values instead of maintaining a duplicate cost table
getMinMaxCost has an alternative set of min/max costs to getIntrinsicInstrCost that are only used by getMinMaxReductionCost, but are a lot less thorough and fallback to an expansion in most cases resulting in cost overestimations - we're better off just using getIntrinsicInstrCost.

getIntrinsicInstrCost is still missing complete FMINNUM/FMAXNUM costs, so until then getMinMaxCost will still be used for these, after that we can remove getMinMaxCost and have getMinMaxReductionCost call getIntrinsicInstrCost directly.

Fixes regression noticed in D148036
2023-04-12 15:33:12 +01:00
Sjoerd Meijer
d827865e9f Recommit "[AArch64][TTI] Cost model FADD/FSUB/FNEG""
Fixed two test cases that relied on Asserts, and added a fallthrough
annotation to the switch case.
2023-04-11 12:48:15 +01:00
Sjoerd Meijer
4876f43ea9 Revert "[AArch64][TTI] Cost model FADD/FSUB/FNEG"
This reverts commit d0027e0be9.

Need to look at 2 test failures.
2023-04-11 10:14:40 +01:00
Sjoerd Meijer
d0027e0be9 [AArch64][TTI] Cost model FADD/FSUB/FNEG
This lowers the cost for FADD, FSUB, and FNEG. The motivation is to avoid
over-eager SLP vectorisation, that makes it look like SLP vectorisation is
profitable but results in significant slow downs. Lowering the cost for scalar
FADD/FSUB costs helps the profitability decision to favour the scalar
version where vectorisation isn't beneficial.

Lowering the cost for these floating point operations makes sense because a lot
of other instructions including many shuffles have only a cost of 1; these
FADD/FSUB/FNEG instructions should not be twice the cost.

Performance results show a 7% improvement for Imagick from SPEC FP 2017, a
small improvement in Blender, and unchanged results for the other apps in SPEC.
RAJAPerf is neutral and mostly shows no changes.

Differential Revision: https://reviews.llvm.org/D146033
2023-04-11 09:46:14 +01:00
Alexey Bataev
50af6ab0ab [SLP]Fix emission of the masks in shuffles for undefs.
If the value is used in the expression, need to adjust the mask before
applying the mask. Plus, need to fix the analysis of the phi nodes for
reused scalars.
2023-04-06 10:16:58 -07:00
Alexey Bataev
cf62adbbd8 [SLP]Fix delete of the extractelement with users.
Made the condition for the erasing of the gathered extractelements
stricter, remove it only if it has single vectorized use, otherwise
leave it for instcombiner/instsimplify analysis.
2023-04-06 09:15:30 -07:00
Alexey Bataev
40105a9933 [SLP]Find reused scalars in buildvector sequences, if any.
Patch generalizes analysis of scalars. The main part is outlined into
lambda, which can be used to find reused inserted scalars and emit
shuffle for them instead of multiple insertelement instructions, if the
permutation is found alreadyi. I.e. some scalars are transformed by the
permutation of previously vectorized nodes, and some are inserted
directly.

Reworked part of D110978

Differential Revision: https://reviews.llvm.org/D146564
2023-04-05 09:37:05 -07:00
Alexey Bataev
c1660006b2 [SLP]Reorder counters for same values, if the root node is reordered.
The counters for the repeated scalars are ordered in the natural order,
but the original scalars might be reordered during SLP graph reordering
and this order can be dropped. Need to use the scalars after the
reordering, not the original ones, to emit correct code for same value
counters.
2023-04-03 07:52:49 -07:00
Alexey Bataev
367db8bf6a [SLP][NFC]Add a test for reordered scalars with not reordered reuse coefficient. 2023-04-03 07:15:58 -07:00
Alexey Bataev
c1bcf5dd0a [SLP]Fix PR61835: Assertion `I->use_empty() && "trying to erase
instruction with users."' failed.

If the externally used scalar is part of the tree and is replaced by
extractelement instruction, need to add generated extractelement
instruction to the list of the ExternallyUsedValues to avoid deletion
during vectorization.
2023-03-31 14:21:19 -07:00
Guozhi Wei
a72162cc52 [AARCH64] Enable STORE of v4i8 to help more vectorization opportunities
For the attached test case, currently llvm generates instructions to load/or/store the bytes one by one. Although NEON doesn't support v4i8 natively, we can promote it to v4i16 and operate on v4i16 vectors. So this patch override getStoreMinimumVF and specify the minimum VF for i8 vector is v4i8.

Differential Revision: https://reviews.llvm.org/D145614
2023-03-31 17:03:06 +00:00
Alexey Bataev
9255124a07 [SLP]Fix a crash when trying to shuffle multiple nodes.
Need to transform mask after applying shuffle using the mask itself as
a base to correctly mark with identity those indices, actually used in
previous shuffle. Allows to fix a crash, if different sized vectors are
shuffled.
2023-03-30 09:32:11 -07:00
Zain Jaffal
4d7d454334 [SLP][AArch64] Add test to check for the vectorization of fshl
Currently the cost for fshl is an overestimate causing SLP to vectorize when it is not necessary.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D147056
2023-03-28 17:46:33 +01:00
Florian Hahn
417fe52e6f Revert "[SLP] Check with target before vectorizing GEP Indices."
This reverts commit 1387a13e1d.

This introduced performance regressions on AArch64, when the cost of a
vector GEP + extracts is offset by the benefits of vectorizing the rest
of the tree.

The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll
illustrates the issue. It was extracted from code that regressed a SPEC
benchmark by 15%.
2023-03-28 08:06:53 +01:00