clang-p2996

Author	SHA1	Message	Date
Alexey Bataev	95b631181a	[SLP]Fix getSpillCost functions. There are several issues in the current implementation. The instructions are not properly ordered, if they are placed in different basic blocks, need to reverse the order of blocks. Also, need to exclude non-vectorizable nodes and check for CallBase, not CallInst, otherwise invoke calls are not handled correctly.	2023-05-26 12:19:28 -07:00
Alexey Bataev	e892193cc8	[SLP][NFC]Add a test for spill cost, NFC.	2023-05-26 11:04:46 -07:00
Alexey Bataev	ae5ff3ca0c	[SLP]Fix PR62665: compiler crash when trying to access non-existing mask element. Need to check at first if the SubMask element is PoisonMaskElem to avoid compiler crash.	2023-05-22 13:43:25 -07:00
Luke Lau	c27a0b21c5	[SLP][RISCV] Account for offset folding in getPointersChainCost For a GEP in a pointer chain, if: 1) a pointer chain is unit-strided 2) the base pointer wasn't folded and is sitting in a register somewhere 3) the distance between the GEP and the base pointer is small enough and can be folded into the addressing mode of the using load/store Then we can exclude that GEP from the total cost of the pointer chain, as it will likely be folded away. In order to check if 3) holds, we need to know the type of memory access being made by the users of the pointer chain. For that, we need to pass along a new argument to getPointersChainCost. (Using the source pointer type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for more details). Also note that 2) is currently an assumption, and could be modelled more accurately. This prevents some unprofitable cases from being SLP vectorized on RISC-V by making the scalar costs cheaper and closer to the actual codegen. For now the getPointersChainCost hook is duplicated for RISC-V to prevent disturbing other targets, but could be merged back in and shared with other targets in a following patch. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D149654	2023-05-22 13:55:30 +01:00
Luke Lau	53afdb712d	[SLP][RISCV] Add test for folding offsets in GEP pointer chains	2023-05-22 10:11:02 +01:00
Luke Lau	8288d39b4c	[RISCV] Add test for unprofitable SLP vectorization Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D149653	2023-05-19 14:45:39 +01:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to `b71edfaa4e` since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
Alexey Bataev	9a7248f561	[SLP]Fix crash for scalarized vectors. Need to remove insertion of the nodes to the InVector in case of scalarized vectors too to avoid compiler crashes.	2023-05-17 06:32:22 -07:00
ManuelJBrito	e335e8a432	[InstCombine] Update instcombine for vectorOps to use new shufflevector semantics This patch updates the transformations in InstCombineVectorOps to use the new hufflevector semantics that say that undefined values in the mask yield poison. To prevent miscompilations we have to match with m_Poison instead of m_Undef. Otherwise, we might introduce poison where there was previously undef. Differential Revision: https://reviews.llvm.org/D150039	2023-05-17 07:56:45 +01:00
Alexey Bataev	b33b000ac8	[SLP][NFC]Add remark output to the test with the perfect diamond match in vectorbuild nodes, NFC.	2023-05-05 08:19:54 -07:00
Alexey Bataev	c0e5e7db9a	[SLP]Fix a crash trying finding insert point for GEP nodes with non-gep insts. If the vectorizable GEP node is built, which should not be scheduled, and at least one node is a non-gep instruction, need to insert the vectorized instructions before the last instruction in the list, not before the first one, otherwise the instructions may be emitted in the wrong order.	2023-05-04 09:43:37 -07:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit `3f2fbe92d0`. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit `f9c1ede254`. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Alexey Bataev	f305cafc58	[SLP][NFC]Add a test with the reshuffled nodes in buildvector nodes, NFC.	2023-05-02 13:51:45 -07:00
Alexey Bataev	cf792f664a	[SLP]Fix a crash for the replaced vectorized value. If two nodes share the same value, which is replaced in one of the nodes, need to automatically replace same value in all nodes. Btter to use WeakTrackingVH for this to fix compiler crash.	2023-04-27 09:32:00 -07:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Alexey Bataev	b1abc2beaf	[SLP]Fix PR58616: assert for gep nodes with different basic blocks. Need to relax the assertion check in the FindFirstInst lambda for GEP nodes with non-GEP instruction to avoid compiler crash.	2023-04-24 07:41:06 -07:00
Jay Foad	593e25ffae	[Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass llvm.is.fpclass is different from other vectorizable intrinsics in that it is overloaded on an argument type, not on the return type. Differential Revision: https://reviews.llvm.org/D148905	2023-04-24 13:42:08 +01:00
Jay Foad	3237497d01	[Vectorize] Pre-commit tests for D148905 Differential Revision: https://reviews.llvm.org/D149050	2023-04-24 13:42:08 +01:00
Simon Pilgrim	aca5f9aeea	[CostModel][X86] getMemoryOpCost - increase cost of sub-32-bit vector load/stores For 8-bit/16-bit vector loads/stores we scalarize and transfer to/from the vector unit, or use the (usually slow) PINSR/PEXTR instructions. Fixes #59867	2023-04-23 21:48:25 +01:00
Simon Pilgrim	97927c380f	[SLP][X86] Add test coverage for Issue #59867	2023-04-23 21:20:44 +01:00
Alexey Bataev	851a12138a	[SLP]Fix the cost for the extractelements, used in several nodes. Currently the compiler calculates the compensation cost for the extractelements, removed during vectorization. But if the extractelement instruction is used in several nodes, we can calculate the compensation for them several times. Differential Revision: https://reviews.llvm.org/D148806	2023-04-21 09:05:03 -07:00
Alexey Bataev	403bd583a8	[SLP]Fix a crash on scalarized vectors. Need to register in-vector for scalarized types to avoid crash in further analysis.	2023-04-21 08:13:48 -07:00
Alexey Bataev	ecc204b64e	[SLP][NFC]Add a test with an extra cost of the reused extractelement instruction, NFC.	2023-04-20 13:27:48 -07:00
Simon Pilgrim	4060042384	[CostModel][X86] Improve i8 and vXi8 MUL costs We were treating vXi8 multiply as the sum of a trunc(mul(extend(),extend())) which diverged from the costs from llvm-mcaonce we extended beyond legal types Use a modified version of the D103695 script to determine more accurate throughput/latency/codesize/size-latency cost estimates Helps address some of the regressions identified in D148806	2023-04-20 19:38:51 +01:00
Alexey Bataev	0e1312fbe0	[SLP][X86]Fix the cost of reused gathers/buildvectors and floats insert. There are 2 problems in the cost estimation for buildvector/gather. 1. If the buildvector/gather node is the same as another one node, need to estimate the cost of this node as 0. 2. The cost of inserting float point register to non-poison vector is not 0, it should not be considered free. Differential Revision: https://reviews.llvm.org/D148801	2023-04-20 09:34:46 -07:00
Vasileios Porpodas	a72bcc1252	[SLP][NFC] Test showing a cost estimation issue caused by `f82eb7e066` The buildvector cost for the case shown in the test should be 0 but it is -1, causing the code to get vectorized, whenit shouldn't. Differential Revision: https://reviews.llvm.org/D148732	2023-04-19 14:32:16 -07:00
Alexey Bataev	8cf0290c4a	[SLP]Fix cost estimation for buildvectors with extracts and/or constants. If the partial matching is found and some other scalars must be inserted, need to account the cost of the extractelements, transformed to shuffles, and/or reused entries and calculate the cost of inserting constants properly into the non-poison vectors. Also, fixed the cost calculation for final gather/buildvector sequence. Differential Revision: https://reviews.llvm.org/D148362	2023-04-19 05:54:58 -07:00
Alexey Bataev	1ce4b26a21	[SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions. Implemented the reshuffling in finalize member function + add basic support for add member functions, used during vector build. Part of D110978 Differential Revision: https://reviews.llvm.org/D148279	2023-04-18 11:52:04 -07:00
Alexey Bataev	d7a40a447f	Revert "[SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions." This reverts commit `cd341f3f48` to fix a crash revealed by buildbot https://lab.llvm.org/buildbot#builders/124/builds/7108.	2023-04-18 10:41:00 -07:00
Alexey Bataev	cd341f3f48	[SLP]Add final resize to ShuffleCostEstimator::finalize member function and basic add member functions. Implemented the reshuffling in finalize member function + add basic support for add member functions, used during vector build. Part of D110978 Differential Revision: https://reviews.llvm.org/D148279	2023-04-18 05:51:23 -07:00
Alexey Bataev	f82eb7e066	[SLP]Introduce gather cost estimation function. Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial implementation of the gather/buildvector cost estimation for buildvector nodes. It will allow to use general codegen infrastructure for better cost estimation + it improves the cost estimation for the gathers/buildvectors. Improved part of D110978. Differential Revision: https://reviews.llvm.org/D148174	2023-04-13 10:16:00 -07:00
Simon Pilgrim	b3480d5ede	[SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support). Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel	2023-04-13 17:00:39 +01:00
Alexey Bataev	b28f407df9	[SLP]Improve reduction cost model for scalars. Instead of abstract cost of the scalar reduction ops, try to use the cost of actual reduction operation instructions, where possible. Also, remove the estimation of the vectorized GEPs pointers for reduced loads, since it is already handled in the tree. Differential Revision: https://reviews.llvm.org/D148036	2023-04-12 11:32:51 -07:00
Simon Pilgrim	162284b2e1	[SLP][X86] Add SSE4 test coverage to minmax reduction tests Improve coverage for D148036	2023-04-12 17:41:31 +01:00
Simon Pilgrim	63c3895327	[TTI][X86] getMinMaxCost - use existing integer min/max intrinsic cost values instead of maintaining a duplicate cost table getMinMaxCost has an alternative set of min/max costs to getIntrinsicInstrCost that are only used by getMinMaxReductionCost, but are a lot less thorough and fallback to an expansion in most cases resulting in cost overestimations - we're better off just using getIntrinsicInstrCost. getIntrinsicInstrCost is still missing complete FMINNUM/FMAXNUM costs, so until then getMinMaxCost will still be used for these, after that we can remove getMinMaxCost and have getMinMaxReductionCost call getIntrinsicInstrCost directly. Fixes regression noticed in D148036	2023-04-12 15:33:12 +01:00
Sjoerd Meijer	d827865e9f	Recommit "[AArch64][TTI] Cost model FADD/FSUB/FNEG"" Fixed two test cases that relied on Asserts, and added a fallthrough annotation to the switch case.	2023-04-11 12:48:15 +01:00
Sjoerd Meijer	4876f43ea9	Revert "[AArch64][TTI] Cost model FADD/FSUB/FNEG" This reverts commit `d0027e0be9`. Need to look at 2 test failures.	2023-04-11 10:14:40 +01:00
Sjoerd Meijer	d0027e0be9	[AArch64][TTI] Cost model FADD/FSUB/FNEG This lowers the cost for FADD, FSUB, and FNEG. The motivation is to avoid over-eager SLP vectorisation, that makes it look like SLP vectorisation is profitable but results in significant slow downs. Lowering the cost for scalar FADD/FSUB costs helps the profitability decision to favour the scalar version where vectorisation isn't beneficial. Lowering the cost for these floating point operations makes sense because a lot of other instructions including many shuffles have only a cost of 1; these FADD/FSUB/FNEG instructions should not be twice the cost. Performance results show a 7% improvement for Imagick from SPEC FP 2017, a small improvement in Blender, and unchanged results for the other apps in SPEC. RAJAPerf is neutral and mostly shows no changes. Differential Revision: https://reviews.llvm.org/D146033	2023-04-11 09:46:14 +01:00
Alexey Bataev	50af6ab0ab	[SLP]Fix emission of the masks in shuffles for undefs. If the value is used in the expression, need to adjust the mask before applying the mask. Plus, need to fix the analysis of the phi nodes for reused scalars.	2023-04-06 10:16:58 -07:00
Alexey Bataev	cf62adbbd8	[SLP]Fix delete of the extractelement with users. Made the condition for the erasing of the gathered extractelements stricter, remove it only if it has single vectorized use, otherwise leave it for instcombiner/instsimplify analysis.	2023-04-06 09:15:30 -07:00
Alexey Bataev	40105a9933	[SLP]Find reused scalars in buildvector sequences, if any. Patch generalizes analysis of scalars. The main part is outlined into lambda, which can be used to find reused inserted scalars and emit shuffle for them instead of multiple insertelement instructions, if the permutation is found alreadyi. I.e. some scalars are transformed by the permutation of previously vectorized nodes, and some are inserted directly. Reworked part of D110978 Differential Revision: https://reviews.llvm.org/D146564	2023-04-05 09:37:05 -07:00
Alexey Bataev	c1660006b2	[SLP]Reorder counters for same values, if the root node is reordered. The counters for the repeated scalars are ordered in the natural order, but the original scalars might be reordered during SLP graph reordering and this order can be dropped. Need to use the scalars after the reordering, not the original ones, to emit correct code for same value counters.	2023-04-03 07:52:49 -07:00
Alexey Bataev	367db8bf6a	[SLP][NFC]Add a test for reordered scalars with not reordered reuse coefficient.	2023-04-03 07:15:58 -07:00
Alexey Bataev	c1bcf5dd0a	[SLP]Fix PR61835: Assertion `I->use_empty() && "trying to erase instruction with users."' failed. If the externally used scalar is part of the tree and is replaced by extractelement instruction, need to add generated extractelement instruction to the list of the ExternallyUsedValues to avoid deletion during vectorization.	2023-03-31 14:21:19 -07:00
Guozhi Wei	a72162cc52	[AARCH64] Enable STORE of v4i8 to help more vectorization opportunities For the attached test case, currently llvm generates instructions to load/or/store the bytes one by one. Although NEON doesn't support v4i8 natively, we can promote it to v4i16 and operate on v4i16 vectors. So this patch override getStoreMinimumVF and specify the minimum VF for i8 vector is v4i8. Differential Revision: https://reviews.llvm.org/D145614	2023-03-31 17:03:06 +00:00
Alexey Bataev	9255124a07	[SLP]Fix a crash when trying to shuffle multiple nodes. Need to transform mask after applying shuffle using the mask itself as a base to correctly mark with identity those indices, actually used in previous shuffle. Allows to fix a crash, if different sized vectors are shuffled.	2023-03-30 09:32:11 -07:00
Zain Jaffal	4d7d454334	[SLP][AArch64] Add test to check for the vectorization of fshl Currently the cost for fshl is an overestimate causing SLP to vectorize when it is not necessary. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D147056	2023-03-28 17:46:33 +01:00
Florian Hahn	417fe52e6f	Revert "[SLP] Check with target before vectorizing GEP Indices." This reverts commit `1387a13e1d`. This introduced performance regressions on AArch64, when the cost of a vector GEP + extracts is offset by the benefits of vectorizing the rest of the tree. The test in llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll illustrates the issue. It was extracted from code that regressed a SPEC benchmark by 15%.	2023-03-28 08:06:53 +01:00

1 2 3 4 5 ...

1390 Commits