clang-p2996

Author	SHA1	Message	Date
Alexey Bataev	48bbd76587	[SLP]Fix PR79229: Check that extractelement is used only in a single node before erasing. Before trying to erase the extractelement instruction, not enough to check for single use, need to check that it is not used in several nodes because of the preliminary nodes reordering.	2024-01-24 11:22:22 -08:00
Alexey Bataev	ca654acc16	[SLP]Fix PR79321: SLPVectorizer's PHICompare doesn't provide a strict weak ordering. Compared NumUses to meet the reaquirements of the strict weak ordering.	2024-01-24 09:36:25 -08:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Alexey Bataev	bb3e0d7fc3	[SLP]Fix PR79193: skip analysis of gather nodes for minbitwidth. No need in trying to analyze small graphs with gather node only to avoid crash.	2024-01-23 12:44:49 -08:00
Alexey Bataev	f9da4c6ead	[SLP][NFC]Add a test with extending the types for vectorized stores/insertelement instructions, NFC.	2024-01-18 16:42:45 -08:00
Alexey Bataev	093206bb7e	[SLP]Fix PR78298: Assertion `GEP->getNumIndices() == 1 && !isa<Constant>(GEPIdx)' failed. The non-constant index might be folded to constant during earlier stages of vectorization. Need to consider this option and filter out out GEP with the constant indices from the candidates list.	2024-01-16 09:17:35 -08:00
Alexey Bataev	d79fdb2749	[SLP]Fix PR78236: correctly track external values, replaced several times during reduction vectorization. If the external value was replaced in the vectorizer several times during reduction vectorization, need to find the original value to correctly handle external uses and emit extractelement instructions properly.	2024-01-16 06:52:43 -08:00
Alexey Bataev	6fdc2ce8c5	[SLP]Fix PR77916: transform the whole mask, not only the elements for the second vector. Need to transform all elements in the long mask, if we decided to produce shorter version, some elements may still have incorrect inifices after transformation for the first vector in the permutation.	2024-01-12 07:07:43 -08:00
Alexey Bataev	39b2104b4a	[SLP]Fix a crash for reduced values with minbitwidth, which are reused. If the reduced values are additionally affected by minbitwidth analysis, need to cast them to a proper type before doing any math, if they are reused.	2024-01-12 04:49:48 -08:00
Florian Hahn	3b3da7c7fb	[SLP] Add a set of tests with non-power-of-2 operations.	2024-01-11 16:47:38 +00:00
Alexey Bataev	18473eb108	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening. Review: https://github.com/llvm/llvm-project/pull/72679	2024-01-11 06:59:57 -08:00
Alexey Bataev	dc717b1992	[SLP][NFC]Add a test for final vector with minbitwidth, NFC.	2024-01-11 06:53:42 -08:00
Martin Storsjö	1de3f46938	Revert "[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 )" This reverts commit `408dce8201`. This triggered failed asserts with code like this: char a[]; short b; int c, d, e, f; void g() { char h; for (;;) { for (; f; ++f) { h[f] = b[0] * a[e] + b[c] * a[1] >> 7; ++b; } h += d; } } Compiled like this: $ clang -target x86_64-linux-gnu -c repro.c -O2 clang: ../lib/IR/Instructions.cpp:3335: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value, llvm::Type, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.	2024-01-11 12:15:35 +02:00
Alexey Bataev	408dce8201	[SLP]Do not require external uses for roots and single use for other instructions in computeMinimumValueSizes. (#72679 ) After changes, that does not require support from InstCombine, we can drop some extra requirements for values-to-be-demoted. No need to check for external uses for roots/other instructions, just check that the no non-vectorized insertelement instruction, which may require widening.	2024-01-10 14:06:29 -05:00
Alexey Bataev	04f77a1320	[SLP][NFC]Replace constant by some meaningfull values to make test more relevant, NFC.	2024-01-10 10:34:32 -08:00
Alexey Bataev	73ce13d79b	[SLP][TTI]Improve detection of the insert-subvector pattern for SLP. (#74749 ) SLP vectorizer passes the type of the subvector and the mask, which size determines the size of the resulting vector. TTI should support this pattern to improve cost estimation of the insert_subvector shuffle pattern.	2024-01-10 10:39:34 -05:00
Alexey Bataev	036e48e2f5	[SLP]Fix PR76850: do the analysis of the submask. Need to limit the transformation of the VecMask by the corresponding part of the mask of SliceSize size to avoid compiler crash during further cost analysis.	2024-01-08 07:51:02 -08:00
Alexey Bataev	79e62315be	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. When trying to reuse the extractelement instruction, emitted for the insertelement instruction, need to check, if the this insertelement instruction was vectorized. In this case, need to use vectorized value, not the original insertelement.	2024-01-04 06:45:26 -08:00
Alexey Bataev	7c963fde16	[SLP]Use revectorized value for extracts from buildvector, beeing vectorized. If the insertelement instruction is vectorized, and the extractelement instruction from such insertelement also vectorized as part of the same tree, need to extract from the corresponding for insertelement vectorized value rather than original insertelement instruction.	2024-01-03 10:38:09 -08:00
Alexey Bataev	e775ba384e	[SLP][NFC]Add some extra values to avoid constant expressions in the test.	2024-01-02 11:23:10 -08:00
Enna1	a51c2f39f5	[SLP] no need to generate extract for in-tree uses for original scala… (#76077 ) …r instruction. Before `77a609b556`, we always skip in-tree uses of the vectorized scalars in `buildExternalUses()`, that commit handles the case that if the in-tree use is scalar operand in vectorized instruction, we need to generate extract for these in-tree uses. in-tree uses remain as scalar in vectorized instructions can be 3 cases: - The pointer operand of vectorized LoadInst uses an in-tree scalar - The pointer operand of vectorized StoreInst uses an in-tree scalar - The scalar argument of vector form intrinsic uses an in-tree scalar Generating extract for in-tree uses for vectorized instructions are implemented in `BoUpSLP::vectorizeTree()`: - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 However, `77a609b556` not only generates extract for vectorized instructions, but also generates extract for original scalar instructions. There is no need to generate extract for origin scalar instrutions, as these scalar instructions will be replaced by vector instructions and get erased later. This patch marks there is no exact user for in-tree scalars that remain as scalar in vectorized instructions when building external uses, In this case all uses of this scalar will be automatically replaced by extractelement. and remove - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11497-L11506 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11542-L11551 - https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L11657-L11667 extracts.	2023-12-30 10:45:26 +08:00
Alexey Bataev	5096501082	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-28 05:04:04 -08:00
Douglas Yung	fb981e6b4b	Revert "[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 )" This reverts commit `bc8c4bbd79`. Change is failing to build on several bots: - https://lab.llvm.org/buildbot/#/builders/127/builds/60184 - https://lab.llvm.org/buildbot/#/builders/123/builds/23709 - https://lab.llvm.org/buildbot/#/builders/216/builds/32302	2023-12-27 23:52:04 -08:00
Alexey Bataev	bc8c4bbd79	[SLP][TTI][X86]Add addsub pattern cost estimation. (#76461 ) SLP/TTI do not know about the cost estimation for addsub pattern, supported by X86. Previously the support for pattern detection was added (seeTTI::isLegalAltInstr), but the cost still did not estimated properly.	2023-12-27 15:57:21 -05:00
Alexey Bataev	a13148a880	[SLP]Fix PR75995: drop wrapping flags for resized wrapped binops. If decided to resize the instruction, need to drop wrapping flags from the resulting vector instructions to avoid incorrect optimizations/assumptions later. Fixes PR75995.	2023-12-20 06:51:39 -08:00
Alexey Bataev	8abf8c948c	[SLP][NFC]Add a test with incorrect wrapping flags in the binops with minbitwidth types.	2023-12-20 06:27:01 -08:00
Nikita Popov	9d4557920f	[InstCombine] Don't treat undef as poison in demanded element simplification We can only set PoisonElts if the element is poison, not if it is undef.	2023-12-19 12:26:48 +01:00
Eric Biggers	09058654f6	[RISCV] Remove experimental from Vector Crypto extensions (#74213 ) The RISC-V vector crypto extensions have been ratified. This patch updates the Clang and LLVM support for these extensions to be non-experimental, while leaving the C intrinsics as experimental since the C intrinsics are not yet standardized. Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2023-12-18 22:04:22 -08:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Maurice Heumann	f42b930af9	[SLP] Pessimistically handle unknown vector entries in SLP vectorizer (#75438 ) SLP Vectorizer can discard vector entries at unknown positions. This example shows the behaviour: https://godbolt.org/z/or43EM594 The following instruction inserts an element at an unknown position: ``` %2 = insertelement <3 x i64> poison, i64 %value, i64 %position ``` The position depends on an argument that is unknown at compile time. After running SLP, one can see there is no more instruction present referencing `%position`. This happens as SLP parallelizes the two adds in the example. It then needs to merge the original vector with the new vector. Within `isUndefVector`, the SLP vectorizer constructs a bitmap indicating which elements of the original vector are poison values. It does this by walking the insertElement instructions. If it encounters an insert with a non-constant position, it is ignored. This will result in poison values to be used for all entries, where there are no inserts with constant positions. However, as the position is unknown, the element could be anywhere. Therefore, I think it is only safe to assume none of the entries are poison values and to simply take them all over when constructing the shuffleVector instruction. This fixes #75437	2023-12-14 09:48:23 -05:00
Alexey Bataev	dd0e38eb34	[SLP]Add a test for missed insert_subvector pattern detection, NFC.	2023-12-07 10:46:14 -08:00
Alexey Bataev	0e1a9e3084	[SLP]Fix PR74607: Fix dependency between buildvector nodes with user nodes, having same last instruction. If the user nodes has the same last-instruction, used as insert points for the buildvector nodes, finding the proper dependency is crucial. Before, it depended on the indices of the buildvectors themselves but looks like it should depend on indices of the user nodes, because it identifies the vectorization order and, thus, properly aligns buildvector nodes in terms of def-use chain.	2023-12-06 10:15:01 -08:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Alexey Bataev	279b1ea65f	[SLP]Improve gathering of the scalars used in the graph. Currently we emit gathers for scalars being vectorized in the tree as a pair of extractelement/insertelement instructions. Instead we can try to find all required vectors and emit shuffle vector instructions directly, improving the code and reducing compile time. Part of non-power-of-2 vectorization. Differential Revision: https://reviews.llvm.org/D110978	2023-12-01 11:23:57 -08:00
Alexey Bataev	1f88e62db4	[SLP]Fix/improve minbitwidth mapping to use TreeEntry as a key. Currently, MinBWs map uses Value* as a key and stores mapping for each value to be demoted. It make is it hard to get the actual MinBWs value for the buildvector scalars(constants), since same constant might be used in different nodes with the different MinBWs values/decisions. Also, it consumes extra memory for the vectorized values/instructions from the same nodes. Better to map actual nodes. It fixes the bitwidth data fetching for buildvector scalars and improves memory consumption/analysis time for other instructions.	2023-11-30 06:33:31 -08:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Alexey Bataev	12bcd6339d	[SLP]Improve detection of gathered loads, if no other deps are detected. If the gather node includes ordered loads only partially (not the whole node consists of loads) and the other gathered scalar are not loads, and no other dependency from other nodes is found, we still can improve the cost of gather, if take into account the fact that these loads still can be vectorized.	2023-11-22 11:35:51 -08:00
Alexey Bataev	f609d4ba1d	[SLP]Fix PR72833: do not crash if only operand is casted but the use instruction. Need to check if only operand is casted, not the user instruction itself, if the types of the operands does not match the actual type.	2023-11-20 08:35:35 -08:00
Alexey Bataev	40e46b6eff	[SLP]Do not emit int bitcast after minbitwidth analysis. No need to emit bitcat op for integer operands if it is detected that after minbitwidth analysis the type is the same.	2023-11-20 06:25:17 -08:00
Alexey Bataev	206799fcf5	[SLP]Fix PR72524: "Out-of-bounds shuffle mask element" failed. Need to check if we ran into subvector extract pattern before checking for identity vector to avoid compiler crash.	2023-11-16 07:39:32 -08:00
Alexey Bataev	95703642e3	[SLP]Fix PR72202: wrong mask emission for the first found vector operand. Need to copy the submask not to the very first part of the common extractelements vector mask, but to the proper one to avoid wrong code emission.	2023-11-16 07:01:05 -08:00
Alexey Bataev	181b2c1b4a	[SLP][NFC]Add a test for PR72202 to show a bug in a mask generation for vectorized extractelements operands.	2023-11-16 06:36:04 -08:00
Alexey Bataev	8ea8dd9a01	[SLP] Fix crash on trying to reshuffle a scalar that was vectorized. If the buildvector node contains extractelement, which vector operand depends on vector node, need to check if the node is ready and use vectorized value instead of the original vector operation.	2023-11-15 11:01:45 -08:00
Alexey Bataev	b6f51787f6	[SLP]Fix signedness analysis for scalars in graph. Cannot use the sign info for the roots for all scalars in the graph, need to perform the analysis for each particular scalar (tree node).	2023-11-15 07:10:59 -08:00
Alexey Bataev	5adfad254e	[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI. SLP includes analysis for the minimum bitwidth, the actual integer operations can be emitted. It allows to reduce register pressure and improve perf. Currently, it includes only cost model and the next transformation relies on InstructionCombiner. Better to do it directly in SLP, it allows to reduce compile time and fix cost model issues.	2023-11-14 11:12:52 -08:00
Alexey Bataev	506a30d30f	[SLP][NFC]Add a test with cast op, not matching original cast op, NFC.	2023-11-14 10:08:12 -08:00
Alexey Bataev	f2f3050476	Revert "[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI." This reverts commit `f6ae50f710` to fix a crash revealed in the internal testing.	2023-11-14 09:45:54 -08:00
Alexey Bataev	f6ae50f710	[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI. SLP includes analysis for the minimum bitwidth, the actual integer operations can be emitted. It allows to reduce register pressure and improve perf. Currently, it includes only cost model and the next transformation relies on InstructionCombiner. Better to do it directly in SLP, it allows to reduce compile time and fix cost model issues.	2023-11-14 07:57:37 -08:00
Alexey Bataev	dbd00c3b5d	[SLP][NFC]Add a test for gather node with mixed load/non-load scalars.	2023-11-10 08:40:58 -08:00

1 2 3 4 5 ...

1578 Commits