clang-p2996

Author	SHA1	Message	Date
Alexey Bataev	a13148a880	[SLP]Fix PR75995: drop wrapping flags for resized wrapped binops. If decided to resize the instruction, need to drop wrapping flags from the resulting vector instructions to avoid incorrect optimizations/assumptions later. Fixes PR75995.	2023-12-20 06:51:39 -08:00
Alexey Bataev	8abf8c948c	[SLP][NFC]Add a test with incorrect wrapping flags in the binops with minbitwidth types.	2023-12-20 06:27:01 -08:00
Nikita Popov	9d4557920f	[InstCombine] Don't treat undef as poison in demanded element simplification We can only set PoisonElts if the element is poison, not if it is undef.	2023-12-19 12:26:48 +01:00
Eric Biggers	09058654f6	[RISCV] Remove experimental from Vector Crypto extensions (#74213 ) The RISC-V vector crypto extensions have been ratified. This patch updates the Clang and LLVM support for these extensions to be non-experimental, while leaving the C intrinsics as experimental since the C intrinsics are not yet standardized. Co-authored-by: Brandon Wu <brandon.wu@sifive.com>	2023-12-18 22:04:22 -08:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Maurice Heumann	f42b930af9	[SLP] Pessimistically handle unknown vector entries in SLP vectorizer (#75438 ) SLP Vectorizer can discard vector entries at unknown positions. This example shows the behaviour: https://godbolt.org/z/or43EM594 The following instruction inserts an element at an unknown position: ``` %2 = insertelement <3 x i64> poison, i64 %value, i64 %position ``` The position depends on an argument that is unknown at compile time. After running SLP, one can see there is no more instruction present referencing `%position`. This happens as SLP parallelizes the two adds in the example. It then needs to merge the original vector with the new vector. Within `isUndefVector`, the SLP vectorizer constructs a bitmap indicating which elements of the original vector are poison values. It does this by walking the insertElement instructions. If it encounters an insert with a non-constant position, it is ignored. This will result in poison values to be used for all entries, where there are no inserts with constant positions. However, as the position is unknown, the element could be anywhere. Therefore, I think it is only safe to assume none of the entries are poison values and to simply take them all over when constructing the shuffleVector instruction. This fixes #75437	2023-12-14 09:48:23 -05:00
Alexey Bataev	dd0e38eb34	[SLP]Add a test for missed insert_subvector pattern detection, NFC.	2023-12-07 10:46:14 -08:00
Alexey Bataev	0e1a9e3084	[SLP]Fix PR74607: Fix dependency between buildvector nodes with user nodes, having same last instruction. If the user nodes has the same last-instruction, used as insert points for the buildvector nodes, finding the proper dependency is crucial. Before, it depended on the indices of the buildvectors themselves but looks like it should depend on indices of the user nodes, because it identifies the vectorization order and, thus, properly aligns buildvector nodes in terms of def-use chain.	2023-12-06 10:15:01 -08:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Craig Topper	7ec4f6094e	[InstCombine] Infer disjoint flag on Or instructions. (#72912 ) The disjoint flag was recently added to IR in #72583 We already set it when we turn an add into an or. This patch sets it on Ors that weren't converted from an Add.	2023-12-02 14:11:12 -08:00
Alexey Bataev	279b1ea65f	[SLP]Improve gathering of the scalars used in the graph. Currently we emit gathers for scalars being vectorized in the tree as a pair of extractelement/insertelement instructions. Instead we can try to find all required vectors and emit shuffle vector instructions directly, improving the code and reducing compile time. Part of non-power-of-2 vectorization. Differential Revision: https://reviews.llvm.org/D110978	2023-12-01 11:23:57 -08:00
Alexey Bataev	1f88e62db4	[SLP]Fix/improve minbitwidth mapping to use TreeEntry as a key. Currently, MinBWs map uses Value* as a key and stores mapping for each value to be demoted. It make is it hard to get the actual MinBWs value for the buildvector scalars(constants), since same constant might be used in different nodes with the different MinBWs values/decisions. Also, it consumes extra memory for the vectorized values/instructions from the same nodes. Better to map actual nodes. It fixes the bitwidth data fetching for buildvector scalars and improves memory consumption/analysis time for other instructions.	2023-11-30 06:33:31 -08:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Alexey Bataev	12bcd6339d	[SLP]Improve detection of gathered loads, if no other deps are detected. If the gather node includes ordered loads only partially (not the whole node consists of loads) and the other gathered scalar are not loads, and no other dependency from other nodes is found, we still can improve the cost of gather, if take into account the fact that these loads still can be vectorized.	2023-11-22 11:35:51 -08:00
Alexey Bataev	f609d4ba1d	[SLP]Fix PR72833: do not crash if only operand is casted but the use instruction. Need to check if only operand is casted, not the user instruction itself, if the types of the operands does not match the actual type.	2023-11-20 08:35:35 -08:00
Alexey Bataev	40e46b6eff	[SLP]Do not emit int bitcast after minbitwidth analysis. No need to emit bitcat op for integer operands if it is detected that after minbitwidth analysis the type is the same.	2023-11-20 06:25:17 -08:00
Alexey Bataev	206799fcf5	[SLP]Fix PR72524: "Out-of-bounds shuffle mask element" failed. Need to check if we ran into subvector extract pattern before checking for identity vector to avoid compiler crash.	2023-11-16 07:39:32 -08:00
Alexey Bataev	95703642e3	[SLP]Fix PR72202: wrong mask emission for the first found vector operand. Need to copy the submask not to the very first part of the common extractelements vector mask, but to the proper one to avoid wrong code emission.	2023-11-16 07:01:05 -08:00
Alexey Bataev	181b2c1b4a	[SLP][NFC]Add a test for PR72202 to show a bug in a mask generation for vectorized extractelements operands.	2023-11-16 06:36:04 -08:00
Alexey Bataev	8ea8dd9a01	[SLP] Fix crash on trying to reshuffle a scalar that was vectorized. If the buildvector node contains extractelement, which vector operand depends on vector node, need to check if the node is ready and use vectorized value instead of the original vector operation.	2023-11-15 11:01:45 -08:00
Alexey Bataev	b6f51787f6	[SLP]Fix signedness analysis for scalars in graph. Cannot use the sign info for the roots for all scalars in the graph, need to perform the analysis for each particular scalar (tree node).	2023-11-15 07:10:59 -08:00
Alexey Bataev	5adfad254e	[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI. SLP includes analysis for the minimum bitwidth, the actual integer operations can be emitted. It allows to reduce register pressure and improve perf. Currently, it includes only cost model and the next transformation relies on InstructionCombiner. Better to do it directly in SLP, it allows to reduce compile time and fix cost model issues.	2023-11-14 11:12:52 -08:00
Alexey Bataev	506a30d30f	[SLP][NFC]Add a test with cast op, not matching original cast op, NFC.	2023-11-14 10:08:12 -08:00
Alexey Bataev	f2f3050476	Revert "[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI." This reverts commit `f6ae50f710` to fix a crash revealed in the internal testing.	2023-11-14 09:45:54 -08:00
Alexey Bataev	f6ae50f710	[SLP]Emit actual bitwidth for analyzed MinBitwidth nodes, NFCI. SLP includes analysis for the minimum bitwidth, the actual integer operations can be emitted. It allows to reduce register pressure and improve perf. Currently, it includes only cost model and the next transformation relies on InstructionCombiner. Better to do it directly in SLP, it allows to reduce compile time and fix cost model issues.	2023-11-14 07:57:37 -08:00
Alexey Bataev	dbd00c3b5d	[SLP][NFC]Add a test for gather node with mixed load/non-load scalars.	2023-11-10 08:40:58 -08:00
Ramkumar Ramachandra	2302e4c327	Reland "VectorUtils: mark xrint as trivially vectorizable" (#71416 ) With the recent change `98c90a13` (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests. A couple of important fixes have been landed since the original patch was landed and reverted, and it is now safe to re-land the patch: `5e1d81a` (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and `fd887a3` (LegalizeVectorTypes: fix bug in widening of vec result in xrint). See also #71399, which proves that lrint and llrint will indeed produce vector codegen on RISC-V. Fixes #55208.	2023-11-06 18:49:49 +00:00
Alexey Bataev	ac254fc055	[SLP]Improve tryToGatherExtractElements by using per-register analysis. Currently tryToGatherExtractElements function analyzes the whole vector, regrdless number of actual registers, used in this vector. It may prevent some optimizations, because per-register analysis may allow to simplify the final code by reusing more already emitted vectors and better shuffles. Differential Revision: https://reviews.llvm.org/D148855	2023-11-06 07:29:27 -08:00
Hans Wennborg	046c57e705	Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis." This causes asserts: llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:10082: Value llvm::slpvectorizer::BoUpSLP::ShuffleInstructionBuilder::adjustExtracts( const TreeEntry , MutableArrayRef<int>, unsigned int, bool &): Assertion `Part == 0 && "Expected firs part."' failed. See comment on the code review. > Currently tryToGatherExtractElements function analyzes the whole vector, > regrdless number of actual registers, used in this vector. It may > prevent some optimizations, because per-register analysis may allow to > simplify the final code by reusing more already emitted vectors and > better shuffles. > > Differential Revision: https://reviews.llvm.org/D148855 This reverts commit `9dfdbd7887`.	2023-11-06 13:56:42 +01:00
Alexey Bataev	9dfdbd7887	[SLP]Improve tryToGatherExtractElements by using per-register analysis. Currently tryToGatherExtractElements function analyzes the whole vector, regrdless number of actual registers, used in this vector. It may prevent some optimizations, because per-register analysis may allow to simplify the final code by reusing more already emitted vectors and better shuffles. Differential Revision: https://reviews.llvm.org/D148855	2023-11-03 10:43:58 -07:00
Nikita Popov	e4a4122eb6	[IR] Remove zext and sext constant expressions (#71040 ) Remove support for zext and sext constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. There is some additional cleanup that can be done on top of this, e.g. we can remove the ZExtInst vs ZExtOperator footgun. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-03 10:46:07 +01:00
Martin Storsjö	66152f4eed	Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis." This reverts commit `3e6d7c6d98`. That commit caused miscompilation of ffmpeg's libavcodec/vp9dsp_8bpp.o on aarch64; the file still compiles correctly, but no longer produces the right result - see https://reviews.llvm.org/D148855#4655968 for details.	2023-11-03 00:08:17 +02:00
Alexey Bataev	495ed8d8c8	[SLP]Fix PR70507: freeze poisonous insts to avoid poison propagation. If the reduction instruction is not bool logical op, but reduced within bool logical op reduction list, need to freeze to avoid poison propagation.	2023-11-02 10:37:38 -07:00
Alexey Bataev	033d2b71d2	[SLP][NFC]Add a test to show poison propagation in mixed (non)bool logical ops reduction, NFC.	2023-11-02 09:58:13 -07:00
Alexey Bataev	3e6d7c6d98	[SLP]Improve tryToGatherExtractElements by using per-register analysis. Currently tryToGatherExtractElements function analyzes the whole vector, regrdless number of actual registers, used in this vector. It may prevent some optimizations, because per-register analysis may allow to simplify the final code by reusing more already emitted vectors and better shuffles. Differential Revision: https://reviews.llvm.org/D148855	2023-11-01 10:42:35 -07:00
Alexey Bataev	6e8d957a22	Revert "[SLP]Improve tryToGatherExtractElements by using per-register analysis." This reverts commit `0a34aaedd8` to fix fails reported in https://lab.llvm.org/buildbot/#/builders/265/builds/40	2023-11-01 08:52:31 -07:00
Alexey Bataev	c28b7eb496	[SLP]Fix handling of -slp-vectorize-hor-store for values with many uses.	2023-11-01 08:41:54 -07:00
Alexey Bataev	c449a64c3e	[SLP][NFC]Add the test shoing issue with -slp-vectorize-hor-store option, NFC.	2023-11-01 08:31:18 -07:00
Alexey Bataev	0a34aaedd8	[SLP]Improve tryToGatherExtractElements by using per-register analysis. Currently tryToGatherExtractElements function analyzes the whole vector, regrdless number of actual registers, used in this vector. It may prevent some optimizations, because per-register analysis may allow to simplify the final code by reusing more already emitted vectors and better shuffles. Differential Revision: https://reviews.llvm.org/D148855	2023-11-01 07:44:49 -07:00
Ramkumar Ramachandra	ac7c816dc2	Revert "VectorUtils: mark lrint, llrint as trivially vectorizable (#69945 )" This reverts commit `5bfd89bda7`. It was causing build failures on ffmpeg on i686.	2023-11-01 09:57:22 +00:00
Ramkumar Ramachandra	5bfd89bda7	VectorUtils: mark lrint, llrint as trivially vectorizable (#69945 ) With the recent change `98c90a13` (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.	2023-10-31 21:29:15 +00:00
Alexey Bataev	4c997e1536	[SLP]Fix PR70507: emit freeeze whenever required for bool logical ops in the middle of reduction ops. Need to emit freeze instruction not only in the case, where the root is bool logical op, but also if we reduce several scalars, but unable to say precisely, if the root is bool logical op.	2023-10-31 12:23:12 -07:00
Alexey Bataev	0e8cbb6ac8	[SLP][NFC]Add a test with poisonous reduction, seeding bool logical op. NFC.	2023-10-31 12:10:10 -07:00
Alexey Bataev	9da19e4340	[SLP]Fix PR70507: correctly handle bool logical ops in reductions. If the very first reduction operation is not bool logical op, but some others are, still need to emit the boo logic op for all the extra reduction operations to avoid incorrect poison propagation.	2023-10-30 14:09:08 -07:00
Alexey Bataev	71bf052ec9	[SLP][NFC]Add a test for bool logic ops reduction, NFC.	2023-10-30 13:38:57 -07:00
Philip Reames	3f2ed812f0	[InstCombine] Infer nneg on zext when forming from non-negative sext (#70706 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction. InstCombine is one of our largest canonicalizers of zext from non-negative sext instructions, so set the flag there.	2023-10-30 12:09:43 -07:00
Philip Reames	89564f0b69	Regenerate a set of auto-update tests [nfc] To reduce the spurious test delta in an upcoming change.	2023-10-30 11:36:43 -07:00
Alexey Bataev	af15c46777	[SLP]Do not crash if number of vector registers does not feet the vector type. Need to check, if the number of vector registers, returned by TTI, is not greater than total number of mask element and not zero, before trying to perform any operations. TTI still may return non-valid number of registers.	2023-10-30 07:30:52 -07:00
Antonio Frighetto	138e6c1c86	[AArch64][TTI] Improve `LegalVF` when gather loads are scalarized After determining the cost of loads that could not be coalesced into `VectorizedLoads` in SLP, computing the cost of a gather-vectorized load is carried out. Favour a potentially high valid cost when the type of a group of loads, whose type is a vector of size dependent upon `VF`, may be legalized into a scalar value. Fixes: https://github.com/llvm/llvm-project/issues/68953.	2023-10-27 20:22:54 +02:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00

1 2 3 4 5 ...

1554 Commits