clang-p2996

Author	SHA1	Message	Date
Sanjay Patel	e71b81cab0	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try) The first attempt was reverted because a clang test changed unexpectedly - the file is already marked with a FIXME, so I just updated it this time to pass. Original commit message: This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 14:52:20 -05:00
Sanjay Patel	5eacdcff06	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1" This reverts commit `a4c466766d`. This broke clang tests that are wrongly dependent on the optimizer.	2022-11-30 14:10:50 -05:00
Sanjay Patel	a4c466766d	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 13:22:04 -05:00
William Huang	be4b1dd35b	[InstCombine] Revert D125845 Reverting D125845 `[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back` because multiple users reported performance regression Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D138950	2022-11-29 22:02:40 +00:00
Sanjay Patel	c7bd82dfd8	[PhaseOrdering] add test for vector load combining; NFC This is another example from issue #17113	2022-11-28 16:00:06 -05:00
Matt Arsenault	1c55cc600e	PhaseOrdering: Convert tests to opaque pointers Required manually running update_test_checks: AArch64/hoisting-sinking-required-for-vectorization.ll AArch64/peel-multiple-unreachable-exits-for-vectorization.ll ARM/arm_mult_q15.ll X86/hoist-load-of-baseptr.ll X86/spurious-peeling.ll	2022-11-27 21:26:41 -05:00
Roman Lebedev	25f01d593c	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)" TableGen is still getting miscompiled on PPC buildbots. Sent a mail with request for help. This reverts commit `3c4d2a0396`.	2022-11-27 00:00:06 +03:00
Roman Lebedev	3c4d2a0396	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2) This is a recommit of `cf624b23bc`, which was reverted in `5cfc22cafe`, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, and caused compile time explosions in some cases. Let's try with something really REALLY conservative first, just to get somewhere, and try to bump it (to 64/128) later. FIXME: should this respect TTI reg width * num vec regs? Original commit message: Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-11-26 23:19:15 +03:00
Max Kazantsev	211d941188	[SCEV] Rename max backedge-taken count -> constant max backedge taken-count in printout This is a preparatory step for introducing symbolic max backedge-taken count.	2022-11-24 18:43:42 +07:00
Benjamin Kramer	5cfc22cafe	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes" This reverts commit `cf624b23bc`. It triggers crashes in clang, see the comments on github on the original change.	2022-11-23 13:11:16 +01:00
Roman Lebedev	cf624b23bc	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-11-23 02:38:25 +03:00
Roman Lebedev	6a77477d53	[NFC][SROA] Autogenerate check lines in some tests being affected by upcoming change	2022-11-23 02:38:25 +03:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: `87debdadaf` https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Roman Lebedev	8adfa29706	[Pipelines] Introduce SROA after (final, run-time) loop unrolling Now that we are done with loop unrolling, be it either by LoopVectorizer, or LoopUnroll passes, some variable-offset GEP's into alloca's could have become constant-offset, thus enabling SROA and alloca promotion, yet we don't capitalize on that, which is surprizing. While it would be good to not introduce one more SROA invocation, but instead move the one from `PassBuilder::buildFunctionSimplificationPipeline()`, the existing test coverage says that is a bad idea, though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions So instead, i add yet another SROA run. I have checked, and it needs to be at least after said final loop unrolling. This is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions I've encountered this in a real code, `SROA-after-final-loop-unrolling.ll` has been reduced from https://godbolt.org/z/fsdMhETh3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136806	2022-11-17 21:31:30 +03:00
Sanjay Patel	3682180e5f	[PhaseOrdering] add test for load combining; NFC Based on an example in issue #17113	2022-11-17 10:33:35 -05:00
Arthur Eubanks	70dc3b811e	[AggressiveInstCombine] Remove legacy PM pass As part of legacy PM optimization pipeline removal. This shouldn't be used in codegen pipelines so it should be ok to remove. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D137116	2022-11-15 14:35:15 -08:00
Bjorn Pettersson	893e351f2f	[test] Avoid legacy PM default pipelines (O0,O1 etc) when running opt Two lit tests were found running something like this: opt -O<n> -pass-locked-to-legacy-PM ... The expand-atomicrmw-xchg-fp.ll seem to have used -O1 just to ensure that the -atomic-expand pass were thinking that it wasn't running at O0 level. Same thing can be ensured by using the -codegen-opt-level=1 option, making it possible to avoid using O1 in that test case. In the vector-reductions-expanded.ll test case it was possible to split the RUN line into using two opt invocations. First running "opt -O2" using the new PM, and then running "opt -expand-reductions" using the legacy PM. I think that given this patch we get closer to removing code related to 'AddOptimizationPasses' in opt.cpp. Differential Revision: https://reviews.llvm.org/D137626	2022-11-09 09:57:57 +01:00
Sanjay Patel	115d2f69a5	[InstCombine] canonicalize branch with logical-and-not condition https://alive2.llvm.org/ce/z/EfHlWN In the motivating case from issue #58313, this allows forming a duplicate 'not' op which then gets CSE'd and simplifyCFG'd and combined into the expected 'xor'.	2022-10-31 15:51:45 -04:00
Sanjay Patel	a7265f9102	[InstCombine] add tests for branch on logical and/or; NFC	2022-10-31 15:51:45 -04:00
David Green	5dd7d2ce67	[InstCombine] Don't change switch table from desirable to illegal types In InstCombine we treat i8/i16 as desirable, even if they are not legal. The current logic in shouldChangeType will decide to convert from an illegal but desirable type (such as an i8) to an illegal and undesirable type (such as i3). This patch prevents changing the switch conditions to an irregular type on like Arm/AArch64 where i8/i16 are not legal. This is the same issue as https://reviews.llvm.org/D54115. In the case I was looking it is was converting an i32 switch to an i8 switch, which then became a i3 switch. Differential Revision: https://reviews.llvm.org/D136763	2022-10-28 10:15:41 +01:00
Roman Lebedev	e14f30584d	[NFC][PhaseOrdering] Add one more test for SROA after partial unroll https://reviews.llvm.org/D136806	2022-10-27 19:10:27 +03:00
Roman Lebedev	70324cd883	[NFC][PhaseOrdering] Add new test for SROA misplacement	2022-10-27 03:11:23 +03:00
Yaxun (Sam) Liu	9d5adc7e49	Revert "reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost" This reverts commit `bd7949bcd8`. Revert this patch since reviwers have different opinions regarding the approach in post-commit review. Will open RFC for further discussion. Differential Revision: https://reviews.llvm.org/D132408	2022-10-25 12:15:39 -04:00
Yaxun (Sam) Liu	bd7949bcd8	reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost Fixed compile time increase due to always constructing LocalCostTracker. Now only construct LocalCostTracker when needed.	2022-10-24 15:43:53 -04:00
William Huang	6c767cef5a	[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125845	2022-10-20 17:41:26 +00:00
bipmis	38f3e44997	[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns. Differential Revision: https://reviews.llvm.org/D135137	2022-10-19 11:22:58 +01:00
bipmis	82e3056255	Add test for combinations of four i8-loads spliced into a 32-bit value	2022-10-18 15:40:56 +01:00
Sanjay Patel	008a89037a	[InstCombine] fold udiv with common shl amount in operands (X << Z) / (Y << Z) --> X / Y https://alive2.llvm.org/ce/z/E5eaxU This fixes the motivating example from issue #58137, but it is not the most general transform. We should probably also convert left-shift in the divisor to right-shift in the dividend for that, but that exposes another missed canonicalization for shifts and adds.	2022-10-12 11:12:26 -04:00
Sanjay Patel	3f36bdd2f4	[PhaseOrdering] add test for sdiv with common factor; NFC issue #58137	2022-10-10 08:11:53 -04:00
Sanjay Patel	8da2fa856f	[InstCombine] fold sdiv with hidden common factor (X * Y) s/ (X << Z) --> Y s/ (1 << Z) https://alive2.llvm.org/ce/z/yRSddG issue #58137	2022-10-06 13:11:50 -04:00
Sanjay Patel	241893f99f	[PhaseOrdering] add test for mul + sdiv; NFC issue #58137	2022-10-06 13:11:50 -04:00
Nikita Popov	627a0c6b40	[PhaseOrdering] Name instructions in test (NFC) Run through opt -instnamer.	2022-10-05 17:04:11 +02:00
Sanjay Patel	3f906f057c	[InstSimplify] look through vector select (shuffle) in min/max fold This is an extension of the existing min/max+select fold (which already has a very large number of variations) to allow a vector shuffle because that's what we have in the motivating example from issue #42100. A couple of Alive2 checks of variants (I don't know how to generalize these in Alive): https://alive2.llvm.org/ce/z/jUFAqT And verify the PR42100 test: https://alive2.llvm.org/ce/z/3EcASf It's possible there is some generalization of the fold or a VectorCombine/SLP answer for the motivating test, but I haven't found a better/smaller solution yet. We can also add even more variants here as follow-up patches. For example, we can have shuffle followed by min/max; we also don't have this canonicalization or the reverse: https://alive2.llvm.org/ce/z/StHD9f Differential Revision: https://reviews.llvm.org/D134879	2022-09-30 08:27:00 -04:00
Simon Pilgrim	5849fcb635	Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions" Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI." Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds" Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.	2022-09-30 11:22:48 +01:00
Sanjay Patel	8bfba17b40	[InstSimplify][PhaseOrdering] add tests for vector select of min/max; NFC The phase ordering test is the almost unoptimized IR for the example in issue #42100; it was passed through -mem2reg to reduce obvious excessive load/store and other noise. D134879	2022-09-29 12:06:55 -04:00
Simon Pilgrim	c29d911fd3	[PhaseOrdering] Add missing x86 registered target requirement to fast-basictest.ll	2022-09-27 20:59:57 +01:00
Simon Pilgrim	1b7089fe67	[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions Instead of accumulating all extraction costs separately and then adjusting for repeated subvector extractions, this patch collects all the extractions and then converts to calls to getScalarizationOverhead to improve the accuracy of the costs. I'm not entirely satisfied with the getExtractWithExtendCost handling yet - this still just adds all the getExtractWithExtendCost costs together - it really needs to be replaced with a "getScalarizationOverheadWithExtend", but that will require further refactoring first. This replaces my initial attempt in D124769. Differential Revision: https://reviews.llvm.org/D134605	2022-09-27 14:49:07 +01:00
Simon Pilgrim	bcb1397bda	[PhaseOrdering][X86] Add missing triple from fast-basictest.ll test	2022-09-25 16:23:35 +01:00
Sanjay Patel	271f3b91bb	[PhaseOrdering] add test for issue #50778 ; NFC Several different passes are involved to get the expected IR, and we don't want that to break again.	2022-09-23 12:12:13 -04:00
Sanjay Patel	34f8112b79	Revert "[PhaseOrdering] add test for issue #50778 ; NFC" This reverts commit `cdc012fa26`. This accidentally deleted a test file (not sure how that became part of the commit).	2022-09-23 12:06:29 -04:00
Sanjay Patel	cdc012fa26	[PhaseOrdering] add test for issue #50778 ; NFC Several different passes are involved to get the expected IR, and we don't want that to break again.	2022-09-23 12:03:03 -04:00
Pavel Samolysov	1c530500ab	[Pipelines] Introduce DAE after ArgumentPromotion The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting down generated `alloca` instructions as well as meaningless `store`s and this behavior can leave unused (dead) arguments. To eliminate the dead arguments and therefore let the DeadCodeElimination remove becoming dead inserted `GEP`s as well as `load`s and `cast`s in the callers, the DeadArgumentElimination pass should be run after the ArgumentPromotion one. Differential Revision: https://reviews.llvm.org/D128830	2022-09-22 15:33:46 -07:00
Djordje Todorovic	f0f8b46863	Recommit "[AggressiveInstCombine] Lower Table Based CTTZ The bug reported on the [0] has been fixed. The issue was we have not checked if the global variables that represent cttz tables was constant. There is a new negative test added in negative-lower-table-based-cttz.ll that represents this. [0] https://reviews.llvm.org/rGdf868edee561eb973edd85ec9df41c67aa0bff6b	2022-09-20 13:12:47 +02:00
Nikita Popov	dd61726d5b	Revert "[SimplifyCFG] accumulate bonus insts cost" This reverts commit `e5581df60a`. This causes major compile-time regressions, about 2-3% end-to-end on CTMark.	2022-09-19 14:46:43 +02:00
Yaxun (Sam) Liu	e5581df60a	[SimplifyCFG] accumulate bonus insts cost SimplifyCFG folds bool foo() { if (cond1) return false; if (cond2) return false; return true; } as bool foo() { if (cond1 \| cond2) return false return true; } 'cond2' is called 'bonus insts' in branch folding since they introduce overhead since the original CFG could do early exit but the folded CFG always executes them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into its predecessor BB which shares the destination. If it is below bonus-inst-threshold, SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed. When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts in the current BB to be considered for folding. This causes issue for unrolled loops which share destinations, e.g. bool foo(int a) { for (int i = 0; i < 32; i++) if (a[i] > 0) return false; return true; } After unrolling, it becomes bool foo(int a) { if(a[0]>0) return false if(a[1]>0) return false; //... if(a[31]>0) return false; return true; } SimplifyCFG will merge each BB with its predecessor BB, and ends up with 32 'bonus insts' which are always executed, which is much slower than the original CFG. The root cause is that SimplifyCFG does not consider the accumulated cost of 'bonus insts' which are folded from different BB's. This patch fixes that by introducing a ValueMap to track costs of 'bonus insts' coming from different BB's into the same BB, and cuts off if the accumulated cost exceeds a threshold. Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault Differential Revision: https://reviews.llvm.org/D132408	2022-09-18 20:21:14 -04:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Valery N Dmitriev	18dde772d6	[SLP] Unify main/alternate selection for CmpInst instructions Make main/alternate operation selection logic for CmpInst consistent across SLP vectorizer. Differential Revision: https://reviews.llvm.org/D133430	2022-09-13 09:20:25 -07:00
Djordje Todorovic	b080d0bae8	Revert ""Recommit "[AggressiveInstCombine] Lower Table Based CTTZ""" This reverts commit `df868edee5`, as it introduces a bug found by Alive2 (more on the rGdf868edee561).	2022-09-12 08:23:07 +02:00
Djordje Todorovic	df868edee5	"Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"" This reverts commit `053841c562`. We faced a use-after-free after pushing the D113291, since the foldSqrt() has a call to eraseFromParent(). The function should be at the end of the main loop that folds the patterns. This patch fixes that.	2022-09-09 10:29:39 +02:00
Djordje Todorovic	7aec9ddcfd	Revert "Recommit "[AggressiveInstCombine] Lower Table Based CTTZ"" This reverts commit `f879939157`.	2022-09-08 17:01:16 +02:00

1 2 3 4 5 ...

435 Commits