clang-p2996

Author	SHA1	Message	Date
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Dmitry Makogon	62f86d4f95	Reapply `5ec2386` "Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit `7cd273c339`. Several patches with tests fixes have been applied: `0cada82f0a` "[Test] Remove incorrect test in GVN" `97cb13615d` "[Test] Separate IndVars test into AArch64 and X86 parts" `985cc490f1` "[Test] Remove separated test in IndVars", and test failures caused by `5ec2386` should be resolved now.	2021-11-10 17:36:14 +07:00
Douglas Yung	7cd273c339	Revert "Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit `5ec2386332`. This change is causing test failures on the PS4 linux build bot: https://lab.llvm.org/buildbot/#/builders/139/builds/12871	2021-11-09 10:28:41 -08:00
Sanjay Patel	c36b7e21bd	[InstCombine] enhance vector bitwise select matching (Cond & C) \| (~bitcast(Cond) & D) --> bitcast (select Cond, (bc C), (bc D)) This is part of fixing: https://llvm.org/PR34047 That report shows a case where a bitcast is sitting between the select condition candidate and its 'not' value due to current cast canonicalization rules. There's a bitcast type restriction that might be violated in existing matching, but I still need to investigate if that is possible - Alive2 shows we can only do this transform safely when the bitcast is from narrow to wide vector elements (otherwise poison could leak into elements that were safe in the original code): https://alive2.llvm.org/ce/z/Hf66qh Differential Revision: https://reviews.llvm.org/D113035	2021-11-09 08:54:59 -05:00
Dmitry Makogon	5ec2386332	Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs" This reapplies patch `db289340c8`. The test failures on build with expensive checks caused by the patch happened due to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort, which shuffles the given container if the expensive checks are enabled, so equivalent Phis in the sorted vector had different mutual order from run to run. replaceCongruentIVs tries to replace narrow Phis with truncations of wide ones. In some test cases there were several Phis with the same width, so if their order differs from run to run, the narrow Phis would be replaced with a different Phi, depending on the shuffling result. The patch `ae14fae0ff` fixed this issue by replacing llvm::sort with llvm::stable_sort.	2021-11-09 17:42:29 +07:00
Anton Afanasyev	ce4fa93db8	[SCCP] Tune cast instruction handling for overdefined operand Extended value is known to be inside range smaller than full one. Prevent SCCP to mark such value as overdefined. Fixes PR52253 Differential Revision: https://reviews.llvm.org/D112721	2021-11-08 18:34:30 +03:00
Anton Afanasyev	fba1f36d13	[Test][SCCP] Precommit tests for PR52253	2021-11-08 16:59:38 +03:00
Dmitry Makogon	8d4eba6c0d	Revert "[IndVars] Pass TTI to replaceCongruentIVs" This reverts commit `db289340c8`. The patch caused 2 crashes with expensive checks enabled.	2021-11-08 19:35:14 +07:00
Dmitry Makogon	db289340c8	[IndVars] Pass TTI to replaceCongruentIVs In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'. This function optionally takes a TTI argument to be able to replace narrow IVs uses with truncates of the widest one. For some reason the TTI wasn't passed to the function, so it couldn't perform such transform. This patch fixes it. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D113024	2021-11-08 19:20:53 +07:00
Roman Lebedev	9c2469c1dd	[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/ We manage to deduce that the answer does not require looping, but we do that after the last `LoopDeletion` pass run, so we end up being stuck with a dead loop. Now, as with all things SCEV, this has a very expected ~`+0.12%` compile time performance regression: https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions (for comparison, doing that in function simplification pipeline would have been ~`+0.5` compile time performance regression, D112840) Looking at the transformation stats over vanilla test-suite, i think it's rather expected: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|--------------------------------------------------\|----------:\|----------:\|------:\|-------:\|-------:\| \| scalar-evolution.NumBruteForceTripCountsComputed \| 789 \| 888 \| 99 \| 12.55% \| 12.55% \| \| scalar-evolution.NumTripCountsNotComputed \| 105592 \| 117900 \| 12308 \| 11.66% \| 11.66% \| \| loop-delete.NumBackedgesBroken \| 542 \| 559 \| 17 \| 3.14% \| 3.14% \| \| regalloc.numExtends \| 81 \| 79 \| -2 \| -2.47% \| 2.47% \| \| indvars.NumFoldedUser \| 408 \| 400 \| -8 \| -1.96% \| 1.96% \| \| indvars.NumElimCmp \| 3831 \| 3758 \| -73 \| -1.91% \| 1.91% \| \| scalar-evolution.NumTripCountsComputed \| 299759 \| 304278 \| 4519 \| 1.51% \| 1.51% \| \| loop-delete.NumDeleted \| 8055 \| 8128 \| 73 \| 0.91% \| 0.91% \| \| machine-cse.NumCommutes \| 111 \| 110 \| -1 \| -0.90% \| 0.90% \| \| globaldce.NumFunctions \| 1187 \| 1192 \| 5 \| 0.42% \| 0.42% \| \| codegenprepare.NumSelectsExpanded \| 277 \| 278 \| 1 \| 0.36% \| 0.36% \| \| loop-unroll.NumRuntimeUnrolled \| 13841 \| 13791 \| -50 \| -0.36% \| 0.36% \| \| machinelicm.NumPostRAHoisted \| 1168 \| 1172 \| 4 \| 0.34% \| 0.34% \| \| phi-node-elimination.NumCriticalEdgesSplit \| 83054 \| 82879 \| -175 \| -0.21% \| 0.21% \| \| machine-cse.NumPREs \| 3085 \| 3079 \| -6 \| -0.19% \| 0.19% \| \| branch-folder.NumBranchOpts \| 108122 \| 107942 \| -180 \| -0.17% \| 0.17% \| \| loop-unroll.NumUnrolled \| 40136 \| 40067 \| -69 \| -0.17% \| 0.17% \| \| branch-folder.NumDeadBlocks \| 130818 \| 130607 \| -211 \| -0.16% \| 0.16% \| \| codegenprepare.NumBlocksElim \| 92856 \| 92714 \| -142 \| -0.15% \| 0.15% \| \| instsimplify.NumSimplified \| 103263 \| 103129 \| -134 \| -0.13% \| 0.13% \| \| instcombine.NumConstProp \| 26070 \| 26102 \| 32 \| 0.12% \| 0.12% \| \| instsimplify.NumExpand \| 1716 \| 1718 \| 2 \| 0.12% \| 0.12% \| \| loop-unroll.NumCompletelyUnrolled \| 9236 \| 9225 \| -11 \| -0.12% \| 0.12% \| \| branch-folder.NumHoist \| 2773 \| 2770 \| -3 \| -0.11% \| 0.11% \| \| regalloc.NumReloadsRemoved \| 10822 \| 10834 \| 12 \| 0.11% \| 0.11% \| \| regalloc.NumSnippets \| 11394 \| 11406 \| 12 \| 0.11% \| 0.11% \| \| machine-cse.NumCrossBBCSEs \| 1052 \| 1053 \| 1 \| 0.10% \| 0.10% \| \| machinelicm.NumCSEed \| 99887 \| 99784 \| -103 \| -0.10% \| 0.10% \| \| branch-folder.NumTailMerge \| 72501 \| 72435 \| -66 \| -0.09% \| 0.09% \| \| codegenprepare.NumExtUses \| 22007 \| 21987 \| -20 \| -0.09% \| 0.09% \| \| local.NumRemoved \| 68232 \| 68294 \| 62 \| 0.09% \| 0.09% \| \| loop-vectorize.LoopsAnalyzed \| 75483 \| 75413 \| -70 \| -0.09% \| 0.09% \| ``` Note that i'm only changing current PM, and not touching obsolete PM. This is an alternative to the function simplification pipeline variant of the same change, D112840. It has both less compile time impact (since the additional number of SCEV trip count calculations is way lass less than with the D112840), and it is much more powerful/impactful (almost 2x more loops deleted). I have checked, and doing this after loop rotation is favorable (more loops deleted). Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D112851	2021-11-03 19:24:49 +03:00
Sanjay Patel	ff30394de8	[PhaseOrdering] add tests for x86 abs/max using SSE intrinsics (PR34047); NFC D113035	2021-11-03 09:13:23 -04:00
Roman Lebedev	b554e41e2d	[CVP] Canonicalize signed relational comparisons of scalar integers to unsigned comparison predicates Now that the reasoning was added to ConstantRange in D90924, this replicates IndVars variant of this transform (D111836) in a pass that uses value range reasoning for the transform. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112895	2021-11-01 12:16:05 +03:00
Roman Lebedev	e5df0a5a6f	[NFC][PhaseOrdering] Add additional loop deletion tests Test thanks to Michael Kuklinski from #llvm, originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/	2021-10-29 21:10:36 +03:00
Alexey Bataev	ce14d1b690	[SLP]Do not reorder reduction nodes. The final reduction nodes should not be reordered, the order does not matter for reductions. Also, it might be profitable to vectorize smaller reduction trees, reduction cost may compensate small tree cost. Part of D111574 Differential Revision: https://reviews.llvm.org/D112467	2021-10-26 07:41:24 -07:00
Stanislav Mekhanoshin	969b72fb66	Add test to check we can instcombine after reassociate. NFC. The pattern became optimized after `b92412fb28`. Differential Revision: https://reviews.llvm.org/D112258	2021-10-21 12:27:26 -07:00
Bjorn Pettersson	a413663d8f	[NewPM][test] Avoid using -enable-new-pm=1 since -passes implies new PM	2021-10-20 15:16:17 +02:00
Florian Hahn	4a1d63d7d0	[VectorCombine] Add option to only run scalarization transforms. This patch adds a pass option to only run transforms that scalarize vector operations and do not create new vector instructions. When running VectorCombine early in the pipeline introducing new vector operations can have negative effects, like blocking loop or SLP vectorization. To avoid regressions, restrict the early VectorCombine run (when using -enable-matrix) to only perform scalarization and not introduce new vector operations. This is done as option to the pass directly, which is then set when adding the pass to the pipeline. This is done for the new pass manager only. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111800	2021-10-15 20:35:58 +01:00
Anton Afanasyev	7b07c01351	[InstCombine] Support arbitrary const shift amount for `lshr (sext i1 ...)` Add lshr (sext i1 X to iN), C --> select (X, -1 >> C, 0) case. This expands C == N-1 case to arbitrary C. Fixes PR52078. Reviewed By: spatel, RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D111330	2021-10-15 13:39:13 +03:00
Anton Afanasyev	e23351cdc9	[Test][InstCombine] Precommit tests for PR52078	2021-10-15 13:38:40 +03:00
Florian Hahn	094faa5fca	[VectorCombine] Add test showing issue when running VectorCombine early. Running -vector-combine early can introduce new vector operations, blocking loop/SLP vectorization. The added test case could be better optimized by the SLPVectorizer if no new vector operations are added early.	2021-10-14 14:03:02 +01:00
Florian Hahn	cd0ba9dc58	[LoopPeel] Peel if it turns invariant loads dereferenceable. This patch adds a new cost heuristic that allows peeling a single iteration off read-only loops, if the loop contains a load that 1. is feeding an exit condition, 2. dominates the latch, 3. is not already known to be dereferenceable, 4. and has a loop invariant address. If all non-latch exits are terminated with unreachable, such loads in the loop are guaranteed to be dereferenceable after peeling, enabling hoisting/CSE'ing them. This enables vectorization of loops with certain runtime-checks, like multiple calls to `std::vector::at` if the vector is passed as pointer. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D108114	2021-10-12 11:42:28 +01:00
Hans Wennborg	1e9afab875	Re-apply "[JumpThreading] Ignore free instructions" It seems the crashes we saw wasn't caused by this (see comments on the review). > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `4604695d7c`.	2021-09-24 18:52:30 +02:00
Hans Wennborg	4604695d7c	Revert "[JumpThreading] Ignore free instructions" It caused compiler crashes, see comment on the code review for repro. > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `1e3c6fc7cb`.	2021-09-24 16:14:53 +02:00
Nikita Popov	1e3c6fc7cb	[JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290	2021-09-23 18:28:36 +02:00
hyeongyu kim	ec8311444a	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110227	2021-09-23 00:14:50 +09:00
Florian Hahn	a7c6471a85	[Passes] Run vector-combine early with -fenable-matrix. IR with matrix intrinsics is likely to also contain large vector operations, which can benefit from early simplifications. This is the last step in a series of changes to improve code-gen for code using matrix subscript operators with the C/C++ matrix extension in CLang, like using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; } https://clang.godbolt.org/z/6dKxK1Ed7 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D102496	2021-09-22 12:48:32 +01:00
Florian Hahn	300870a95c	[VectorCombine] Switch to using a worklist. This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines. Suggested in D100302. The main use case at the moment is foldSingleElementStore and scalarizeLoadExtract working together to improve scalarization. Note that we now also do not run SimplifyInstructionsInBlock on the whole function if there have been changes. This means we fail to remove/simplify instructions not related to any of the vector combines. IMO this is fine, as simplifying the whole function seems more like a workaround for not tracking the changed instructions. Compile-time impact looks neutral: NewPM-O3: +0.02% NewPM-ReleaseThinLTO: -0.00% NewPM-ReleaseLTO-g: -0.02% http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D110171	2021-09-22 09:54:58 +01:00
Arthur Eubanks	d49cb5b303	[SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest This makes some tests in vector-reductions-logical.ll more stable when applying D108837. The cost of branching is higher when vector ops are involved due to potential SLP transformations. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108935	2021-09-16 10:50:36 -07:00
Sanjay Patel	be1028053e	[PhaseOrdering] add tests for PR47023; NFC	2021-09-15 08:44:04 -04:00
Arthur Eubanks	096d9814aa	[opt] Remove some legacy PM flags Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109664	2021-09-13 15:50:03 -07:00
Roman Lebedev	909cba9699	[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125) I can't seem to wrap my head around the proper fix here, we should be fine without this requirement, iff we can form this form, but the naive attempt (https://reviews.llvm.org/D106317) has failed. So just to unblock the release, put up a restriction. Fixes https://bugs.llvm.org/show_bug.cgi?id=51125	2021-09-09 12:28:09 +03:00
Arthur Eubanks	37e6a27da7	[test] Fixup tests with -analyze in llvm/test/Transforms	2021-09-04 16:45:51 -07:00
Dávid Bolvanský	9e06c767a4	[NFC] Added testcase for PR39116	2021-09-04 10:52:46 +02:00
Dávid Bolvanský	00f8aecf6e	[NFC] Added testcase for PR40750	2021-09-02 22:44:03 +02:00
David Green	efa340fbd2	[ARM] Workaround tailpredication min/max costmodel The min/max intrinsics are not yet canonical, but when they are the tail predications analysis will change from treating them like icmp to treating them like intrinsics. Unfortunately, they can currently produce better code by not being tail predicated thanks to the vectorizer picking higher VF's and the backend folding to better instructions (especially for saturate patterns). In the long run we will need to improve the vectorizers cost modelling, recognizing the instruction directly, but in the meantime this treats min/max as before to prevent performance regressions.	2021-08-30 19:19:51 +01:00
Anton Afanasyev	cfb6dfcbd1	[AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG Add `lshr` instruction to the DAG post-dominated by `trunc`, allowing TruncInstCombine to reduce bitwidth of expressions containing these instructions. We should be shifting by less than the target bitwidth. Also it is sufficient to require that all truncated bits of the value-to-be-shifted are zeros: https://alive2.llvm.org/ce/z/_LytbB Alive2 variable-length proof: https://godbolt.org/z/1srE1aqzf => s/32/8/ => https://alive2.llvm.org/ce/z/StwPia Part of https://reviews.llvm.org/D107766 Differential Revision: https://reviews.llvm.org/D108201	2021-08-18 22:20:58 +03:00
Anton Afanasyev	8f8f9260a9	[Test][AggressiveInstCombine] Add test for shifts Precommit test for D107766/D108091. Also move fixed test for PR50555 from SLPVectorizer/X86/ to PhaseOrdering/X86/ subdirectory.	2021-08-17 12:39:53 +03:00
Florian Hahn	39cc0b8c68	[PhaseOrdering] Add test for missed vectorization with vector::at calls. This test illustrates missed vectorization of loops with multiple std::vector::at calls, like int sum(std::vector<int> A, std::vector<int> B, int N) { int cost = 0; for (int i = 0; i < N; ++i) cost += A->at(i) + B->at(i); return cost; } https://clang.godbolt.org/z/KbYoaPhvq	2021-08-16 09:43:30 +01:00
David Green	2b42350994	[InstCombine] Extend sadd.sat tests to include min/max patterns. NFC This tests code starting from smin/smax, as opposed to the icmp/select form. Also adds a ARM MVE phase ordering test for vectorizing to sadd.sat from the original IR.	2021-08-14 22:48:10 +01:00
Sanjay Patel	a22c99c3c1	[InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant We can invert a compare constant and preserve the logic as shown in this sampling: https://alive2.llvm.org/ce/z/YAXbfs (In theory, we could deal with non-all-ones/zero as well, but it doesn't seem worthwhile.) I noticed this as a part of the x86 codegen difference in https://llvm.org/PR51259 - it ends up using "test" instead of "not + cmp" in that example. This pattern also shows up in https://llvm.org/PR41312 and https://llvm.org/PR50798 . Differential Revision: https://reviews.llvm.org/D107170	2021-07-31 13:31:12 -04:00
Roman Lebedev	1901c98dd8	[SimplifyCFG] SwitchToLookupTable(): don't increase ret count The very next SimplifyCFG pass invocation will tail-merge these two ret's anyways, there is not much point in creating more work for ourselves.	2021-07-26 23:29:55 +03:00
Eli Friedman	5c486ce04d	[LLVM IR] Allow volatile stores to trap. Proposed alternative to D105338. This is ugly, but short-term I think it's the best way forward: first, let's formalize the hacks into a coherent model. Then we can consider extensions of that model (we could have different flavors of volatile with different rules). Differential Revision: https://reviews.llvm.org/D106309	2021-07-26 10:51:00 -07:00
hyeongyu kim	aca5aeb752	[InstCombine] Add freezeAllUsesOfArgument to visitFreeze In D106041, a freeze was added before the branch condition to solve the miscompilation problem of SimpleLoopUnswitch. However, I found that the added freeze disturbed other optimizations in the following situations. ``` arg.fr = freeze(arg) use(arg.fr) ... use(arg) ``` It is a problem that occurred when arg and arg.fr were recognized as different values. Therefore, changing to use arg.fr instead of arg throughout the function eliminates the above problem. Thus, I add a function that changes all uses of arg to freeze(arg) to visitFreeze of InstCombine. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106233	2021-07-24 18:08:58 +09:00
Roman Lebedev	d7378259aa	[SimplifyCFG] SimplifyCondBranchToTwoReturns(): really only deal with different ret blocks This function is called when some predecessor of an empty return block ends with a conditional branch, with both successors being empty ret blocks. Now, because of the way SimplifyCFG works, it might happen to simplify one of the blocks in a way that makes a conditional branch into an unconditional one, since it's destinations are now identical, but it might not have actually simplified said conditional branch into an unconditional one yet. So, we have to check that ourselves first, especially now that SimplifyCFG aggressively tail-merges all ret and resume blocks. Even if it was an unconditional branch already, `SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()` by default.	2021-07-23 00:36:59 +03:00
Sanjay Patel	0e15de2d0c	[InstCombine] fold reassociative FP add into start value of fadd reduction This pattern is visible in unrolled and vectorized loops. Although the backend seems to be able to reassociate to ideal form in the examples I looked at, we might as well do that in IR for efficiency.	2021-07-18 06:26:20 -04:00
Sanjay Patel	81ce3aa30c	[SLP] avoid leaking poison in reduction of safe boolean logic ops This bug was introduced with D105730 / `25ee55c0ba` . If we are not converting all of the operations of a reduction into a vector op, we need to preserve the existing select form of the remaining ops. Otherwise, we are potentially leaking poison where it did not in the original code. Alive2 agrees that the version that freezes some inputs and then falls back to scalar is correct: https://alive2.llvm.org/ce/z/erF4K2	2021-07-15 17:33:06 -04:00
Roman Lebedev	3e6c383dc6	[SimplifyCFG] Rerun PHI deduplication after common code sinkinkg (PR51092) `SinkCommonCodeFromPredecessors()` doesn't itself ensure that duplicate PHI nodes aren't created. I suppose, we could teach it to do that on-the-fly (& account for the already-existing PHI nodes, & adjust costmodel), the diff will be bigger than this. The alternative is to schedule a new EarlyCSE pass invocation somewhere later in the pipeline. Clearly, we don't have any EarlyCSE runs in module optimization passline, so this pattern isn't cleaned up... That would perhaps better, but it will again have some compile time impact. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106010	2021-07-15 16:34:34 +03:00
Roman Lebedev	dfbfc277b2	[NFC] Drop redundant check prefixes in newly added test file	2021-07-14 22:14:36 +03:00
Roman Lebedev	a4856c739c	[NFC][PhaseOrdering] Add test for the lack of CSE after SimplifyCFG (PR51092)	2021-07-14 22:07:38 +03:00

1 2 3 4 5 ...

281 Commits