clang-p2996

Author	SHA1	Message	Date
Bjorn Pettersson	472462c472	[NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg' There was an alias between 'simplifycfg' and 'simplify-cfg' in the PassRegistry. That was the original reason for this patch, which effectively removes the alias. This patch also replaces all occurrances of 'simplify-cfg' by 'simplifycfg'. Reason for choosing that form for the name is that it matches the DEBUG_TYPE for the pass, and the legacy PM name and also how it is spelled out in other passes such as 'loop-simplifycfg', and in other options such as 'simplifycfg-merge-cond-stores'. I for some reason the name should be changed to 'simplify-cfg' in the future, then I think such a renaming should be more widely done and not only impacting the PassRegistry. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D105627	2021-07-09 09:47:03 +02:00
Philip Reames	f0693bc0ae	autogen two tests for ease of update	2021-06-30 11:47:36 -07:00
Roman Lebedev	9c4c2f2472	[SimplifyCFG] Tail-merging all blocks with `ret` terminator Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104597	2021-06-24 13:15:39 +03:00
David Green	8cfc080132	[ARM] Limit v6m unrolling with multiple live outs v6m cores only have a limited number of registers available. Unrolling can mean we spend more on stack spills and reloads than we save from the unrolling. This patch adds an extra heuristic to put a limit on the unroll count for loops with multiple live out values, as measured from the LCSSA phi nodes. Differential Revision: https://reviews.llvm.org/D104659	2021-06-23 16:36:37 +01:00
Roman Lebedev	ff4b1d379f	[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future. This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it. Other restrictions of the transform remain. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104598	2021-06-23 14:33:18 +03:00
Nikita Popov	1ae266f452	[LoopUnroll] Use smallest exact trip count from any exit This is a more general alternative/extension to D102635. Rather than handling the special case of "header exit with non-exiting latch", this unrolls against the smallest exact trip count from any exit. The latch exit is no longer treated as priviledged when it comes to full unrolling. The motivating case is in full-unroll-one-unpredictable-exit.ll. Here the header exit is an IV-based exit, while the latch exit is a data comparison. This kind of loop does not get rotated, because the latch is already exiting, and loop rotation doesn't try to distinguish IV-based/analyzable latches. Differential Revision: https://reviews.llvm.org/D102982	2021-06-20 20:58:26 +02:00
Nikita Popov	3308205ae9	[LoopUnroll] Simplify optimization remarks Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and debug code. For debug code, print information about all exits. For optimization remarks, only include the unroll count and the type of unroll (complete, partial or runtime), but omit detailed information about exit folding, now that more than one exit may be folded. Differential Revision: https://reviews.llvm.org/D104482	2021-06-18 23:47:03 +02:00
Nikita Popov	f7c54c4603	[LoopUnroll] Fold all exits based on known trip count/multiple Fold all exits based on known trip count/multiple information from SCEV. Previously only the latch exit or the single exit were folded. This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple entirely: They're still used to a) decide whether runtime unrolling should be performed and b) for ORE remarks. However, the core unrolling logic is independent of them now. Differential Revision: https://reviews.llvm.org/D104203	2021-06-17 20:58:34 +02:00
Roman Lebedev	e52364532a	[NewPM] Remove SpeculateAroundPHIs pass Addition of this pass has been botched. There is no particular reason why it had to be sold as an inseparable part of new-pm transition. It was added when old-pm was still the default, and very very few users were actually tracking new-pm, so it's effects weren't measured. Which means, some of the turnoil of the new-pm transition are actually likely regressions due to this pass. Likewise, there has been a number of post-commit feedback (post new-pm switch), namely * https://reviews.llvm.org/D37467#2787157 (regresses HW-loops) * https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before) * https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata) and in the half year past, the pass authors (google) still haven't found time to respond to any of that. Hereby it is proposed to backout the pass from the pipeline, until someone who cares about it can address the issues reported, and properly start the process of adding a new pass into the pipeline, with proper performance evaluation. Furthermore, neither google nor facebook reports any perf changes from this change, so i'm dropping the pass completely. It can always be re-reverted should/if anyone want to pick it up again. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D104099	2021-06-15 20:35:55 +03:00
Nikita Popov	6ecc99210c	[LoopUnroll] Test multi-exit runtime unrolling with predictable exit (NFC) The (prior to prologue insertion) predictable exit shouldn't get folded here. Make sure it isn't...	2021-06-13 18:48:38 +02:00
Nikita Popov	8fdd7c2ff1	[LoopUnroll] Clamp unroll count to MaxTripCount Unrolling with more iterations than MaxTripCount is pointless, as those iterations can never be executed. As such, we clamp ULO.Count to MaxTripCount if it is known. This means we no longer need to consider iterations after MaxTripCount for exit folding, and the CompletelyUnroll flag becomes independent of ULO.TripCount. Differential Revision: https://reviews.llvm.org/D103748	2021-06-07 21:08:42 +02:00
Nikita Popov	92ce29ee45	[LoopUnroll] Regenerate test checks (NFC)	2021-06-05 10:52:02 +02:00
Nikita Popov	db45746821	[LoopUnroll] Separate peeling from unrolling Loop peeling is currently performed as part of UnrollLoop(). Outside test scenarios, it is always performed with an unroll count of 1. This means that unrolling doesn't actually do anything apart from performing post-unroll simplification. When testing, it's currently possible to specify both an explicit peel count and an explicit unroll count. This doesn't perform any sensible operation and may result in miscompiles, see https://bugs.llvm.org/show_bug.cgi?id=45939. This patch moves peeling from UnrollLoop() into tryToUnrollLoop(), so that peeling does not also perform a susequent unroll. We only run the post-unroll simplifications. Specifying both an explicit peel count and unroll count is forbidden. In the future, we may want to support both (non-PGO) peeling a loop and unrolling it, but this needs to be done by first performing the peel and then recalculating unrolling heuristics on a now possibly analyzable loop. Differential Revision: https://reviews.llvm.org/D103362	2021-06-05 10:32:00 +02:00
Philip Reames	5c0d1b2f90	[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed. The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile. Differential Revision: https://reviews.llvm.org/D103620	2021-06-03 14:09:16 -07:00
Nikita Popov	33e41eaecd	[LoopUnroll] Add additional test with one unpredictable exit (NFC) One exit is unpredictable, the other has a known trip count. For one function the predictable exit is the latch exit, for the other the non-latch exit. Currently they are treated differently.	2021-06-03 21:58:51 +02:00
Nikita Popov	4af2730ac3	[LoopUnroll] Add store to unreachable latch test (NFC) This is to show that we currently only convert the terminator to unreachable, but don't clean up instructions before it (unless trivial DCE removes them). Also clean up excessive whitespace in this test.	2021-05-28 22:49:23 +02:00
Philip Reames	79c09d5ee1	[tests] Add some basic coverage of multiple exit unrolling	2021-05-26 15:51:26 -07:00
Philip Reames	9cc2181ec3	[unroll] Use value domain for symbolic execution based cost model The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost. By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928. Differential Revision: https://reviews.llvm.org/D102934	2021-05-26 08:41:25 -07:00
Max Kazantsev	794fb5482e	[Test] Add test on unrolling to make sure it won't fail Initially it failed an assertion with "Do actual DCE in LoopUnroll (try 2)" which was later reverted. Make sure that when this patch is returned, the test works fine.	2021-05-26 16:30:41 +07:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Nikita Popov	a832e83bcb	[LoopUnroll] Add additional trip multiple test (NFC) This uses a trip multiple on a (unique) non-latch exit.	2021-05-24 17:26:07 +02:00
Nikita Popov	971a2ae8b3	[LoopUnroll] Regenerate test checks (NFC)	2021-05-24 17:26:07 +02:00
Nikita Popov	15b108442f	[LoopUnroll] Add test for partial unrolling again non-latch exit (NFC) This test case would get miscompiled by the current version of D102982, because unrolling does not respect the PreserveCondBr flag for partial unrolling.	2021-05-23 23:10:23 +02:00
Nikita Popov	d4abbcfb0d	[LoopUnroll] Add test for unrollable non-latch multi-exit (NFC) This test case requires unrolling against a non-latch exit in a multiple-exit loop with exiting latch. It's not covered by exiting heuristics or the extension in D102635.	2021-05-23 10:51:45 +02:00
Philip Reames	317c105c6a	precommit tests for D102934 and D102928	2021-05-21 10:58:48 -07:00
Philip Reames	449d14ebd2	Do actual DCE in LoopUnroll (try 4) Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change. Commit message from try 3: Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) Original commit message: The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-19 10:25:31 -07:00
Amy Huang	517857421d	Revert "Do actual DCE in LoopUnroll (try 3)" This reverts commit `b6320eeb86` as it causes clang to assert; see https://reviews.llvm.org/rGb6320eeb8622f05e4a5d4c7f5420523357490fca.	2021-05-19 08:53:38 -07:00
Philip Reames	b6320eeb86	Do actual DCE in LoopUnroll (try 3) Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-17 14:47:02 -07:00
Florian Hahn	fded6f77c3	[LoopUnroll] Add multi-exit test which does not exit through latch. This patch adds a new test for loop-unrolling with multiple exiting blocks, where the latch does not exit, but the header does. This can happen when the loop has not been rotated, e.g. due to minsize. Inspired by the following end-to-end test, using -Oz https://godbolt.org/z/fP6sna8qK bool foo(int *ptr, int limit) { #pragma clang loop unroll(full) for (unsigned int i = 0; i < 4; i++) { if (ptr[i] > limit) return false; ptr[i]++; } return true; }	2021-05-17 17:08:15 +01:00
Philip Reames	6ae9893ed2	Revert "Do actual DCE in LoopUnroll (try 2)" This reverts commit `653fa0b46a`. Reported to trigger pr50354. Reverting until investigated.	2021-05-16 09:38:36 -07:00
Philip Reames	23c93c2555	Discount invariant instructions in full unrolling This patch updates the cost model for full unrolling to discount the cost of a loop invariant expression on all but one iteration. The reasoning here is that such an expression (as determined by SCEV) will be CSEd or DSEd once the loop is unrolled. Note that SCEVs reasoning will find things which could be invariant, not simply those outside the loop. Differential Revision: https://reviews.llvm.org/D102506	2021-05-14 11:07:19 -07:00
Philip Reames	653fa0b46a	Do actual DCE in LoopUnroll (try 2) Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-14 10:42:36 -07:00
Philip Reames	e488bf815f	Revert "Do actual DCE in LoopUnroll" This reverts commit `9d1a61e695`. I'd missed some review feedback, and had missed updating an aarch64 test. Reverting while I fix both.	2021-05-14 10:15:30 -07:00
Philip Reames	9d1a61e695	Do actual DCE in LoopUnroll LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-14 10:05:25 -07:00
Philip Reames	6594bac06c	Autogen a test for ease of update	2021-05-14 09:33:17 -07:00
Arthur Eubanks	34a8a437bf	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
Arthur Eubanks	6f7131002b	[NewPM] Move analysis invalidation/clearing logging to instrumentation We're trying to move DebugLogging into instrumentation, rather than being part of PassManagers/AnalysisManagers. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D102093	2021-05-07 15:25:31 -07:00
Nicholas Guy	2b6e0c90f9	[AArch64] Enable runtime unrolling for in-order sched models Differential Revision: https://reviews.llvm.org/D97947	2021-04-27 13:22:10 +01:00
Nikita Popov	c456ab78ae	[LoopUnroll] Regenerate test checks (NFC)	2021-04-17 20:59:20 +02:00
Nikita Popov	fe9a5a806e	[LoopUnroll] Make some tests more robust (NFC) Replace branch on undef by branch on unknown condition.	2021-04-17 20:59:20 +02:00
Florian Hahn	acd9cc7495	[AArch64] Use type-legalization cost for code size memop cost. At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory operations on large vectors have a cost of one, even if they will get expanded to a large number of memory operations in the backend. This patch updates getMemoryOpCost to return the cost for the type legalization for both CodeSize and SizeAndLatency. This should more accurately reflect the number of memory operations required. I am not sure how latency should properly be included in SizeAndLatency from the description, but returning the size cost should be clearly more accurate. This does not cause any binary changes when building MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because large vector memops are not really formed by code emitted from Clang. But using the C/C++ matrix extension can easily result in code with very large vector operations directly from Clang, e.g. https://clang.godbolt.org/z/6xzxcTGvb Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100291	2021-04-15 10:11:05 +01:00
Florian Hahn	816cf41462	[LoopUnroll] Add AArch64 test case with large vector ops. Add test case to illustrate over-eager unrolling on AArch64, due to the cost-model not estimating the size of vector loads/stores accurately.	2021-04-11 21:39:52 +01:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00
David Green	da98177cda	[ARM] Allow v6m runtime loop unrolling This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed. Differential Revision: https://reviews.llvm.org/D99588	2021-04-01 21:21:40 +01:00
David Green	14b2ec934e	[ARM] Enable UpperBound unrolling for all loops This UpperBound unrolling was already enabled so long as a series of conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always enables it as it can help fully unroll loops that would not otherwise pass those tests. Differential Revision: https://reviews.llvm.org/D99174	2021-03-24 16:39:21 +00:00
David Green	003fab9e8d	[ARM] Additional Upper bound unrolling test. NFC	2021-03-23 12:00:40 +00:00
Whitney Tsang	0d8f102809	[NFC][LoopUnroll] Add `-unroll-runtime-other-exit-predictable=false` in `runtime-multiexit-heuristic.ll` Added -unroll-runtime-other-exit-predictable=false in runtime-multiexit-heuristic.ll to make it more robust. runtime-multiexit-heuristic.ll intention is to test -unroll-runtime-multi-exit=false, so the default value of -unroll-runtime-other-exit-predictable should not impact the result. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98098	2021-03-07 23:51:09 +00:00
Whitney Tsang	40391cef61	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. (Add LIT) Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-07 23:48:00 +00:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00

1 2 3 4 5 ...

439 Commits