clang-p2996

Author	SHA1	Message	Date
Roman Lebedev	3a8e009f97	Revert "Reland "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block"" One of these two changes is exposing (or causing) some more miscompiles. A reproducer is in progress, so reverting until resolved. This reverts commit `428f36401b`.	2022-12-20 18:36:42 +03:00
Nikita Popov	88419a30a0	[LICM] Allow load-only scalar promotion in the presence of aliasing loads During scalar promotion, if there are additional potentially-aliasing loads outside the promoted set, we can still perform a load-only promotion. As the stores are retained, any potentially-aliasing loads will still read the correct value. This increases the number of load promotions in llvm-test-suite by a factor of two: \| Old \| New licm.NumPromotionCandidates \| 4448 \| 6038 licm.NumLoadPromoted \| 479 \| 1069 licm.NumLoadStorePromoted \| 1459 \| 1459 Unfortunately, this does have some impact on compile-time: http://llvm-compile-time-tracker.com/compare.php?from=57f7f0d6cf0706a88e1ecb74f3d3e8891cceabfa&to=72b811738148aab399966a0435f13b695da1c1c8&stat=instructions In part this is because we now have less early bailouts from promotion, but also due to second order effects (e.g. for one case I looked at we spend more time in SLP now). Differential Revision: https://reviews.llvm.org/D133192	2022-12-20 10:02:46 +01:00
Roman Lebedev	428f36401b	Reland "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block" This reverts commit `37b8f09a4b`, and returns commit `1bd0b82e50`. The miscompile was in InstCombine, and it has been addressed. This tries to approach the problem noted by @arsenm: terrible codegen for `__builtin_fpclassify()`: https://godbolt.org/z/388zqdE37 Just because the PHI in the common successor happens to have different incoming values for these two blocks, doesn't mean we have to give up. It's quite easy to deal with this, we just need to produce a select: https://alive2.llvm.org/ce/z/000srb Now, the cost model for this transform is rather overly strict, so this will basically never fire. We tally all (over all preds) the selects needed to the NumBonusInsts Differential Revision: https://reviews.llvm.org/D139275	2022-12-17 05:18:54 +03:00
Alexander Kornienko	37b8f09a4b	Revert "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block" This reverts commit `1bd0b82e50`, since it leads to miscompiles. See https://reviews.llvm.org/D139275#3993229 and https://reviews.llvm.org/D139275#4001580.	2022-12-16 17:23:35 +01:00
Roman Lebedev	1bd0b82e50	[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block This tries to approach the problem noted by @arsenm: terrible codegen for `__builtin_fpclassify()`: https://godbolt.org/z/388zqdE37 Just because the PHI in the common successor happens to have different incoming values for these two blocks, doesn't mean we have to give up. It's quite easy to deal with this, we just need to produce a select: https://alive2.llvm.org/ce/z/000srb Now, the cost model for this transform is rather overly strict, so this will basically never fire. We tally all (over all preds) the selects needed to the NumBonusInsts Differential Revision: https://reviews.llvm.org/D139275	2022-12-12 18:20:03 +03:00
Roman Lebedev	1e2c548150	[NFC] Port all LICM tests to `-passes=` syntax	2022-12-08 02:38:45 +03:00
Roman Lebedev	80e8f2beeb	[NFC] Port all (but one) LICM tests to `-passes=` syntax	2022-12-07 20:53:15 +03:00
Nikita Popov	ed76074173	[LICM] Remove custom isInstInList() implementation (PR59324) We already collect all instructions that need to be promoted. The custom isInstInList() implementation could provide incorrect results if a new use of the original pointer was introduced as part of promotion. This probably cannot happen with normal code, because of the pointer capture, but it can happen with a null pointer. Fixes https://github.com/llvm/llvm-project/issues/59324.	2022-12-07 09:51:35 +01:00
Roman Lebedev	683b49b151	[NFC][LICM] Autogenerate checklines for one function to simplify update	2022-12-06 03:47:46 +03:00
Roman Lebedev	7850ab2112	[NFC] Port an assortment of tests that invoke SROA to new pass manager	2022-12-01 21:17:18 +03:00
Nikita Popov	304f1d59ca	[IR] Switch everything to use memory attribute This switches everything to use the memory attribute proposed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly attributes are dropped. The readnone, readonly and writeonly attributes are restricted to parameters only. The old attributes are auto-upgraded both in bitcode and IR. The bitcode upgrade is a policy requirement that has to be retained indefinitely. The IR upgrade is mainly there so it's not necessary to update all tests using memory attributes in this patch, which is already large enough. We could drop that part after migrating tests, or retain it longer term, to make it easier to import IR from older LLVM versions. High-level Function/CallBase APIs like doesNotAccessMemory() or setDoesNotAccessMemory() are mapped transparently to the memory attribute. Code that directly manipulates attributes (e.g. via AttributeList) on the other hand needs to switch to working with the memory attribute instead. Differential Revision: https://reviews.llvm.org/D135780	2022-11-04 10:21:38 +01:00
Nikita Popov	884bb97dca	[MustExec][LICM] Handle latch being part of an inner cycle (PR57780) The algorithm in allLoopPathsLeadToBlock() does not handle the case where the loop latch is part of the predecessor set correctly: In this case, we may take the backedge (escaping to a different loop iteration) and not execute other latch successors. This can happen if the latch is part of an inner cycle. Fixes https://github.com/llvm/llvm-project/issues/57780. Differential Revision: https://reviews.llvm.org/D134279	2022-10-11 09:30:13 +02:00
Shubham Narlawar	b920407cf5	[LICM] Disable thread-safety checks in single-thread model If the single-thread model is used, or the -licm-force-thread-model-single flag is specified, skip checks related to thread-safety. This means that store promotion for conditionally executed stores only requires proof of dereferenceability and writability, but not of thread-safety. For example, this enables promotion of stores to (non-constant) globals, as well as captured allocas. Fixes https://github.com/llvm/llvm-project/issues/50537. Differential Revision: https://reviews.llvm.org/D130466	2022-10-10 16:51:16 +02:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Nikita Popov	b6676f3c12	[LICM] Add test for single thread model promotion (NFC) Tests for D130466.	2022-10-07 17:13:09 +02:00
Nikita Popov	07253bc8c0	[LICM] Convert tests to opaque pointers (NFC) Using https://gist.github.com/nikic/98357b71fd67756b0f064c9517b62a34. The opaque pointer migration resolves the TODO on test_fence3: The transform now works as expected by dint of the bitcast no longer existing.	2022-10-05 16:47:53 +02:00
Nikita Popov	bf6ec0dea0	[LICM] Adjust speculation test to avoid no-op instruction (NFC) Such GEPs don't exist with opaque pointers, give it an actual offset.	2022-10-05 16:41:20 +02:00
Nikita Popov	f5f7e4e6f4	[LICM] Add test for PR57780 (NFC)	2022-09-20 13:07:11 +02:00
Sebastian Peryt	99c9b37d11	[NFC][1/n] Remove -enable-new-pm=0 flags from lit tests This is the first patch in a series intended for removing flag -enable-new-pm=0 from lit tests. This is part of a bigger effort of completely removing legacy code related to legacy pass manager in favor of currently default new pass manager. In this patch flag has been removed only from tests where no significant change has been required because checks has been duplicated for both PMs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134150	2022-09-19 09:57:37 -07:00
Mingming Liu	8aa800614b	[AArch64][CostModel] Detects that {extract,insert}-element at lane 0 has the same cost as the other lane for vector instructions in the IR. Currently, {extract,insert}-element has zero cost at lane 0 [1]. However, there is a cost (by fmov instruction [2], or ext/ins instruction) to move values from SIMD registers to GPR registers, when the element is used explicitly as integers. See https://godbolt.org/z/faPE1nTn8, when fmov is generated for d* register -> x* register conversion. Implementation-wise, add a private method `AArch64TTIImpl::getVectorInstrCostHelper` as a helper function. This way, instruction-based method could share the core logic (e.g., returning zero cost if type is legalized to scalar). [1] `2cf320d41e/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (L1853)` [2] `2cf320d41e/llvm/lib/Target/AArch64/AArch64InstrInfo.td (L8150-L8157)` Differential Revision: https://reviews.llvm.org/D128302	2022-09-09 09:47:30 -07:00
Nikita Popov	5b1df2e951	[LICM] Regenerate test checks (NFC)	2022-09-09 15:30:17 +02:00
Nikita Popov	4ab77d1677	[LICM] Allow promotion with non-load/store users If there are non-load/store users of the promoted pointer, we currently abort promotion. However, having such users isn't really relevant to the transform. We already separately check that a) there are no instructions that modref the promoted pointer and b) that a pointer capture disables store promotion. In the affected @test_captured_in_loop test case we have a readnone capture of the promoted pointer, which means that load promotion can be performed (while store promotion cannot). Differential Revision: https://reviews.llvm.org/D133485	2022-09-09 13:09:59 +02:00
Nikita Popov	52f7eb3151	[LICM] Add test for sret with conditional store (NFC)	2022-09-08 14:53:06 +02:00
Nikita Popov	10dfcf1f87	[LICM] Add test for missed load promotion opportunity (NFC)	2022-09-02 11:36:07 +02:00
Nikita Popov	639d912282	[LICM] Allow load-only scalar promotion in the presence of unwinding Currently, we bail out of scalar promotion if the loop may unwind and the memory may be visible on unwind. This is because we can't insert stores of the promoted value on unwind edges. However, nowadays scalar promotion also has support for only promoting loads, while leaving stores in place. This kind of promotion is safe even in the presence of unwinding. Differential Revision: https://reviews.llvm.org/D133111	2022-09-02 09:27:13 +02:00
Nikita Popov	26347adf96	[LICM] Regenerate test checks (NFC)	2022-09-01 16:06:38 +02:00
Nikita Popov	315aef667e	[LICM] Fix thread safety checks for promotion of byval args This code was relying on a very subtle contract: The expectation was that for non-allocas, the unwind safety check would already perform a capture check, so we don't need to perform it later. This held true when this unwind safety was only handled for allocas and noalias calls, but became incorrect when byval support was added. To avoid this kind of issue, just remove the dependency between the unwind and thread-safety checks entirely. At worst, this means we perform a redundant capture check. If this should turn out to be problematic for compile-time, we can cache that query in a more explicit way.	2022-09-01 15:33:46 +02:00
Nikita Popov	20524a3c94	[LICM] Add another byval capture test (NFC) Variant with capture after the loop, in which case promotion is safe.	2022-09-01 15:18:10 +02:00
Nikita Popov	e1826326af	[LICM] Add test for byval scalar promotion miscompile (NFC)	2022-09-01 15:03:20 +02:00
Max Kazantsev	c52d447713	[Test] Mode test for pr56243 from LICM to LoopSimplifyCFG	2022-07-18 12:37:01 +07:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Mingming Liu	b242e8502c	[AArch64][NFC] Prepare test cases (for D128302) to show more accurate cost estimation of extract-element could generate better assembly code. Pre-commit the test cases (for D128302) to show that more accurate cost estimation of extract-element could generate better code. Differential Revision: https://reviews.llvm.org/D128945	2022-07-07 09:39:29 -07:00
Nikita Popov	40a4078e14	[BasicBlockUtils] Allow splitting predecessors with callbr terminators SplitBlockPredecessors currently asserts if one of the predecessor terminators is a callbr. This limitation was originally necessary, because just like with indirectbr, it was not possible to replace successors of a callbr. However, this is no longer the case since D67252. As the requirement nowadays is that callbr must reference all blockaddrs directly in the call arguments, and these get automatically updated when setSuccessor() is called, we no longer need this limitation. The only thing we need to do here is use replaceSuccessorWith() instead of replaceUsesOfWith(), because only the former does the necessary blockaddr updating magic. I believe there's other similar limitations that can be removed, e.g. related to critical edge splitting. Differential Revision: https://reviews.llvm.org/D129205	2022-07-07 09:13:25 +02:00
Nikita Popov	cf7502a1eb	[LICM] Check opt output in test (NFC) Check what the test actually produces, not just that it doesn't crash.	2022-07-06 16:21:36 +02:00
Nikita Popov	560e694d48	[AST] Don't assert instruction reads/writes memory (PR51333) This function is well-defined for an instruction that doesn't access memory (and thus trivially doesn't alias anything in the AST), so drop the assert. We can end up with a readnone call here if we originally created a MemoryDef for an indirect call, which was later replaced with a direct readnone call. Fixes https://github.com/llvm/llvm-project/issues/51333. Differential Revision: https://reviews.llvm.org/D127947	2022-07-01 17:04:48 +02:00
Max Kazantsev	abb8bf3671	[Test] Add XFAIL test for PR56243 This test demonstrates how sinking down gc.relocate may lead to breach of LCSSA form by tokens and, consecutively, end up with SSA breach by LoopSimplifyCFG which creates fake edges and is unable to update missing LCSSA phis for tokens used outside of the loop.	2022-06-29 19:46:17 +07:00
Congzhe Cao	b941857b40	[LoopInterchange] New cost model for loop interchange This is another attempt to land this patch. The patch proposed to use a new cost model for loop interchange, which is obtained from loop cache analysis. Given a loopnest, what loop cache analysis returns is a vector of loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost loop, loop1 should be placed one more level inside, and loop2 one more level inside, etc. What loop cache analysis does is not only more comprehensive than the current cost model, it is also a "one-shot" query which means that we only need to query it once during the entire loop interchange pass, which is better than the current cost model where we query it every time we check whether it is profitable to interchange two loops. Thus complexity is reduced, especially after D120386 where we do more interchanges to get the globally optimal loop access pattern. Updates made to test cases are mostly minor changes and some corrections. One change that applies to all tests is that we added an option `-cache-line-size=64` to the RUN lines. This is ensure that loop cache analysis receives a valid number of cache line size for correct analysis. Test coverage for loop interchange is not reduced. Currently we did not completely remove the legacy cost model, but keep it as fall-back in case the new cost model did not run successfully. This is because currently we have some limitations in delinearization, which sometimes makes loop cache analysis bail out. The longer term goal is to enhance delinearization and eventually remove the legacy cost model compeletely. Reviewed By: bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D124926	2022-06-28 00:08:37 -04:00
Nuno Lopes	6ef9a2ad01	[LICM] Use poison to replace unreachable values instead of undef [NFC]	2022-06-26 14:56:35 +01:00
Evgenii Stepanov	878309cc54	Revert "[LoopInterchange] New cost model for loop interchange" llvm/lib/Analysis/LoopCacheAnalysis.cpp:702:30: runtime error: signed integer overflow: 6148914691236517209 * 100 cannot be represented in type 'long' https://lab.llvm.org/buildbot/#/builders/5/builds/25185 This reverts commit `1b24fe34b0`.	2022-06-23 16:10:53 -07:00
Congzhe Cao	1b24fe34b0	[LoopInterchange] New cost model for loop interchange This is the second attempt to land this patch. The patch proposed to use a new cost model for loop interchange, which is obtained from loop cache analysis. Given a loopnest, what loop cache analysis returns is a vector of loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost loop, loop1 should be placed one more level inside, and loop2 one more level inside, etc. What loop cache analysis does is not only more comprehensive than the current cost model, it is also a "one-shot" query which means that we only need to query it once during the entire loop interchange pass, which is better than the current cost model where we query it every time we check whether it is profitable to interchange two loops. Thus complexity is reduced, especially after D120386 where we do more interchanges to get the globally optimal loop access pattern. Updates made to test cases are mostly minor changes and some corrections. One change that applies to all tests is that we added an option `-cache-line-size=64` to the RUN lines. This is ensure that loop cache analysis receives a valid number of cache line size for correct analysis. Test coverage for loop interchange is not reduced. Currently we did not completely remove the legacy cost model, but keep it as fall-back in case the new cost model did not run successfully. This is because currently we have some limitations in delinearization, which sometimes makes loop cache analysis bail out. The longer term goal is to enhance delinearization and eventually remove the legacy cost model compeletely. Reviewed By: bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D124926	2022-06-23 16:34:57 -04:00
Serguei Katkov	24e16e4af2	[SSAUpdaterImpl] Do not generate phi node with all the same incoming values If all available vals to basic block are the same - do not build new phi node and just use this value. Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126525	2022-06-03 12:24:33 +07:00
Daniil Suchkov	f1940a5895	Revert "[LoopInterchange] New cost model for loop interchange" Reverting the commit due to numerous buildbot failures. This reverts commit `006334470d`.	2022-06-03 00:52:08 +00:00
Congzhe Cao	006334470d	[LoopInterchange] New cost model for loop interchange This patch proposed to use a new cost model for loop interchange, which is obtained from loop cache analysis. Given a loopnest, what loop cache analysis returns is a vector of loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost loop, loop1 should be placed one more level inside, and loop2 one more level inside, etc. What loop cache analysis does is not only more comprehensive than the current cost model, it is also a "one-shot" query which means that we only need to query it once during the entire loop interchange pass, which is better than the current cost model where we query it every time we check whether it is profitable to interchange two loops. Thus complexity is reduced, especially after D120386 where we do more interchanges to get the globally optimal loop access pattern. Updates made to test cases are mostly minor changes and some corrections. Test coverage for loop interchange is not reduced. Currently we did not completely remove the legacy cost model, but keep it as fall-back in case the new cost model did not run successfully. This is because currently we have some limitations in delinearization, which sometimes makes loop cache analysis bail out. The longer term goal is to enhance delinearization and eventually remove the legacy cost model compeletely. Reviewed By: bmahjour, #loopoptwg Differential Revision: https://reviews.llvm.org/D124926	2022-06-02 19:07:14 -04:00
Florian Hahn	0776c48f9b	Recommit "[LICM] Only create load in ph when promoting load or store doesn't exec." This reverts the revert commit `ad95255b92`. The updated version also creates a load when the store may not execute. In those cases, we still need to introduce a load in a function where there may not have been one before, so this doesn't completely resolve issue #51248. Original message: When only a store is sunk, there is no need to create a load in the pre-header, as the result of the load will never get used. The dead load can can introduce UB, if the function is marked as writeonly. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123473	2022-05-29 21:57:14 +01:00
Max Kazantsev	143ca15106	Fix comment in test. NFC	2022-05-24 17:22:16 +07:00
Max Kazantsev	1968f765c3	[Test] Add LICM test for PR55672 showing problem with freeze instruction	2022-05-24 17:17:46 +07:00
Florian Hahn	3497a4f396	[LICM] Add test to exercise assertion from D123473. Add a test case that triggers an assertion with earlier versions of D123473.	2022-05-05 10:49:52 +01:00
Florian Hahn	ce3bb82e45	[LICM] Add test for writeonly fn with noalias call. Add an additional test for D123473.	2022-04-22 21:37:08 +01:00
Florian Hahn	5e54a413de	[LICM] Add additional writeonly tests, check attributes. Add additional test coverage for D123473.	2022-04-20 18:49:37 +01:00
Florian Hahn	ad95255b92	Revert "[LICM] Only create load in pre-header when promoting load." This reverts commit `4bf3b7dc92`. This might be causing another buildbot failure.	2022-04-13 20:24:28 +02:00

1 2 3 4 5 ...

473 Commits