clang-p2996

Author	SHA1	Message	Date
Eli Friedman	7ac1c7bead	Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Recommitting with fix to MemoryDepChecker::isDependent. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 12:16:05 -07:00
Eli Friedman	a6d081b2cb	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit `74d6ce5d5f`. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Philip Reames	9ffa90d6c2	[LV] Disable epilogue vectorization for non-latch exits When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed. Better late (4 months), than never right? I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error. Thus, let's play it safe for now.	2021-07-06 10:57:10 -07:00
Philip Reames	600624a103	[LoopVersion] Move an assert [nfc-ish]	2021-07-06 10:57:10 -07:00
Eli Friedman	74d6ce5d5f	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Arnold Schwaighofer	846a530e7d	Fix coro lowering of single predecessor phis Code assumes that uses of single predecessor phis are not live accross suspend points. Cleanup any single predecessor phis preceeding the code making this assumption. rdar://76020301 Differential Revision: https://reviews.llvm.org/D105488	2021-07-06 10:22:25 -07:00
Alexey Bataev	4e1a0684f1	[SLP]Fix non-determinism in PHI sorting. Compare type IDs and DFS numbering for basic block instead of addresses to fix non-determinism. Differential Revision: https://reviews.llvm.org/D105031	2021-07-06 08:45:45 -07:00
Arnold Schwaighofer	130ea3ceb4	Use swift mangling for resume functions The resume partial functions generated for swift suspend points will now use a Swift mangling suffix. Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_") and suspend resume partial functions will use the suffix 'TY'[0-9]+'_' (e.g "...TY1_"). Reviewed By: nate_chandler Differential Revision: https://reviews.llvm.org/D104144	2021-07-06 08:27:46 -07:00
Florian Hahn	ef0d147cdc	Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups. This reverts commit `706bbfb35b`. The committed version moves the definition of VPReductionPHIRecipe out of an ifdef only intended for ::print helpers. This should resolve the build failures that caused the revert	2021-07-06 14:15:42 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Florian Hahn	706bbfb35b	Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups This reverts commit `3fed6d443f`, `bbcbf21ae6` and `6c3451cd76`. The changes causing build failures with certain configurations, e.g. https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction, llvm::ArrayRef<llvm::VPValue>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]': LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe' collect2: error: ld returned 1 exit status	2021-07-06 12:10:03 +01:00
Florian Hahn	3fed6d443f	[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual. VPReductionRecipe overrides those implementations. Mark them as virtual in the VPWidenPHIRecipe to unbreak build in certain configurations.	2021-07-06 12:00:41 +01:00
Florian Hahn	bbcbf21ae6	[VPlan] Add destructor to VPReductionRecipe to unbreak build. Attempt to unbreak https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio	2021-07-06 11:41:20 +01:00
Florian Hahn	6c3451cd76	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Kerry McLaughlin	17b701c43c	[LV] Collect a list of all element types found in the loop (NFC) Splits `getSmallestAndWidestTypes` into two functions, one of which now collects a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not have to iterate over all instructions in the loop again in other places, such as in D102253 which disables scalable vectorization of a loop if any of the instructions use invalid types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105437	2021-07-06 10:37:41 +01:00
Akira Hatanaka	28fe9afdba	[ObjC][ARC] Prevent moving objc_retain calls past objc_release calls that release the retained object This patch fixes what looks like a longstanding bug in ARC optimizer where it reverses the order of objc_retain calls and objc_release calls that retain and release the same object. The code in ARC optimizer that is responsible for code motion takes the following steps: 1. Traverse the CFG bottom-up and determine how far up objc_release calls can be moved. Determine the insertion points for the objc_release calls, but don't actually move them. 2. Traverse the CFG top-down and determine how far down objc_retain calls can be moved. Determine the insertion points for the objc_retain calls, but don't actually move them. 3. Try to move the objc_retain and objc_release calls if they can't be removed. The problem is that the insertion points for the objc_retain calls are determined in step 2 without taking into consideration the insertion points for objc_release calls determined in step 1, so the order of an objc_retain call and an objc_release call can be reversed, which is incorrect, even though each step is correct in isolation. To fix this bug, this patch teaches the top-down traversal step to take into consideration the insertion points for objc_release calls determined in the bottom-up traversal step. Code motion for an objc_retain call is disabled if there is a possibility that it can be moved past an objc_release call that releases the retained object. rdar://79292791 Differential Revision: https://reviews.llvm.org/D104953	2021-07-05 12:16:15 -07:00
Sanjay Patel	40b752d28d	[InstCombine] fold icmp slt/sgt of offset value with constant This follows up patches for the unsigned siblings: `0c400e8953` `c7b658aeb5` We are translating an offset signed compare to its unsigned equivalent when one end of the range is at the limit (zero or unsigned max). (X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1) (X + C2) <s C --> X >u (C ^ SMAX) (if C == C2) This probably does not show up much in IR derived from C/C++ source because that would likely have 'nsw', and we have folds for that already. As with the previous unsigned transforms, the folds could be generalized to handle non-constant patterns: https://alive2.llvm.org/ce/z/Y8Xrrm ; sgt define i1 @src(i8 %a, i8 %c) { %c2 = add i8 %c, 1 %t = add i8 %a, %c2 %ov = icmp sgt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_off = sub i8 127, %c ; SMAX %ov = icmp ult i8 %a, %c_off ret i1 %ov } https://alive2.llvm.org/ce/z/c8uhnk ; slt define i1 @src(i8 %a, i8 %c) { %t = add i8 %a, %c %ov = icmp slt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_offnot = xor i8 %c, 127 ; SMAX %ov = icmp ugt i8 %a, %c_offnot ret i1 %ov }	2021-07-05 10:08:31 -04:00
Caroline Concatto	b868a2d2c6	[SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector. The function vectorizeChainsInBlock does not support scalable vector, because function like canReuseExtract and isCommutative in the code path assert with scalable vectors. This patch avoids vectorizing blocks that have extract instructions with scalable vector.. Differential Revision: https://reviews.llvm.org/D104809	2021-07-05 12:43:41 +01:00
Stephen Tozer	14b62f7e2f	[DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops This patch fixes an issue which occurred in CodeGenPrepare and HWAddressSanitizer, which both at some point create a map of Old->New instructions and update dbg.value uses of these. They did this by iterating over the dbg.value's location operands, and if an instance of the old instruction was found, replaceVariableLocationOp would be called on that dbg.value. This would cause an error if the same operand appeared multiple times as a location operand, as the first call to replaceVariableLocationOp would update all uses of the old instruction, invalidating the old iterator and eventually hitting an assertion. This has been fixed by no longer iterating over the dbg.value's location operands directly, but by first collecting them into a set and then iterating over that, ensuring that we never attempt to replace a duplicated operand multiple times. Differential Revision: https://reviews.llvm.org/D105129	2021-07-05 10:35:19 +01:00
Nikita Popov	a213f735d8	[IR] Deprecate GetElementPtrInst::CreateInBounds without element type This API is not compatible with opaque pointers, the method accepting an explicit pointer element type should be used instead. Thankfully there were few in-tree users. The BPF case still ends up using the pointer element type for now and needs something like D105407 to avoid doing so.	2021-07-04 16:49:30 +02:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Roman Lebedev	fc150cecd7	[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable This replaces the current ad-hoc implementation, by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`, with one exception that here in SimplifyCFG we are allowed to remove EH instructions. Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return), arithmetic/logic operations, etc. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105374	2021-07-03 10:45:44 +03:00
Fangrui Song	252a1eecc0	[ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for isDeclarationForLinker `GlobalValue`s. It missed a case for imported declarations (`doImportAsDefinition` is false while `isPerformingImport` is true). This can lead to a linker error for a default visibility symbol in `ld.lld -shared`. When `ClearDSOLocalOnDeclarations` is true, we check `isPerformingImport() && !doImportAsDefinition(&GV)` along with `GV.isDeclarationForLinker()`. The new condition checks an imported declaration. This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104986	2021-07-02 17:08:25 -07:00
Roman Lebedev	53fef0b293	[NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs Mimics similar change for InstCombine: `ce192ced2b` / D104602 All these uses are in blocks that aren't reachable from function's entry, and said blocks are removed by SimplifyCFG itself, so we can't really test this change.	2021-07-02 22:11:52 +03:00
Heejin Ahn	51fecd17bb	[InstCombine] Don't combine PHI before catchswitch This tries to bail out if the PHI is in a `catchswitch` BB in InstCombine. A PHI cannot be combined into a non-PHI instruction if it is in a `catchswitch` BB, because `catchswitch` BB cannot have any non-PHI instruction other than `catchswitch` itself. The given test case started crashing after D98058. Reviewed By: lebedev.ri, rnk Differential Revision: https://reviews.llvm.org/D105309	2021-07-02 12:10:24 -07:00
Roman Lebedev	da81ec6158	[SimplifyCFG] Volatile memory operations do not trap Somewhat related to D105338. While it is up for discussion whether or not volatile store traps, so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap. And even if simplifycfg currently concervatively believes that to be the case, instcombine does not: https://godbolt.org/z/5vhv4K5b8 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105343	2021-07-02 21:47:44 +03:00
Alexey Bataev	7f7e4aed21	[SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by V.Dmitriev. Reduces number of arguments	2021-07-02 10:30:13 -07:00
Jon Roelofs	37b6e03c18	[Intrinsics] Make MemCpyInlineInst a MemCpyInst This opens up more optimization opportunities in passes that already handle MemCpyInst's. Differential revision: https://reviews.llvm.org/D105247	2021-07-02 10:25:24 -07:00
Roman Lebedev	13e35ac124	[NFC][InstCombine] visitUnreachableInst(): enhance comments somewhat	2021-07-02 17:30:01 +03:00
Roman Lebedev	dadedc99e9	[InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. This is a recommit of `d9d65527c2`, which i immediately reverted because i have messed up something during branch switch, and `597ccc92ce` accidentally ended up being pushed, which was very much not the intention. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:20:21 +03:00
Roman Lebedev	24d271bb18	Revert "https://godbolt.org/z/5vhv4K5b8 " This reverts commit `597ccc92ce`.	2021-07-02 17:17:55 +03:00
Roman Lebedev	93a1642763	Revert "[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable" This reverts commit `d9d65527c2`.	2021-07-02 17:17:47 +03:00
Roman Lebedev	d9d65527c2	[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:17:03 +03:00
Roman Lebedev	597ccc92ce	https://godbolt.org/z/5vhv4K5b8	2021-07-02 17:16:19 +03:00
Nico Weber	a92964779c	Revert "[InstrProfiling] Use external weak reference for bias variable" This reverts commit `33a7b4d9d8`. Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176	2021-07-02 09:05:12 -04:00
Florian Hahn	a3ca578eb9	[Matrix] Fix crash during fusion if the same load is re-used. This patch fixes a crash when the same load is used for both operands of a fuseable multiply.	2021-07-02 14:00:17 +01:00
Alexey Bataev	28ac873bcb	[SLP]Fix gathering of the scalars by not ignoring UndefValues. The compiler should not ignore UndefValue when gathering the scalars, otherwise the resulting code may be less defined than the original one. Also, grouped scalars to insert them at first to reduce the analysis in further passes. Differential Revision: https://reviews.llvm.org/D105275	2021-07-02 04:46:48 -07:00
Florian Hahn	7655061cc6	[Matrix] Hoist address computation before multiply to enable fusion. If the store address does not dominate the matrix multiply, try to hoist address computation instructions without side-effects and/or memory reads before the multiply, to allow fusion. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D105193	2021-07-02 09:52:11 +01:00
Evgeniy Brevnov	9568811cb8	[NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D101044	2021-07-02 10:00:47 +07:00
Craig Topper	066524ea54	[ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0. Previously we used the vector type, but we're loading/storing invididual elements so I think only element alignment should matter. Noticed while looking at the code for something else so I don't have a test case. Differential Revision: https://reviews.llvm.org/D105220	2021-07-01 19:08:47 -07:00
Petr Hosek	33a7b4d9d8	[InstrProfiling] Use external weak reference for bias variable We need the compiler generated variable to override the weak symbol of the same name inside the profile runtime, but using LinkOnceODRLinkage results in weak symbol being emitted which leads to an issue where the linker might choose either of the weak symbols potentially disabling the runtime counter relocation. This change replaces the use of weak definition inside the runtime with an external weak reference to address the issue. We also place the compiler generated symbol inside a COMDAT group so dead definition can be garbage collected by the linker. Differential Revision: https://reviews.llvm.org/D105176	2021-07-01 15:25:31 -07:00
Philip Reames	955f125899	[instcombine] Fold overflow check using overflow intrinsic to comparison This follows up to D104665 (which added umulo handling alongside the existing uaddo case), and generalizes for the remaining overflow intrinsics. I went to add analogous handling to LVI, and discovered that LVI already had a more general implementation. Instead, we can port was LVI does to instcombine. (For context, LVI uses makeExactNoWrapRegion to constrain the value 'x' in blocks reached after a branch on the condition `op.with.overflow(x, C).overflow`.) Differential Revision: https://reviews.llvm.org/D104932	2021-07-01 09:41:55 -07:00
Arnold Schwaighofer	4a361f5209	[coro async] Add support for specifying which parameter is swiftself in async resume functions Differential Revision: https://reviews.llvm.org/D104147	2021-07-01 07:33:15 -07:00
David Sherwood	51b4ab26ca	[NFC] Add new setDebugLocFromInst that uses the class Builder by default In lots of places we were calling setDebugLocFromInst and passing in the same Builder member variable found in InnerLoopVectorizer. I personally found this confusing so I've changed the interface to take an Optional<IRBuilder<> *> and we can now pass in None when we want to use the class member variable. Differential Revision: https://reviews.llvm.org/D105100	2021-07-01 14:23:34 +01:00
Roman Lebedev	333d3a3cdf	[NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO' I.e. it will be `false` for thin lto.	2021-07-01 10:09:24 +03:00
Chuanqi Xu	51fbd18706	[Coroutine] Recommit Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-07-01 11:01:28 +08:00
Sanjay Patel	0c400e8953	[InstCombine] fold icmp ult of offset value with constant This is one sibling of the fold added with `c7b658aeb5` . (X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN) I'm still not sure how to describe it best, but we're translating 2 constants from an unsigned range comparison to signed because that eliminates the offset (add) op. This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/K-fMBf define i1 @src(i8 %a, i8 %c2) { %t = add i8 %a, %c2 %c = add i8 %c2, 128 ; SMIN %ov = icmp ult i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c2) { %not_c2 = xor i8 %c2, -1 %ov = icmp sgt i8 %a, %not_c2 ret i1 %ov }	2021-06-30 19:00:12 -04:00
Xun Li	822b92aae4	[Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html In the existing design, An SCC that contains a coroutine will go through the folloing passes: Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat. As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended. What we really wanted is this: Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline (note that we don't really need to run Inliner again on the ramp function after split). Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function. This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually. By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines. Reviewed By: aeubanks, ChuanqiXu Differential Revision: https://reviews.llvm.org/D95807	2021-06-30 11:38:14 -07:00
Sanjay Patel	c7b658aeb5	[InstCombine] fold icmp of offset value with constant There must be a better way to describe this pattern in words? (X + C2) >u C --> X <s -C2 (if C == C2 + SMAX) This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/rdfNFP define i1 @src(i8 %a, i8 %c1) { %t = add i8 %a, %c1 %c2 = add i8 %c1, 127 ; SMAX %ov = icmp ugt i8 %t, %c2 ret i1 %ov } define i1 @tgt(i8 %a, i8 %c1) { %neg_c1 = sub i8 0, %c1 %ov = icmp slt i8 %a, %neg_c1 ret i1 %ov } The pattern was noticed as a by-product of D104932.	2021-06-30 13:37:31 -04:00

1 2 3 4 5 ...

27808 Commits