clang-p2996

Author	SHA1	Message	Date
Craig Topper	18ff7e045a	[RISCV] Make the min and max vector width command line options more consistent and check their relationship to each other.	2021-02-09 10:47:23 -08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Hsiangkai Wang	a5b07a221a	[RISCV] Initial support of LoopVectorizer for RISC-V Vector. Define an option -riscv-vector-bits-max to specify the maximum vector bits for vectorizer. Loop vectorizer will use the value to check if it is safe to use the whole vector registers to vectorize the loop. It is not the optimum solution for loop vectorizing for scalable vector. It assumed the whole vector registers will be used to vectorize the code. If it is possible, we should configure vl to do vectorize instead of using whole vector registers. We only consider LMUL = 1 in this patch. This patch just an initial work for loop vectorizer for RISC-V Vector. Differential Revision: https://reviews.llvm.org/D95659	2021-02-09 06:32:18 +08:00
Simon Pilgrim	22302b2be0	[LoopVectorize][ARM] Regenerate mve-gather-scatter-tailpred.ll test Fix codegen after rG7fe41ac3dff2d44c3d2c31b28554fbe4a86eaa6c	2021-02-05 12:32:45 +00:00
Adrian Kuegel	7fe41ac3df	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `3e5ce49e53`. Tests started failing on PPC, for example: http://lab.llvm.org:8011/#/builders/105/builds/5569	2021-02-05 12:51:03 +01:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Philip Reames	4cb7d03481	Add missing test update from `3e5ce49` Sorry for the build break, apparently forgot to build ARM target.	2021-02-04 18:04:24 -08:00
Philip Reames	3e5ce49e53	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way. Differential Revision: https://reviews.llvm.org/D94892	2021-02-04 17:28:30 -08:00
Sanjay Patel	916c4121c1	[LoopVectorize] add test for fake min/max; NFC This goes with the dyn_cast fix: `0fa61304d2` That was made after noticing that the assert was over-reaching here: `bbed5f2f8a` ( D95690 )	2021-02-03 09:24:57 -05:00
David Sherwood	d4d4ceeb8f	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Cullen Rhodes	8cda227432	[LV] Fix crash when computing max VF too early D90687 introduced a crash: llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed. when compiling the following C code: typedef struct { char a; } b; b *c; int d, e; int f() { int g = 0; for (; d; d++) { e = 0; for (; e < c[d].a; e++) g++; } return g; } with: clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o - This occurred since prior to D90687 computeFeasibleMaxVF would only be called in computeMaxVF when a scalar epilogue was allowed, but now it's always called. This causes the assert above since computeFeasibleMaxVF collects all viable VFs larger than the default MaxVF, and for each VF calculates the register usage which results in analysis being done the assert above guards against. This can occur in computeFeasibleMaxVF if TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in the hexagon backend to always return true. Reported by @iajbar. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94869	2021-02-01 12:14:59 +00:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Sanjay Patel	00773ef78a	[LoopVectorize] add test for fmin/fmax FMF propagation; NFC The existing test has less FMF than we might expect if our FMF was fixed (on all FP values), so this additional test is intended to check propagation in a more "normal" example.	2021-01-26 11:22:51 -05:00
David Green	4cc94b7313	[CostModel] Tests for showing the cost of intrinsics from the vectorizer. NFC	2021-01-24 14:47:15 +00:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Arthur Eubanks	6699029b67	[NewPM][opt] Run the "default" AA pipeline by default We tend to assume that the AA pipeline is by default the default AA pipeline and it's confusing when it's empty instead. PR48779 Initially reverted due to BasicAA running analyses in an unspecified order (multiple function calls as parameters), fixed by fetching analyses before the call to construct BasicAA. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D95117	2021-01-21 21:08:54 -08:00
Arthur Eubanks	ba9b4ea4ee	Revert "[NewPM][opt] Run the "default" AA pipeline by default" This reverts commit `be611431cd`. Other/new-pm-lto-defaults.ll failing	2021-01-21 20:16:34 -08:00
Arthur Eubanks	be611431cd	[NewPM][opt] Run the "default" AA pipeline by default We tend to assume that the AA pipeline is by default the default AA pipeline and it's confusing when it's empty instead. PR48779 Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D95117	2021-01-21 19:46:38 -08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
David Green	dfac521da1	[ARM] Fix vector saddsat costs. It turns out the vectorizer calls the getIntrinsicInstrCost functions with a scalar return type and vector VF. This updates the costmodel to handle that, still producing the correct vector costs. A vectorizer test is added to show it vectorizing at the correct factor again.	2021-01-21 15:30:39 +00:00
Florian Hahn	eee2e8813f	[LV] Add test cases with multiple exits which require versioning. This adds some test coverage for `caafdf07bb`, which relaxed an assertion to only require a unique exit block.	2021-01-20 11:48:48 +00:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Florian Hahn	83aa93e995	[VectorUtils] Do not try to add indices matching tombstone/empty values. Keys matching the tombstone/empty special values cannot be inserted in a DenseMap. Under some circumstances, LV tries to add members to an interleave group that match the special values. Skip adding such members. This is unlikely to have any impact in practice, because interleave groups with such indices are very likely to not be vectorized, due to gaps. This issue has been surfaced by fuzzing, see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11638	2021-01-18 11:18:28 +00:00
Philip Reames	8356610f8d	[test] pre commit a couple more tests for vectorizing multiple exit loops	2021-01-17 20:29:13 -08:00
Philip Reames	7011086dc1	[test] Autogen a loop vectorizer test to make future changes visible	2021-01-17 20:03:22 -08:00
Philip Reames	9f61fbd75a	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Florian Hahn	eb0371e403	[VPlan] Unify value/recipe printing after VPDef transition. This patch unifies the way recipes and VPValues are printed after the transition to VPDef. VPSlotTracker has been updated to iterate over all recipes and all their defined values to number those. There is no need to number values in Value2VPValue. It also updates a few places that only used slot numbers for VPInstruction. All recipes now can produce numbered VPValues.	2021-01-11 14:42:46 +00:00
Philip Reames	86d6f7e90a	Precommit tests requested for D93725	2021-01-10 12:29:34 -08:00
Philip Reames	377dcfd5c1	[Tests] Auto update a vectorizer test to simplify future diff	2021-01-10 12:23:22 -08:00
David Green	a36a2864c0	[ARM][LV] Additional loop invariant reduction test. NFC	2021-01-08 15:15:08 +00:00
Cullen Rhodes	1e7efd397a	[LV] Legalize scalable VF hints In the following loop: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int a, int b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718	2021-01-08 10:49:44 +00:00
David Green	72fb5ba079	[LV] Don't sink into replication regions The new test case here contains a first order recurrences and an instruction that is replicated. The first order recurrence forces an instruction to be sunk _into_, as opposed to after the replication region. That causes several things to go wrong including registering vector instructions multiple times and failing to create dominance relations correctly. Instead we should be sinking to after the replication region, which is what this patch makes sure happens. Differential Revision: https://reviews.llvm.org/D93629	2021-01-08 09:50:10 +00:00
Gil Rapaport	7ddbe0cb90	[LV] Merge tests into a single file (NFC) In response to https://reviews.llvm.org/D94088#inline-879268	2021-01-07 09:04:07 +02:00
Peter Waller	3e357ecd44	[llvm][NFC] Disallow all warnings in TypeSize tests This is a follow-up to a request from a reviewer [0]. The text may change in the future and these tests should not produce any warning output. [0] https://reviews.llvm.org/D91806#inline-879243 Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D94161	2021-01-06 17:17:07 +00:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Florian Hahn	8a47e6252a	[VPlan] Re-add interleave group members to plan. Creating in-loop reductions relies on IR references to map IR values to VPValues after interleave group creation. Make sure we re-add the updated member to the plan, so the look-ups still work as expected This fixes a crash reported after D90562.	2021-01-05 15:06:47 +00:00
Gil Rapaport	d9c0b128e3	[SCEV] Simplify trunc to zero based on known bits Let getTruncateExpr() short-circuit to zero when the value being truncated is known to have at least as many trailing zeros as the target type. Differential Revision: https://reviews.llvm.org/D93973	2021-01-03 13:57:12 +02:00
Gil Rapaport	d8af310063	[LV] Add missed optimization fold-tail test The loop vectorizer avoids folding the tail for loop's whose trip-count is known to SCEV to be divisible by VF. In this case the assumption providing this information is not taken into account, so the tail is needlessly folded.	2021-01-02 14:00:15 +02:00
Florian Hahn	d9f306aa52	[LV] Fix crash when generating remarks with multi-exit loops. If DoExtraAnalysis is true (e.g. because remarks are enabled), we continue with the analysis rather than exiting. Update code to conditionally check if the ExitBB has phis or not a single predecessor. Otherwise a nullptr is dereferenced with DoExtraAnalysis.	2021-01-01 13:54:41 +00:00
Sanjay Patel	5ced712e98	[LoopVectorizer] add test to show wrong FMF propagation; NFC	2020-12-30 15:13:57 -05:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Roman Lebedev	d4c0abb4a3	[SimplifyCFG] Teach FoldCondBranchOnPHI() to preserve DomTree	2020-12-30 00:48:11 +03:00
Juneyoung Lee	278aa65cc4	[IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93793	2020-12-30 04:21:04 +09:00
Philip Reames	4b33b23877	Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder This reverts commit `4ffcd4fe9a` thus restoring `e4df6a40da`. The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.	2020-12-28 10:13:28 -08:00
Arthur Eubanks	4ffcd4fe9a	Revert "[LV] Vectorize (some) early and multiple exit loops" This reverts commit `e4df6a40da`. Breaks Windows bots, e.g. http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio	2020-12-28 10:05:41 -08:00
Philip Reames	e4df6a40da	[LV] Vectorize (some) early and multiple exit loops This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways: single exit loops which are not bottom tested multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later) The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts. The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration. The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical). Differential Revision: https://reviews.llvm.org/D93317	2020-12-28 09:40:42 -08:00
Juneyoung Lee	9d70dbdc2b	[InstCombine] use poison as placeholder for undemanded elems Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement. However, this is problematic because undef isn’t undefined enough. Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison. This makes a few straightforward optimizations incorrect, such as: ``` ; https://bugs.llvm.org/show_bug.cgi?id=44185 define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %xv = insertelement <4 x float> %q, float %x, i32 2 %r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef } ret <4 x float> %r ; %r[3] is undef } => define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) { %r = insertelement <4 x float> %y, float %x, i32 1 ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison } Transformation doesn't verify! ERROR: Target is more poisonous than source ``` I’d like to suggest 1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too) 2. Updating shufflevector’s semantics to return poison element if mask is undef Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay. m_Undef() matches PoisonValue as well, so existing optimizations will still fire. The only concern is hidden miscompilations that will go incorrect when poison constant is given. A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :( Instead, I’ll simply locally maintain the tests and run Alive2. If there is any bug found, I’ll report it. Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93586	2020-12-28 08:58:15 +09:00
Florian Hahn	0ea3749b3c	[LV] Set up branch from middle block earlier. Previously the branch from the middle block to the scalar preheader & exit was being set-up at the end of skeleton creation in completeLoopSkeleton. Inserting SCEV or runtime checks may result in LCSSA phis being created, if they are required. Adjusting branches afterwards may break those PHIs. To avoid this, we can instead create the branch from the middle block to the exit after we created the middle block, so we have the final CFG before potentially adjusting/creating PHIs. This fixes a crash for the included test case. For the non-crashing case, this is almost a NFC with respect to the generated code. The only change is the order of the predecessors of the involved branch targets. Note an assertion was moved from LoopVersioning() to LoopVersioning::versionLoop. Adjusting the branches means loop-simplify form may be broken before constructing LoopVersioning. But LV only uses LoopVersioning to annotate the loop instructions with !noalias metadata, which does not require loop-simplify form. This is a fix for an existing issue uncovered by D93317.	2020-12-27 18:21:12 +00:00

1 2 3 4 5 ...

1171 Commits