clang-p2996

Author	SHA1	Message	Date
Philip Reames	dc7387b587	[LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated. The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use. Differential Revision: https://reviews.llvm.org/D134460	2022-09-27 07:28:40 -07:00
Florian Hahn	2c692d891e	[LV] Update handling of scalable pointer inductions after b73d2c8. The dependent code has been changed quite a lot since `151c144` which b73d2c8 effectively reverts. Now we run into a case where lowering didn't expect/support the behavior pre `151c144` any longer. Update the code dealing with scalable pointer inductions to also check for uniformity in combination with isScalarAfterVectorization. This should ensure scalable pointer inductions are handled properly during epilogue vectorization. Fixes #57912.	2022-09-23 18:23:02 +01:00
Florian Hahn	17167005d5	[LV] Add test for #57912 . Add test showing miscompilation during epilogue vectorization with SVE.	2022-09-23 11:49:55 +01:00
Florian Hahn	05b3493819	[LV] Convert sve-epilog-vect.ll to use opaque pointers.	2022-09-23 10:24:19 +01:00
Philip Reames	32dc1151e2	[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.) This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.) Differential Revision: https://reviews.llvm.org/D133580	2022-09-22 08:53:46 -07:00
Simon Pilgrim	e030be64d8	[CostModel][X86] Add partial CostKinds handling for funnelshifts/rotates This mainly just adds costs for the targets where we have actual funnelshift/rotate instructions (VBMI2/XOP etc.) - the cases where we expand still need addressing, although for many the default shift+or expansion, especially for uniform cases, isn't that bad. This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 11:24:11 +01:00
Simon Pilgrim	b2cd8118d0	[CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 10:19:23 +01:00
Philip Reames	8c46881a53	[TTI] Recognize fp constants in getOperandInfo We were recognizing vectors of floats, but not scalars. That's a tad odd.	2022-09-21 14:34:34 -07:00
Graham Hunter	7b420a4a8b	[NFC][LV] Scalarizing test for masked vector calls	2022-09-21 15:43:25 +01:00
Simon Pilgrim	71162ad957	[LoopVectorize] Fix test name - the test is for fshl not cttz intrinsic costs	2022-09-21 15:24:43 +01:00
Sanjay Patel	0f32a5dea0	[InstCombine] don't canonicalize shl+sub to mul+add This stops Negator from transforming: `C1 - shl X, C2 --> mul X, (1<<C2) + C1` ...in the general case. There does not seem to be any analysis benefit to using mul in IR, and there's definitely downside in codegen (particularly when the multiply has to be expanded). If `C1` is 0, then there's a stronger argument that the single mul is a better canonicalization than negate-of-shl, but we may want to remove that too. This was noted as a potential conflict for D133667. Differential Revision: https://reviews.llvm.org/D134310	2022-09-21 08:39:07 -04:00
Simon Pilgrim	09cb9fdef9	[InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635) Alive2: https://alive2.llvm.org/ce/z/sZ6wwS As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero. Differential Revision: https://reviews.llvm.org/D134172	2022-09-20 16:44:41 +01:00
Vitaly Buka	bbef90ace4	[IRBuilder] Use PoisonValue in CreateMasked* Followup to `72b776168c` Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D133967	2022-09-19 11:01:41 -07:00
Florian Hahn	582f8ef19f	[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd. Epilogue vectorization uses isScalarAfterVectorization to check if widened versions for inductions need to be generated and bails out in those cases. At the moment, there are scenarios where isScalarAfterVectorization returns true but VPWidenPointerInduction::onlyScalarsGenerated would return false, causing widening. This can lead to widened phis with incorrect start values being created in the epilogue vector body. This patch addresses the issue by storing the cost-model decision in VPWidenPointerInductionRecipe and restoring the behavior before `151c144`. This effectively reverts `151c144`, but the long-term fix is to properly support widened inductions during epilogue vectorization Fixes #57712.	2022-09-19 18:14:35 +01:00
Sebastian Peryt	99c9b37d11	[NFC][1/n] Remove -enable-new-pm=0 flags from lit tests This is the first patch in a series intended for removing flag -enable-new-pm=0 from lit tests. This is part of a bigger effort of completely removing legacy code related to legacy pass manager in favor of currently default new pass manager. In this patch flag has been removed only from tests where no significant change has been required because checks has been duplicated for both PMs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134150	2022-09-19 09:57:37 -07:00
Florian Hahn	f02ff5348f	[LV] Move new epilog-vectorization-widen-inductions.ll to AArch64 dir. The test requires the AArch64 backend, so move it to the right subdir.	2022-09-19 17:13:06 +01:00
Florian Hahn	6087b6386e	[LV] Add tests for epilogue vectorization with widened inductions. Includes a test for the miscompile in #57712.	2022-09-19 17:10:41 +01:00
Simon Pilgrim	393cc6a354	[LoopVectorize] Regenerate runtime-check.ll	2022-09-19 10:25:48 +01:00
Simon Pilgrim	7e626d7a89	[LoopVectorize][X86] Use quotes around the pass list to appease DOS cmd evaluation DOS can't handle -passes='default<O3>' correctly	2022-09-19 10:24:37 +01:00
Sanjay Patel	d6498abc24	[InstCombine] remove multi-use add demanded constant fold This was originally part of D133788. There are no visible regressions. All of the diffs show a large unsigned constant becoming a small negative constant. This should be better for analysis (and slightly less compile-time) and codegen.	2022-09-18 14:23:43 -04:00
Vitaly Buka	ed188b39ab	[test] Regenerate few tests	2022-09-15 12:36:32 -07:00
Simon Pilgrim	0ec028fe10	[CostModel][X86] Add CostKinds handling for vector shift by uniform/constuniform ops Vector shift by const uniform is the cheapest shift instruction we have, non-const uniform have a marginally higher cost - some targets 'splat' the amount internally to use the shift-per-element instruction, others see a higher cost for the explicit zeroing of the upper bits for the (64-bit) shift amount. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 14:05:30 +01:00
jacquesguan	ecf327f154	[RISCV] Add cost model for vector insert/extract element. This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133007	2022-09-14 11:10:18 +08:00
Simon Pilgrim	8ae9cf550b	[LoopVectorize][X86] Add uniform shift costs checks for VF=1/2/4	2022-09-13 13:46:52 +01:00
Philip Reames	4e295cb1ce	[LV] Autogen a test for ease of update	2022-09-09 08:16:22 -07:00
Philip Reames	edb26268ce	[VPlan] Only generate single instr for stores uniform across all parts. Extend the approach taken by D133019 to store instructions. Differential Revision: https://reviews.llvm.org/D133497	2022-09-09 07:15:12 -07:00
Graham Hunter	1f639d1bd2	[NFC][LV] Convert masked call tests to use update script	2022-09-09 10:07:39 +01:00
Craig Topper	5f3a8b585b	[RISCV] Add RecurKind::FMulAdd to isLegalToVectorizeReduction for scalable vectors. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133511	2022-09-08 12:34:59 -07:00
Philip Reames	4c4c0d2c06	[LV] Use safe-divisor lowering for fixed vectors if profitable This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well. Differential Revision: https://reviews.llvm.org/D132591	2022-09-08 09:15:54 -07:00
Florian Hahn	422cf99161	[VPlan] Only generate single instr for loads uniform across all parts. VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a scalar instruction is generated per-part. This is a potential alternative D132892. For now the current patch only catches cases where the address is trivially invariant (defined outside VPlan), while D132892 catches any address that is considered invariant by SCEV AFAICT. It should be possible to hoist fully invariant recipes feeding loads out of the vector loop region as well, but in practice LICM should do that already. This version of the patch artificially limits this to loads to make it easier to compare, but this restriction should be easily liftable. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133019	2022-09-08 14:27:58 +01:00
Florian Hahn	ba3d29f871	[LCSSA] Update unreachable uses with poison. Users of LCSSA may not expect non-phi uses when checking the uses outside a loop, which may cause crashes. This is due to the fact that we do not update uses in unreachable blocks. To ensure all reachable uses outside the loop are phis, update uses in unreachable blocks to use poison in dead code. Fixes #57508.	2022-09-04 22:26:18 +01:00
Florian Hahn	a10d42dd45	[LV] Update test use opaque pointers, regenerate checks. Modernize the test to make it easier to extend in a follow-up patch.	2022-09-04 22:26:18 +01:00
Florian Hahn	fc444ddc77	[VPlan] Add field to track if intrinsic should be used for call. (NFC) This patch moves the cost-based decision whether to use an intrinsic or library call to the point where the recipe is created. This untangles code-gen from the cost model and also avoids doing some extra work as the information is already computed at construction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D132585	2022-09-01 13:14:40 +01:00
Florian Hahn	faad567589	[LV] Add test case where SCEV is needed to remove vector backedge. Test case mentioned in the discussion for D115261.	2022-08-31 14:01:42 +01:00
Florian Hahn	1ed555a62b	[LV] Fix test cases where vector loop never executed. It looks like the vector loops in the modified test cases unintentionally never get executed. Update the exit condition to ensure it does to avoid them getting optimized away in upcoming changes.	2022-08-31 13:24:49 +01:00
Philip Reames	4c10646367	[LV] Refresh autogen tests to reflect naming changes [nfc] Purely so that these can be easily autogened without spurious diffs	2022-08-29 14:16:54 -07:00
Florian Hahn	005d1a8ff5	[LV] Add test where either a libfunc or intrinsic is chosen. In the newly added test either a libfunc (VF=2) or a intrinsic (VF=4) can be chosen. Test coverage for D132585.	2022-08-29 10:51:20 +01:00
Philip Reames	b45a262679	[RISCV] Enable fixed length vectors and loop vectorization with same This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size. For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware. The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. SLP has been disabled for now, even when fixed vectors are enabled. See `a310637` and associated review. There are a few addiitional code quality issues which need worked through before turning SLP on would be reasonable. Differential Revision: https://reviews.llvm.org/D131508	2022-08-26 14:45:23 -07:00
Florian Hahn	9405af1c85	[LAA] Require AddRecs to be in the innermost loop for diff-checks. The simpler diff-checks require pointers with add-recs from the same innermost loop, but this property wasn't check completely. Add the missing check to ensure both addrecs are in the innermost loop. Fixes #57315.	2022-08-26 20:39:52 +01:00
Florian Hahn	e117137af0	[LV] Add another test for incorrect runtime check generation. Add a variation of @nested_loop_outer_iv_addrec_invariant_in_inner with the dependence sink and source swapped to extend test coverage. Also simplifies the test by removing an unneeded reduction.	2022-08-26 17:28:55 +01:00
Florian Hahn	6e56779e6b	[LV] Add test for incorrect runtime check generation #57315 . Test for PR57315 based on a test provided by @kpdev42.	2022-08-26 16:29:20 +01:00
Florian Hahn	3b135ef446	[LV] Convert runtime diff check test to use opaque pointers. Modernize the test to make it easier to extend with up-to-date IR.	2022-08-26 16:02:38 +01:00
Philip Reames	86b67a310d	[LAA] Prune dependencies with distance large than access implied by trip count When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances. Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all. Differential Revision: https://reviews.llvm.org/D131924	2022-08-25 14:24:13 -07:00
Florian Hahn	637da77e66	[LV] Add additional test coverage for SCEVexp and LCSSA interaction. Also converts the test to use opaque pointers while I am here.	2022-08-25 20:59:47 +01:00
Philip Reames	190cdf51ff	[RISCV][LV] Add predicated div/rem test for fixed length vectorization	2022-08-24 11:24:22 -07:00
Philip Reames	b20104f644	[LV] Update a test which appears to have been editted without regen [nfc]	2022-08-24 11:05:49 -07:00
Philip Reames	f79214d1e1	[LV] Support predicated div/rem operations via safe-divisor select idiom This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors. The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better. I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches. Differential Revision: https://reviews.llvm.org/D130164	2022-08-24 10:07:59 -07:00
David Green	8d830f8d68	[LV] Replace fixed-order cost model with a SK_Splice shuffle The existing cost model for fixed-order recurrences models the phi as an extract shuffle of a v1 vector. The shuffle produced should be a splice, as they take two vectors inputs are extracting from a subset of the lanes. On certain architectures the existing cost model can drastically under-estimate the correct cost for the shuffle, so this changes it to a SK_Splice and passes a correct Mask through to the getShuffleCost call. I believe this might be the first use of a SK_Splice shuffle cost model outside of scalable vectors, and some targets may require additions to the cost-model to correctly account for them. In tree targets appear to all have been updated where needed. Differential Revision: https://reviews.llvm.org/D132308	2022-08-24 13:00:32 +01:00
David Green	e29f9f7572	[AArch64][X86] Add some fixed-order-recurrence tests to check the costmodel of fixed order recurrences. NFC	2022-08-24 08:18:01 +01:00
Graham Hunter	14212c968f	[NFC][LoopVectorize] Precommit masked vector function call tests	2022-08-23 09:47:10 +01:00

1 2 3 4 5 ...

1897 Commits