clang-p2996

Author	SHA1	Message	Date
Farzon Lotfi	378fe2fc23	[X86][LoopVectorize] Add support for arc and hyperbolic trig functions (#99383 ) This change is part 2 x86 Loop Vectorization of : https://github.com/llvm/llvm-project/pull/96222 It also has veclib call loop vectorization hence the test cases in `llvm/test/Transforms/LoopVectorize/X86/veclib-calls.ll` finally the last pr missed tests for `llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll` and `llvm/test/CodeGen/X86/vec-libcalls.ll` so added those aswell. No evidence was found for arc and hyperbolic trig glibc vector math functions https://github.com/lattera/glibc/blob/master/sysdeps/x86/fpu/bits/math-vector.h so no new `_ZGVbN2v_` and `_ZGVdN4v_` . So no new tests in `llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll` Also no new svml and no new tests to: `llvm/test/Transforms/LoopVectorize/X86/svml-calls.ll` There was not enough evidence that there were svml arc and hyperbolic trig vector implementations, Documentation was scarces so looked at test cases in [numpy](`32bf2a9842/linux/avx512/svml_z0_acos_d_la.s (L8)`). Someone with more experience with svml should investigate. ## Note amd libm doesn't have a vector hyperbolic sine api hence why youi might notice there are no tests for `sinh`. ## History This change is part of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds loop vectorization for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. resolves #70079 resolves #70080 resolves #70081 resolves #70083 resolves #70084 resolves #95966	2024-07-28 20:57:43 -04:00
Florian Hahn	66ce4f771e	[VPlan] Port invalid cost remarks to VPlan. (#99322 ) This patch moves the logic to create remarks for instructions with invalid costs to work on recipes and decoupling it from selectVectorizationFactor. This is needed to replace the remaining uses of selectVectorizationFactor with getBestPlan using the VPlan-based cost model. The current implementation iterates over all VPlans and their recipes again, to find recipes with invalid costs, which is more work but will only be done when remarks for LV are enabled. Once the remaining uses of selectVectorizationFactor are retired, we can collect VPlans with invalid costs as part of getBestPlan if we want to optimize the remarks case a bit, at the cost of adding additional complexity. PR: https://github.com/llvm/llvm-project/pull/99322	2024-07-27 12:52:12 +01:00
Philip Reames	d3fd28a134	[RISCV][TTI] Properly model odd vector sized LD/ST operations (#100436 ) The motivation for this change is the costing of a LD or ST with nearly power of 2 vectors (e.g. <3 x i32> or <7 x i32>) on V. There's an experimental option in SLP to allow emitting these if the cost model says they're profitable. This really helps with e.g. RGB vectors. Our actual lowering for these depends on whether a wider container type is known available. If so, we use a vle or vse on the wider type with a restricted VL. If not, we split until a legal type is found, and then apply the vle/vse on the sub-pieces. This change is intentionally restricted to only the case where promotion (widening w/VL predication) is involved. We appear to have at least one bug in our splitting lowering (see discussion on review), and to avoid exposing this more widely, I chose to not adjust costs for the splitting case. The current splitting costing assumes scalarization (which is not true of the actual lowering), but that has the effect of biasing vectorization away from such cases strongly. For the widening case, the true cost scales with the next largest legal type. The default implementation assumes that such a type is scalarized. Changing that brings our cost in line with our actual lowering decision. Note that since scalarization is not possible for scalable types, the prior costing falsely returned Invalid for that case.	2024-07-26 12:52:20 -07:00
Florian Hahn	67a55e01e3	[VPlan] Replace getBestPlan by getBestVF use also for epilogue vec. (#98821 ) Replace getBestPlan by getBestVF which simply finds the best VF out of the VFs for the available VPlans. Then use getBestPlan to retrieve the corresponding VPlan. This allows using getBestVF & getBestPlan for epilogue vectorization as well. As the same plan may be used to vectorize both the main and epilogue loop, restricting the VF of the best plan would cause issues. PR: https://github.com/llvm/llvm-project/pull/98821	2024-07-26 14:06:46 +01:00
Alexey Bataev	7432ad6af5	[LV][VP][NFC]Add tests for safe store/load forwarding/dependence distance. Reviewers: fhahn Reviewed By: fhahn Pull Request: https://github.com/llvm/llvm-project/pull/100635	2024-07-25 19:55:37 -04:00
Florian Hahn	a3092152ac	[VPlan] Don't create live-outs for induction increments. Follow up to `fc9cd3272b` to also skip creating live-outs for IV increments, as those are also generated independent of VPlan for now.	2024-07-25 21:34:55 +01:00
Philip Reames	ea202f9f2e	[LV,RISCV] Regenerate a test to reduce spurious deltas in upcoming change	2024-07-25 12:22:59 -07:00
Simon Pilgrim	010dcfd85f	[CostModel][X86] Improve add/sub/mul overflow intrinsic costs Noticed due to x86 changes in #97463	2024-07-25 16:01:05 +01:00
Florian Hahn	72532c9219	[LV] Don't predicate divs with invariant divisor when folding tail (#98904 ) When folding the tail, at least one of the lanes must execute unconditionally. If the divisor is loop-invariant no predication is needed, as predication would not prevent the divide-by-0 on the executed lane. Depends on https://github.com/llvm/llvm-project/pull/98892. PR: https://github.com/llvm/llvm-project/pull/98904	2024-07-25 12:21:09 +01:00
Florian Hahn	b72689a5cb	[LV] Ignore live-out users in cost model if scalar epilogue is required. Follow-up to `ba8126b6fe`. If a scalar epilogue is required, users outside the loop won't use live-outs from the vector loop but from the scalar epilogue. Ignore them if that is the case. This fixes another case where the VPlan-based cost-model more accurately computes cost. Fixes https://github.com/llvm/llvm-project/issues/100464.	2024-07-25 11:16:18 +01:00
Florian Hahn	ba8126b6fe	[LV] Mark dead instructions in loop as free. Update collectValuesToIgnore to also ignore dead instructions in the loop. Such instructions will be removed by VPlan-based DCE and won't be considered by the VPlan-based cost model. This closes a gap between the legacy and VPlan-based cost model. In practice with the default pipelines, there shouldn't be any dead instructions in loops reaching LoopVectorize, but it is easy to generate such cases by hand or automatically via fuzzers. Fixes https://github.com/llvm/llvm-project/issues/99701.	2024-07-24 09:31:32 +01:00
Florian Hahn	bb60dd391f	[VPlan] Only use force-target-instruction-cost for recipes with insts. To match the behavior of the legacy cost model, only apply -force-target-instruction-cost to recipes with underlying instructions for now, as only original IR instructions are considered by the legacy cost model. This fixes a difference between legacy and VPlan based cost model, triggering the verification assertion, reported by @JonPsson1.	2024-07-23 21:05:10 +01:00
Zhaoxin Yang	89d1eb6734	[LoongArch] Remove experimental `auto-vec` feature. (#100070 ) Currently, automatic vectorization will be enabled with `-mlsx/-mlasx` enabled.	2024-07-23 15:19:00 +08:00
David Sherwood	102d16809b	[Analysis] Bail out for negative offsets in isDereferenceableAndAlignedInLoop (#99490 ) This patch now bails out explicitly for negative offsets so that it's more consistent with the unsigned remainder and add calculations, and it fixes a genuine bug as shown with the new test.	2024-07-22 11:31:50 +01:00
Luke Lau	58854facb3	[RISCV] Don't cost vector arithmetic fp ops as cheaper than scalar (#99594 ) I was comparing some SPEC CPU 2017 benchmarks across rva22u64 and rva22u64_v, and noticed that in a few cases that rva22u64_v was considerably slower. One of them was 519.lbm_r, which has a large loop that was being unprofitably vectorized. It has an if/else in the loop which requires large amounts of predication when vectorized, but despite the loop vectorizer taking this into account the vector cost came out as cheaper than the scalar. It looks like the reason for this is because we cost scalar floating point ops as 2, but their vector equivalents as 1 (for LMUL 1). This comes from how we use BasicTTIImpl for scalars which treats floats as twice as expensive as integers. This patch doubles the cost of vector floating point arithmetic ops so that they're at least as expensive as their scalar counterparts, which gives a 13% speedup on 519.lbm_r at -O3 on the spacemit-x60. Fixes #62576 (the last point there about scalar fsub/fmul)	2024-07-22 13:56:10 +08:00
Florian Hahn	c8c0b18b5d	[LV] Update tests to not have dead interleave groups. Update existing tests with dead interleave groups by adding users. This ensures the tests keep testing what they were intended to test with a planned change to skip unused instructions in cost computations.	2024-07-21 14:03:40 +01:00
Florian Hahn	05f986e143	[LV] Add tests for loops with switches.	2024-07-21 10:11:38 +01:00
Florian Hahn	710dab6e18	[VPlan] Remove VPPredInstPHIRecipes without users after region merging. After merging replicate regions, VPPredInstPHIRecipes may become unused. Remove them directly instead of moving them to the merged region.	2024-07-20 13:21:32 +01:00
Farzon Lotfi	e2f463b5b6	[aarch64] Add hyperbolic and arc trig intrinsic lowering (#98937 ) ## The change(s) - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing `fewerElementsVector` handling for the new trig intrinsics - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize the new trig intrinsics. aarch64 has specail legalization requirments in `AArch64LegalizerInfo.cpp`. If we redirect the clang builtin without handling this we will break the aarch64 compiler ## History This change is part of an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 ## Why is aarch64 needed The last step is to redirect the `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh` to emit the intrinsic. We can't emit the intrinsic without the intrinsics becoming legal for aarch64 in `AArch64LegalizerInfo.cpp`	2024-07-19 10:18:23 -04:00
Florian Hahn	008df3cf85	[LV] Check isPredInst instead of isScalarWithPred in uniform analysis. (#98892 ) Any instruction marked as uniform will result in a uniform VPReplicateRecipe. If it requires predication, it will be placed in a replicate region, even if isScalarWithPredication returns false. Check isPredicatedInst instead of isScalarWithPredication to avoid generating uniform VPReplicateRecipes placed inside a replicate region. This fixes an assertion when using scalable VFs. Fixes https://github.com/llvm/llvm-project/issues/80416. Fixes https://github.com/llvm/llvm-project/issues/94328. Fixes https://github.com/llvm/llvm-project/issues/99625. PR: https://github.com/llvm/llvm-project/pull/98892	2024-07-19 12:02:25 +01:00
Florian Hahn	b8741cc185	[VPlan] Relax assertion retrieving a scalar from VPTransformState::get. The current assertion VPTransformState::get when retrieving a single scalar only does not account for cases where a def has multiple users, some demanding all scalar lanes, some demanding only a single scalar. For an example, see the modified test case. Relax the assertion by also allowing requesting scalar lanes only when the Def doesn't have only its first lane used. Fixes https://github.com/llvm/llvm-project/issues/88849.	2024-07-19 11:33:57 +01:00
Florian Hahn	17f98baf70	[LV] Add test with users both demanding all lanes and first-lane-only. Add a test case where scalar steps are used by both a VPReplicateRecipe (demands all scalar lanes) and a VPInstruction that only demands the first lane. Test case for https://github.com/llvm/llvm-project/issues/88849.	2024-07-19 10:29:43 +01:00
Florian Hahn	270f5e42b8	[LV] Add tests where uniform recipe gets predicated for scalable VFs. Currently the tests crash, due to a VPReplicateRecipe getting predicated for scalable vectors. Precommits tests for https://github.com/llvm/llvm-project/pull/98892. Test cases for * https://github.com/llvm/llvm-project/issues/80416 and * https://github.com/llvm/llvm-project/issues/94328	2024-07-19 09:21:40 +01:00
Florian Hahn	2bb65660ae	[LV] Allow re-processing of operands of instrs feeding interleave group Follow up to `d216615518` to update dead interleave group pointer detection to allow re-processing of operands of instructions determined to only feed interleave groups. This is needed because instructions feeding interleave group pointers can become dead in any order, as per the newly added test case.	2024-07-17 21:37:28 +01:00
Florian Hahn	d216615518	[LV] Process dead interleave pointer ops in reverse order. Process dead interleave pointer ops in reverse order. This also catches cases where the same base pointer is used by multiple different interleave groups. This fixes another case where the legacy cost model inaccuarately estimates cost, surfaced by `b841e2eca3`.	2024-07-17 11:43:42 +01:00
Sjoerd Meijer	c5329c827a	[LV][AArch64] Prefer Fixed over Scalable if cost-model is equal (Neoverse V2) (#95819 ) For the Neoverse V2 we would like to prefer fixed width over scalable vectorisation if the cost-model assigns an equal cost to both for certain loops. This improves 7 kernels from TSVC-2 and several production kernels by about 2x, and does not affect SPEC21017 INT and FP. This also adds a new TTI hook that can steer the loop vectorizater to preferring fixed width vectorization, which can be set per CPU. For now, this is only enabled for the Neoverse V2. There are 3 reasons why preferring NEON might be better in the case the cost-model is a tie and the SVE vector size is the same as NEON (128-bit): architectural reasons, micro-architecture reasons, and SVE codegen reasons. The latter will be improved over time, so the more important reasons are the former two. I.e., (micro) architecture reason is the use of LPD/STP instructions which are not available in SVE2 and it avoids predication. For what it is worth: this codegen strategy to generate more NEON is inline with GCC's codegen strategy, which is actually even more aggressive in generating NEON when no predication is required. We could be smarter about the decision making, but this seems to be a first good step in the right direction, and we can always revise this later (for example make the target hook more general).	2024-07-17 10:46:28 +01:00
Florian Hahn	55483379e2	[VPlan] Update test to use CHECK variables. Update test to avoid using hard-coded VPValue IDs.	2024-07-16 13:04:10 +01:00
Florian Hahn	4469a1e587	[LV] Add missing check lines in vector.ph in tests. Match all instructions in vector.ph in sve-inductions-unusual-types.ll. This should help to better show the impact of https://github.com/llvm/llvm-project/pull/95305.	2024-07-16 10:45:53 +01:00
Mel Chen	4eb30cfb34	[LV][EVL] Support in-loop reduction using tail folding with EVL. (#90184 ) Following from #87816, add VPReductionEVLRecipe to describe vector predication reduction. Address one of TODOs from #76172.	2024-07-16 16:15:24 +08:00
Dinar Temirbulatov	31d4c97506	[LoopVectorize] LLVM fails to vectorise loops with multi-bool varables (#89226 ) This change allows to consider compare instructions in the loop with multiple use inside the loop and outside. This change allows to vectorise this loop: int foo(float* a, int n) { _Bool any = 0; _Bool all = 1; for (int i = 0; i < n; i++) { if (a[i] < 0.0f) { any = 1; } else { all = 0; } } return all ? 1 : any ? 2 : 3; }	2024-07-15 20:21:50 +01:00
Florian Hahn	967eba0754	[LV] Add test cases for tail-folding sdiv/udiv/urem feeding geps. Based on reduced tests from https://github.com/llvm/llvm-project/issues/94328.	2024-07-15 11:45:07 +01:00
Florian Hahn	8fcb822da6	[LV] Add uses of result to pointer-runtime-checks-unprofitable.ll test. Otherwise %p.2 is not used and will be removed by VPlan transforms, leading to a difference between legacy and VPlan-based cost.	2024-07-15 09:59:46 +01:00
Florian Hahn	fc9cd3272b	[VPlan] Don't add live-outs for IV phis. Resume and exit values for inductions are currently still created outside of VPlan and independent of the induction recipes. Don't add live-outs for now, as the additional unneeded users can pessimize other anlysis. Fixes https://github.com/llvm/llvm-project/issues/98660.	2024-07-14 20:49:03 +01:00
Mel Chen	a00754bb2a	[LV] Fix the cost of min/max reductions. (#98453 ) This patch updates the function `getReductionPatternCost` to handle the cost of min/max reductions by `TTI.getMinMaxReductionCost`.	2024-07-12 13:47:33 +08:00
Florian Hahn	7a49d80f58	[VPlan] Skip users outside loop in check for exit pre-compute candidates When collecting candidates to pre-compute cost for operands of exit conditions, skip users outside the loop when checking if they are in ExistInstrs. The users outside the loop should be ignored, as they won't make a value live in the VPlan. This fixes a failure when building for X86 with sanitizers on macOS after `b841e2eca3` (https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)	2024-07-11 22:04:39 +01:00
Graham Hunter	22a7f6dcc4	Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493 ) Reverts llvm/llvm-project#91458 to deal with post-commit reviewer requests.	2024-07-11 16:39:30 +01:00
Florian Hahn	9a5a8731e7	[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. (#94760 ) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on https://github.com/llvm/llvm-project/pull/92651 PR: https://github.com/llvm/llvm-project/pull/94760	2024-07-11 16:08:04 +01:00
Graham Hunter	1860fd049e	[LV] Autovectorization for the all-in-one histogram intrinsic (#91458 ) This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.	2024-07-11 15:33:30 +01:00
Florian Hahn	67f4968a57	[LV] Skip cost for ZExt/SExts that will be removed by truncating ops. If an extend is truncated, it will be removed if the result type is <= the source type, as there is nothing to extend. Return a cost of 0. This was caught by the first step to perform cost-modeling based on VPlan (`b841e2e`), as the legacy cost model would query the cost of an invalid extend, while the extend has been folded away by VPlan transforms. Fixes https://github.com/llvm/llvm-project/issues/98413.	2024-07-11 11:40:14 +01:00
Florian Hahn	88e9c56990	[LV] Don't adjust name of recurrence phi in scalar loop (NFC). Adjusting the name of the recurrence phi in the scalar loop is a bit inconsistent, as we do not adjust any other names in the scalar loops (including other phis). Remove this adjustment in preparation for https://github.com/llvm/llvm-project/pull/94760/ and as discussed there.	2024-07-10 18:37:35 +01:00
Florian Hahn	b841e2eca3	Recommit "[VPlan] First step towards VPlan cost modeling. (#92555 )" This reverts commit `6f538f6a2d`. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/llvm-project/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's https://github.com/llvm/llvm-project/pull/67647 and https://github.com/llvm/llvm-project/pull/67934 which is an earlier version of the current PR. PR: https://github.com/llvm/llvm-project/pull/92555	2024-07-10 14:22:21 +01:00
Florian Hahn	ef89e3efa9	[VPlan] Collect ephemeral values for VPlan. Port collectEphemeralValues to VPlan as collectEphemeralRecipesForVPlan, use it in willGenerateVectors. This fixes a regression caused by `29b8b72117` for loops where the only vector values are ephemeral.	2024-07-09 21:34:49 +01:00
Florian Hahn	7346e7cc47	[VPlan] Update HCFG builder after `72937203dd` to fix leak. Update buildPlainCFG to re-use the vector and latch VPBBs created as part of the initial skeleton in `72937203dd`. This should fix the leak sanitizer failure discovered by https://lab.llvm.org/buildbot/#/builders/52/builds/619.	2024-07-09 15:28:43 +01:00
Florian Hahn	0577cdaa32	[LV] Split checking if tail-folding is possible, collecting masked ops. (#77612 ) Introduce new canFoldTail helper which only checks if tail-folding is possible, but without modifying MaskedOps. Just because tail-folding is possible doesn't mean the tail will be folded; that's up to the cost-model to decide. Separating the check if tail-folding is possible and preparing for tail-folding makes sure that MaskedOps is only populated when tail-folding is actually selected. PR: https://github.com/llvm/llvm-project/pull/77612	2024-07-08 16:34:42 +01:00
Florian Hahn	27ccc8835e	[LV] Add tests with ephemeral values that are widened. Add tests with loops with ephemeral values that are widened. After `29b8b72117`, @ephemeral_load_and_compare_another_load_used_outside is vectorized even though the only vector values that are generated are ephemeral.	2024-07-08 13:15:39 +01:00
Florian Hahn	29b8b72117	[LV] Move check if any vector insts will be generated to VPlan. (#96622 ) This patch moves the check if any vector instructions will be generated from getInstructionCost to be based on VPlan. This simplifies getInstructionCost, is more accurate as we check the final result and also allows us to exit early once we visit a recipe that generates vector instructions. The helper can then be re-used by the VPlan-based cost model to match the legacy selectVectorizationFactor behavior, this fixing a crash and paving the way to recommit https://github.com/llvm/llvm-project/pull/92555. PR: https://github.com/llvm/llvm-project/pull/96622	2024-07-07 20:08:01 +01:00
Florian Hahn	ac03ae30cf	[LV] Preserve LAA in LoopVectorize (NFCI). LoopVectorize already always preserves DT, LI and SCEV. If any changes get made to the CFG, cached LAA info for loops are cleared. LoopAccessAnalysis also implements ::invalidate to clear the analysis if SE, DT or LI gets invalidated. Hence it should be safe to preserve LAA and save a small amount of compile-time.	2024-07-05 21:41:31 +01:00
Florian Hahn	959ff45bda	[LV] Regenerate test checks for zero_unroll.ll (NFC). Regenerate test checks to better show impact of https://github.com/llvm/llvm-project/pull/96622.	2024-07-05 11:37:13 +01:00
Florian Hahn	99d6c6d936	[VPlan] Model branch cond to enter scalar epilogue in VPlan. (#92651 ) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on https://github.com/llvm/llvm-project/pull/92525 PR: https://github.com/llvm/llvm-project/pull/92651	2024-07-05 10:08:42 +01:00
Florian Hahn	2b3b405b09	[LV] Don't vectorize first-order recurrence with VF <vscale x 1 x ..> The assertion added as part of https://github.com/llvm/llvm-project/pull/93395 surfaced cases where first-order recurrences are vectorized with <vscale x 1 x ..>. If vscale is 1, then we are unable to extract the penultimate value (second to last lane). Previously this case got mis-compiled, trying to extract from an invalid lane (-1) https://llvm.godbolt.org/z/3adzYYcf9. Fixes https://github.com/llvm/llvm-project/issues/97452.	2024-07-04 11:44:51 +01:00

1 2 3 4 5 ...

2557 Commits