clang-p2996

Author	SHA1	Message	Date
Florian Hahn	cec24f0d7e	[VPlan] Update stale test after `9536a6286`, fix formatting.	2024-01-31 13:45:38 +00:00
Florian Hahn	9536a6286e	[VPlan] Preserve original induction order when creating scalar steps. Update createScalarIVSteps to take an insert point as parameter. This ensures that the inserted scalar steps are in the same order as the recipes they replace (vs in reverse order as currently). This helps to reduce the diff for follow-up changes.	2024-01-31 13:31:28 +00:00
Nilanjana Basu	c492eb6b28	[LV] Update interleaving count computation when scalar epilogue loop needs to run at least once (#79651 ) Update loop interleaving count computation to address loops that require at least one scalar iteration in the epilogue loop. For this case, the available trip count for interleaving the loop is one less.	2024-01-29 13:41:15 -08:00
Nilanjana Basu	155f24b11e	[Tests][LV][AArch64] Pre-commit tests for changing loop interleaving count computation for loops that need to run scalar iterations (#79640 ) This patch contains a set of pre-commit tests for changing the loop interleaving count computation in a subsequent patch in order to address loops that need to execute at least a single scalar iteration in the epilogue.	2024-01-29 10:21:23 -08:00
David Sherwood	962fbafecf	[LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034 ) When we generate runtime memory checks for an inner loop it's possible that these checks are invariant in the outer loop and so will get hoisted out. In such cases, the effective cost of the checks should reduce to reflect the outer loop trip count. This fixes a 25% performance regression introduced by commit `49b0e6dcc2` when building the SPEC2017 x264 benchmark with PGO, where we decided the inner loop trip count wasn't high enough to warrant the (incorrect) high cost of the runtime checks. Also, when runtime memory checks consist entirely of diff checks these are likely to be outer loop invariant.	2024-01-26 14:43:48 +00:00
Florian Hahn	731c2049a4	[VPlan] Relax IV user assertion after `0ab539f` for epilogue vec. After `0ab539fd67`, the canonical IV in the epilogue vector loop may be used by a trunc. Relax the corresponding assert. This should fix some build-bot failures, including https://lab.llvm.org/buildbot/#/builders/187/builds/14113 https://lab.llvm.org/buildbot/#/builders/98/builds/32350 https://lab.llvm.org/buildbot/#/builders/239/builds/5473	2024-01-26 13:19:25 +00:00
Graham Hunter	d4c0171423	[LV] Fix handling of interleaving linear args (#78725 ) Currently when interleaving vector calls with linear arguments, the Part is ignored and all vector calls use the initial value from the first lane of the current iteration. Fix this to extract from the correct part of the linear vector.	2024-01-26 11:30:35 +00:00
Florian Hahn	0ab539fd67	[VPlan] Add new VPScalarCastRecipe, use for IV & step trunc. (#78113 ) Add a new recipe to model scalar cast instructions, without relying on an underlying instruction. This allows creating scalar casts, without relying on an underlying instruction (like the current VPReplicateRecipe). The new recipe is used to explicitly model both truncating the induction step and the VPDerivedIVRecipe, thus simplifying both the recipe and code needed to introduce it. Truncating VPWidenIntOrFpInductionRecipes should also be modeled using the new recipe, as follow-up. PR: https://github.com/llvm/llvm-project/pull/78113	2024-01-26 11:13:05 +00:00
David Spickett	4a91206359	[llvm][LV] Move new test into X86 subfolder Added in `a04f615291`. Failing on our Arm only bots: https://lab.llvm.org/buildbot/#/builders/245/builds/19684	2024-01-25 17:04:34 +00:00
Florian Hahn	a04f615291	[LV] Check for innermost loop instead of EnableVPlanNativePath in CM. Replace EnableVPlanNativePath checks in the cost-model by assertions that the code is only called for innermost loops. This ensures that the cost model isn't used in the VPlanNativePath, which is only used for outer-loop vectorization. Even with EnableVPlanNativePath, inner loops are processed by the inner loop vectorization path, not the native path, so checking for EnableVPlanNativePath may impact decisions for inner loops and can cause crashes, like in the attached test case.	2024-01-25 12:49:52 +00:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
wanglei	fcff4582f0	[LoongArch] Permit auto-vectorization using LSX/LASX with `auto-vec` feature (#78943 ) With enough codegen complete, we can now correctly report the size of vector registers for LSX/LASX, allowing auto vectorization (The `auto-vec` feature needs to be enabled simultaneously). As described, the `auto-vec` feature is an experimental one. To ensure that automatic vectorization is not enabled by default, because the information provided by the current `TTI` cannot yield additional benefits for automatic vectorization.	2024-01-23 09:06:35 +08:00
Alexandros Lamprineas	530c72b498	[TLI] Add missing ArmPL mappings (#78474 ) Adds TLI mappings for fixed and scalable vector variants of cospi(f), fmax(f), ilogb(f) and ldexp(f).	2024-01-22 17:15:17 +00:00
Jay Foad	7017efa1a1	Fix typo "widended"	2024-01-19 13:50:26 +00:00
Graham Hunter	689da340ed	[NFC][LV] Test precommit for interleaved linear args	2024-01-19 12:59:09 +00:00
Alexandros Lamprineas	92289db82f	[VFABI] Move the Vector ABI demangling utility to LLVMCore. (#77513 ) This fixes #71892 allowing us to check magled names in the IR verifier.	2024-01-17 09:55:30 +00:00
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to `806761a762`. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Maciej Gabka	279dfe77da	[TLI][AArch64] Add extra SLEEF mappings and tests (#78140 ) This patch is adding more scalar to vector mappings to the TLI for the SLEEF vector library. The added mappings are for the following functions: acosh, asinh, cbrt, copysign, cospi erf, erfc, expm1, fdim, fma, fmax, fmin hypot, ilogb, ldexp, log1p, nextafter, sinpi. It also brings back accidentally removed tests for sincospi.	2024-01-16 14:51:38 +00:00
Mel Chen	b6e8f6604c	[LV] Skipping all debug instructions when native vplan is enabled (#77413 ) The following internal error occurred when using native vplan to vectorize the program with the debug info generation. Assertion `!isa<DbgInfoIntrinsic>(CI) && "DbgInfoIntrinsic should have been dropped during VPlan construction"' failed. This patch ignored all debug instructions to fix the error when native vplan is enabled.	2024-01-16 11:08:10 +08:00
Jonas Paulsson	62b7e35f10	[SystemZ] Don't assert for i128 vectors in getInterleavedMemoryOpCost() (#78009 ) This assert does not seem justified given that the LoopVectorizer can form interleave groups containing i128 elements where the number of elements per vector is indeed just one.	2024-01-15 17:31:18 +01:00
Florian Hahn	2ae795d3d6	[LV] Add test case where variable induction step needs truncating.	2024-01-14 20:14:12 +00:00
Maciej Gabka	5dbf178154	[TLI][NFC] Fix ordering of ArmPL and SLEEF tests (#77609 ) This patch sorts the tests which check if SLEEF and ArmPL mappings are used, in the order of the math functions base names.	2024-01-12 15:06:25 +00:00
Florian Hahn	59d6f033a2	[VPlan] Support narrowing widened loads in truncateToMinimimalBitwidths. MinBWs may also contain widened load instructions, handle them by only narrowing their result. Fixes https://github.com/llvm/llvm-project/issues/77468	2024-01-12 13:14:13 +00:00
Florian Hahn	51afb10174	[LV] Create block in mask up-front if needed. (#76635 ) At the moment, block and edge masks are created on demand, which means that they are inserted at the point where they are demanded and then cached. It is possible that the mask for a block is looked up later at a point that's not dominated by the point where the mask has been inserted. To avoid this, create masks up front on entry to the corresponding basic block and leave it to VPlan simplification to remove unneeded masks. Note that we need to create masks for all blocks, if any of the blocks in the loop needs predication, as computing the mask of a block depends on the masks of its predecessor. Needed for #76090. https://github.com/llvm/llvm-project/pull/76635	2024-01-09 10:50:08 +00:00
Florian Hahn	18ec3304a9	[VPlan] Manage InBounds via VPRecipeWithIRFlags for VectorPtrRecipe. As suggested as follow-up in https://github.com/llvm/llvm-project/pull/72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.	2024-01-07 13:58:05 +00:00
Florian Hahn	249d2ccb1d	[LV] Add test showing overly aggressive dropping of inbounds. As %B.gep.0 executes unconditionally in the latch, inbounds could be preserved in the vector version. https://alive2.llvm.org/ce/z/XWbMuD	2024-01-07 13:55:32 +00:00
Florian Hahn	3fb0d8dc80	Recommit "[VPlan] Mark Select VPInstructions as not having sideeffects." With #70253 landed, selects for reduction results are explicitly used by ComputeReductionResult and Selects can be marked as not having side-effects again. This reverts the revert commit `173032902c`.	2024-01-06 12:08:06 +00:00
Alexandros Lamprineas	8c7f10eadb	[TLI] Add mappings to SLEEF/ArmPL libcall variants taking linear args. (#76060 ) The mappings correspond to vectorized variants (fixed/scalable) for the math functions: modf, sincos, sincospi.	2024-01-05 11:01:09 +00:00
Yingwei Zheng	6681650025	[InstCombine] Revert the `signed icmp -> unsigned icmp` canonicalization when folding `icmp Pred min\|max(X, Y), Z` (#76685 ) This patch tries to flip the signedness of predicates when folding an unsigned icmp with a signed min/max. It will enable more optimizations as we canonicalizes a signed icmp into an unsigned icmp when both operands are known to have the same sign. Fixes #76672. Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|-0.00%\|+0.01%\|+0.05%\|-0.12%\|-0.01%\|-0.03%\|-0.00%\| NOTE: We can flip the signedness of predicate if both operands are negative. But I don't see the benefit of handling these cases.	2024-01-05 14:39:16 +08:00
Florian Hahn	241fe83704	[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253 ) This patch introduces a new ComputeReductionResult opcode to compute the final reduction result in the middle block. The code from fixReduction has been moved to ComputeReductionResult, after some earlier cleanup changes to model parts of fixReduction explicitly elsewhere as needed. The recipe may be broken down further in the future. Note that the phi nodes to merge the reduction result from the trip count check and the middle block, to be used as resume value for the scalar remainder loop are also generated based on ComputeReductionResult. Once we have a VPValue for the reduction result, this can also be modeled explicitly and moved out of the recipe.	2024-01-04 22:53:18 +00:00
Florian Hahn	2ab5c47c87	[VPlan] Don't replace scalarizing recipe with VPWidenCastRecipe. Don't replace a scalarizing recipe with a VPWidenCastRecipe. This would introduce wide (vectorizing) recipes when interleaving only. Fixes https://github.com/llvm/llvm-project/issues/76986	2024-01-04 20:39:44 +00:00
Nilanjana Basu	cd28da390f	[LV] Change loops' interleave count computation (#73766 ) [LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (https://github.com/llvm/llvm-test-suite/pull/56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: https://github.com/llvm/llvm-project/pull/70272 and https://github.com/llvm/llvm-project/pull/74689.	2024-01-04 12:45:22 +05:30
Alexandros Lamprineas	e512df3ecc	[LV] Fix crash when vectorizing function calls with linear args. (#76274 ) llvm/lib/IR/Type.cpp:694: Assertion `isValidElementType(ElementType) && "Element type of a VectorType must be an integer, floating point, or pointer type."' failed. Stack dump: llvm::FixedVectorType::get(llvm::Type, unsigned int) llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&) llvm::VPBasicBlock::execute(llvm::VPTransformState) llvm::VPRegionBlock::execute(llvm::VPTransformState) llvm::VPlan::execute(llvm::VPTransformState) ... Happens with function calls of void return type.	2024-01-02 18:14:16 +00:00
Florian Hahn	f18536d642	[VPlan] Model address separately. (#72164 ) Move vector pointer generation to a separate VPVectorPointerRecipe. This untangles address computation from the memory recipes future and is also needed to enable explicit unrolling in VPlan. https://github.com/llvm/llvm-project/pull/72164	2024-01-01 19:51:15 +00:00
Alexandros Lamprineas	6c2ad8ac7b	[TLI][NFC] Autogenerate vectorized call tests for SLEEF/ArmPL. (#76146 ) This patch prepares the ground for #76060. * Unifies ArmPL and SLEEF tests for better coverage * Replaces deprecated float* and double* types with ptr * Adds noalias attribute to pointer arguments * Adds some cmd-line options to the RUN lines to simplify output * Removes datalayout since target triple is provided * Removes checks for return statements * Refactors the regex filter for autogenerated checks * Removes redundant test file suffix (already under the AArch64 dir)	2023-12-22 16:29:18 +00:00
Paschalis Mpeis	2349731992	[TLI] Add SLEEFGNUABI mappings for fmod/fmodf fixed-width. (#75803 ) Cleanup test sleef-calls-aarch64.ll: - make the util update script's regex more clear - eliminate scalar epilogues in tests	2023-12-20 09:08:17 +00:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Nikita Popov	e93d324adb	[InstCombine] Preserve poison in evaluateInDifferentElementOrder() Don't unnecessarily replace poison with undef.	2023-12-18 15:36:22 +01:00
David Sherwood	49b0e6dcc2	[LoopVectorize] Enable hoisting of runtime checks by default (#71538 ) With commit https://reviews.llvm.org/D152366 I introduced functionality that permitted the hoisting of runtime memory checks from a vectorised inner loop to the preheader of the next outer-most loop. This is useful for benchmarks like SPEC2017's x264 where the inner loop is vectorised and only has a small trip count. In such cases the runtime memory checks become expensive and since the checks never fail in the case of x264 it makes sense to do this. However, this behaviour was controlled by the flag -hoist-runtime-checks which was off by default. This patch enables this flag by default for all targets, since I believe this is a generally beneficial thing to do. I have tested this with SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1 and neoverse-n1, respectively. Similarly, I saw slight improvements in the overall geomean on both machines. The only other notable changes were a 1% drop in the roms benchmark, which was compensated for by a 1% improvement in fotonik3d.	2023-12-18 09:41:54 +00:00
Shih-Po Hung	3d422a9859	[VPlan] Implement mayHaveSideEffects/mayWriteToMemory for VPInterleav… (#71360 ) …eRecipe This helps VPlanTransforms::removeDeadRecipes to work on VPInterleaveRecipe	2023-12-15 00:23:14 +08:00
Shih-Po Hung	b97c5a9554	[VPlan] Add a test for testing unused interleave recipes (#75026 ) - Precommit of tests from #71360. - Replace `undef` pointer operands and add stores to avoid the loads being optmized away.	2023-12-14 21:16:11 +08:00
Simon Pilgrim	b7fc78255e	Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026 )" vplan-unused-interleave-group.ll is causing buildbot failures	2023-12-14 10:25:41 +00:00
Shih-Po Hung	2047ab00ea	[VPlan] Add a test for testing unused interleave recipes (#75026 ) - Precommit of tests from #71360. - Replace `undef` pointer operands and add stores to avoid the loads being optmized away.	2023-12-14 17:36:58 +08:00
Florian Hahn	173032902c	Revert "[VPlan] Mark Select VPInstructions as not having sideeffects." This reverts commit `19918ac34d`. Fixes #75298. There is still a case where we miss the correct users outside the main vector loop for reductions, and that is tail-folded loops with reductions where the final value is stored after the loop. This should be handled explicitly in #70253	2023-12-13 21:05:24 +00:00
Florian Hahn	8d893f28f2	[LV] Add test case for #75298 .	2023-12-13 20:59:28 +00:00
David Sherwood	ceb02379a9	[LoopVectorize] Improve algorithm for hoisting runtime checks (#73515 ) When attempting to hoist runtime checks out of a loop we currently avoid creating pointer diff checks and prefer to do expanded range checks instead. This gives us the opportunity to hoist runtime checks out of a loop, since these checks are loop invariant. However, in some cases the pointer diff checks would also be loop invariant and so will naturally get hoisted. Therefore, since diff checks are cheaper so we should prefer to use those instead.	2023-12-12 09:10:39 +00:00
Nilanjana Basu	41a3828838	[LV] Added pre-commit tests for changing loop interleaving count computation (#74689 ) Added more pre-commit tests for evaluating changes to loop interleaving count computation in (https://github.com/llvm/llvm-project/pull/73766). The new set of tests address the change in IC computation to minimize the remainder TC of the vectorized loop while maximizing the IC when the remainder TC is the same.	2023-12-12 11:09:25 +05:30
Florian Hahn	19918ac34d	[VPlan] Mark Select VPInstructions as not having sideeffects. Select VPInstructions don't have sideeffects, mark them accordingly.	2023-12-11 12:26:32 +00:00
Florian Hahn	a5891fa4d2	[VPlan] Initial modeling of VF * UF as VPValue. (#74761 ) This patch starts initial modeling of VF * UF in VPlan. Initially, introduce a dedicated VFxUF VPValue, which is then populated during VPlan::prepareToExecute. Initially, the VF * UF applies only to the main vector loop region. Once we extend the scope of VPlan in the future, we may want to associate different VFxUFs with different vector loop regions (e.g. the epilogue vector loop) This allows explicitly parameterizing recipes that rely on the VF * UF, like the canonical induction increment. At the moment, this mainly helps to avoid generating some duplicated calls to vscale with scalable vectors. It should also allow using EVL as induction increments explicitly in D99750. Referring to VF * UF is also needed in other places that we plan to migrate to VPlan, like the minimum trip count check during skeleton creation. The first version creates the value for VF * UF directly in prepareToExecute to limit the scope of the patch. A follow-on patch will model VF * UF computation explicitly in VPlan using recipes. Moved from Phabricator (https://reviews.llvm.org/D157322)	2023-12-08 18:30:30 +00:00
Florian Hahn	5ea6a3fc6d	[VPlan] Compute scalable VF in preheader for induction increment. (#74762 ) UF * VF is loop invariant and can be computed directly in the preheader. This prepares the code for #74761 and reduces the test changes.	2023-12-08 12:18:31 +00:00

1 2 3 4 5 ...

2333 Commits