clang-p2996

Author	SHA1	Message	Date
Rin	d3e4702c0f	[AArch64] [LoopVectorize] Use either fixed-width or scalable VF when tail-folding (#67543 ) Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.	2023-10-05 10:24:30 +01:00
Arthur Eubanks	07389535a7	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit `b186f1f68b`. Causes crashes, see https://reviews.llvm.org/D158449.	2023-10-04 14:37:16 -07:00
Alex Richardson	e86d6a43f0	Regenerate test checks for tests affected by D141060	2023-10-04 10:51:35 -07:00
Alexey Bataev	b186f1f68b	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-10-04 07:53:30 -07:00
Alexey Bataev	1129dec778	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit `6f43d28f34` to fix a crash reported in https://reviews.llvm.org/D158449.	2023-10-03 13:02:16 -07:00
Alexey Bataev	6f43d28f34	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-10-03 10:26:11 -07:00
JolantaJensen	01797dad86	Fix mechanism propagating mangled names for TLI function mappings (#66656 ) Currently the mappings from TLI are used to generate the list of available "scalar to vector" mappings attached to scalar calls as "vector-function-abi-variant" LLVM IR attribute. Function names from TLI are wrapped in mangled name following the pattern: _ZGV<isa><mask><vlen><parameters>_<scalar_name>[(<vector_redirection>)] The problem is the mangled name uses _LLVM_ as the ISA name which prevents the compiler to compute vectorization factor for scalable vectors as it cannot make any decision based on the _LLVM_ ISA. If we use "s" as the ISA name, the compiler can make decisions based on VFABI specification where SVE spacific rules are described. This patch is only a refactoring stage where there is no change to the compiler's behaviour.	2023-10-02 18:58:39 +01:00
Alexey Bataev	ebcb5d59fc	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit `9f5960e004` to fix buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.	2023-09-29 15:03:46 -07:00
Alexey Bataev	9f5960e004	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-09-29 13:16:03 -07:00
Mel Chen	ab9cd27fa4	[LV][NFC] Move and add truncated-related FindLastIV reduction test cases. (#67674 )	2023-09-29 22:18:32 +08:00
Alexey Bataev	3204f88a8b	Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst." This reverts commit `c88c281cf1` to fix the crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.	2023-09-28 11:57:32 -07:00
Alexey Bataev	c88c281cf1	[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst. Need to add NumSrcElts param to is..Mask functions in ShuffleVectorInstruction class for better mask analysis. Mask.size() not always matches the sizes of the permuted vector(s). Allows to better estimate the cost in SLP and fix uses of the functions in other cases. Differential Revision: https://reviews.llvm.org/D158449	2023-09-28 11:03:21 -07:00
Mel Chen	707686b0fc	[LV][NFC] Remove unnecessary parameter attributes from the test cases. (#67630 ) The vectorization of the FindLastIV reduction does not depend on the nocapture and readonly attributes.	2023-09-28 21:15:34 +08:00
Ramkumar Ramachandra	ad415e3095	LoopVectorize/iv-select-cmp: comment out-of-bound tests (NFC) To help future contributors understand a couple of mysterious out-of-bound tests, add a brief comment to each.	2023-09-25 14:02:19 +01:00
Florian Hahn	97687b7aea	[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation. This patch updates the mask creation code to always create compares of the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front when tail folding and introduce active-lane-mask as later transformation. This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count) the canonical form for tail-folding early on. Introducing more specific active-lane-mask recipes is treated as a VPlan-to-VPlan optimization. This has the advantage of keeping the logic (and complexity) of introducing active-lane-mask recipes in a single place, instead of spreading the logic out across multiple functions. It also simplifies initial VPlan construction and enables treating introducing EVL as similar optimization. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158779	2023-09-25 13:34:45 +01:00
Ramkumar Ramachandra	ef48e90489	LoopVectorize/iv-select-cmp: add test for decreasing IV out-of-bound The most straightforward extension to D150851 would involve handling the decreasing IV case, for which tests have been added in `110ec1863a` (LoopVectorize/iv-select-cmp: add test for decreasing IV, const start). However, the commit missed a testcase for the out-of-bound sentinel value LONG_MAX, which should not be vectorized. Fix this by adding a test corresponding to the following program: long test(long *a) { long rdx = 331; for (long i = LONG_MAX; i >= 0; i--) { if (a[i] > 3) rdx = i; } return rdx; } Differential Revision: https://reviews.llvm.org/D157969	2023-09-25 13:20:11 +01:00
Sergey Kachkov	0a5d52a757	[RISCV][CostModel] Add getCFInstrCost RISC-V implementation (#65599 ) This patch implements getCFInstrCost TTI hook that mostly affects LoopVectorizer decisions. It sets zero cost for PHI nodes and zero throughput cost for branches (assuming that branches are likely to be predicted). The implementation is similar to X86/AArch64/PowerPC targets and reduces loop cost by excluding induction PHIs/loop latch branches, which in turn leads to selecting smaller vectorization factor.	2023-09-25 12:26:01 +03:00
Florian Hahn	1a9358c090	[LV] Relax over-strict assertion for reduction exit value selects. After `f108c6c`, (mul x, 1) is simplified to x, which can cause the select for the final reduction value when tail-folding to use the reduction value for both options. Relax the assertion to make sure this case is allowed. Note that the reduction is now redundant itself and could be further simplified. Fixes #66895.	2023-09-21 10:12:29 +01:00
Dhruv Chawla	3e992d81af	[InferAlignment] Enable InferAlignment pass by default This gives an improvement of 0.6%: https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u Differential Revision: https://reviews.llvm.org/D158600	2023-09-20 12:08:52 +05:30
Nikita Popov	c41b4b6397	[InstCombine] Make flag drop during select equiv fold more generic Instead of unsetting flags on the instruction, attempting the fold, and the resetting the flags if it failed, add support to simplifyWithOpReplaced() to ignore poison-generating flags/metadata and collect all instructions where they may need to be dropped. This allows us to perform the fold a) with poison-generating metadata, which was previously not handled and b) poison-generating flags/metadata that are not on the root instruction. Proof for the ctpop case: https://alive2.llvm.org/ce/z/3H3HFs Fixes https://github.com/llvm/llvm-project/issues/62450.	2023-09-19 14:54:25 +02:00
Florian Hahn	f108c6cdc1	[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform. Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A. Among other things, this enables additional simplifications after applying versioned strides, as follow up to D147783. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D159200	2023-09-18 21:45:34 +01:00
Yingwei Zheng	44e5afdb91	[InstCombine] Generalize foldICmpWithMinMax This patch generalizes the fold of `icmp pred min/max(X, Y), Z` to address the issue https://github.com/llvm/llvm-project/issues/62898. For example, we can fold `smin(X, Y) < Z` into `X < Z` when `Y > Z` is implied by constant folds/invariants/dom conditions. Alive2 (with `--disable-undef-input` due to the limitation of --smt-to=10000): https://alive2.llvm.org/ce/z/rB7qLc You can run the standalone translation validation tool `alive-tv` locally to verify these transformations. ``` alive-tv transforms.ll --smt-to=600000 --exit-on-error ``` Reviewed By: goldstein.w.n Differential Revision: https://reviews.llvm.org/D156238	2023-09-11 02:26:48 +08:00
Florian Hahn	3fa1b254b7	[VPlan] Print blend recipe as operand directly, instead of IR PHI. Update VPBlendRecipe::print() to print the result directly, instead of relying on the stored Phi pointer. This brings the recipe in line with how other recipes are printed.	2023-09-04 12:35:58 +01:00
Florian Hahn	cb54522853	[LV] Add test coverage for adding DebugLoc to vector select. Add missing test coverage for selects with !dbg info.	2023-09-04 12:01:14 +01:00
Nuno Lopes	66a652ab08	recommit test for #65212	2023-09-04 09:17:18 +01:00
Muhammad Omair Javaid	42a46730bb	Revert "fix test for #65212 " This reverts commit `a0b0d7493d`. It has broken following buildbots: https://lab.llvm.org/buildbot/#/builders/188/builds/34873 https://lab.llvm.org/buildbot/#/builders/245/builds/13538 https://lab.llvm.org/buildbot/#/builders/65/builds/11074	2023-09-04 12:53:12 +05:00
Nuno Lopes	a0b0d7493d	fix test for #65212 I committed the wrong test, sorry.	2023-09-03 17:01:36 +01:00
Nuno Lopes	5a3fd5f3f5	[LoopVectorizer] Fix PR #65212 : vectorization of reduction loop wasn't respecting original store alignment	2023-09-03 16:35:05 +01:00
Nuno Lopes	335a9bc4d9	precommit test for #65212	2023-09-03 16:33:57 +01:00
Florian Hahn	fd66195777	[VPlan] Manage compare predicates in VPRecipeWithIRFlags. Extend VPRecipeWithIRFlags to also manage predicates for compares. This allows removing the custom ICmpULE opcode from VPInstruction which was a workaround for missing proper predicate handling. This simplifies the code a bit while also allowing compares with any predicates. It also fixes a case where the compare predixcate wasn't printed properly for VPReplicateRecipes. Discussed/split off from D150398. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158992	2023-09-02 21:45:24 +01:00
Igor Kirillov	ac65fb8699	[LoopVectorize] Fix incorrect order of invariant stores when there are multiple reductions. When a loop has multiple reductions, each with an intermediate invariant store, the order in which those reductions are processed is not considered. This can result in the invariant stores outside the loop not preserving the original order. This patch sorts VPReductionPHIRecipes by the order in which they have stores in the original loop before running `InnerLoopVectorizer::fixReduction` function, and it helps to maintain the correct order of stores. Fixes https://github.com/llvm/llvm-project/issues/64047 Differential Revision: https://reviews.llvm.org/D157631	2023-08-31 16:21:44 +00:00
Igor Kirillov	2df9ed11c5	[LoopVectorize] Pre-commit tests for D157631 Differential Revision: https://reviews.llvm.org/D157630	2023-08-31 09:50:53 +00:00
Dhruv Chawla	4ea8212775	[NFC][LoopVectorize] Regenerate test checks	2023-08-30 23:22:57 +05:30
Ramkumar Ramachandra	04b1276ad3	LoopVectorize/iv-select-cmp: add tests for truncated IV The current tests in iv-select-cmp.ll are not representative of clang output of common real-world C programs, which are often written with i32 induction vars, as opposed to i64 induction vars. Hence, add five tests corresponding to the following programs: int test(int a, int n) { int rdx = 331; for (int i = 0; i < n; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int a) { int rdx = 331; for (int i = 0; i < 20000; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int a, long n) { int rdx = 331; for (int i = 0; i < n; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int a, unsigned n) { int rdx = 331; for (int i = 0; i < n; i++) { if (a[i] > 3) rdx = i; } return rdx; } int test(int *a) { int rdx = 331; for (long i = INT_MIN - 1; i < UINT_MAX; i++) { if (a[i] > 3) rdx = i; } return rdx; } The first two can theoretically be vectorized without a runtime-check, while the third and fourth cannot. The fifth cannot be vectorized, even with a runtime-check. This issue was found while reviewing D150851. Differential Revision: https://reviews.llvm.org/D156124	2023-08-30 13:09:37 +01:00
Florian Hahn	96e83d3705	[LV] Use IRBuilder to create and optimize middle-block compare. Split off from D150398 to avoid builder-related diff changes there. Using IRBuilder to create ICmps simplifies the result if both operands are constants. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D158332	2023-08-29 11:42:18 +01:00
David Sherwood	c02184f286	[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop Suppose we have a nested loop like this: void foo(int32_t dst, int32_t src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { dst[(i * n) + j] += src[(i * n) + j]; } } } We currently generate runtime memory checks as a precondition for entering the vectorised version of the inner loop. However, if the runtime-determined trip count for the inner loop is quite small then the cost of these checks becomes quite expensive. This patch attempts to mitigate these costs by adding a new option to expand the memory ranges being checked to include the outer loop as well. This leads to runtime checks that can then be hoisted above the outer loop. For example, rather than looking for a conflict between the memory ranges: 1. &dst[(i * n)] -> &dst[(i * n) + n] 2. &src[(i * n)] -> &src[(i * n) + n] we can instead look at the expanded ranges: 1. &dst[0] -> &dst[((m - 1) * n) + n] 2. &src[0] -> &src[((m - 1) * n) + n] which are outer-loop-invariant. As with many optimisations there is a trade-off here, because there is a danger that using the expanded ranges we may never enter the vectorised inner loop, whereas with the smaller ranges we might enter at least once. I have added a HoistRuntimeChecks option that is turned off by default, but can be enabled for workloads where we know this is guaranteed to be of real benefit. In future, we can also use PGO to determine if this is worthwhile by using the inner loop trip count information. When enabling this option for SPEC2017 on neoverse-v1 with the flags "-Ofast -mcpu=native -flto" I see an overall geomean improvement of ~0.5%: SPEC2017 results (+ is an improvement, - is a regression): 520.omnetpp: +2% 525.x264: +2% 557.xz: +1.2% ... GEOMEAN: +0.5% I didn't investigate all the differences to see if they are genuine or noise, but I know the x264 improvement is real because it has some hot nested loops with low trip counts where I can see this hoisting is beneficial. Tests have been added here: Transforms/LoopVectorize/runtime-checks-hoist.ll Differential Revision: https://reviews.llvm.org/D152366	2023-08-24 12:14:02 +00:00
David Sherwood	494d28ec07	[LoopVectorize] Add pre-commit tests for D152366 Differential Revision: https://reviews.llvm.org/D154075	2023-08-24 10:52:18 +00:00
Florian Hahn	c071dba1a3	[LV] update hexagon test to use load results. The current version of the test doesn't use any of the loads, so they can be removed together with the mask of the interleave group. Use some loaded values and store them, to prevent the mask from being optimized away.	2023-08-22 20:20:58 +01:00
Florian Hahn	34d25924c4	[VPlan] Mark some VPInstruction opcodes as not having side effects. Mark some VPInstruction opcodes as not having side effects, preparation for D157037.	2023-08-22 20:05:57 +01:00
Kolya Panchenko	acbe886880	[LV] Vectorization remark for outerloop Reviewed By: fhahn, ABataev Differential Revision: https://reviews.llvm.org/D150696	2023-08-21 13:05:06 -04:00
Florian Hahn	686aef8401	[LV] Remove compares and branches on undef from a few tests.	2023-08-18 16:28:42 +01:00
Roland Froese	4d425f8663	[PowerPC] vector cost model add cost to extract i1 Try to avoid some unprofitable predication on PPC. Recognize in the cost model that computing on i1 values will require extra mask or compare operation. Differential Revision: https://reviews.llvm.org/D155876	2023-08-14 17:04:11 -04:00
Kerry McLaughlin	5d814b3848	Revert "[AArch64][SVE2] Change the cost of extends with S/URHADD to 0" This reverts commit `dda2cd2505`.	2023-08-14 10:44:13 +00:00
Kerry McLaughlin	dda2cd2505	[AArch64][SVE2] Change the cost of extends with S/URHADD to 0 When SVE2 is enabled, we can combine an add of 1, add & shift right by 1 to a single s/urhadd instruction. If the operands to the adds are extended, these extends will fold into the s/urhadd and their costs should be 0. Reviewed By: dtemirbulatov Differential Revision: https://reviews.llvm.org/D157628	2023-08-14 10:32:06 +00:00
Anna Thomas	5dfdf34df0	[LV] Move interleaved test to X86 directory Remove the x86-registered-target under REQUIRES.	2023-08-09 16:03:33 -04:00
David Spickett	c09bdfe6f7	[LV] Require x86 target for interleaved access test This is failing on every Linaro bot that only builds the Arm or AArch64 targets, adding X86, it passes.	2023-08-09 09:02:02 +00:00
Anna Thomas	cb7d28ef52	Fix BB failure for check lines Fix clang build bots which complain of missing check lines for Loop access analysis by generating two run lines (original commit: `3cf24dbb`).	2023-08-08 20:28:33 -04:00
Anna Thomas	3cf24dbbdd	[LV] Complete load groups and release store groups. Try 2. This is a complete fix for CompleteLoadGroups introduced in D154309. We need to check for dependency between A and every member of the load Group of B. This patch also fixes another miscompile seen when we incorrectly sink stores below a depending load (see testcase in interleaved-accesses-sink-store-across-load.ll). This is fixed by releasing store groups correctly. This change was previously reverted (`e85fd3cbdd`) due to Asan failure with use-after-free error. A testcase is added and the bug is fixed in this version of the patch. Differential Revision: https://reviews.llvm.org/D155520	2023-08-08 18:10:23 -04:00
Florian Hahn	af635a5547	[VPlan] Model wrap flags directly, remove NUW opcodes (NFC) Model wrap flags directly using VPRecipeWithIRFlags and clean up the duplicated NUW opcodes. D157144 will build on this and also model FMFs for VPInstruction. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D157194	2023-08-08 12:12:30 +01:00
Florian Hahn	93c5bae00e	[VPlan] Use printOperands for VPInstruction. Use the printOperands for printing VPInstruction's operands to be more in line with other recipes and ensure consistent printing after D15719. Also removes some stray spaces in print output.	2023-08-08 11:31:21 +01:00

1 2 3 4 5 ...

2299 Commits