clang-p2996

Author	SHA1	Message	Date
Simon Pilgrim	4517488eb7	[LoopVectorize] Regenerate reduction-predselect.ll test checks	2022-02-10 12:03:10 +00:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit `77a0da926c` as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
David Green	b4c6d1bb37	[LoopVectorizer] Don't perform interleaving of predicated scalar loops The vectorizer will choose at times to "vectorize" loops with a scalar factor (VF=1) with interleaving (IC > 1). This can occasionally produce better code than the unroller (notable for reductions where it can produce independent reduction chains that are combined after the loop). At times this is not very beneficial though, for example when runtime checks are needed or when the scalar code requires predication. This addresses the second point, preventing the vectorizer from interleaving when the scalar loop will require predication. This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial. It helps reverse some of the regressions from D118090. Differential Revision: https://reviews.llvm.org/D118566	2022-02-07 19:34:28 +00:00
Florian Hahn	1049735d07	[LV] Adjust accesses in test to ensure full RT checks are generated. Add an additional access so the full runtime checks are still generated, even after D119078.	2022-02-07 18:07:19 +00:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Florian Hahn	ef4df27940	[LV] Modernize some runtime check tests a bit. Update tests to check runtime checks a bit more precisely.	2022-02-07 12:08:56 +00:00
Sander de Smalen	eaee477eda	[LV] Use VScaleForTuning to allow wider epilogue VFs. When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709	2022-02-03 15:40:17 +00:00
Malhar Jajoo	778b455dd6	[LAA] Add Memory dependence remarks. Adds new optimization remarks when vectorization fails. More specifically, new remarks are added for following 4 cases: - Backward dependency - Backward dependency that prevents Store-to-load forwarding - Forward dependency that prevents Store-to-load forwarding - Unknown dependency It is important to note that only one of the sources of failures (to vectorize) is reported by the remarks. This source of failure may not be first in program order. A regression test has been added to test the following cases: a) Loop can be vectorized: No optimization remark is emitted b) Loop can not be vectorized: In this case an optimization remark will be emitted for one source of failure. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D108371	2022-02-02 12:07:51 +00:00
Sander de Smalen	2a44eaf20f	[LV] Allow a scalable VF for the epilogue. For some reason we limited the epilogue VF to be fixed-width, but there is not necessarily a reason for doing so. If the main VF=vscale x 16, the epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118688	2022-02-01 22:38:55 +00:00
David Green	aaa16eb023	[LV][AArch64] Add test for scalar interleaving with predication. NFC	2022-02-01 09:21:49 +00:00
Florian Hahn	02ee3fbff8	[LV] Add additional complex first order recurrence test. Add a new test case with 2 first-order recurrences, which share a user.	2022-01-31 19:54:14 +00:00
Florian Hahn	8f12175fed	[VPlan] Use VPlan to check if only the first lane is used. This removes the remaining dependence on LoopVectorizationCostModel from buildScalarSteps and is required so it can be moved out of ILV. It also improves allows us to remove a few unneeded instructions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116554	2022-01-30 13:07:29 +00:00
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Malhar Jajoo	b75bdff4a0	Trivial update for debug location in LIT test. This just updates debug location of a loop in a LIT test to point to the correct source line.	2022-01-27 19:07:47 +00:00
Congzhe Cao	f3e1f44340	[IVDescriptor] Get the exact FP instruction that does not allow reordering This is a bugfix in IVDescriptor.cpp. The helper function `RecurrenceDescriptor::getExactFPMathInst()` is supposed to return the 1st FP instruction that does not allow reordering. However, when constructing the RecurrenceDescriptor, we trace the use-def chain staring from a PHI node and for each instruction in the use-def chain, its descriptor overrides the previous one. Therefore in the final RecurrenceDescriptor we constructed, we lose previous FP instructions that does not allow reordering. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D118073	2022-01-27 00:33:46 -05:00
Igor Kirillov	d3932c690d	[LoopVectorize] Add tests with reductions that are stored in invariant address This patch adds tests for functionality that is to be implemented in D110235. Differential Revision: https://reviews.llvm.org/D117213	2022-01-24 21:26:38 +00:00
Florian Hahn	b2a8eff45c	[LV] Make some tests more robust by adding missing users.	2022-01-24 13:04:09 +00:00
Florian Hahn	b7f69b8d46	[LV] Name values and blocks in same induction tests (NFC). This reduces the churn in the test in future updates due to numbering changes.	2022-01-24 12:28:43 +00:00
Kerry McLaughlin	8082ab2fc3	[LoopVectorize] Support epilogue vectorisation of loops with reductions isCandidateForEpilogueVectorization will currently return false for loops which contain reductions. This patch removes this restriction and makes the following changes to support epilogue vectorisation with reductions: - `fixReduction`: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop. - `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue. - `processLoop`: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D116928	2022-01-24 12:03:31 +00:00
eopXD	3cf15af2da	[RISCV] Remove experimental prefix from rvv-related extensions. Extensions affected: +v, +zve, +zvl Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117860	2022-01-22 20:18:40 -08:00
Kerry McLaughlin	c740a07863	[LoopVectorize] Test in-loop reductions with tail folding for scalable vectors Adds `-prefer-inloop-reductions` to the RUN line of sve-tail-folding.ll & adds a new test where in-loop reductions cannot be used (`@cond_xor_reduction`). NFC. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117578	2022-01-19 14:36:23 +00:00
David Sherwood	e781620dee	[LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled When SVE is enabled for AArch64 targets it makes more sense to use the get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping from the intrinsic to the 'whilelo' instruction for legal vector types. This instruction neatly takes overflow into account as well. This patch fixes an issue in VPInstruction::generateInstruction that assumed we are only dealing with fixed-width vectors. Differential Revision: https://reviews.llvm.org/D117109	2022-01-18 11:59:30 +00:00
Florian Hahn	524150fe07	[LV] Add test coverage for reductions with odd interleave counts. Add test coverage for loops with reductions and odd (3, 5) interleave counts.	2022-01-17 14:34:21 +00:00
Florian Hahn	4a6f475446	[LV] Make test more robust by adding users of inductions. The modified tests didn't have actual users of all inductions, making it trivial to eliminate them. Add users to make sure the inductions are actually used in the vectorized version.	2022-01-17 13:28:59 +00:00
Kito Cheng	cc35161dc7	[RISCV] Add initial support for getRegUsageForType and getNumberOfRegisters Those two TTI hooks are used during vectorization for calculating register pressure, the default implementation isn't consider for LMUL, and that's also definitly wrong value for register number (all register class are 8 registers). So in this patch we tried to: 1. Calculate right register usage for vector type and scalar type. 2. Return right number of register for general purpose register and vector register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D116890	2022-01-17 15:27:54 +08:00
Florian Hahn	070d1034da	[LV] Restore metadata to disable runtime unrolling for epilogue loop. After `d4a8fc3a87` LV stopped adding metadata to disable runtime unrolling to the vectorized epilogue loop. This was missed because `278aa65cc4` removed the relevant test coverage. This patch fixes that by adding the relevant metadata after vector loop generation.	2022-01-16 13:14:16 +00:00
Florian Hahn	ba3198cfd1	[IRBuilder] Migrate select-folding to value-based FoldSelect. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D117228	2022-01-15 11:26:44 +00:00
Florian Hahn	42b34facfd	Recommit "[LV] Inline CreateSplatIV call for scalar VFs." This reverts the revert commit `073c27b5e5`. A reduced test case has been added in `5e4966cbae` and the code has been updated to handle the case where getInductionOpcode returns BinaryOpsEnd. In this case, the original code was always using Instruction::Add. Do the same in the patch. Note this commit may slightly change the value naming, because it now also assigns the 'induction' name in the floating point case.	2022-01-14 19:03:49 +00:00
Florian Hahn	5e4966cbae	[LV] Add test with an integer induction based on a ptr one. Reduced test case from the reproducer mentioned in `073c27b5e5`.	2022-01-14 15:56:47 +00:00
James Y Knight	073c27b5e5	Revert "[LV] Inline CreateSplatIV call for scalar VFs (NFC)." Causes a crash with the following (creduce'd) test-case: clang -O3 '--target=aarch64-grtev4-linux-gnu' -xc - -c -o /dev/null <<EOF int e; int f; int g() { int h; int j = 0; while (&f - j > 0) { int k; k = j; if (e == j && *e) k = 5; h = k; j++; } return h; } EOF This reverts commit `7ce48be0fd`.	2022-01-14 00:00:02 +00:00
Florian Hahn	7b9f5cbfa7	[LV] Extend check lines for pr34681.ll to cover foldable select.	2022-01-13 16:42:47 +00:00
Florian Hahn	3f2fb767e3	[VPlan] Make IV operand explicit for VPWidenCanonicalIVRecipe (NFC). This makes the def-use relationship between VPCanonicalIVPHIRecipe and VPWidenCanonicalIVRecipe explicit. Needed for D117140.	2022-01-13 11:13:05 +00:00
Florian Hahn	7ce48be0fd	[LV] Inline CreateSplatIV call for scalar VFs (NFC). This is a NFC change split off from D116123, as suggested there. D116123 will remove the last user of CreateSplatIV.	2022-01-13 09:34:31 +00:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
Florian Hahn	138fcc5f76	[IRBuilder] Migrate icmp-folding to value-based FoldICmp. Depends on D116935. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D116969	2022-01-12 12:37:46 +00:00
Florian Hahn	7e68061305	[IRBuilder] Migrate add-folding to value-based FoldAdd. Depends on D116935. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D116968	2022-01-12 09:24:46 +00:00
Florian Hahn	f0ef1ea6dd	[IRBuilder] Introduce folder using inst-simplify, use for Or fold. Alternative to D116817. This introduces a new value-based folding interface for Or (FoldOr), which takes 2 values and returns an existing Value or a constant if the Or can be simplified. Otherwise nullptr is returned. This replaces the more restrictive CreateOr which takes 2 constants. This is the used to implement a folder that uses InstructionSimplify. The logic to simplify `Or` instructions is moved there. Subsequent patches are going to transition other CreateXXX to the more general FoldXXX interface. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D116935	2022-01-11 17:30:48 +00:00
David Sherwood	b0922a9dcd	[LoopVectorize] Make VPWidenCanonicalIVRecipe::execute work for scalable vectors The code in VPWidenCanonicalIVRecipe::execute only worked for fixed-width vectors due to the way we generate the values per lane. This patch changes the code to use a combination of vector splats and step vectors to get the same result. This then works for both fixed-width and scalable vectors. Tests that exercise this code path for scalable vectors have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Differential Revision: https://reviews.llvm.org/D113180	2022-01-10 14:12:32 +00:00
Florian Hahn	aecad5828e	[SCEVExpander] Only create trunc when needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to create TruncTripCount if it is actually used. Sink the TruncTripCount creating into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 11:31:27 +00:00
David Sherwood	e3c84fb948	[LoopVectorize] Add support for tail folding using scalable vectors This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount whereby we weren't correctly generating the runtime trip count for scalable vectors when tail-folding. It also removes some asserts in the tail-folding path for cases when the VF is not scalable. In this patch I have only permitted tail-folding to be enabled explicitly for scalable vectors when the user has specified one of the following flags: -prefer-predicate-over-epilogue=predicate-dont-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue For now it's best not to enable tail-folding with scalable vectors for low trip counts or when optimising for code size, since there has been no analysis on whether this is worth it. Various tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll The tests cannot be target independent because they require masked load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore need to return true. Differential Revision: https://reviews.llvm.org/D113003	2022-01-10 10:55:40 +00:00
Florian Hahn	7f1bf68d7d	[SCEVExpander] Only check overflow if it is needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to check for overflows if the result of the multiplication is actually used. Sink the Or for the overflow check into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-09 12:55:41 +00:00
Florian Hahn	3b7b1a75b0	[LV] Improve check lines in existing tests. Update the check lines in 2 existing tests to use patterns + variables to match some IR to make them independent of value naming.	2022-01-08 20:46:31 +00:00
Florian Hahn	daa5e26312	[LV] Make tests more robust by removing undef. Replace some uses of undef in the tests. The undef causes runtime checks to be trivially fold/removeable, which does defeat the purpose of the tests.	2022-01-08 15:21:57 +00:00
Florian Hahn	9345ab3a45	[SCEVExpander] Skip creating <u 0 check, which is always false. Unsigned compares of the form <u 0 are always false. Do not create such a redundant check in generateOverflowCheck. The patch introduces a new lambda to create the check, so we can exit early conveniently and skip creating some instructions feeding the check. I am planning to sink a few additional instructions as follow-ups, but I would prefer to do this separately, to keep the changes and diff smaller. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116811	2022-01-08 10:31:04 +00:00
Craig Topper	042394b69e	[RISCV] Add a command line option to control the LMUL used by TTI's getRegisterBitWidth. By default we return the width of an LMUL=1 register. We can enable testing with larger LMUL values by returning a larger bit width. This patch adds a RISCV specific option to provide a LMUL which will be multiplied by the LMUL=1 bit width. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D116339	2022-01-07 20:02:10 -08:00
David Green	bc615e436c	[AArch64] Update addo and subo costs Similar to D116732, this adds basic scalar sadd_with_overflow, uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for aarch64, which are usually quite efficiently lowered. Differential Revision: https://reviews.llvm.org/D116734	2022-01-07 16:20:23 +00:00
Florian Hahn	f395a4f8d5	[SCEVExpand] Only create required predicate checks. Currently generateOverflowCheck always creates code for Step being negative and positive, followed by a select at the end depending on Step's sign. This patch updates the code to only create either the checks for step being positive or negative, if the sign is known. Follow-up to D116696. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116747	2022-01-07 14:49:02 +00:00
Florian Hahn	86d113a8b8	[SCEVExpand] Do not create redundant 'or false' for pred expansion. This patch updates SCEVExpander::expandUnionPredicate to not create redundant 'or false, x' instructions. While those are trivially foldable, they can be easily avoided and hinder code that checks the size/cost of the generated checks before further folds. I am planning on look into a few other similar improvements to code generated by SCEVExpander. I remember a while ago @lebedev.ri working on doing some trivial folds like that in IRBuilder itself, but there where concerns that such changes may subtly break existing code. Reviewed By: reames, lebedev.ri Differential Revision: https://reviews.llvm.org/D116696	2022-01-06 11:52:19 +00:00
Sander de Smalen	95a93722db	[LV] Remove what seems like stale code in collectElementTypesForWidening. This was originally added in rG22174f5d5af1eb15b376c6d49e7925cbb7cca6be although that patch doesn't really mention any reasons for ignoring the pointer type in this calculation if the memory access isn't consecutive. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D115356	2022-01-05 12:20:59 +00:00

... 6 7 8 9 10 ...

1929 Commits