clang-p2996

Author	SHA1	Message	Date
Graham Hunter	dee810e117	[NFC][LAA] Precommit tests for forked pointers Precommit for https://reviews.llvm.org/D108699	2021-11-24 16:20:35 +00:00
Florian Hahn	a7648eb2aa	[LV] Use patterns in some induction tests, to make more robust. (NFC)	2021-11-24 13:32:24 +00:00
Rosie Sumpter	df32a39dd0	[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic This patch updates the cost model for ordered reductions so that a call to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction plus the cost of an ordered fadd reduction. Differential Revision: https://reviews.llvm.org/D111630	2021-11-24 08:50:05 +00:00
Rosie Sumpter	2d33327f9d	[LoopVectorize] Print fast-math flags for VPReductionRecipe	2021-11-24 08:50:05 +00:00
Rosie Sumpter	991074012a	[LoopVectorize] Propagate fast-math flags for VPInstruction In-loop vector reductions which use the llvm.fmuladd intrinsic involve the creation of two recipes; a VPReductionRecipe for the fadd and a VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags these should be propagated through to the fmul instruction, so an interface setFastMathFlags has been added to the VPInstruction class to enable this. Differential Revision: https://reviews.llvm.org/D113125	2021-11-24 08:50:04 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Huihui Zhang	9cd7c534e2	[InstCombine] Enable fold select into operand for FAdd, FMul, FSub and FDiv. For FAdd, FMul, FSub and FDiv, fold select into one of the operands to enable further optimizations, i.e., floating-point reduction detection. Turn code: %C = fadd %A, %B %D = select %cond, %C, %A into: %C = select %cond, %B, -0.000000e+00 %D = fadd %A, %C Alive2 verification (with --disable-undef-input), timed out otherwise. FAdd - https://alive2.llvm.org/ce/z/eUxN4Y FMul - https://alive2.llvm.org/ce/z/5SWZz4 FSub - https://alive2.llvm.org/ce/z/Dhj8dU FDiv - https://alive2.llvm.org/ce/z/Yj_NA2 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113442	2021-11-22 15:10:10 -08:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Diego Caballero	a7027bb799	[LV] Pre-commit test for D111846 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D112054	2021-11-22 10:13:56 +00:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Kerry McLaughlin	ff64b2933a	[LoopVectorize] Check the number of uses of an FAdd before classifying as ordered checkOrderedReductions looks for Phi nodes which can be classified as in-order, meaning they can be vectorised without unsafe math. In order to vectorise the reduction it should also be classified as in-loop by getReductionOpChain, which checks that the reduction has two uses. In this patch, a similar check is added to checkOrderedReductions so that we now return false if there are more than two uses of the FAdd instruction. This fixes PR52515. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D114002	2021-11-18 16:41:19 +00:00
Florian Hahn	dead1c11ff	[LV] Add basic check lines to test added in `00200dbda3`.	2021-11-18 14:08:57 +00:00
Florian Hahn	00200dbda3	[LV] Add test case for PR52024. This patch adds a reduced version of the test case from PR52024. Together with `764d9aa979` the test causes a crash, because LV expands a SCEV expression during code generation, when the dominator tree is not up-to-date.	2021-11-18 12:10:44 +00:00
David Sherwood	8d77555b12	[Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector When getTypeConversion returns TypeScalarizeScalableVector we were sometimes returning a non-simple type from getTypeLegalizationCost. However, many callers depend upon this being a simple type and will crash if not. This patch changes getTypeLegalizationCost to ensure that we always a return sensible simple VT. If the vector type contains unusual integer types, e.g. <vscale x 2 x i3>, then we just set the type to MVT::i64 as a reasonable default. A test has been added here that demonstrates the vectoriser can correctly calculate the cost of vectorising a "zext i3 to i64" instruction with a VF=vscale x 1: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113777	2021-11-17 13:11:58 +00:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
David Green	309f1e4ac8	[ARM] Add datalayout to costmodel tests. NFC This adds a sensible datalayout to the ARM cost model tests, to prevent the costs reported being incorrect for the size of pointers.	2021-11-16 09:49:42 +00:00
Florian Hahn	112c1c346a	[IVDescriptor] Make sure the sign is included for negative extension. At the moment, computeRecurrenceType does not include any sign bits in the maximum bit width. If the value can be negative, this means the sign bit will be missing and the sext won't properly extend the value. If the value can be negative, increment the bitwidth by one to make sure there is at least one sign bit in the result value. Note that the increment is also needed if the value is known to be negative, as a sign bit needs to be preserved for the sext to work. Note that this at the moment prevents vectorization, because the analysis computes i1 as type for the recurrence when looking through the AND in lookThroughAnd. Fixes PR51794, PR52485. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113056	2021-11-15 13:12:57 +00:00
Simon Pilgrim	fbe72e41b9	[LoopVectorize] Add PR41179 test case	2021-11-14 21:54:23 +00:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Florian Hahn	30ebdf8a6d	[LV] Precommit test case from PR52485.	2021-11-12 16:09:19 +00:00
Kerry McLaughlin	7647822156	[AArch64][SVE] Remove i1 type from isElementTypeLegalForScalableVector `collectElementTypesForWidening` collects the types of load, store and reduction Phis in a loop. These types are later checked using `isElementTypeLegalForScalableVector` to prevent vectorisation of loops with instruction types that are unsupported. This patch removes i1 from the list of types supported for scalable vectors. This fixes an assert ("Cannot yet scalarize uniform stores") in `setCostBasedWideningDecision` when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113680	2021-11-12 14:24:38 +00:00
Florian Hahn	e7f1232cb7	[LV] Move optimized IV recipes to phi section of header after sinking. Unfortunately sinking recipes for first-order recurrences relies on the original position of recipes. So if a recipes needs to be sunk after an optimized induction, it needs to stay in the original position, until sinking is done. This is causing PR52460. To fix the crash, keep the recipes in the original position until sink-after is done. Post-commit follow-up to `c45045bfd0` to address PR52460.	2021-11-10 11:41:08 +00:00
Kerry McLaughlin	6f16ee5e14	Revert "[LoopVectorize] Extract the last lane from a uniform store" This reverts commit `0d748b4d32`. This is causing some failures when building Spec2017 with scalable vectors. Reverting to investigate.	2021-11-10 11:21:19 +00:00
Dmitry Makogon	62f86d4f95	Reapply `5ec2386` "Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit `7cd273c339`. Several patches with tests fixes have been applied: `0cada82f0a` "[Test] Remove incorrect test in GVN" `97cb13615d` "[Test] Separate IndVars test into AArch64 and X86 parts" `985cc490f1` "[Test] Remove separated test in IndVars", and test failures caused by `5ec2386` should be resolved now.	2021-11-10 17:36:14 +07:00
David Sherwood	2a48b6993a	[IR] In ConstantFoldShuffleVectorInstruction use zeroinitializer for splats of 0 When creating a splat of 0 for scalable vectors we tend to create them with using a combination of shufflevector and insertelement, i.e. shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 0, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) However, for the case of a zero splat we can actually just replace the above with zeroinitializer instead. This makes the IR a lot simpler and easier to read. I have changed ConstantFoldShuffleVectorInstruction to use zeroinitializer when creating a splat of integer 0 or FP +0.0 values. Differential Revision: https://reviews.llvm.org/D113394	2021-11-10 09:42:58 +00:00
Douglas Yung	7cd273c339	Revert "Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit `5ec2386332`. This change is causing test failures on the PS4 linux build bot: https://lab.llvm.org/buildbot/#/builders/139/builds/12871	2021-11-09 10:28:41 -08:00
Kerry McLaughlin	0d748b4d32	[LoopVectorize] Extract the last lane from a uniform store Changes VPReplicateRecipe to extract the last lane from an unconditional, uniform store instruction. collectLoopUniforms will also add stores to the list of uniform instructions where Legal->isUniformMemOp is true. setCostBasedWideningDecision now sets the widening decision for all uniform memory ops to Scalarize, where previously GatherScatter may have been chosen for scalable stores. This fixes an assert ("Cannot yet scalarize uniform stores") in setCostBasedWideningDecision when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: sdesmalen, david-arm, fhahn Differential Revision: https://reviews.llvm.org/D112725	2021-11-09 14:43:16 +00:00
Dmitry Makogon	5ec2386332	Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs" This reapplies patch `db289340c8`. The test failures on build with expensive checks caused by the patch happened due to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort, which shuffles the given container if the expensive checks are enabled, so equivalent Phis in the sorted vector had different mutual order from run to run. replaceCongruentIVs tries to replace narrow Phis with truncations of wide ones. In some test cases there were several Phis with the same width, so if their order differs from run to run, the narrow Phis would be replaced with a different Phi, depending on the shuffling result. The patch `ae14fae0ff` fixed this issue by replacing llvm::sort with llvm::stable_sort.	2021-11-09 17:42:29 +07:00
Florian Hahn	e3bfb6a146	[VPlan] Make sure recurrence splice is not inserted between phis. All phi-like recipes should be at the beginning of a VPBasicBlock with no other recipes in between. Ensure that the recurrence-splicing recipe is not added between phi-like recipes, but after them. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111301	2021-11-08 17:42:32 +00:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
Dmitry Makogon	8d4eba6c0d	Revert "[IndVars] Pass TTI to replaceCongruentIVs" This reverts commit `db289340c8`. The patch caused 2 crashes with expensive checks enabled.	2021-11-08 19:35:14 +07:00
Dmitry Makogon	db289340c8	[IndVars] Pass TTI to replaceCongruentIVs In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'. This function optionally takes a TTI argument to be able to replace narrow IVs uses with truncates of the widest one. For some reason the TTI wasn't passed to the function, so it couldn't perform such transform. This patch fixes it. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D113024	2021-11-08 19:20:53 +07:00
David Sherwood	c42bb30b9e	[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor we bail out if the main vector loop uses a scalable VF. This patch adds support for generating epilogue vector loops using a fixed-width VF when the main vector loop uses a scalable VF. I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor so that we convert the scalable VF into a fixed-width VF and do profitability checks on that instead. In addition, since the scalable and fixed-width VFs live in different VPlans that means I had to change the calls to LVP.hasPlanWithVFs so that we only pass in the fixed-width VF. New tests added here: Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll Differential Revision: https://reviews.llvm.org/D109432	2021-11-08 09:41:13 +00:00
David Sherwood	9da8dde7fd	[NFC][LoopVectorize] Add test for tail-folding loop with conditional uniform load I've added a test for a loop containing a conditional uniform load for a target that supports masked loads. The test just ensures that we correctly use gather instructions and have the correct mask. Differential Revision: https://reviews.llvm.org/D112619	2021-11-03 09:51:11 +00:00
Florian Hahn	e515d3a433	[LV] Add test case from PR51794 for over-eager truncation. This patch adds a test case for PR51794 where reductions are performed on types that are too small.	2021-11-02 22:15:09 +01:00
Rosie Sumpter	dcb8222d87	[LoopVectorize] Propagate fast-math flags for inloop reductions This patch updates VPReductionRecipe::execute so that the fast-math flags associated with the underlying instruction of the VPRecipe are propagated through to the reductions which are created. Differential Revision: https://reviews.llvm.org/D112548	2021-11-02 08:59:53 +00:00
David Sherwood	87a294d5eb	[LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion We never expect the runtime VF to be negative so we should use the uitofp instruction instead of sitofp. Differential revision: https://reviews.llvm.org/D112610	2021-11-01 09:58:14 +00:00
Florian Hahn	c45045bfd0	[VPlan] Keep induction recipes in header. This patch updates recipe creation to ensure all VPWidenIntOrFpInductionRecipes are in the header block. At the moment, new induction recipes can be created in different blocks when trying to optimize casts and induction variables. Having all induction recipes in the header makes it easier to analyze/transform them in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111300	2021-10-28 18:22:05 +01:00
Philip Reames	6caff716da	Regen some autogen tests to account for format change	2021-10-28 09:22:20 -07:00
Roman Lebedev	b291597112	Revert rest of `IRBuilderBase`'s short-circuiting folds Upon further investigation and discussion, this is actually the opposite direction from what we should be taking, and this direction wouldn't solve the motivational problem anyway. Additionally, some more (polly) tests have escaped being updated. So, let's just take a step back here. This reverts commit `f3190dedee`. This reverts commit `749581d21f`. This reverts commit `f3df87d57e`. This reverts commit `ab1dbcecd6`.	2021-10-28 02:15:14 +03:00
Roman Lebedev	101aaf62ef	Revert "[NFC] `IRBuilderBase::CreateAdd()`: place constant onto RHS" Clang OpenMP codegen tests are failing, will recommit afterwards. This reverts commit `4723c9b3c6`.	2021-10-27 22:21:37 +03:00
Roman Lebedev	42712698fd	Revert "[IR] `IRBuilderBase::CreateAdd()`: short-circuit `x + 0` --> `x`" Clang OpenMP codegen tests are failing. This reverts commit `288f1f8abe`. This reverts commit `cb90e5356a`.	2021-10-27 22:21:37 +03:00
Roman Lebedev	cb90e5356a	[IR] `IRBuilderBase::CreateAdd()`: short-circuit `x + 0` --> `x` There's precedent for that in `CreateOr()`/`CreateAnd()`. The motivation here is to avoid bloating the run-time check's IR in `SCEVExpander::generateOverflowCheck()`. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 21:34:38 +03:00
Roman Lebedev	4723c9b3c6	[NFC] `IRBuilderBase::CreateAdd()`: place constant onto RHS	2021-10-27 21:34:38 +03:00
Roman Lebedev	156f10c840	[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ While generally i don't like such hacks, we have a very good reason to do this: here we are expanding a run-time correctness check for the vectorization, and said `umul_with_overflow` will not be optimized out before we query the cost of the checks we've generated. Which means, the cost of run-time checks would be artificially inflated, and after https://reviews.llvm.org/D109368 that will affect the minimal trip count for which these checks are even evaluated. And if they aren't even evaluated, then the vectorized code certainly won't be run. We could consider doing this in IRBuilder, but then we'd need to also teach `CreateExtractValue()` to look into chain of `insertvalue`'s, and i'm not sure there's precedent for that. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 19:45:55 +03:00
Roman Lebedev	f3df87d57e	[IR] `IRBuilderBase::CreateOr()`: fix short-circuiting for constant on LHS There is no guarantee that the constant is on RHS here, we have to handle both cases. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 18:01:06 +03:00
Roman Lebedev	ab1dbcecd6	[IR] `IRBuilderBase::CreateSelect()`: if cond is a constant i1, short-circuit While we could emit such a tautological `select`, it will stick around until the next instsimplify invocation, which may happen after we count the cost of this redundant `select`. Which is precisely what happens with loop vectorization legality checks, and that artificially increases the cost of said checks, which is bad. There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 18:01:05 +03:00
Roman Lebedev	5a8a7b3bf8	[NFC] Re-autogenerate check lines in some tests to ease of future update	2021-10-27 18:01:05 +03:00
David Sherwood	3d706c20f8	[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor I have removed LoopVectorizationPlanner::setBestPlan, since this function is quite aggressive because it deletes all other plans except the one containing the <VF,UF> pair required. The code is currently written to assume that all <VF,UF> pairs will live in the same vplan. This is overly restrictive, since scalable VFs live in different plans to fixed-width VFS. When we add support for vectorising epilogue loops when the main loop uses scalable vectors then we will the vplan for the main loop will be different to the epilogue. Instead I have added a new function called LoopVectorizationPlanner::getBestPlanFor that returns the best vplan for the <VF,UF> pair requested and leaves all the vplans untouched. We then pass this best vplan to LoopVectorizationPlanner::executePlan which now takes an additional VPlanPtr argument. Differential revision: https://reviews.llvm.org/D111125	2021-10-27 09:38:27 +01:00
Roman Lebedev	e1db72703f	[NFC] Re-harden test/Transforms/LoopVectorize/X86/pr48340.ll This test is quite fragile WRT improvements to the interleaved load cost modelling. Let's bump the stride way up so that is no longer a concern.	2021-10-22 15:07:53 +03:00

... 7 8 9 10 11 ...

1897 Commits