clang-p2996

Author	SHA1	Message	Date
Florian Hahn	156b431c66	[LV] Add test with ptr induction used as scalar and vector. Test case inspired by D105199.	2021-07-19 13:15:17 +02:00
Kerry McLaughlin	49d73130ca	[LV] Avoid scalable vectorization for loops containing alloca This patch returns an Invalid cost from getInstructionCost() for alloca instructions if the VF is scalable, as otherwise loops which contain these instructions will crash when attempting to scalarize the alloca. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105824	2021-07-16 11:47:13 +01:00
Sander de Smalen	239d01fa88	Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs." The original patch was: https://reviews.llvm.org/D105806 There were some issues with undeterministic behaviour of the sorting function, which led to scalable-call.ll passing and/or failing. This patch fixes the issue by numbering all instructions in the array first, and using that number as the order, which should provide a consistent ordering. This reverts commit `a607f64118`.	2021-07-16 10:52:01 +01:00
Philip Reames	95346ba877	[LV] Enable vectorization of multiple exit loops w/computable exit counts This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch. The majority of work to support this was done in a set of previous patches. In particular,, `72314466` avoids having multiple edges from the middle block to the exits, and `4b33b2387` which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit. Differential Revision: https://reviews.llvm.org/D105817	2021-07-15 08:53:51 -07:00
Sander de Smalen	a607f64118	Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs." This reverts commit `efaf3099c8`. This reverts commit `dc7bdc1e71`. Reverting patches due to buildbot failures.	2021-07-15 15:21:57 +01:00
Sander de Smalen	dc7bdc1e71	[LV] Fix determinism for failing scalable-call.ll test. The sort function for emitting an OptRemark was not deterministic, which caused scalable-call.ll to fail on some buildbots. This patch fixes that. This patch also fixes an issue where `Instruction::comesBefore()` is called when two Instructions are in different basic blocks, which would otherwise cause an assertion failure.	2021-07-15 13:16:59 +01:00
Sander de Smalen	efaf3099c8	[LV] Print remark when loop cannot be vectorized due to invalid costs. This patch emits remarks for instructions that have invalid costs for a given set of vectorization factors. Some example output: t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load dst[i] = sinf(src[i]); ^ t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32 dst[i] = sinf(src[i]); ^ t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store dst[i] = sinf(src[i]); ^ Reviewed By: fhahn, kmclaughlin Differential Revision: https://reviews.llvm.org/D105806	2021-07-14 17:11:33 +01:00
Sander de Smalen	eac1670739	[CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid. At the moment, <vscale x 1 x eltty> are not yet fully handled by the code-generator, so to avoid vectorizing loops with that VF, we mark the cost for these types as invalid. The reason for not adding a new "TTI::getMinimumScalableVF" is because the type is supposed to be a type that can be legalized. It partially is, although the support for these types need some more work. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D103882	2021-07-14 16:44:22 +01:00
Simon Pilgrim	ae0d73ac3b	[CostModel][X86] Adjust fptosi/fptoui SSE/AVX legalized costs based on llvm-mca reports. Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-12 20:38:25 +01:00
Sander de Smalen	d2e4ccc790	[LV] Ignore candidate VFs with invalid costs. This follows on from discussion on the mailing-list: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html to interpret an Invalid cost as 'infinitely expensive', as this simplifies some of the legalization issues with scalable vectors. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105473	2021-07-12 09:58:22 +01:00
Alexey Bataev	0d74fd3fdf	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Philip Reames	723144665b	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4) Resubmit after the following changes: * Fix a latent bug related to unrolling with required epilogue (see `e49d65f`). I believe this is the cause of the prior PPC buildbot failure. * Disable non-latch exits for epilogue vectorization to be safe (`9ffa90d`) * Split out assert movement (`600624a`) to reduce churn if this gets reverted again. Previous commit message (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-07-07 07:44:35 -07:00
Dylan Fleming	7215dcfe36	[SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths Depends on D104239 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105341	2021-07-07 15:30:10 +01:00
Dylan Fleming	7586b47fb6	[SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104239	2021-07-07 09:58:05 +01:00
Florian Hahn	ef0d147cdc	Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups. This reverts commit `706bbfb35b`. The committed version moves the definition of VPReductionPHIRecipe out of an ifdef only intended for ::print helpers. This should resolve the build failures that caused the revert	2021-07-06 14:15:42 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Florian Hahn	706bbfb35b	Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups This reverts commit `3fed6d443f`, `bbcbf21ae6` and `6c3451cd76`. The changes causing build failures with certain configurations, e.g. https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction, llvm::ArrayRef<llvm::VPValue>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]': LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe' collect2: error: ld returned 1 exit status	2021-07-06 12:10:03 +01:00
Florian Hahn	6c3451cd76	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Florian Hahn	0bb9c05b1e	[LV] Extend FIXME in test add in `91ee1e3799`.	2021-07-05 15:56:03 +01:00
Florian Hahn	91ee1e3799	[LV] Add initial test cases with small clamped indices.	2021-07-05 15:51:12 +01:00
Sjoerd Meijer	ee752134ac	[AArch64] Cost-model i8 vector loads/stores Loads of <4 x i8> vectors were modeled as extremely expensive. And while we don't have a load instruction that supports this, it isn't that expensive to create a vector of i8 elements. The codegen for this was fixed/optimised in D105110. This now tweaks the cost model and enables SLP vectorisation of my motivating case loadi8.ll. Differential Revision: https://reviews.llvm.org/D103629	2021-07-05 11:25:10 +01:00
Simon Pilgrim	cdca1785d3	[CostModel][X86] Adjust uitofp(vXi64) SSE/AVX legalized costs based on llvm-mca reports. Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695. Fixes a few regressions before we start adding AVX costs for legalized types.	2021-07-02 13:09:00 +01:00
Simon Pilgrim	0af9b25aff	[LoopVectorize][X86] Regenerate conversion-cost.ll tests	2021-07-01 15:34:20 +01:00
Philip Reames	e49d65f36d	[LV] Fix bug when unrolling (only) a loop with non-latch exit If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so. The included test case on an unmodified opt will access memory one past the expected bound. As a result, this patch is fixing a latent miscompile. Differential Revision: https://reviews.llvm.org/D103700	2021-06-29 08:04:26 -07:00
Florian Hahn	47215e1c62	[LV] Fix crash when target instruction for sinking is dead. This patch fixes a crash when the target instruction for sinking is dead. In that case, no recipe is created and trying to get the recipe for it results in a crash. To ensure all sink targets are alive, find & use the first previous alive instruction. Note that the case where the sink source is dead is already handled. Found by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104603	2021-06-29 13:31:22 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
Philip Reames	716d2fedbf	Precommit miscompile test from D103700	2021-06-28 16:00:42 -07:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Florian Hahn	f1a6430272	[VPlan] Track both incoming values for first-order recurrence phis. This patch updates VPWidenPHI recipes for first-order recurrences to also track the incoming value from the back-edge. Similar to D99294, which did the same for reductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104197	2021-06-27 14:29:35 +01:00
Florian Hahn	f6ba845da3	[VPlan] Fix indentation of check lines in sinking test (NFC).	2021-06-24 16:39:16 +01:00
Florian Hahn	833bdbe93c	[LV] Support sinking recipe in replicate region after another region. This patch handles sinking a replicate region after another replicate region. In that case, we can connect the sink region after the target region. This properly handles the case for which an assertion has been added in `337d765282`. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D103514	2021-06-24 13:58:42 +01:00
Eli Friedman	8f3d16905d	[ScalarEvolution] Ensure backedge-taken counts are not pointers. A backedge-taken count doesn't refer to memory; returning a pointer type is nonsense. So make sure we always return an integer. The obvious way to do this would be to just convert the operands of the icmp to integers, but that doesn't quite work out at the moment: isLoopEntryGuardedByCond currently gets confused by ptrtoint operations. So we perform the ptrtoint conversion late for lt/gt operations. The test changes are mostly innocuous. The most interesting changes are more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to i64)) + %ptr)". This is expected: we can't fold this to zero because we need to preserve the pointer base. The call to isLoopEntryGuardedByCond in howFarToZero is less precise because of ptrtoint operations; this shows up in the function pr46786_c26_char in ptrtoint.ll. Fixing it here would require more complex refactoring. It should eventually be fixed by future improvements to isImpliedCond. See https://bugs.llvm.org/show_bug.cgi?id=46786 for context. Differential Revision: https://reviews.llvm.org/D103656	2021-06-21 16:24:16 -07:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Florian Hahn	9e77526d46	[VPlan] Add additional tests for region merging. Add additional tests suggested in D100260. Also drop the unneeded `indvars.` prefix from induction phi name.	2021-06-14 11:25:06 +01:00
Mindong Chen	8449af41e5	[LoopVectorize] precommit pr50686.ll for D104148	2021-06-14 13:58:25 +08:00
Florian Hahn	0d9e8f5f4b	[VPlan] Add more sinking/merging tests with predicated loads/stores.	2021-06-12 15:36:51 +01:00
Joachim Meyer	4f01122c3f	[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads. This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL. The question remains why this was initially added and what the implications of removing this optimization would be. Do we need an alternative mechanism to propagate the information about legality of if-conversion? Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general? I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous. Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103907	2021-06-10 23:37:57 +02:00
Kerry McLaughlin	5db52751a5	[CostModel] Return an invalid cost for memory ops with unsupported types Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector type cannot be legalized by widening/splitting. When this is the method of legalization found, getTypeLegalizationCost will return an Invalid cost. The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call getTypeLegalizationCost and will now also return an Invalid cost for unsupported types. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102515	2021-06-08 12:07:36 +01:00
Kerry McLaughlin	14eeccfe9a	[LoopVectorize] Don't use strict reductions when reordering is allowed If the `-enable-strict-reductions` flag is set to true, then currently we will always choose to vectorize the loop with strict in-order reductions. This is not necessary where we allow the reordering of FP operations, such as when loop hints are passed via metadata. This patch moves useOrderedReductions so that we can also check whether loop hints allow reordering, in which case we should use the default behaviour of vectorizing with unordered reductions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103814	2021-06-08 10:39:29 +01:00
Philip Reames	3c6e419198	[SCEV] Properly guard reasoning about infinite loops being UB on mustprogress Noticed via code inspection. We changed the semantics of the IR when we added mustprogress, and we appear to have not updated this location. Differential Revision: https://reviews.llvm.org/D103834	2021-06-07 14:47:36 -07:00
Florian Hahn	1465e7770b	[VPlan] Print successors of VPRegionBlocks. The non-DOT printing does not include the successors of VPregionBlocks. This patch use the same style for printing successors as for VPBasicBlock. I think the printing of successors could be a bit improved further, as at the moment it is hard to ensure a check line matches all successors. But that can be done as follow-up. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D103515	2021-06-07 17:57:21 +01:00
Florian Hahn	8344e215ec	[LV] Update more target-specific tests after `23c2f2e6b2`.	2021-06-07 12:13:21 +01:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
Eli Friedman	925cd6b467	Regenerate a few tests related to SCEV. In preparation for https://reviews.llvm.org/D103656	2021-06-04 13:35:00 -07:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Douglas Yung	18225d4576	Mark test as requiring asserts.	2021-06-01 02:01:01 -07:00
Florian Hahn	aa00b1d763	[LV] Try to sink users recursively for first-order recurrences. Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction. Fixes PR44286 (and another PR or two). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84951	2021-05-31 19:55:33 +01:00
Juneyoung Lee	7161bb87c9	[InsCombine] Fix a few remaining vec transforms to use poison instead of undef This is a patch that replaces shufflevector and insertelement's placeholder value with poison. Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead (D93818) The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 . This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.	2021-05-31 18:47:09 +09:00
Luke	c4c3869554	[RISCV] Enable interleaved vectorization for RVV Enable interleaved vectorization for RVV. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101469	2021-05-29 11:03:27 +08:00

1 2 3 4 5 ...

1401 Commits