clang-p2996

Author	SHA1	Message	Date
Florian Hahn	3d3634e8bd	[LV] Add extra test for D139927.	2022-12-14 22:47:05 +00:00
Florian Hahn	e898479f2b	[VPlan] Sink non-uniform recieps for scalar plans. In scalar plans, replicate recipes will only generate a single value per UF, independent of whether they are uniform or not. So don't consider uniformity for plans with scalar VFs only. This allows us to handle a few additional cases in VPlan sinking instead of non-VPlan sinkScalarOperands. Depends on D133762. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D134218	2022-12-14 17:55:31 +00:00
Nikita Popov	5b40015063	[LoopVectorize] Convert some tests to opaque pointers (NFC) For these tests update_test_checks.py had to be rerun.	2022-12-14 15:27:31 +01:00
Nikita Popov	7d7577256b	[LoopVectorize] Convert some tests to opaque pointers (NFC)	2022-12-14 15:16:59 +01:00
Philip Reames	b0f904b6da	[LV] Account for minimum vscale when rejecting scalable vectorization of short loops The vectorizer has code to reject scalable vectorization of loops with very short trip counts, and instead use fixed length vectors. The current code doesn't account for the minimum vscale value known, and thus under estimates the number of lanes in the scalable type for RISCV's default configuration. This results in use of predication and a trivially dead loop where a single straight line piece of code would suffice. Note that the code quality of the original scalable vectorization could (and probably should) be improved other ways as well. This patch is solely about whether the scalable vectorization was the right choice to begin with. This bit of code - both with and without my change - does make the unchecked assumption that the target knows how to lower fixed length vectors whose length is provably less than the vector length. Differential Revision: https://reviews.llvm.org/D137285	2022-12-09 11:29:41 -08:00
liqinweng	6efb45f5ab	[AARCH64][CostModel] Modified the cost of mask vector load/store Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D134413	2022-12-09 14:11:21 +08:00
Sanjay Patel	05dbdb0088	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)" This reverts commit `e71b81cab0`. As discussed in the planned follow-on to this patch (D138874), this and the subsequent patches in this set can cause trouble for the backend, and there's probably no quick fix. We may even want to canonicalize in the opposite direction (towards insertelt).	2022-12-08 14:16:46 -05:00
Bjorn Pettersson	51ee10747d	[test] Remove duplicate RUN lines A few more that I missed in commit `3528e63d89`. There could be more duplicates remaining, since I've only focused on exactly duplicated "RUN: opt" lines (ignoring multi line RUN lines ending with '\').	2022-12-08 12:47:24 +01:00
Bjorn Pettersson	3528e63d89	[test] Remove duplicate RUN lines in Transform tests	2022-12-08 11:47:16 +01:00
Roman Lebedev	1e08a08a87	[NFC] Port all LoopVectorize tests to `-passes=` syntax	2022-12-08 02:38:47 +03:00
Roman Lebedev	be51fa4580	[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax	2022-12-05 22:17:30 +03:00
Florian Hahn	37809c867a	[VPlan] Support sinking VPScalarIVStepsRecipe. This patch extends VP-based sinking to also sink VPScalarStepsRecipe. This takes us a step closer towards retiring the IR based sinking. The main change is extending VPScalarIVStepsRecipe::execute to support executing in a replicate-region. Depends on D133758. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133760	2022-12-04 22:59:17 +00:00
Florian Hahn	fb84dad58b	[LV] Update test to use use variables in CHECK lines. This makes the test more robust with respect to value numbering which will change with future changes.	2022-12-04 11:59:00 +00:00
Matt Arsenault	a74c5707be	Fix some test files with executable permissions	2022-12-02 17:12:03 -05:00
Mel Chen	7b5928e4a7	[NFC] Update Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll.	2022-12-02 01:07:16 -08:00
Philip Reames	73eacf94e0	[RISCV] Incorporate LMUL into costs for arithmetic and shuffles This reuses the routine implemented in `0e6f0b7` to implement several existing TODOs. Many of the operations scale linearly with LMUL; this change represents that in the cost model. Differential Revision: https://reviews.llvm.org/D139039	2022-12-01 10:46:27 -08:00
Sanjay Patel	e71b81cab0	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try) The first attempt was reverted because a clang test changed unexpectedly - the file is already marked with a FIXME, so I just updated it this time to pass. Original commit message: This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 14:52:20 -05:00
Sanjay Patel	5eacdcff06	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1" This reverts commit `a4c466766d`. This broke clang tests that are wrongly dependent on the optimizer.	2022-11-30 14:10:50 -05:00
Sanjay Patel	a4c466766d	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 13:22:04 -05:00
Florian Hahn	0c5df7cd2f	Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit `bf15f1e489`. The updated version fixes a crash by checking the induction kind instead of the opcode; for integer inductions, the step is always added, but the opcode might not be set.	2022-11-30 17:04:20 +00:00
Florian Hahn	c7ca21816a	[LV] Add test showing crash with `0fa666eced`.	2022-11-30 16:51:24 +00:00
Florian Hahn	06ed6edc87	[LV] Update test to use opaque pointers.	2022-11-30 15:21:52 +00:00
William Huang	be4b1dd35b	[InstCombine] Revert D125845 Reverting D125845 `[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back` because multiple users reported performance regression Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D138950	2022-11-29 22:02:40 +00:00
Simon Tatham	e45cbf9923	[ARM,MVE] Update MVE_VMLA_qr for architecture change. In revision B.q and before of the Armv8-M architecture reference manual, the vector/scalar forms of the `vmla` and `vmlas` instructions came in signed and unsigned integer forms, such as `vmla.s8 q0,q1,r2` or `vmlas.u32 q3,q4,r5`. Revision B.r has changed this. There are no longer signed and unsigned versions of these instructions, since they were functionally identical anyway. Now there is just `vmla.i8` (or `i16` or `i32`, and similarly for `vmlas`). Bit 28 of the instruction encoding, which was previously 0 for signed or 1 for unsigned, is now expected to be 0 always. This change updates LLVM to the new version of the architecture. The obsoleted encodings for unsigned integers are now decoding errors, and only the still-valid encoding is ever emitted. This shouldn't break any existing assembly code, because the old signed and unsigned versions of the mnemonic are still accepted by the assembler (which is standard practice anyway for all signedness-agnostic MVE integer instructions). Reviewed By: dmgreen, lenary Differential Revision: https://reviews.llvm.org/D138827	2022-11-29 08:47:00 +00:00
Florian Hahn	bf15f1e489	Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit `0fa666eced`. This triggers an assertion during AArch64 stage2 builds. Revert while I investigate. See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio	2022-11-28 22:43:11 +00:00
Philip Reames	db07d79ab0	[RISCV] Add cost model for integer and float vector arithmetic instructions. This patch implements getArithmeticInstrCost for RISCV, supports cost model for integer and float vector arithmetic instructions. Differential Revision: https://reviews.llvm.org/D133552 (Original patch by jacquesguan. Subset by me with todos added.)	2022-11-28 09:04:38 -08:00
Florian Hahn	0fa666eced	[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe. This patch splits off the logic to transform the canonical IV to a a value for an induction with a different start and step. This transformation only needs to be done once (independent of VF/UF) and enables sinking of VPScalarIVStepsRecipe as follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133758	2022-11-28 16:32:31 +00:00
Florian Hahn	12bb5535d2	[VPlan] Move cast codegen to emitTransformedIndex (NFCI). This reduces duplication a bit. Suggested as simplification in D133758.	2022-11-26 22:47:13 +00:00
Florian Hahn	ed2fdace89	[LV] Use separate index to access StoredValues in vectorizeInterleave. StoredValues only has entries for members of the interleave group. If there are gaps, then using the index i here will either access a wrong entry or be out-of-bounds. Instead use a dedicated index that only gets incremented for members of the interleave group. Fixes #59090.	2022-11-25 15:28:05 +00:00
Mel Chen	846cdf0198	[RISCV] Enable reduction pattern SelectICmp and SelectFCmp. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D137940	2022-11-21 00:31:44 -08:00
David Green	662b5f1846	[ARM] Add an extra test for low trip count MVE vectorization. NFC This is quite reduced from the original example, but hopefully shows where vectorization is unprofitable because of multiple factors including the low trip count of the loop.	2022-11-17 15:07:28 +00:00
Roman Lebedev	8e37b53360	[X86] Rewrite `getScalarizationOverhead()` All of our insert/extract ops work on 128-bit lanes. For `Insert`, we need to extract affected 128-bit lane, unless it's being fully overwritten (FIXME: do we need to be careful about legalization-induced padding that we obviously don't demand?), perform insertions, and then insert the 128-bit lane back. But hold on. If we are operating on an 256-bit legal vector, and thus have two 128-bit subvectors, and are fully overwriting them both, we don't actually need to insert both subvectors, only the second one, into the implicitly-widened first one. Also, `Insert` wasn't actually querying the costs, but just assuming them to be `1`. `getShuffleCost(TTI::SK_ExtractSubvector)` notes: ``` // Note that in general, the insertion starting at the beginning of a vector // isn't free, because we need to preserve the rest of the wide vector. ``` ... so as far as i can tell, we didn't account for that. I was hoping this would allow vectorization at a higher VF at one case i looked at, but the subvector insertion cost is still dis-advising that. The change for `Extract` is NFC, and is for consistency only, i wanted to get rid of of that weird explicit discounting of insertion of 0'th element, since the general code should already deal with that. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137913	2022-11-15 21:07:12 +03:00
Nikita Popov	458ae539df	[AST] Remove legacy AliasSetPrinter pass A NewPM version of this pass exists, drop the legacy version of this testing-only pass.	2022-11-14 15:50:38 +01:00
Florian Hahn	758699c399	[VectorUtils] Skip interleave members with diff type and alloca sizes. Currently, codegen doesn't support cases where the type size doesn't match the alloc size. Skip them for now. Fixes #58722.	2022-11-13 22:06:20 +00:00
Karthik Senthil	d9c52c31a0	[LV][IVDescriptors] Fix recurrence identity element for FMin and FMax reductions For a min and max reduction idioms, the identity (i.e. neutral) element should be datatype's highest and lowest possible values respectively. Current implementation in IVDescriptors incorrectly returns -Inf for FMin reduction and +Inf for FMax reduction. This patch fixes this bug which was causing incorrect reduction computation results in loops vectorized by LV. Differential Revision: https://reviews.llvm.org/D137220	2022-11-04 10:39:37 -04:00
LiDongjin	d1cee3539f	[LoopVectorize] Fix crash on "Cannot dereference end iterator!"(PR56627) Check hasOneUser before user_back(). Differential Revision: https://reviews.llvm.org/D136227	2022-11-03 23:13:37 +08:00
Craig Topper	b6ad7ab89e	[RISCV] Prevent autovectorization using vscale with Zvl32b. RVVBitsPerBlock is 64. If VLen==32, VLen/RVVBitsPerBlock is 0. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D137280	2022-11-02 13:55:21 -07:00
Philip Reames	86f9655373	[LV][RISCV] Add test showing poor choice of VF for short loop	2022-11-02 13:06:45 -07:00
Patrick Walton	f3d49dbcb1	[test] Remove readonly from some parameters that are written through in tests. In D136659 I found a few tests that write through readonly parameters: * Analysis/BasicAA/pr18573.ll: @foo1 writes through %arr.ptr, but declares it readonly. I removed the readonly annotation. * CodeGen/ARM/ParallelDSP/aliasing.ll: @restrict writes through the readonly %arg3, @store_alias_arg3_illegal_1 writes through the readonly %arg3, and @store_alias_arg3_illegal_2 writes through the readonly %arg3. I removed readonly from all three. Also, I added some CHECK-LABEL directives to make it harder for FileCheck output to be mixed up. * Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll: @gather_nxv4i32_ind64_stride2 writes through the readonly %a. I removed the readonly attribute. * Transforms/LoopVectorize/interleaved-accesses.ll: @load_gap_reverse writes through the readonly %P1 and %P2. Also, the corresponding C code in the comment didn't match the test. I removed the readonly attribute from both parameters and corrected the C code. Differential Revision: https://reviews.llvm.org/D136880	2022-10-29 15:05:20 -07:00
Florian Hahn	43f0f1a66f	[VPlan] Use onlyFirstLaneUsed in sinkScalarOperands. Replace custom code to check if only the first lane is used by generic helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few additional cases and was suggested in D133760. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136368	2022-10-29 19:45:19 +01:00
Philip Reames	269bc684e7	[LV][RISCV] Disable vectorization of epilogue loops Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder. In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV. In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV. As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination. As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV. Differential Revision: https://reviews.llvm.org/D136695	2022-10-25 14:28:02 -07:00
David Green	093b4011e8	[ARM] Add a test demonstrating reductions with reused extend. NFC D136227 showed that tests for this case in getReductionPatternCost were missing.	2022-10-24 19:38:19 +01:00
Florian Hahn	7eb4ec1c75	[VPlan] Print predicates for widened cmp instructions (NFC).	2022-10-21 08:54:11 +01:00
William Huang	6c767cef5a	[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125845	2022-10-20 17:41:26 +00:00
Florian Hahn	e25ed058bc	[LV] Use buildScalarSteps to also handle VF = 1. (NFCI) The code in buildScalarSteps already properly handles creating the scalar induction values with VF = 1. Use it directly instead of using extra code to handle that case. Suggested by @Ayal in D133760.	2022-10-20 14:30:01 +01:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
Craig Topper	44f0b13494	[RISCV] Correct RISCVTTIImpl::getRegUsageForType for vectors of pointers. getPrimitiveSizeInBits returns 0 for pointers, we need to query the size via DataLayout instead. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135976	2022-10-14 11:34:12 -07:00
Florian Hahn	518bccfd6e	[LV] Add epilogue test with variable induction start value. Add additional test mentioned by @venkataramanan.kumar.llvm in D92132.	2022-10-13 15:56:27 +01:00
Florian Hahn	26c8632f22	[LV] Add extra tests for epilogue vectorization with widened inductions. Extend test coverage to also include inductions with step > 1 and also with runtime trip counts.	2022-10-12 15:21:38 +01:00
Florian Hahn	c1fe52bfa6	[VPlan] Remove dead recipes before sinking. optimizeInductions may leave dead recipes which can prevent sinking. Sinking on the other hand should not introduce new dead recipes, so clean up dead recipes before sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133762	2022-10-12 12:49:42 +01:00

1 2 3 4 5 ...

1907 Commits