clang-p2996

Author	SHA1	Message	Date
LiaoChunyu	f14d18c7a9	[RISCV] Add more patterns for FNMADD D54205 handles fnmadd: -rs1 * rs2 - rs3 This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126852	2022-06-04 12:31:45 +08:00
Craig Topper	cc3bd43533	[RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI. This fixes an inconsistency between RV32 and RV64. Still considering trying to do this peephole during isel, but wanted to fix the inconsistency first. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126986	2022-06-03 18:06:56 -07:00
Craig Topper	8da5d5dbdc	[RISCV] Pre-commit test cases for D126986. NFC	2022-06-03 13:31:45 -07:00
Craig Topper	170c550ca8	[RISCV] Use SelectionDAG::isBaseWithConstantOffset in scalar load/store address matching. Test changes are because isBaseWithConstantOffset uses computeKnownBits and that is able to see that an earlier AND instruction guaranteed alignment so that we can treat an OR as an ADD. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126970	2022-06-03 10:55:28 -07:00
Craig Topper	dbead2388b	[RISCV] Add custom isel for (add X, imm) used by load/stores. If the imm is out of range for an ADDI, we will materialize it in a register using multiple instructions. If the ADD is used by a load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from the constant materialization into the load/store offset. This only works if the ADD has a single use, otherwise the peephole would have to rebuild multiple nodes. This patch instead tries to solve the problem when the add is selected. We check that the add is only used by loads/stores and if it is we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI used as a pointer. As a result we can remove the more complicated peephole from doPeepholeLoadStoreADDI. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126576	2022-06-02 13:45:32 -07:00
Craig Topper	fa20bf1636	[DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative. If C is non-negative, the result of the smax must also be non-negative, so all sign bits of the result are 0. This allows DAGCombiner to remove a zext_inreg in the modified test. This zext_inreg started as a sext that became zext before type legalization then was promoted to a zext_inreg. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126896	2022-06-02 12:34:24 -07:00
Craig Topper	01ba470826	[RISCV] Add test case showing unnecessary extend after i32 smax on rv64. NFC One of the operands of the smax is a positive value so computeKnownBits determines the result of the smax must always be positive. This allows DAG combiner to convert the sign extend to zero extend before type legalization. After type legalization the smax is promoted to i64 by sign extending its inputs and the zero extend becomes an AND instruction. We are unable to remove the AND at this point and it becomes a pair of shifts or a zext.w. The result of smax has as many sign bits as the minimum of its inputs. Had we kept the sign extend instead of turning it into a zero extend it would be removed by DAG combiner after type legalization.	2022-06-02 09:58:11 -07:00
Philip Reames	dcdb0bf25b	[RISCV] Fix an inconsistency with compatible load/store handling Once we've computed the incoming predecessor state, we should use the same compatibility check with knowledge of MI as we did in phase 2 in order to be consistent across all phases. Differential Revision: https://reviews.llvm.org/D126574	2022-06-02 08:03:51 -07:00
jacquesguan	5482ae6328	[LegalizeTypes][VP] Add widen and split support for VP FP integer casting op. This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP. Differential Revision: https://reviews.llvm.org/D126847	2022-06-02 09:05:27 +00:00
jacquesguan	058791d8f2	[LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND. Differential Revision: https://reviews.llvm.org/D126442	2022-06-02 02:21:22 +00:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Philip Reames	f15add7d93	[RISCV] Split fixed-vector-strided-load-store.ll so it can be autogened I've gotten tired of updating register allocation changes by hand, let's just autogen this even if we have to duplicate it.	2022-06-01 16:12:35 -07:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Craig Topper	aeb27f133a	[RISCV] Fix i64<->f64 and i32<->f32 bitcasts with VLS vectors enabled. We enable a custom handler to optimize conversions between scalars and fixed vectors. Unfortunately, the custom handler picks up scalar to scalar conversions as well. If the scalar types are both legal, we wouldn't match any of the fixed vector cases and would return SDValue() causing the LegalizeDAG to expand the bitcast through memory. This patch fixes this by checking if it's a scalar to scalar conversion and returns `Op` if both types are legal. Differential Revision: https://reviews.llvm.org/D126739	2022-06-01 08:13:49 -07:00
wangpc	57203af167	[RISCV] Set target-abi explicitly to reduce codegen results As mentioned in D125947, we can reduce codegen results by adding an explicit hard single-float ABI. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126640	2022-06-01 13:49:23 +08:00
Philip Reames	33b1be5916	[riscv] add test coverage for fractional lmul w/fixed length vectorization	2022-05-31 10:25:37 -07:00
Craig Topper	1b2de79ff4	[RISCV] Use two ADDIs to do some stack pointer adjustments. If the adjustment doesn't fit in 12 bits, try to break it into two 12 bit values before falling back to movImm+add/sub. This is based on a similar idea from isel. Reviewed By: luismarques, reames Differential Revision: https://reviews.llvm.org/D126392	2022-05-31 10:25:28 -07:00
Craig Topper	80c4cf6369	[RISCV] Fix a few corner case bugs in RISCVMergeBaseOffsetOpt::matchLargeOffset The immediate for LUI is stored as 20-bit unsigned value. We need to sign extend if after shifting by 12 to match the instruction behavior. If we find an LUI+ADDI on RV64, it means the constant isn't a simm32. If it was, we would have emitted LUI+ADDIW from constant materialization. Make sure the constant is a simm32 before folding. This appears to match gcc. A future patch will add support for LUI+ADDIW on RV64.	2022-05-31 09:50:54 -07:00
Craig Topper	3b5456d5f0	[RISCV] Pre-commit tests for D126635. NFC	2022-05-31 09:49:46 -07:00
eopXD	2cadf84fc8	[RISCV] Pass OptLevel to `RISCVDAGToDAGISel` correctly Originally, `OptLevel` isn't passed into the `MachineFunctionPass`. This lets the default parameter of `SelectionDAGISel`, which is `CodeGenOpt::Default`, be passed in. OptLevelChanger captures the optimization level with the parameter, and rather not the value within `TargetMachine`. This lets the optimization be unintentionally overwriten if other value than `CodeGenOpt::Default` passed. This patch fixes this by passing the optimization level rather than using the default value. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126641	2022-05-30 17:22:50 -07:00
eopXD	51002bdb5e	[RISCV] Precommit test case to show bug in RISCVISelDagToDag The optimization level should not be restored into O2. This is a pre-commit test case to show fix in D126641. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126677	2022-05-30 15:59:20 -07:00
Ping Deng	88af539c0e	[RISCV] Support VP_REDUCE_MUL mask operation Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126520	2022-05-30 03:05:39 +00:00
Ping Deng	083798e270	[LegalizeTypes][VP] Add integer promotion support for vp.fptosi/vp.fptoui Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125760	2022-05-30 03:05:39 +00:00
Craig Topper	6a6cf2e28d	[RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y) This pattern is what we get after DAG combine for C code like this. short ptr1, ptr2, *ptr3; unsigned diff = ptr1 - ptr2; return ptr3[diff]; Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126588	2022-05-29 18:24:07 -07:00
Craig Topper	e642d0ea21	[RISCV] Add test cases showing missed opportunity to use shXadd.uw. NFC The tests here show the codegen for something like this C code. unsigned diff = ptr1 - ptr2; return ptr3[diff]; The pointer difference is truncated to 32-bits before being used again as an index. In SelectionDAG this appears as an AND between a SRL and a SHL. DAGCombiner will remove the shifts leaving only an AND. The Mask now has 1,2, or 3 trailing zeros and 31, 30, or 29 leading zeros. We end up falling back to constant materialization to create this mask. We could instead use srli followed by slli.uw. Or since we have an add, we can use srli followed by shXadd.uw. Differential Revision: https://reviews.llvm.org/D126589	2022-05-29 18:22:55 -07:00
Philip Reames	85b4470035	[RISCV] Allow PRE of vsetvli involving non-1 LMUL This is a follow up to address a review comment from D124869. When deciding whether to PRE a vsetvli, we can allow non-LMUL1 vsetvlis. Differential Revision: https://reviews.llvm.org/D126563	2022-05-27 15:49:41 -07:00
Craig Topper	542a83c362	[RISCV] Correct load/store alignments in sink-splat-operands.ll. NFC These should be aligned to the natural alignment of the element. Probably copy/paste mistake from the i32 tests. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126567	2022-05-27 14:39:31 -07:00
Philip Reames	d4905a7b20	[RISCV] Add a vsetvli PRE test involving non-1 LMUL	2022-05-27 13:16:05 -07:00
Craig Topper	aaad507546	[RISCV] Return false from isOffsetFoldingLegal instead of reversing the fold in lowering. When lowering GlobalAddressNodes, we were removing a non-zero offset and creating a separate ADD. It already comes out of SelectionDAGBuilder with a separate ADD. The ADD was being removed by DAGCombiner. This patch disables the DAG combine so we don't have to reverse it. Test changes all look to be instruction order changes. Probably due to different DAG node ordering. Differential Revision: https://reviews.llvm.org/D126558	2022-05-27 11:05:18 -07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Philip Reames	8a3b6ba756	[RISCV] Add a subtarget feature to enable unaligned scalar loads and stores A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature. There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment. Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them. Differential Revision: https://reviews.llvm.org/D126085	2022-05-26 15:25:47 -07:00
Craig Topper	460781feef	[LegalizeTypes] Fix bug in expensive checks verification With a fix for an expensive checks build failure exposed by new RISC-V tests. Something about expanding two rotates in type legalization caused a change in the remapping tables that the expensive checks verifying wasn't expecting. See comment in the code for how it was fixed. Tests came from this commit that exposed the bug [RISCV] Add test cases showing failure to remove mask on rotate amounts. If the masking AND has multiple users we fail to remove it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D126036	2022-05-26 13:13:32 -07:00
Philip Reames	afe49934a6	[RISCV] Allow compatible VTYPE in AVL Reg Forward cases During insertion of VSETVLI, we have two related bits of code which decide whether we can reuse a previous vsetvli result. As was pointed out in the original review, these cases can allow any prior state for which we know that VL is the same for any value of AVL. This was originally separated out of a desire for separate tests and review. As it turns out, finding a test case for this has been quite challenging. Most of the cases I tried, we manage to already get through other chains of logic. We do have one correct test change, but that only exercises one of the two changes. Differential Revision: https://reviews.llvm.org/D126400	2022-05-26 08:50:35 -07:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Lian Wang	8aa6b05deb	[LegalizeTypes][VP] Add widen and split support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125950	2022-05-26 02:03:27 +00:00
Haocong.Lu	085acede57	[RISCV][NFC] Remove solved TODO for combining constant shifts Reviewed By: benshi001, asb Differential Revision: https://reviews.llvm.org/D126185	2022-05-26 09:55:19 +08:00
Philip Reames	be2cb824d0	[riscv] Remove mutation of prior vsetvli from insertion dataflow This moves mutation entirely out of the main algorithm. The immediate trigger is that we hit another case of the same issue I thought we'd fixed in `72925d9`. It turns out we hadn't considered the cross block case. As a brief summary, the issue being fixed is that if we mutate a previous vsetvli in phase 3, there's a possibility that some later use of that vsetvli changes "compatibility". In the cross_block_mutate test, this later vsetvli occurs in another block (and is thus visit order dependent too!). This causes us to fail strict asserts. (To be explicit, the current on by default workaround should compensate. It's only when we turn that off that we have problems.) Now, I want to explicitly call out an alternate workaround. We could leave the mutation in phase 3, and simplify restrict it to the case where the previous vsetvli's GPR result is unused. That covers the case we've actually seen. (I'll note that codegen regressions with a simple form of this were significant. We might have to check specifically for the use outside block case to keep them reasonable, which complicates the workaround slightly.) Personally, I'm at the point where I want the mutation pulled out just for robustness sake. I'm worried there's yet one more form of this bug we haven't thought about. The other motivation for this change is that it does give us a couple of minor codegen wins. None appear to be hugely significant, but improvements never hurt right? Differential Revision: https://reviews.llvm.org/D125270	2022-05-25 10:51:14 -07:00
Craig Topper	172149e98c	[RISCV] Preserve fast math flags in lowerVPOp. Update test to check MIR after finalize-isel instead of debug output. This is of course not the only place we should preserve FMF, but it's the most obvious one. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126306	2022-05-25 09:16:07 -07:00
Philip Reames	2a3b6f2cba	[RISCV] Hoist VSETVLI vlmax, vtype out of scalable loops This is a straight forward extension of the PRE transform introduced in D124869 to handle the VLMAX case. The test changes here look quite positive. This surprised me until I realized that all the tests are using @llvm.vscale to figure out the VLMAX, not the llvm.riscv.vsetvlmax intrinsic. If they'd used the later, these would have been full redundancy cases and fully handled by the data flow. I'm not really sure if use of vscale here is representative or not. If it is, we should probably look at using VSETVLI to lower vscale rather than a raw read of vlenb and some math. Differential Revision: https://reviews.llvm.org/D126338	2022-05-25 08:00:27 -07:00
Philip Reames	a4a438f05a	[riscv] Add coverage for fixed length vector loops using LMUL	2022-05-25 07:42:21 -07:00
Lewis Revill	29a5a7c6d4	[RISCV] Add pre-emit pass to make more instructions compressible When optimizing for size, this pass searches for instructions that are prevented from being compressed by one of the following: 1. The use of a single uncompressed register. 2. A base register + offset where the offset is too large to be compressed and the base register may or may not already be compressed. In the first case, if there is a compressed register available, then the uncompressed register is copied to the compressed register and its uses replaced. This is only done if there are enough uses that code size would be improved. In the second case, if a compressed register is available, then the original base register is copied and adjusted such that: new_base_register = base_register + adjustment base_register + large_offset = new_base_register + small_offset and the uses of the base register are replaced with the new base register. Again this is only done if there are enough uses for code size to be improved. This pass was authored by Lewis Revill, with large offset optimization added by Craig Blackmore. Differential Revision: https://reviews.llvm.org/D92105	2022-05-25 09:25:02 +01:00
Craig Topper	66db5312bd	[RISCV] Fix vnsrl/vnsra isel patterns that are dropping VL. We were incorrectly using VLMax instead of the passed VL. Reviewed By: khchen, reames Differential Revision: https://reviews.llvm.org/D126319	2022-05-24 21:38:59 -07:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Philip Reames	948d931323	[RISCV] Ensure the forwarded AVL register is alive When the AVL value does not fit in 5 bits, the register in which this value is stored may be dead when we want to forward it. This patch ensure the kill flags on the register are cleared before forwarding. Patch by: loralb Differential Revision: https://reviews.llvm.org/D125971	2022-05-24 15:07:42 -07:00
Philip Reames	a95ecb20bc	[RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops This patch teaches the VSETVLI insertion pass to perform a very limited form of partial redundancy elimination. The motivating example comes from the fixed length vectorization of a simple loop such as: for (unsigned i = 0; i < a_len; i++) a[i] += b; Without this change, the core vector loop and preheader is as follows: .LBB0_3: # %vector.ph andi a1, a6, -8 addi a4, a0, 16 mv a5, a1 .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 addi a3, a4, -16 vsetivli zero, 4, e32, m1, ta, mu vle32.v v8, (a3) vle32.v v9, (a4) vadd.vx v8, v8, a2 vadd.vx v9, v9, a2 vse32.v v8, (a3) vse32.v v9, (a4) addi a5, a5, -8 addi a4, a4, 32 bnez a5, .LBB0_4 The key thing to note here is that, the execution of the vsetivli only needs to happen once. Since there's no tail folding happening here, the value of the vector configuration registers are invariant through the loop. After this patch, we hoist the configuration into the preheader and perform it once. .LBB0_3: # %vector.ph andi a1, a6, -8 vsetivli zero, 4, e32, m1, ta, mu addi a4, a0, 16 mv a5, a1 .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 addi a3, a4, -16 vle32.v v8, (a3) vle32.v v9, (a4) vadd.vx v8, v8, a2 vadd.vx v9, v9, a2 vse32.v v8, (a3) vse32.v v9, (a4) addi a5, a5, -8 addi a4, a4, 32 bnez a5, .LBB0_4 Differential Revision: https://reviews.llvm.org/D124869	2022-05-24 14:56:01 -07:00
Craig Topper	415b9f595d	Recommit "[RISCV] Use selectShiftMaskXLen ComplexPattern for isel of rotates." This reverts commit `dfe513ae1b`. Tests have been changed to avoid the type legalization bug being fixed in D126036. Original commit message: This will remove masks on the shift amount. We usually get this with SimplifyDemandedBits in DAGCombine, but that's restricted to cases where the AND has a single use. selectShiftMaskXLen does not have that restriction.	2022-05-24 09:41:04 -07:00
Craig Topper	cddeb78e8d	[RISCV] Add test cases showing failure to remove mask on rotate amounts. This is similar to tests I added in `e2f410feea` that had to be reverted. I've modified them to avoid the bug that is being fixed by D126036.	2022-05-24 09:41:00 -07:00
Fraser Cormack	08c9fb8447	[RISCV] Ensure the entire stack is aligned to the RVV stack alignment This patch fixes another bug in the RVV frame lowering. While some frame objects with non-default stack IDs (such scalable-vector alloca instructions) are considered in the target-independent max alignment calculations, others (for example, during calling-convention lowering) are not. This means we'd occasionally align the base of the stack to only 16 bytes, with no way to ensure that the RVV section contained within that is aligned to anything higher. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D125973	2022-05-24 06:58:51 +01:00

1 2 3 4 5 ...

1688 Commits