clang-p2996

Author	SHA1	Message	Date
Philip Reames	de34d39b66	[RISCV] Cap build vector cost to avoid quadratic cost at high LMULs Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL). Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL. To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold. For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b). buildvector_m1_via_stack.mca:Total Cycles: 1904 buildvector_m2_via_stack.mca:Total Cycles: 2104 buildvector_m4_via_stack.mca:Total Cycles: 2504 buildvector_m8_via_stack.mca:Total Cycles: 3304 buildvector_m1_via_vslide1down.mca:Total Cycles: 804 buildvector_m2_via_vslide1down.mca:Total Cycles: 1604 buildvector_m4_via_vslide1down.mca:Total Cycles: 6400 buildvector_m8_via_vslide1down.mca:Total Cycles: 25599 There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme. Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity. Differential Revision: https://reviews.llvm.org/D159332	2023-09-05 09:03:26 -07:00
Luke Lau	6098d7d5f6	[RISCV] Lower shuffles as rotates without zvbb Now that the codegen for the expanded ISD::ROTL sequence has been improved, it's probably profitable to lower a shuffle that's a rotate to the vsll+vsrl+vor sequence to avoid a vrgather where possible, even if we don't have the vror instruction. This patch relaxes the restriction on ISD::ROTL being legal in lowerVECTOR_SHUFFLEAsRotate. It also attempts to do the lowering twice: Once if zvbb is enabled before any of the interleave/deinterleave/vmerge lowerings, and a second time unconditionally just before it falls back to the vrgather. This way it doesn't interfere with any of the above patterns that may be more profitable than the expanded ISD::ROTL sequence. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D159353	2023-09-04 09:35:12 +01:00
Kazu Hirata	e2e68468f5	[RISCV] Use isNullConstant (NFC)	2023-09-04 00:31:38 -07:00
Matt Arsenault	b14e83d1a4	IR: Add llvm.exp10 intrinsic We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient. https://reviews.llvm.org/D157871	2023-09-01 19:45:03 -04:00
Craig Topper	319aba645f	[RISCV] Teach MatInt to use (ADD_UW X, (SLLI X, 32)) to materialize some constants. If the high and low 32 bits are the same, we try to use (ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since the low 32 bits will be sign extended. If we have Zba we can use add.uw to zero the sign extended bits. Reviewed By: reames, wangpc Differential Revision: https://reviews.llvm.org/D159253	2023-08-31 20:24:34 -07:00
Luke Lau	1664eb05d0	[RISCV] Fix crash during during i1 vector bitreverse lowering A shuffle of v256i1 with a large enough minimum vlen might make it through type legalization and into lowering. In this case, zvl1024b was enough. The bitreverse shuffle lowering would then try to convert this to a v1i256 type which is invalid (v1i128 exists though, which is why the existing v128i1 tests were fine). This patch checks to make sure that the new type is not only legal but also valid. Reviewed By: craig.topper, reames Differential Revision: https://reviews.llvm.org/D159215	2023-08-31 19:39:08 +01:00
Luke Lau	7b33f60f13	[RISCV] Remove vmv_v_x_vl workaround for constant splat. NFC Now that DAG.getConstant uses splat_vector_parts if needed on RV32, we can use it directly without having to manually lower to a vmv_v_x_vl. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D159287	2023-08-31 19:36:09 +01:00
Philip Reames	3e89aca446	[RISCV] Rename getELEN to getELen [nfc] Let's follow the naming scheme use for DLen, XLen, and FLen.	2023-08-31 11:27:00 -07:00
Craig Topper	d1c3784adf	[RISCV] Prefer ShortForwardBranch over the fully generic Zicond expansion. Short forward branch is shorter than (or (czero.eqz), (czero.nez)). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D159295	2023-08-31 11:07:35 -07:00
Philip Reames	079c968eb9	[RISCV] Form vmv.s.f/x from single element splats via DAG combine This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases. The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x. Differential Revision: https://reviews.llvm.org/D158874	2023-08-30 12:44:36 -07:00
Philip Reames	fd465f377c	[RISCV] Move vmv_s_x and vfmv_s_f special casing to DAG combine We'd discussed this in the original set of patches months ago, but decided against it. I think we should reverse ourselves here as the code is significantly more readable, and we do pick up cases we'd missed by not calling the appropriate helper routine. Differential Revision: https://reviews.llvm.org/D158854	2023-08-30 12:04:48 -07:00
Luke Lau	976244bb84	[RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate A rotate of 8 bits of an e16 vector in either direction is equivalent to a byteswap, i.e. vrev8. There is a generic combine on ISD::ROT{L,R} to canonicalize these rotations to byteswaps, but on fixed vectors they are legalized before they have the chance to be combined. This patch teaches the rotate vector_shuffle lowering to emit these rotations as byteswaps to match the scalable vector behaviour. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D158195	2023-08-30 11:01:49 +01:00
Luke Lau	a61c4a0ef6	[RISCV][SelectionDAG] Lower shuffles as bitrotates with vror.vi when possible Given a shuffle mask like <3, 0, 1, 2, 7, 4, 5, 6> for v8i8, we can reinterpret it as a shuffle of v2i32 where the two i32s are bit rotated, and lower it as a vror.vi (if legal with zvbb enabled). We also need to make sure that the larger element type is a valid SEW, hence the tests for zve32x. X86 already did this, so I've extracted the logic for it and put it inside ShuffleVectorSDNode so it could be reused by RISC-V. I originally tried to add this as a generic combine in DAGCombiner.cpp, but it ended up causing worse codegen on X86 and PPC. Reviewed By: reames, pengfei Differential Revision: https://reviews.llvm.org/D157417	2023-08-30 11:01:47 +01:00
Craig Topper	7b5cf52f32	[RISCV] Improve splatPartsI64WithVL for fixed vector constants where Hi and Lo are the same and the VL is constant. If doubling the VL will fit in a vsetivli, use it. It will be cheap to change and cheap to change back. This improves codegen from D158896. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D158896	2023-08-29 09:27:48 -07:00
Craig Topper	398c855457	[RISCV] Improve splatPartsI64WithVL for vlmax scalable vector constants where Hi and Lo are the same. We can use a 32-bit splat and bitcast to i64 vector. This only handles the case where we are using vlmax so that the new vl is cheap to compute. This could be generalized to double the VL. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D158879	2023-08-25 14:15:41 -07:00
Craig Topper	4184bafa9b	[RISCV] Refactor lowerSPLAT_VECTOR_PARTS to use splatPartsI64WithVL for scalable vectors. There was quite a bit of duplication between splatPartsI64WithVL and the scalable vector handling in lowerSPLAT_VECTOR_PARTS, but scalable vector had one additional case. Move that case to splatPartsI64WithVL which improves some fixed vector tests. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D158876	2023-08-25 14:15:40 -07:00
LiaoChunyu	1b12427c01	[VP][RISCV] Add vp.is.fpclass and RISC-V support There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152993	2023-08-25 15:40:55 +08:00
Luke Lau	e772c0ecd8	[RISCV] Use vmv.v.x if Hi bits are undef when lowering splat_vector_parts When lowering a splat_vector_parts, if the hi bits are undefined then we can splat the lo bits without having to check if it's going to be sign extended or not, because those bits will be undefined anyway. I've handled it for both fixed and scalable vectors, but there's no diff on the scalable vror tests, since the hi bits aren't combined away to undef in SimplifyDemanded for scalable vectors. I'm not sure why that is. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158625	2023-08-24 12:19:09 +01:00
Luke Lau	06d3ee9603	[RISCV] Fix wrong operand being used for VL in shift combine At some point a merge operand was added to the binary vl ops, so this combine was using the mask for the VL. This causes a crash when trying to select the vmv_v_x_vl, which showed up locally when messing about with selectVSplat, but thankfully in ToT the vmv_v_x_vl gets pattern matched away into the .vx and .vi operands every time, so there's no noticeable change. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158634	2023-08-23 17:44:21 +01:00
Jianjian GUAN	879e801a91	[RISCV] Apply promotion for f16 vector ops when only have zvfhmin For most fp16 vector ops, we could promote it to fp32 vector when zvfhmin is enable but zvfh is not. But for nxv32f16, we need to split it first since nxv32f32 is not a valid MVT. Reviewed By: michaelmaitland Differential Revision: https://reviews.llvm.org/D153848	2023-08-23 16:49:20 +08:00
Jianjian GUAN	759903568f	[RISCV] Add Zvfhmin extension support for llvm RISCV backend This patch supports Zvfhmin for RISCV codegen. Reviewed By: michaelmaitland Differential Revision: https://reviews.llvm.org/D151414	2023-08-23 16:47:47 +08:00
Philip Reames	c3b48ec6ff	[RISCV] Match strided loads with reversed indexing sequences This extends the concat_vector of loads to strided_load transform to handle reversed index pattern. The previous code expected indexing of the form (a0, a1+S, a2+S,...). However, we can also see indexing of the form (a1+S, a2+S, a3+S, .., aS). This form is a strided load starting at address aN + S(n-1) with stride -S. Note that this is also fixing what looks to be a bug in the memory location reasoning for forward strided case. A strided load with negative stride access eltsize bytes past base ptr, and then bytes before* base ptr. (That is, the range should extend from before base ptr to after base ptr.) Differential Revision: https://reviews.llvm.org/D157886	2023-08-22 07:59:49 -07:00
Philip Reames	ecb855a5a8	[RISCV] Reduce LMUL for vector extracts If we have a known (or bounded) index which definitely fits in a smaller LMUL register group size, we can reduce the LMUL of the slide and extract instructions. This loosens constraints on register allocation, and allows the hardware to do less work, at the potential cost of some additional VTYPE toggles. In practice, we appear (after prior patches) to do a decent job of eliminating the additional VTYPE toggles in most cases. Differential Revision: https://reviews.llvm.org/D158460	2023-08-22 07:36:17 -07:00
Craig Topper	b441fd60b2	[RISCV] Separate hasRoundModeOpNum into separate VXRM and FRM functions. Preparation for developing a new rounding mode insertion algorithm that is going to be different between them since VXRM doesn't need to be save/restored. This also unifies the FRM handling in RISCVISelLowering.cpp between scalar and vector. Fixes outdated comments in RISCVAsmPrinter and sorts the predicate function by the reverse order of the operands being skipped. Reviewed By: eopXD Differential Revision: https://reviews.llvm.org/D158326	2023-08-21 10:00:23 -07:00
Craig Topper	078eb4bd85	[RISCV] Fix a UBSAN failure for passing INT64_MIN to std::abs. clang recently started checking for INT64_MIN being passed to 64-bit std::abs. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D158304	2023-08-18 12:47:52 -07:00
Craig Topper	42dad521e3	[RISCV] Add RISCVII::getRoundModeOpNum to reduce code duplication. NFC	2023-08-16 12:00:02 -07:00
wangpc	ac00cca3d9	[RISCV] Fix assertion when passing f64 vectors via integer registers The vector arguments are split but assignments won't be pending. Fixes #64645 Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157847	2023-08-15 12:11:08 +08:00
Luke Lau	9f369a4c43	[RISCV] Lower reverse shuffles of fixed i1 vectors to vbrev.v If we can fit an entire vector of i1 into a single element, e.g. v32i1 -> v1i32, then we can reverse it via vbrev.v. We need to handle the case where the vector doesn't exactly fit into the larger element type, e.g. v4i1 -> v1i8. In this case we shift up the reversed bits afterwards. Reviewed By: fakepaper56, 4vtomat Differential Revision: https://reviews.llvm.org/D157614	2023-08-14 16:36:58 +01:00
wangpc	8a98f24ec5	[RISCV] Truncate constants to EltSize when combine store of BUILD_VECTOR The constants can be with larger bit width, so we need to truncate them to EltSize or we will exceed the width of fixed-length vector. Fixes #64588 Reviewed By: luke, craig.topper, bjope, michaelmaitland Differential Revision: https://reviews.llvm.org/D157603	2023-08-14 10:55:53 +08:00
Craig Topper	2df9328fe3	[RISCV] Stop performFP_TO_INTCombine from folding with ISD::FRINT. FRINT was added to matchRoundingOp after this function was written. So FRINT was not tested originally. For vectors, folding this causes us to create a CSR swap that tries to write 7 to FRM. This is an illegal value and will cause the CSR write to fail. While this might be a legal fold we could do, I'm disabling it for now so we can backport to LLVM 17 with the least risk. Differential Revision: https://reviews.llvm.org/D157583	2023-08-10 09:30:36 -07:00
Patrick O'Neill	fcad2bbcfc	[RISC-V] Add proposed mapping for Ztso Currently LLVM emits Ztso code for fences, loads, and stores (behind an experimental flag) [1]. This patch updates the mapping and implements support for LR/SC and AMO ops. This updated mapping is compatible with the RVWMO ABI present in the psABI. Additional context can be found in the psABI pull request [2]. [1] https://reviews.llvm.org/D143076 [2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391 Differential Revision: https://reviews.llvm.org/D155517	2023-08-10 15:59:06 +01:00
Luke Lau	5d510ea724	[RISCV] Lower vro{l,r} for fixed vectors We need to add new VL nodes to mirror ISD::ROTL and ISD::ROTR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157295	2023-08-08 09:47:00 +01:00
Luke Lau	768740ef77	[RISCV] Lower unary zvbb ops for fixed vectors This reuses the same strategy for fixed vectors as other ops, i.e. custom lower to a scalable *_vl SD node. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157294	2023-08-08 09:46:57 +01:00
Philip Reames	f0a9aacdb9	[RISCV] Use vmv.s.x for a constant build_vector when the entire size is less than 32 bits We have a variant of this for splats already, but hadn't handled the case where a single copy of the wider element can be inserted producing the entire required bit pattern. This shows up mostly in very small vector shuffle tests. Differential Revision: https://reviews.llvm.org/D157299	2023-08-07 17:15:05 -07:00
Craig Topper	7cc615413f	[RISCV] Add back handling of X > -1 to ISD::SETCC lowering. There are cases where the -1 doesn't become visible until lowering so the folding doesn't have a chance to run. I think in these cases there is a missed DAGCombine for truncate (undef), which I may fix separately, but RISC-V backend should protect itself. Fixes #64503. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157314	2023-08-07 13:00:57 -07:00
Philip Reames	47fe3b3b9a	[RISCV] Use v(f)slide1down for build_vector with dominant values If we have a dominant value, we can still use a v(f)slide1down to handle the last value in the vector if that value is neither undef nor the dominant value. Note that we can extend this idea to any tail of elements, but that's ends up being a near complete merge of the v(f)slide1down insert path, and requires a bit more untangling on profitability heuristics first. Differential Revision: https://reviews.llvm.org/D157120	2023-08-07 07:54:29 -07:00
Alex Bradbury	7a1b2adc45	[RISCV] Implement straight-forward bf16<->int conversion cases This ports over the test cases half-convert.ll and implements patterns or RISCVISelLowering.cpp changes for all of the most straight-forward cases (those that don't require changes outside of lib/Target/RISCV). The remaining cases and noted poor codegen for saturating conversions will be handled in follow-up patches. Differential Revision: https://reviews.llvm.org/D156943	2023-08-07 11:12:51 +01:00
Craig Topper	f36bbb0bd2	[RISCV] Use static_assert to check ranges in hasMergeOp and hasMaskOp. If the ranges are wrong it is better to catch at compile time.	2023-08-04 13:23:23 -07:00
Philip Reames	9f4a2a8636	[RISCV] Separate lowering of constant build vector into a helper [nfc] We have a bunch of special casing for constant vectors, and the costing is generally different. Separate out the logic so that it's easier to follow.	2023-08-04 08:38:18 -07:00
Craig Topper	814250191d	[RISCV] Add vector legalization for fmaximum/fminimum. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D156937	2023-08-04 08:07:14 -07:00
Bjorn Pettersson	4ce7c4a92a	[llvm] Drop some typed pointer handling/bitcasts Differential Revision: https://reviews.llvm.org/D157016	2023-08-03 22:54:33 +02:00
Craig Topper	a8c502a589	[RISCV] Add bf16 to isFPImmLegal. Part of this test file was stolen from D156895. We should merge them when committing. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156926	2023-08-03 08:27:38 -07:00
Alex Bradbury	8a71f44e00	[RISCV] Expand test coverage of bf16 operations with Zfbfmin and fix gaps This doesn't bring us to parity with the test/CodeGen/RISCV/half-* test cases, it simply picks off an initial set that can be supported especially easy. In order to make the review more manageable, I'll follow up with other cases. There is zero innovation in the test cases - they simply take the existing half/float cases and replace f16->bf16 and half->bfloat. Differential Revision: https://reviews.llvm.org/D156895	2023-08-03 07:06:57 +01:00
Jim Lin	40cc106fa0	[RISCV] Scalarize binop followed by extractelement to custom lowered instruction isOperationLegalOrCustomOrPromote returns true only if VT is other or legal and operation action is Legal, Custom or Promote. Permit a vector binary operation can be converted to scalar binary operation which is custom lowered with illegal type. One of cases is i32 isn't a legal type on RV64 and its ALU operations is set to custom lowering, so vadd for element type i32 can be converted to addw. Reviewed By: jacquesguan, craig.topper Differential Revision: https://reviews.llvm.org/D156692	2023-08-03 13:02:49 +08:00
Yeting Kuo	cd79599304	[RISCV] Teach lowerScalarInsert to handle scalar value is the first element of a fixed vector. D155929 teach lowerScalarInsert to handl start value (extractelement scalable_vector, 0) and specifically converts fixed extracted vectors to scalable vectors when lowering vector reduction. It's not enough because there is another way to create (extractelement fixed_vector, 0) as a start value of lowerScalarInsert like #64327. #64327: https://github.com/llvm/llvm-project/issues/64327. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156863	2023-08-03 10:53:14 +08:00
Alex Bradbury	667602793b	[RISCV] Implement support for bf16 select when zfbfmin is enabled These test cases previously caused an error. RISCVInstrInfo::copyPhysReg also needed a tweak in order to account for copying bf16 values in FPR16 registers. Differential Revision: https://reviews.llvm.org/D156883	2023-08-02 20:04:30 +01:00
Craig Topper	d8f9663f1a	[RISCV] Rename RISCVISD::FMINNUM_VL/FMAXNUM_VL to VFMIN_VL/VFMAX_VL. NFC I want these to have RISC-V semantics not LLVM IR semantics. Specifically that -0.0 comes before +0.0. This is needed to emulate FMAXIMUM/FMINIMUM for vectors.	2023-08-02 11:53:06 -07:00
4vtomat	346c1f2641	[RISCV] Support vector crypto extension LLVM IR Depends on D141672 Differential Revision: https://reviews.llvm.org/D138809	2023-08-02 10:25:36 -07:00
Alex Bradbury	be0dac268d	[RISCV] Improve codegen for i8/i16 'atomicrmw xchg a, {0,-1}' As noted in <https://github.com/llvm/llvm-project/issues/64090>, it's more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand with appropriate mask. There are a range of possible ways to go about this - e.g. writing a combine based on the `llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new interface to AtomicExpandPass to allow target-specific atomics conversions, or trying to lift the conversion into AtomicExpandPass itself based on querying some target hook. Ultimately I've gone with what appears to be the simplest approach - just covering this case in emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a different name way back when it was introduced. This also handles the `atomicrmw xchg a, -1` case suggested by Craig during review. Fixes https://github.com/llvm/llvm-project/issues/64090 Differential Revision: https://reviews.llvm.org/D156801	2023-08-02 09:48:50 +01:00
Philip Reames	e938217f81	[RISCV] Implement getOptimalMemOpType for memcpy/memset lowering This patch implements the getOptimalMemOpType callback which is used by the generic mem* lowering in SelectionDAG to pick the widest type used. This patch only changes the behavior when vector instructions are available, as the default is reasonable for scalar. Without this change, we were emitting either XLEN sized stores (for aligned operations) or byte sized stores (for unaligned operations.) Interestingly, the final codegen was nowhere near as bad as that would seem to imply. Generic load combining and store merging kicked in, and frequently (but not always) produced pretty reasonable vector code. The primary effects of this change are: * Enable the use of vector operations for memset of non-constant. Our generic store merging logic doesn't know how to merge a broadcast store, and thus we were seeing the generic (and awful) byte expansion lowering for unaligned memset. * Enable the generic misaligned overlap trick where we write to some of the same bytes twice. The alternative is to either a) use an increasing small sequence of stores for the tail or b) use VL to restrict the vector store. The later is not implemented at this time, so the former is what previously happened. Interestingly, I'm not sure that changing VL (as opposed to the overlap trick) is even obviously profitable here. Differential Revision: https://reviews.llvm.org/D156249	2023-08-01 12:14:50 -07:00

1 2 3 4 5 ...

1241 Commits