Commit Graph

1241 Commits

Author SHA1 Message Date
Philip Reames
de34d39b66 [RISCV] Cap build vector cost to avoid quadratic cost at high LMULs
Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL).  Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL.  To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.

For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b).
  buildvector_m1_via_stack.mca:Total Cycles: 1904
  buildvector_m2_via_stack.mca:Total Cycles: 2104
  buildvector_m4_via_stack.mca:Total Cycles: 2504
  buildvector_m8_via_stack.mca:Total Cycles: 3304
  buildvector_m1_via_vslide1down.mca:Total Cycles:  804
  buildvector_m2_via_vslide1down.mca:Total Cycles:  1604
  buildvector_m4_via_vslide1down.mca:Total Cycles:  6400
  buildvector_m8_via_vslide1down.mca:Total Cycles: 25599

There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.

Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.

Differential Revision: https://reviews.llvm.org/D159332
2023-09-05 09:03:26 -07:00
Luke Lau
6098d7d5f6 [RISCV] Lower shuffles as rotates without zvbb
Now that the codegen for the expanded ISD::ROTL sequence has been improved,
it's probably profitable to lower a shuffle that's a rotate to the
vsll+vsrl+vor sequence to avoid a vrgather where possible, even if we don't
have the vror instruction.

This patch relaxes the restriction on ISD::ROTL being legal in
lowerVECTOR_SHUFFLEAsRotate. It also attempts to do the lowering twice: Once
if zvbb is enabled before any of the interleave/deinterleave/vmerge lowerings,
and a second time unconditionally just before it falls back to the vrgather.
This way it doesn't interfere with any of the above patterns that may be more
profitable than the expanded ISD::ROTL sequence.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D159353
2023-09-04 09:35:12 +01:00
Kazu Hirata
e2e68468f5 [RISCV] Use isNullConstant (NFC) 2023-09-04 00:31:38 -07:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Craig Topper
319aba645f [RISCV] Teach MatInt to use (ADD_UW X, (SLLI X, 32)) to materialize some constants.
If the high and low 32 bits are the same, we try to use
(ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since
the low 32 bits will be sign extended.

If we have Zba we can use add.uw to zero the sign extended bits.

Reviewed By: reames, wangpc

Differential Revision: https://reviews.llvm.org/D159253
2023-08-31 20:24:34 -07:00
Luke Lau
1664eb05d0 [RISCV] Fix crash during during i1 vector bitreverse lowering
A shuffle of v256i1 with a large enough minimum vlen might make it through type
legalization and into lowering. In this case, zvl1024b was enough. The
bitreverse shuffle lowering would then try to convert this to a v1i256 type
which is invalid (v1i128 exists though, which is why the existing v128i1 tests
were fine).

This patch checks to make sure that the new type is not only legal but also
valid.

Reviewed By: craig.topper, reames

Differential Revision: https://reviews.llvm.org/D159215
2023-08-31 19:39:08 +01:00
Luke Lau
7b33f60f13 [RISCV] Remove vmv_v_x_vl workaround for constant splat. NFC
Now that DAG.getConstant uses splat_vector_parts if needed on RV32, we can use
it directly without having to manually lower to a vmv_v_x_vl.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D159287
2023-08-31 19:36:09 +01:00
Philip Reames
3e89aca446 [RISCV] Rename getELEN to getELen [nfc]
Let's follow the naming scheme use for DLen, XLen, and FLen.
2023-08-31 11:27:00 -07:00
Craig Topper
d1c3784adf [RISCV] Prefer ShortForwardBranch over the fully generic Zicond expansion.
Short forward branch is shorter than (or (czero.eqz), (czero.nez)).

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D159295
2023-08-31 11:07:35 -07:00
Philip Reames
079c968eb9 [RISCV] Form vmv.s.f/x from single element splats via DAG combine
This re-implements the special casing we had in lowerScalarSplat as a DAG combine. As can be seen in the tests, this ends up triggering in a bunch more cases.

The semantically interesting bit of this change is the use of the implicit truncate semantics for when XLEN > SEW. We'd already been doing this for vmv.v.x, but this change extends e.g. the constant matching to make the same assumption about vmv.s.x. Per my reading of the specification, this should be fine, and if anything, is more obviously true of vmv.s.x than vmv.v.x.

Differential Revision: https://reviews.llvm.org/D158874
2023-08-30 12:44:36 -07:00
Philip Reames
fd465f377c [RISCV] Move vmv_s_x and vfmv_s_f special casing to DAG combine
We'd discussed this in the original set of patches months ago, but decided against it. I think we should reverse ourselves here as the code is significantly more readable, and we do pick up cases we'd missed by not calling the appropriate helper routine.

Differential Revision: https://reviews.llvm.org/D158854
2023-08-30 12:04:48 -07:00
Luke Lau
976244bb84 [RISCV] Canonicalize vrot{l,r} to vrev8 when lowering shuffle as rotate
A rotate of 8 bits of an e16 vector in either direction is equivalent to a
byteswap, i.e. vrev8. There is a generic combine on ISD::ROT{L,R} to
canonicalize these rotations to byteswaps, but on fixed vectors they are
legalized before they have the chance to be combined. This patch teaches the
rotate vector_shuffle lowering to emit these rotations as byteswaps to match
the scalable vector behaviour.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158195
2023-08-30 11:01:49 +01:00
Luke Lau
a61c4a0ef6 [RISCV][SelectionDAG] Lower shuffles as bitrotates with vror.vi when possible
Given a shuffle mask like <3, 0, 1, 2, 7, 4, 5, 6> for v8i8, we can
reinterpret it as a shuffle of v2i32 where the two i32s are bit rotated, and
lower it as a vror.vi (if legal with zvbb enabled).
We also need to make sure that the larger element type is a valid SEW, hence
the tests for zve32x.

X86 already did this, so I've extracted the logic for it and put it inside
ShuffleVectorSDNode so it could be reused by RISC-V. I originally tried to add
this as a generic combine in DAGCombiner.cpp, but it ended up causing worse
codegen on X86 and PPC.

Reviewed By: reames, pengfei

Differential Revision: https://reviews.llvm.org/D157417
2023-08-30 11:01:47 +01:00
Craig Topper
7b5cf52f32 [RISCV] Improve splatPartsI64WithVL for fixed vector constants where Hi and Lo are the same and the VL is constant.
If doubling the VL will fit in a vsetivli, use it. It will be cheap
to change and cheap to change back.

This improves codegen from D158896.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158896
2023-08-29 09:27:48 -07:00
Craig Topper
398c855457 [RISCV] Improve splatPartsI64WithVL for vlmax scalable vector constants where Hi and Lo are the same.
We can use a 32-bit splat and bitcast to i64 vector.

This only handles the case where we are using vlmax so that the new
vl is cheap to compute. This could be generalized to double the VL.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158879
2023-08-25 14:15:41 -07:00
Craig Topper
4184bafa9b [RISCV] Refactor lowerSPLAT_VECTOR_PARTS to use splatPartsI64WithVL for scalable vectors.
There was quite a bit of duplication between splatPartsI64WithVL
and the scalable vector handling in lowerSPLAT_VECTOR_PARTS, but
scalable vector had one additional case. Move that case to
splatPartsI64WithVL which improves some fixed vector tests.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158876
2023-08-25 14:15:40 -07:00
LiaoChunyu
1b12427c01 [VP][RISCV] Add vp.is.fpclass and RISC-V support
There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152993
2023-08-25 15:40:55 +08:00
Luke Lau
e772c0ecd8 [RISCV] Use vmv.v.x if Hi bits are undef when lowering splat_vector_parts
When lowering a splat_vector_parts, if the hi bits are undefined then we can
splat the lo bits without having to check if it's going to be sign extended or
not, because those bits will be undefined anyway.

I've handled it for both fixed and scalable vectors, but there's no diff
on the scalable vror tests, since the hi bits aren't combined away to
undef in SimplifyDemanded for scalable vectors. I'm not sure why that is.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158625
2023-08-24 12:19:09 +01:00
Luke Lau
06d3ee9603 [RISCV] Fix wrong operand being used for VL in shift combine
At some point a merge operand was added to the binary vl ops, so this combine
was using the mask for the VL. This causes a crash when trying to
select the vmv_v_x_vl, which showed up locally when messing about with
selectVSplat, but thankfully in ToT the vmv_v_x_vl gets pattern matched
away into the .vx and .vi operands every time, so there's no noticeable
change.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D158634
2023-08-23 17:44:21 +01:00
Jianjian GUAN
879e801a91 [RISCV] Apply promotion for f16 vector ops when only have zvfhmin
For most fp16 vector ops, we could promote it to fp32 vector when zvfhmin is enable but zvfh is not.
But for nxv32f16, we need to split it first since nxv32f32 is not a valid MVT.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D153848
2023-08-23 16:49:20 +08:00
Jianjian GUAN
759903568f [RISCV] Add Zvfhmin extension support for llvm RISCV backend
This patch supports Zvfhmin for RISCV codegen.

Reviewed By: michaelmaitland

Differential Revision: https://reviews.llvm.org/D151414
2023-08-23 16:47:47 +08:00
Philip Reames
c3b48ec6ff [RISCV] Match strided loads with reversed indexing sequences
This extends the concat_vector of loads to strided_load transform to handle reversed index pattern. The previous code expected indexing of the form (a0, a1+S, a2+S,...). However, we can also see indexing of the form (a1+S, a2+S, a3+S, .., aS). This form is a strided load starting at address aN + S*(n-1) with stride -S.

Note that this is also fixing what looks to be a bug in the memory location reasoning for forward strided case. A strided load with negative stride access eltsize bytes past base ptr, and then bytes *before* base ptr. (That is, the range should extend from before base ptr to after base ptr.)

Differential Revision: https://reviews.llvm.org/D157886
2023-08-22 07:59:49 -07:00
Philip Reames
ecb855a5a8 [RISCV] Reduce LMUL for vector extracts
If we have a known (or bounded) index which definitely fits in a smaller LMUL register group size, we can reduce the LMUL of the slide and extract instructions. This loosens constraints on register allocation, and allows the hardware to do less work, at the potential cost of some additional VTYPE toggles. In practice, we appear (after prior patches) to do a decent job of eliminating the additional VTYPE toggles in most cases.

Differential Revision: https://reviews.llvm.org/D158460
2023-08-22 07:36:17 -07:00
Craig Topper
b441fd60b2 [RISCV] Separate hasRoundModeOpNum into separate VXRM and FRM functions.
Preparation for developing a new rounding mode insertion algorithm
that is going to be different between them since VXRM doesn't need
to be save/restored.

This also unifies the FRM handling in RISCVISelLowering.cpp between
scalar and vector.

Fixes outdated comments in RISCVAsmPrinter and sorts the predicate
function by the reverse order of the operands being skipped.

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D158326
2023-08-21 10:00:23 -07:00
Craig Topper
078eb4bd85 [RISCV] Fix a UBSAN failure for passing INT64_MIN to std::abs.
clang recently started checking for INT64_MIN being passed to 64-bit std::abs.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D158304
2023-08-18 12:47:52 -07:00
Craig Topper
42dad521e3 [RISCV] Add RISCVII::getRoundModeOpNum to reduce code duplication. NFC 2023-08-16 12:00:02 -07:00
wangpc
ac00cca3d9 [RISCV] Fix assertion when passing f64 vectors via integer registers
The vector arguments are split but assignments won't be pending.

Fixes #64645

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D157847
2023-08-15 12:11:08 +08:00
Luke Lau
9f369a4c43 [RISCV] Lower reverse shuffles of fixed i1 vectors to vbrev.v
If we can fit an entire vector of i1 into a single element, e.g. v32i1 ->
v1i32, then we can reverse it via vbrev.v.
We need to handle the case where the vector doesn't exactly fit into the larger
element type, e.g. v4i1 -> v1i8. In this case we shift up the reversed bits
afterwards.

Reviewed By: fakepaper56, 4vtomat

Differential Revision: https://reviews.llvm.org/D157614
2023-08-14 16:36:58 +01:00
wangpc
8a98f24ec5 [RISCV] Truncate constants to EltSize when combine store of BUILD_VECTOR
The constants can be with larger bit width, so we need to truncate
them to EltSize or we will exceed the width of fixed-length vector.

Fixes #64588

Reviewed By: luke, craig.topper, bjope, michaelmaitland

Differential Revision: https://reviews.llvm.org/D157603
2023-08-14 10:55:53 +08:00
Craig Topper
2df9328fe3 [RISCV] Stop performFP_TO_INTCombine from folding with ISD::FRINT.
FRINT was added to matchRoundingOp after this function was written.
So FRINT was not tested originally.

For vectors, folding this causes us to create a CSR swap that tries
to write 7 to FRM. This is an illegal value and will cause the CSR
write to fail.

While this might be a legal fold we could do, I'm disabling it for
now so we can backport to LLVM 17 with the least risk.

Differential Revision: https://reviews.llvm.org/D157583
2023-08-10 09:30:36 -07:00
Patrick O'Neill
fcad2bbcfc [RISC-V] Add proposed mapping for Ztso
Currently LLVM emits Ztso code for fences, loads, and stores (behind an
experimental flag) [1]. This patch updates the mapping and implements
support for LR/SC and AMO ops. This updated mapping is compatible with
the RVWMO ABI present in the psABI. Additional context can be found in
the psABI pull request [2].

[1] https://reviews.llvm.org/D143076
[2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391

Differential Revision: https://reviews.llvm.org/D155517
2023-08-10 15:59:06 +01:00
Luke Lau
5d510ea724 [RISCV] Lower vro{l,r} for fixed vectors
We need to add new VL nodes to mirror ISD::ROTL and ISD::ROTR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157295
2023-08-08 09:47:00 +01:00
Luke Lau
768740ef77 [RISCV] Lower unary zvbb ops for fixed vectors
This reuses the same strategy for fixed vectors as other ops, i.e. custom lower
to a scalable *_vl SD node.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157294
2023-08-08 09:46:57 +01:00
Philip Reames
f0a9aacdb9 [RISCV] Use vmv.s.x for a constant build_vector when the entire size is less than 32 bits
We have a variant of this for splats already, but hadn't handled the case where a single copy of the wider element can be inserted producing the entire required bit pattern. This shows up mostly in very small vector shuffle tests.

Differential Revision: https://reviews.llvm.org/D157299
2023-08-07 17:15:05 -07:00
Craig Topper
7cc615413f [RISCV] Add back handling of X > -1 to ISD::SETCC lowering.
There are cases where the -1 doesn't become visible until lowering
so the folding doesn't have a chance to run.

I think in these cases there is a missed DAGCombine for truncate (undef),
which I may fix separately, but RISC-V backend should protect itself.

Fixes #64503.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D157314
2023-08-07 13:00:57 -07:00
Philip Reames
47fe3b3b9a [RISCV] Use v(f)slide1down for build_vector with dominant values
If we have a dominant value, we can still use a v(f)slide1down to handle the last value in the vector if that value is neither undef nor the dominant value.

Note that we can extend this idea to any tail of elements, but that's ends up being a near complete merge of the v(f)slide1down insert path, and requires a bit more untangling on profitability heuristics first.

Differential Revision: https://reviews.llvm.org/D157120
2023-08-07 07:54:29 -07:00
Alex Bradbury
7a1b2adc45 [RISCV] Implement straight-forward bf16<->int conversion cases
This ports over the test cases half-convert.ll and implements patterns
or RISCVISelLowering.cpp changes for all of the most straight-forward
cases (those that don't require changes outside of lib/Target/RISCV).
The remaining cases and noted poor codegen for saturating conversions
will be handled in follow-up patches.

Differential Revision: https://reviews.llvm.org/D156943
2023-08-07 11:12:51 +01:00
Craig Topper
f36bbb0bd2 [RISCV] Use static_assert to check ranges in hasMergeOp and hasMaskOp.
If the ranges are wrong it is better to catch at compile time.
2023-08-04 13:23:23 -07:00
Philip Reames
9f4a2a8636 [RISCV] Separate lowering of constant build vector into a helper [nfc]
We have a bunch of special casing for constant vectors, and the costing is generally different.  Separate out the logic so that it's easier to follow.
2023-08-04 08:38:18 -07:00
Craig Topper
814250191d [RISCV] Add vector legalization for fmaximum/fminimum.
Reviewed By: fakepaper56

Differential Revision: https://reviews.llvm.org/D156937
2023-08-04 08:07:14 -07:00
Bjorn Pettersson
4ce7c4a92a [llvm] Drop some typed pointer handling/bitcasts
Differential Revision: https://reviews.llvm.org/D157016
2023-08-03 22:54:33 +02:00
Craig Topper
a8c502a589 [RISCV] Add bf16 to isFPImmLegal.
Part of this test file was stolen from D156895. We should merge them
when committing.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D156926
2023-08-03 08:27:38 -07:00
Alex Bradbury
8a71f44e00 [RISCV] Expand test coverage of bf16 operations with Zfbfmin and fix gaps
This doesn't bring us to parity with the test/CodeGen/RISCV/half-* test
cases, it simply picks off an initial set that can be supported
especially easy. In order to make the review more manageable, I'll
follow up with other cases.

There is zero innovation in the test cases - they simply take the
existing half/float cases and replace f16->bf16 and half->bfloat.

Differential Revision: https://reviews.llvm.org/D156895
2023-08-03 07:06:57 +01:00
Jim Lin
40cc106fa0 [RISCV] Scalarize binop followed by extractelement to custom lowered instruction
isOperationLegalOrCustomOrPromote returns true only if VT is other or legal
and operation action is Legal, Custom or Promote.
Permit a vector binary operation can be converted to scalar binary operation which is custom lowered with illegal type.
One of cases is i32 isn't a legal type on RV64 and its ALU operations is set to custom lowering,
so vadd for element type i32 can be converted to addw.

Reviewed By: jacquesguan, craig.topper

Differential Revision: https://reviews.llvm.org/D156692
2023-08-03 13:02:49 +08:00
Yeting Kuo
cd79599304 [RISCV] Teach lowerScalarInsert to handle scalar value is the first element of a fixed vector.
D155929 teach lowerScalarInsert to handl start value (extractelement scalable_vector, 0)
and specifically converts fixed extracted vectors to scalable vectors when
lowering vector reduction. It's not enough because there is another way to
create (extractelement fixed_vector, 0) as a start value of lowerScalarInsert
like #64327.

#64327: https://github.com/llvm/llvm-project/issues/64327.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156863
2023-08-03 10:53:14 +08:00
Alex Bradbury
667602793b [RISCV] Implement support for bf16 select when zfbfmin is enabled
These test cases previously caused an error. RISCVInstrInfo::copyPhysReg also needed a tweak in order to account for copying bf16 values in FPR16 registers.

Differential Revision: https://reviews.llvm.org/D156883
2023-08-02 20:04:30 +01:00
Craig Topper
d8f9663f1a [RISCV] Rename RISCVISD::FMINNUM_VL/FMAXNUM_VL to VFMIN_VL/VFMAX_VL. NFC
I want these to have RISC-V semantics not LLVM IR semantics. Specifically
that -0.0 comes before +0.0.

This is needed to emulate FMAXIMUM/FMINIMUM for vectors.
2023-08-02 11:53:06 -07:00
4vtomat
346c1f2641 [RISCV] Support vector crypto extension LLVM IR
Depends on D141672

Differential Revision: https://reviews.llvm.org/D138809
2023-08-02 10:25:36 -07:00
Alex Bradbury
be0dac268d [RISCV] Improve codegen for i8/i16 'atomicrmw xchg a, {0,-1}'
As noted in <https://github.com/llvm/llvm-project/issues/64090>, it's
more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand
with appropriate mask. There are a range of possible ways to go about
this - e.g. writing a combine based on the
`llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new
interface to AtomicExpandPass to allow target-specific atomics
conversions, or trying to lift the conversion into AtomicExpandPass
itself based on querying some target hook. Ultimately I've gone with
what appears to be the simplest approach - just covering this case in
emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a
different name way back when it was introduced.

This also handles the `atomicrmw xchg a, -1` case suggested by Craig
during review.

Fixes https://github.com/llvm/llvm-project/issues/64090

Differential Revision: https://reviews.llvm.org/D156801
2023-08-02 09:48:50 +01:00
Philip Reames
e938217f81 [RISCV] Implement getOptimalMemOpType for memcpy/memset lowering
This patch implements the getOptimalMemOpType callback which is used by the generic mem* lowering in SelectionDAG to pick the widest type used. This patch only changes the behavior when vector instructions are available, as the default is reasonable for scalar.

Without this change, we were emitting either XLEN sized stores (for aligned operations) or byte sized stores (for unaligned operations.) Interestingly, the final codegen was nowhere near as bad as that would seem to imply. Generic load combining and store merging kicked in, and frequently (but not always) produced pretty reasonable vector code.

The primary effects of this change are:
* Enable the use of vector operations for memset of non-constant. Our generic store merging logic doesn't know how to merge a broadcast store, and thus we were seeing the generic (and awful) byte expansion lowering for unaligned memset.
* Enable the generic misaligned overlap trick where we write to some of the same bytes twice. The alternative is to either a) use an increasing small sequence of stores for the tail or b) use VL to restrict the vector store. The later is not implemented at this time, so the former is what previously happened. Interestingly, I'm not sure that changing VL (as opposed to the overlap trick) is even obviously profitable here.

Differential Revision: https://reviews.llvm.org/D156249
2023-08-01 12:14:50 -07:00