Commit Graph

3647 Commits

Author SHA1 Message Date
Jianjian Guan
fd50151180 [RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275) 2024-04-11 10:23:26 +08:00
Craig Topper
f27f369710 [RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069)
This code was trying to save temporary argument registers in interrupt
handler functions that contain calls. With the exception that all FP
registers are saved including the normally callee saved registers.

If all of the callees use an FP ABI and the interrupt handler doesn't
touch the normally callee saved FP registers, we don't need to save
them.

It doesn't appear that we need to special case functions with calls. The
normal callee saved register handling will already check each of the calls
and consider a register clobbered if the call doesn't explicitly say it is preserved.

All of the test changes are from the removal of the FP callee saved
registers. There are tests for interrupt handlers with F and D extension
that use ilp32 or lp64 ABIs that are not affected by this change. They
still save the FP callee saved registers as they should.

gcc appears to have a bug where the D extension being enabled with the
ilp32f or lp64f ABI does not save the FP callee saved regs. The callee
would only save/restore the lower 32 bits and clobber the upper bits.
LLVM saves the FP callee saved regs in this case and there is an
unchanged test for it.

The unnecessary save/restore was raised in this thread
https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
2024-04-10 10:28:54 -07:00
Craig Topper
323d3ab257 [RISCV] Optimize undef Even vector in getWideningInterleave. (#88221)
We recently optimized the code when the Odd vector was undef to fix a
poison bug.

There are additional optimizations we can do if the even vector is
undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a
vzext.vf2 and a vsll.
2024-04-10 09:08:50 -07:00
Craig Topper
7f1b9adfc8 [RISCV] Add MachineCombiner to fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (#87884)
This improves a pattern that occurs in 531.deepsjeng_r. Reducing the
dynamic instruction count by 0.5%.

This may be possible to improve in SelectionDAG, but given the special
cases around shXadd formation, it's not obvious it can be done in a
robust way without adding multiple special cases.

I've used a GEP with 2 indices because that mostly closely resembles the
motivating case. Most of the test cases are the simplest GEP case. One
test has a logical right shift on an index which is closer to the
deepsjeng code. This requires special handling in isel to reverse a
DAGCombiner canonicalization that turns a pair of shifts into (srl (and
X, C1), C2).
2024-04-10 08:39:56 -07:00
Chia
469caa31e7 [RISCV] Use vwadd.vx for splat vector with extension (#87249)
This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like
`(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can
use `vwadd.vx` and `vwadd.w` for such a case.

### Source code
```
define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) {
     %sb = sext i32 %b to i64
     %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0
     %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
     %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64>
     %ve = add <vscale x 8 x i64> %vc, %splat
     ret <vscale x 8 x i64> %ve
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/sq191PsT4)
```
vwadd_vx_splat_sext:
  sext.w a0, a0
  vsetvli a1, zero, e64, m8, ta, ma
  vmv.v.x v16, a0
  vsetvli zero, zero, e32, m4, ta, ma
  vwadd.wv v16, v16, v8
  vmv8r.v v8, v16
  ret
```
### After this patch
```
vwadd_vx_splat_sext
  vsetvli a1, zero, e32, m4, ta, ma
  vwadd.vx v16, v8, a0
  vmv8r.v v8, v16
  ret
```
2024-04-10 15:26:17 +09:00
Philip Reames
e47fd09f8e [RISCV] Use shNadd for scalable stack offsets (#88062)
If we need to multiply VLENB by 2, 4, or 8 and add it to the stack
pointer, we can do so with a shNadd instead of separate shift and add
instructions.
2024-04-09 07:29:10 -07:00
Luke Lau
24e8c6a09b [RISCV] Convert remaining constant splats in tests to use splat shorthand. NFC (#88099)
This follows on from #87616, but includes the tests with codegen
differences. These are presumably due to the fact that the splat is now
a constant expression. They don't seem to affect anything that we were
specifically testing for.
2024-04-09 17:15:15 +08:00
Luke Lau
9c660362c4 [RISCV] Support vwsll in combineBinOp_VLToVWBinOp_VL (#87620)
If the subtarget has +zvbb then we can attempt folding shl and shl_vl to
vwsll nodes.

There are few test cases where we still don't pick up the vwsll:
- For fixed vector vwsll.vi on RV32, see the FIXME for VMV_V_X_VL in
fillUpExtensionSupport for support implicit sign extension
- For scalable vector vwsll.vi we need to support ISD::SPLAT_VECTOR, see
#87249
2024-04-09 16:10:35 +08:00
Luke Lau
0f20b9b92f [RISCV] Don't require mask or VL to be the same in combineBinOp_VLToVWBinOp_VL (#87997)
In NodeExtensionHelper we keep track of the VL and mask of the operand
being extended and check that they are the same as the root node's.
However for the nodes that we support, none of them have a passthru
operand with the exception of RISCV::VMV_V_X_VL, but we check that it's
passthru is undef anyway.

So it's safe to just discard the extend node's VL and mask and just use
the root's instead. (This is the same type of reasoning we use to treat
any vmset_vl as an all ones mask)

This allows us to match some more cases where we mix VP/non-VP/VL nodes,
but these don't seem to appear in practice. The main benefit from this
would be to simplify the code.
2024-04-09 16:04:10 +08:00
Luke Lau
d8d131dfa9 [RISCV] Convert more constant splats in tests to splat shorthand. NFC (#87616)
A handy shorthand for specifying the shufflevector(insertelement(poison,
foo, 0), poison, zeroinitializer) splat pattern was introduced in
#74620.

Some of the RISC-V tests were converted over to use this new form in
dbb65dd330, this patch handles the rest
which didn't have any codegen diffs.

This not only converts some constant expressions to the new form, but
also instruction sequences that weren't previously constant expressions
to constant expressions as well. In some cases this affects codegen, but
these have been omitted here and will be handled in a separate PR.
2024-04-09 15:46:38 +08:00
Craig Topper
4e98adf677 [RISCV] Add tests for F/D with non-FP ABI to interrupt-attr.ll. NFC
Without a floating point aware ABI for callees, an interrupt handler
needs to save all floating point registers even normally callee saved.

We are currently unnecessarily saving callee saved FP registers when
a floating point ABI is used by the callee. This is different than gcc
as noted in this discourse
post https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
2024-04-08 16:12:36 -07:00
Craig Topper
472ea6e015 [RISCV] Resolve CHECK prefix conflict in fixed-vectors-vitofp-constrained-sdnode.ll. NFC 2024-04-08 16:01:18 -07:00
Craig Topper
afc7cc7b12 [RISCV] Fix missing CHECK prefixes in vector lrint test files. NFC
All of these test cases had iXLen in their name which got replaced
by sed. This prevented FileCheck from finding the function. The other
test cases in these files do not have that issue.
2024-04-08 16:01:18 -07:00
Craig Topper
89ebb56152 [RISCV] Resolve CHECK prefix conflict in fixed-vectors-vwsll.ll. NFC
riscv32 and riscv64 generate different code for one test case so we need
RV32 and RV64 CHECK lines.
2024-04-08 15:45:07 -07:00
Philip Reames
eb26edbbf8 [RISCV] Exploit sh3add/sh2add for stack offsets by shifted 12-bit constants (#87950)
If we're falling back to generic constant formation in a register +
add/sub, we can check if we have a constant which is 12-bits but left
shifted by 2 or 3. If so, we can use a sh2add or sh3add to perform the
shift and add in a single instruction.

This is profitable when the unshifted constant would require two
instructions (LUI/ADDI) to form, but is never harmful since we're going
to need at least two instructions regardless of the constant value.

Since stacks are aligned to 16 bytes by default, sh3add allows addresing
(aligned) data out to 2^14 (i.e. 16kb) in at most two instructions
w/zba.
2024-04-08 14:53:21 -07:00
Philip Reames
f5cf98c026 [RISCV] Improve test coverage for #87950
Noticed in review that we want both the LUI and LUI/ADDI cases
with different behavior for each.
2024-04-08 14:39:37 -07:00
Pengcheng Wang
364028a1a5 [RISCV] Zimop/Zcmop are ratified
Remove them from experimental.

See also:
https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc

Reviewers: kito-cheng

Reviewed By: kito-cheng

Pull Request: https://github.com/llvm/llvm-project/pull/87966
2024-04-08 16:40:02 +08:00
David Green
ac321cbb03 [AArch64][GlobalISel] Legalize Insert vector element (#81453)
This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
  TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
  make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
  use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
  expanding into a stack store and reload.
2024-04-08 08:44:13 +01:00
Pengcheng Wang
f3b5597364 [RISCV] Use larger copies when register tuples are aligned
When the encoding of register tuples are aligned, we can use a copy
with larger LMUL to reduce copies.

Reviewers: preames, topperc, lukel97

Reviewed By: topperc, lukel97

Pull Request: https://github.com/llvm/llvm-project/pull/84455
2024-04-08 13:24:57 +08:00
Philip Reames
da675b922c [RISCV] Expand test coverage of stack offsets between 2^11 and 2^15
Adds two sets of tests.  First, one for prolog/epilogue insertions where
the second stack adjustment can be done with shNadd for zba.  Second, a
set of tests with offsets off SP in the same ranges, but also adding
varying alignments.
2024-04-07 15:22:25 -07:00
Jianjian Guan
bc8726b16b [RISCV] Support codegen of vfmv.v.f for bfloat vector with both Zvfbfmin and Zfbfmin (#87318)
vfmv, vfmerge should support bfloat vector when we have both Zvfbfmin
and Zfbfmin, this patch tries to support vfmv first.
2024-04-07 10:41:47 +08:00
Craig Topper
4abb722ffa [RISCV] Add tests for opportunities to reassociate to form more shXadd instructions. NFC
These tests consist of patterns like (sh3add Z, (add X, (slli Y, 6)))
that can be reassociated to form (sh3add (sh3add Y, Z), X).
2024-04-05 12:50:48 -07:00
Craig Topper
0a6a40d62e [RISCV] Add Zca predicate to BrccCompressOpt patterns used for MinSize.
Previously we only checked for C.
2024-04-05 12:39:39 -07:00
Craig Topper
e7e78274a6 [RISCV] Remove uses of sed from compress-opt-branch.ll. NFC
sed was being used to use the same test functions with eq/ne branch
condition.

This commit duplicates the test functions so that we have a version
with each condition. This allows us to remove 2 RUN lines.

I plan to add a Zca testing to this file which now requires 1 new
RUN line instead of 2.
2024-04-05 12:35:46 -07:00
Craig Topper
3c37f926a1 [RISCV] Fix comment in compress-opt-branch.ll to match description. NFC
Test description says constant does not fit in 12 bits, but the constant
used was -2048 which does fit in 12 bits. Update to -2049.

Also remove uses of -NOT in favor of positive checks. One of the -NOT
should have been using RESBROPT instead of "c.beqz" so that it would
check for the absense of the correct instruction based on the sed
replacement on the RUN line.
2024-04-05 11:52:46 -07:00
Luke Lau
4e0b8eae4c [RISCV] Add tests for vwsll for extends > .vf2. NFC
These cannot be picked up by TableGen patterns alone and need to be handled
by combineBinOp_VLToVWBinOp_VL
2024-04-04 18:43:15 +08:00
Luke Lau
3a7b5223a6 [DAGCombiner][RISCV] Handle truncating splats in isNeutralConstant (#87338)
On RV64, we legalize zexts of i1s to (vselect m, (splat_vector i64 1),
(splat_vector i64 0)), where the splat_vectors are implicitly
truncating.

When the vselect is used by a binop we want to pull the vselect out via
foldSelectWithIdentityConstant. But because vectors with an element size
< i64 will truncate, isNeutralConstant will return false.

This patch handles truncating splats by getting the APInt value and
truncating it. We almost don't need to do this since most of the neutral
elements are either one/zero/all ones, but it will make a difference for
smax and smin.

I wasn't able to figure out a way to write the tests in terms of select,
since we need the i1 zext legalization to create a truncating
splat_vector.

This supercedes #87236. Fixed vectors are unfortunately not handled by
this patch (since they get legalized to _VL nodes), but they don't seem
to appear in the wild.
2024-04-04 12:36:15 +08:00
Luke Lau
07d5f49186 [RISCV] Add patterns for fixed vector vwsll (#87316)
Fixed vectors have their sext/zext operands legalized to _VL nodes, so
we need to handle them in the patterns.

This adds a riscv_ext_vl_oneuse pattern since we don't care about the
type of extension used for the shift amount, and extends
Low8BitsSplatPat to handle other _VL nodes. We don't actually need to
check the mask or VL there since none of the _VL nodes have passthru
operands.

The remaining test cases that are widening from i8->i64 need to be
handled by extending combineBinOp_VLToVWBinOp_VL.

This also fixes Low8BitsSplatPat incorrectly checking the vector size
instead of the element size to determine if the splat value might have
been truncated below 8 bits.
2024-04-04 11:30:23 +08:00
Michael Maitland
63c925ca80 [RISCV][GISEL] Instruction selection for G_ZEXT, G_SEXT, and G_ANYEXT with scalable vector type 2024-04-03 15:56:08 -07:00
Michael Maitland
188ca374ee [RISCV][GISEL] Regbankselect for G_ZEXT, G_SEXT, and G_ANYEXT with scalable vector type 2024-04-03 15:56:04 -07:00
Michael Maitland
35a9393a3f [RISCV][GISEL] Instruction selection for G_ICMP 2024-04-03 15:47:34 -07:00
Michael Maitland
05f673bcef [RISCV][GISEL] Regbank select for scalable vector G_ICMP 2024-04-03 15:47:34 -07:00
Michael Maitland
8aa3a77eaf [RISCV][GISEL] Legalize G_ZEXT, G_SEXT, and G_ANYEXT, G_SPLAT_VECTOR, and G_ICMP for scalable vector types
This patch legalizes G_ZEXT, G_SEXT, and G_ANYEXT. If the type is a
legal mask type, then the instruction is legalized as the element-wise
select, where the condition on the select is the mask typed source
operand, and the true and false values are 1 or -1 (for
zero/any-extension and sign extension) and zero. If the type is a legal integer
or vector integer type, then the instruction is marked as legal.

The legalization of the extends may introduce a G_SPLAT_VECTOR, which
needs to be legalized in this patch for the extend test cases to pass.

A G_SPLAT_VECTOR is legal if the vector type is a legal integer or
floating point vector type and the source operand is sXLen type. This is
because the SelectionDAG patterns only support sXLen typed
ISD::SPLAT_VECTORS, and we'd like to reuse those patterns. A
G_SPLAT_VECTOR is cutom legalized if it has a legal s1 element vector
type and s1 scalar operand. It is legalized to G_VMSET_VL or G_VMCLR_VL
if the splat is all ones or all zeros respectivley. In the case of a
non-constant mask splat, we legalize by promoting the scalar value to
s8.

In order to get the s8 element vector back into s1 vector, we use a
G_ICMP. In order for the splat vector and extend tests to pass, we also
need to legalize G_ICMP in this patch.

A G_ICMP is legal if the destination type is a legal bool vector and the LHS and
RHS are legal integer vector types.
2024-04-03 15:27:15 -07:00
Michael Maitland
07d3f2a8de [RISCV][GISEL] Run update_mir_test_checks on llvm/test/CodeGen/RISCV/GlobalISel/legalizer/rvv/legalize-xor.mir 2024-04-03 10:37:44 -07:00
AinsleySnow
52b18430ae [VP][DAGCombine] Use simplifySelect when combining vp.select. (#87342)
Hi all,

This patch is a follow-up of #79101. It migrates logic from
`visitVSELECT` to `visitVP_SELECT` to simplify `vp.select`. With this
patch we can do the following combinations:

```
vp.select undef, T, F --> T (if T is a constant), F otherwise
vp.select <condition>, undef, F --> F
vp.select <condition>, T, undef --> T
vp.select false, T, F --> F
vp.select <condition>, T, T --> T
```

I'm a total newbie to llvm and I'm sure there's room for improvements in
this patch. Please let me know if you have any advice. Thank you in
advance!
2024-04-03 07:45:50 -04:00
Craig Topper
a9af66a90e [RISCV] Lower (vector_interleave X, undef) to (vzext_vl X). (#87283)
If the odd vector is undef or poison, the widening add and multiply trick
doesn't work unless we freeze the odd vector.

Unfortunately, freezing doesn't work when the operand is provably
undef/poison. MIR doesn't have a representation for freeze so it
just becomes a COPY from IMPLICIT_DEF which freely propagates undef
to each operand independently.

To work around this, check for undef explicitly and lower to a VZEXT_VL
of the even vector. This produces better code than we'd get from a
freeze anyway.

I've left a FIXME for adding a freeze. I'll do that as a separate patch
as it affects other tests and doesn't help with the new test.
2024-04-02 11:58:41 -07:00
Craig Topper
8c1dc5dd58 [RISCV] Add test for miscompile of vector.interleave when odd vector is literal poison.
The interleave lowering relies on a math trick that requires passing
the odd vector to two math instructions. In order to be correct
these instructions must see the same value.

If the odd vector is provably poison or undef, SelectionDAG will
create a vwadd and vwmaccu where the operand is a copy from IMPLICIT_DEF.
Later this will become just the undef flag on the operand. This
gives the register allocator freedom to pick a different register
for each instruction.
2024-04-02 11:49:08 -07:00
Michael Maitland
153b8431bb [RISCV][GISEL] Legalize G_BITCAST for scalable vectors (#85970)
SelectionDAG marks ISD::BITCAST as legal between scalable vector types
and ISelDAGToDAG deletes them.

We mark G_BITCAST between scalable vectors as legal in GISel. A future
patch will handle what to do with them after the legalizer (likley
either drop them in a isel-preprocess or convert them to COPYs).

BITCAST is needed for legalization of G_INSERT and G_EXTRACT. This is a
precommit for legalization of G_INSERT and G_EXTRACT.
2024-04-02 12:30:51 -04:00
Luke Lau
59dd10faf8 [RISCV] Add tests for fixed vector vwsll. NFC
We are missing patterns for fixed vectors, where the sexts and zexts are
legalized to _vl nodes.
2024-04-02 13:02:03 +08:00
Vitaly Buka
20f56e1f8e [CodeGen] Add default lowering for llvm.allow.{runtime,ubsan}.check() (#86049)
RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-03-31 22:19:33 -07:00
Brandon Wu
29e8bfc13c [RISCV] RISCV vector calling convention (2/2) (#79096)
This commit handles vector arguments/return for function definition/call,
the new class RVVArgDispatcher is added for doing all vector register
assignment including mask types, data types as well as tuple types.
It precomputes the register number for each argument as per
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#standard-vector-calling-convention-variant
and it's passed to calling convention function to handle all vector arguments.

Depends on: #78550
2024-03-30 21:05:33 +08:00
Shilei Tian
3a106e5b2c [GlobalISel] Fold G_ICMP if possible (#86357)
This patch tries to fold `G_ICMP` if possible.
2024-03-29 15:59:50 -04:00
Luke Lau
3f69d90351 [RISCV] Add missing RISCVMaskedPseudo for TIED pseudos (#86787)
This was preventing us from folding away the vmerge into its mask.
2024-03-29 22:21:22 +08:00
Luke Lau
76ba3c8e64 [RISCV] Add test case for vmerge fold for tied pseudos with rounding mode. NFC 2024-03-29 19:47:09 +08:00
Luke Lau
2a315d800b [RISCV] Combine (or disjoint ext, ext) -> vwadd (#86929)
DAGCombiner (or InstCombine) will convert an add to an or if the bits
are disjoint, which can prevent what was originally an (add {s,z}ext,
{s,z}ext) from being selected as a vwadd.

This teaches combineBinOp_VLToVWBinOp_VL to recover it by treating it as
an add.
2024-03-29 19:45:24 +08:00
Luke Lau
131be5de90 [RISCV] Add more disjoint or tests for vwadd[u].{w,v}v. NFC 2024-03-29 19:11:26 +08:00
Wang Pengcheng
610b9e23c5 [SDAG] Use shifts if ISD::MUL is illegal when lowering ISD::CTPOP (#86505)
We can avoid libcalls.

Fixes #86205
2024-03-29 15:38:39 +08:00
Sudharsan Veeravalli
e005a09df5 [RISCV][TypePromotion] Dont generate truncs if PromotedType is greater than Source Type (#86941)
We currently check if the source and promoted types are not equal before
generating truncate instructions. This does not work for RV64 where the
promoted type is i64 and this lead to a crash due to the generation of
truncate instructions from i32 to i64.

Fixes #86400
2024-03-28 21:22:05 -07:00
Philip Reames
9ea0396f16 [RISCV] Extend pattern matches involving shNadd to support disjoint or (#87001)
I tried to add representative tests while not duplicating complete
coverage.  If there's other tests you'd like to see, let me know.
2024-03-28 16:34:04 -07:00
Luke Lau
a3c2d8c072 [RISCV] Combine ({s,u}{div,rem} (zext, zext)) -> (zext ({s,u}{div,rem} (zext, zext))) (#86779)
This narrows unsigned and signed div and rem nodes via
combineBinOpOfZExt.

Unlike other binary ops, there are no widening div or rem instructions.
So we will end up with an extra vzext.vf2.

However I'm assuming that div/rem are expensive enough that by reducing
their EMUL we will gain back the cost.

Alive2 proof: https://alive2.llvm.org/ce/z/Et_L6y
2024-03-29 05:55:38 +08:00