Commit Graph

2138 Commits

Author SHA1 Message Date
Craig Topper
f1fd5c9b36 [RISCV] Remove pseudos for whole register load, store, and move.
The MC layer instructions have the correct register classes, and
the pseudos don't have any additional operands. So there doesn't
seem to be any reason for them to exist.

The pseudos were incorrectly going through code in RISCVMCInstLower
that converted LMUL>1 register classes to LMUL1 register class.
This makes the MCInst technically malformed, and prevented the
vl2r.v, vl4r.v, and vl8r.v InstAliases from matching. This accounts
for all of the .ll test diffs.

Differential Revision: https://reviews.llvm.org/D139511
2022-12-07 10:19:58 -08:00
Craig Topper
938d0d6d7b [RISCV] Replace uses of hasStdExtC with COrZca.
Except MakeCompressible which will need more work.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139504
2022-12-07 09:34:01 -08:00
Anton Sidorenko
f8ed709345 [MachineCombiner] Extend reassociation logic to handle inverse instructions
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.

Differential Revision: https://reviews.llvm.org/D136754
2022-12-07 13:50:28 +03:00
Yeting Kuo
0f8c761c48 [VP][RISCV] Recommit "Add vp.fshl/fshr and RISC-V support."
This reverts commit 7883e5b061.

The original commit was reverted that it didn't update test files after D136263
landed. The recommit fixed those.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139509
2022-12-07 15:58:12 +08:00
Kazu Hirata
7883e5b061 Revert "[VP][RISCV] Add vp.fshl/fshr and RISC-V support."
This reverts commit 70de0e0140.

I'm seeing:

Failed Tests (2):
  LLVM :: CodeGen/RISCV/rvv/fixed-vectors-fshr-fshl-vp.ll
  LLVM :: CodeGen/RISCV/rvv/fshr-fshl-vp.ll

Also reported at:

https://lab.llvm.org/buildbot/#/builders/123/builds/14531
2022-12-06 22:27:43 -08:00
Monk Chiang
7b50c18360 [RISCV] Codegen support for Zfhmin.
The Zfhmin subset only has FLH, FSH, FMV.X.H, FMV.H.X, FCVT.S.H, and FCVT.H.S.
If the D extension is present, the FCVT.D.H and FCVT.H.D instructions are also included.
Since most instructions are not included for Zfhmin, so most operations are promoted.
The patch primarily about making f16 a legal type.

RISC-V ISA info:
https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139391
2022-12-06 22:14:15 -08:00
Yeting Kuo
70de0e0140 [VP][RISCV] Add vp.fshl/fshr and RISC-V support.
The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to
achieve the codegen.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D138379
2022-12-07 12:16:36 +08:00
Craig Topper
8d30b9e64f [RISCV] Move VSPILL/VRELOAD expansion for vector tuples to eliminateFrameIndex.
We need a scratch GPR to increment the base pointer for each subsequent
register. We currently reuse the input GPR for the base pointer without
declaring it as a Def of the pseudo.

We can't add it as a Def of the pseudo at creation time because it doesn't
get register allocated. This was tried in D109405.

Seems the only choice we have is to scavenge the GPR. This patch
moves the expansion to eliminateFrameIndex where we can create
virtual registers that will be scavenged. This also eliminates the
extra operand for passing vlenb from frame lowering to expand pseudos.

I need to do more testing on real world code, but wanted to get this
up for early review.

I hope this will fix the issue reported in D123394, but I haven't
checked yet.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139169
2022-12-06 15:42:00 -08:00
Craig Topper
1806ce9097 [RISCV] Teach RISCVMatInt to prefer li+slli over lui+addi(w) for compressibility.
With C extension, li with a 6 bit immediate followed by slli is 4 bytes.
The lui+addi(w) sequence is at least 6 bytes.

The two sequences probably have similar execution latency. The exception
being if the target supports lui+addi(w) macrofusion.

Since the execution latency is probably the same I didn't restrict
this to C extension.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139135
2022-12-06 10:31:17 -08:00
Sanjay Patel
772c2f461b [AArch64][RISCV][x86] add tests for masked val equality with 0; NFC 2022-12-06 11:34:48 -05:00
Sergey Kachkov
132dc442ba [RISCV] Generate .cfi_def_cfa_expression for RVV stack adjustment
Cannonical frame address after RVV stack adjustment is sp + StackSize +
RVVStackSize * vlenb, and since vlenb is unknown at compile-time (but it
is a constant for particular HW implementation), emit
.cfi_def_cfa_expression so libunwind can read VLENB CSR register at
run-time and obtain correct frame address.

Fixes https://github.com/llvm/llvm-project/issues/58356 (but additional
run-time support for reading CSR may be required)

Differential Revision: https://reviews.llvm.org/D136263
2022-12-06 12:45:59 +03:00
jacquesguan
f7a46aa8fb [RISCV] Fold vector binary operatrion into select with identity constant.
This patch implements shouldFoldSelectWithIdentityConstant for RISCV. It would try to generate vmerge after the binary instruction and let them folded to maksed instruction later.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131551
2022-12-06 11:19:31 +08:00
jacquesguan
6392cf331a [RISCV][test] Add pre-commit test for D131551.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D131950
2022-12-06 10:38:25 +08:00
ChunyuLiao
85834d8685 [RISCV]Keep (select c, 0/-1, X) during PerformDAGCombine
D135833, lowerSelect: (select C, -1/0, X) -> or/and
Keep (select c, 0/-1, X), thus making better use of lowerSelect to eliminate branch instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D139272
2022-12-06 09:26:29 +08:00
Dmitry Vyukov
dbe8c2c316 Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-12-05 14:40:31 +01:00
Philip Reames
b775333068 [RISCV] Fold low 12 bits into instruction during frame index elimination
Fold the low 12 bits of an immediate offset into the offset field of the using instruction. That using instruction will be a load, store, or addi which performs an add of a signed 12-bit immediate as part of it's operation. Splitting out the low bits allows the high bits to be generated via a single LUI instead of needing an LUI/ADDI pair.

The codegen effect of this is mostly converting cases where "split addi" kicks in to using LUI + a folded offset. There are a couple of straight dynamic instruction count wins, and using a canonical LUI is probably better than a chain of SP adds if the dynamic instruction count is equal.

Differential Revision: https://reviews.llvm.org/D139037
2022-12-02 11:54:06 -08:00
Craig Topper
e00e20a055 [RISCV] Add ADDW/AND/OR/XOR/SUB/SUBW to getRegAllocHints.
These instructions requires both register operands to be compressible
so I've only applied the hint if we already have a GPRC physical register
assigned for the other register operand.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139079
2022-12-01 11:09:38 -08:00
ZHU Zijia
010a8f7a90 [CodeGen] Fix restore blocks' BasicBlock information in branch relaxation
In branch relaxation pass, restore blocks are created and placed before
the jump destination if indirect branches are required. For example:

        foo
        sd      s11, 0(sp)
        jump    .restore, s11
        bar
        bar
        bar
        j       .dest
.restore:
        ld      s11, 0(sp)
.dest:
        baz

The BasicBlock information of the restore MachineBasicBlock should be
identical to the dest MachineBasicBlock.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D131863
2022-12-02 02:42:22 +08:00
ZHU Zijia
3adf828a0e [CodeGen][test] Pre-commit test for D131863
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D131862
2022-12-02 02:39:14 +08:00
Jonas Paulsson
8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4.
2022-12-01 13:29:24 -05:00
Jonas Paulsson
6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Freddy Ye
89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Craig Topper
df7ab6a52e [RISCV] Add ANDI to getRegAllocationHints. 2022-11-30 20:59:02 -08:00
Marco Elver
b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Craig Topper
a8c79121bf [RISCV] Teach getRegAllocationHints about compressible SRAI/SRLI.
Similar to previous patches for ADDI/ADDIW/SLLI/ADD, but restricted
to only cases where the register is x8-x15(GPRC reg class).

I've restricted it so that we can be precise about whether the
resulting instruction would be compressible. Changing the register
allocation may make some other instruction not compressible so we
should try to be accurate.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D138740
2022-11-30 10:28:57 -08:00
Philip Reames
ac1ec9e290 [RISCV] Share code for fixed offsets adjustRegs (thus materializing fewer constants)
This reuses the existing optimized implementation of adjustReg, and commons up code. This has the effect of enabling two code changes for the new caller. First, we enable the "split andi" lowering (with no alignment requirement), and second we use a sub with smaller constant in register instead of a add with negative constant in register.

Differential Revision: https://reviews.llvm.org/D132839
2022-11-30 09:28:29 -08:00
Dmitry Vyukov
d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Philip Reames
fc0efb7e78 [SDAG] Allow scalable vectors in ComputeNumSignBits (try 2)
I had reverted this before the holiday week because a problem was reported with a related change (D137140 - scalable vector known bits in DAG).  I had initially confused the two patches, and then decided to leave this reverted out an abundance of caution.  Now that we're through the holiday week, reapplying.

I also roled in fixes for several post commit review comments that hadn't landed with the original change.

Original commit message

This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

Differential Revision: https://reviews.llvm.org/D137141
2022-11-29 08:25:05 -08:00
Philip Reames
5583972fe1 [RISCV] Simplify eliminateFrameIndex in advance of reuse [nfc-ish]
The prior code intermixed several concerns - the actual materialization of the offset, the choice of destination register, and whether to prune the ADDI. This version factors the first part out, and then reasons only about the later two. My intention is to merge the adjustReg routine with the one from frame lowering, and then explore using the merged result to simplify frame setup and tear down.

This change is conceptually NFC, but since it results in slightly different vreg usage, the end result can change register allocation in minor ways.

Differential Revision: https://reviews.llvm.org/D138502
2022-11-28 10:09:37 -08:00
Craig Topper
64612f5d8e [RISCV] Add ADD to getRegAllocationHints to improve to improve use of c.add.
add can always be compressed to c.add if one of the sources is the
same as the destination.

The same is not true for c.addw where the registers need to be x8-x15.
2022-11-25 08:59:27 -08:00
Craig Topper
a2b5b584a5 [RISCV] Use register allocation hints to improve use of compressed instructions.
Compressed instructions usually require one of the source registers
to also be the source register. The register allocator doesn't have
that bias on its own.

This patch adds register allocation hints to introduce this bias.
I've started with ADDI, ADDIW, and SLLI. These all have a 5-bit
field for the register. If the source and dest register are the
same they are guaranteed to compress as long as the immediate is
also 6 bits.

This code was inspired by similar code from the SystemZ target.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D138242
2022-11-25 08:39:44 -08:00
LiaoChunyu
aa14f002d5 [RISCV] Branchless lowering for (select (x < 0), TrueConstant, FalseConstant) and (select (x >= 0), TrueConstant, FalseConstant)
This patch reduces the number of unpredictable branches

(select (x < 0), y, z)  -> x >> (XLEN - 1) & (y - z) + z
(select (x >= 0), y, z) -> x >> (XLEN - 1) & (z - y) + y

Reviewed By: craig.topper, reames

Differential Revision: https://reviews.llvm.org/D137949
2022-11-25 20:18:30 +08:00
wangpc
241accea2a [RISCV] Lower unmasked zero-stride vector load to (scalar load + splat)
So we have the opportunity to fold splat into .vx instruction as what
D101138 has done. If failed, we can select zero-stride vector load
again.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D138101
2022-11-24 11:09:45 +08:00
wangpc
25b51b7924 [RISCV] Precommit test for D138101
Add a test that splat can't be folded.

Reviewed By: pcwang-thead

Differential Revision: https://reviews.llvm.org/D138543
2022-11-24 11:07:19 +08:00
WuXinlong
219417b2c6 [RISCV] Add CodeGen support and MC testcase of RISCV Zca Extension
This patch add the support of RISCV Zca ext

`Zca` is a subset of C extension instructions that are compatible with the Zc extension.

So this patch implements Zca code generation with reference to the C extension and sets the 2-byte alignment for the Zca extension, just like C extension does.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130483
2022-11-22 17:22:26 +08:00
Craig Topper
24810acb62 [RISCV] Add isel patterns to select slli+shXadd.uw.
This matches what we get for something like.
%0 = shl i32 %x, C
%1 = zext i32 %0 to i64
%2 = getelementptr i32, ptr %y, %1

The shift before the zext and the shift implied by the GEP get
combined with an AND after them. We need to split it back into
2 shifts so we can fold one into shXadd.uw.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D137886
2022-11-21 09:32:51 -08:00
Philip Reames
06e2b44c46 [RISCV] Optimize scalable frame setup when VLEN is precisely known
If we know the exact value of VLEN, the frame offset adjustment for scalable stack slots becomes a fixed constant. This avoids the need to read vlenb, and may allow the offset to be folded into the immediate field of an add/sub.

We could go further here, and fold the offset into a single larger frame adjustment - instead of having a separate scalable adjustment step - but that requires a bit more code reorganization. I may (or may not) return to that in a future patch.

Differential Revision: https://reviews.llvm.org/D137593
2022-11-18 15:30:39 -08:00
Philip Reames
102f05bd34 Revert "[SDAG] Allow scalable vectors in ComputeNumSignBits" and follow up
This reverts commits 3fb08d14a6 and f8c63a7fbf.

There was a "timeout for a Halide Hexagon test" reported.  Revert until investigation complete.
2022-11-18 15:25:59 -08:00
Philip Reames
f8c63a7fbf [SDAG] Allow scalable vectors in ComputeNumSignBits
This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

Differential Revision: https://reviews.llvm.org/D137141
2022-11-18 10:50:06 -08:00
Philip Reames
18fda867f4 [RISCV] Optimize scalable frame offset calculation when VLEN is precisely known
When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well.

Differential Revision: https://reviews.llvm.org/D137591
2022-11-18 09:56:55 -08:00
luxufan
18c5f3c35d [RegisterScavenger][RISCV] Don't search for FrameSetup instrs if we were searching from Non-FrameSetup instrs
Otherwise, the spill position may point to position where before
FrameSetup instructions. In which case, the spill instruction may store
to caller's frame since the stack pointer has not been adjustted.

Fixes https://github.com/llvm/llvm-project/issues/58286

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D135693
2022-11-18 15:13:52 +08:00
Han-Kuan Chen
7e6dbfcd9d [RISCV] Make lowerVECTOR_SHUFFLEAsVSlidedown follow source until not EXTRACT_SUBVECTOR.
Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are
EXTRACT_SUBVECTOR and their source are same. The commit will make the
function seek input and their source until they are not
EXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D138025
2022-11-17 22:32:53 -08:00
Han-Kuan Chen
2e58d4bc4b [RISCV] Pre-commit test.
Differential Revision: https://reviews.llvm.org/D138024
2022-11-17 22:32:53 -08:00
YingChi Long
7a715bf317 [VP] Add support for vp.inttoptr & vp.ptrtoint
Add vp.inttoptr & vp.ptrtoint support by lowering them into
vp.zext / vp.truncate with in SelectionDAGBuilder.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137169
2022-11-18 10:42:24 +08:00
Anton Sidorenko
b6c790736e [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns
This patch adds tranformation of fmul+fadd/fsub chains to fused multiply
instructions:
  * fmul+fadd->fmadd
  * fmul+fsub->fmsub/fnmsub

We also will try to combine these instructions if the fmul has more than one use
and cannot be deleted. However, removing the dependence between fmul and fadd can
still be profitable, and we rely on machine combiner approximations of scheduling.

Differential Revision: https://reviews.llvm.org/D136764
2022-11-17 13:24:04 +03:00
Anton Sidorenko
374d076563 [MachineCombiner][RISCV] Precommit tests for D136764 2022-11-17 12:12:46 +03:00
Craig Topper
7e15ea102f [RISCV] Add a DAG combine to pre-promote (i1 (truncate (i32 (srl X, Y)))) with Zbs on RV64.
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.

This is similar to what we do for (i32 (and (srl X, Y), 1)).
2022-11-16 19:07:33 -08:00
Yeting Kuo
ed9638c44b [VP][RISCV] Add vp.nearbyint and RISC-V support.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137685
2022-11-16 14:05:35 +08:00
Yeting Kuo
5c3ca10b09 [VP][RISCV] Add vp.bswap and RISC-V support.
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137928
2022-11-16 11:36:38 +08:00
wangpc
a214c521f8 [RISCV] Don't use zero-stride vector load for gather if not optimized
We may form a zero-stride vector load when lowering gather to strided
load. As what D137699 has done, we use `load+splat` for this form if
there is no optimized implementation.
We restrict this to unmasked loads currently in consideration of the
complexity of hanlding all falses masks.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D137931
2022-11-16 10:43:10 +08:00