Commit Graph

1038 Commits

Author SHA1 Message Date
Craig Topper
75620fadf5 [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.
This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116
2021-09-03 09:19:25 -07:00
Evandro Menezes
cd6064bb9e [RISCV] Improve shrink wrap test (NFC)
Restore test for shrink wrapping disabled.
2021-09-02 12:14:04 -05:00
Craig Topper
498e8ae412 [RISCV] Add Zba command line to rv64i-exhaustive-w-insts.ll
Zba adds a zext.w pseudoinstruction using ADDUW. This can simplify
the generated code for many of these tests.

There are at least 2 suboptimal cases in this config that I've marked
with TODOs.
2021-09-02 08:36:27 -07:00
Craig Topper
eaa560582a [RISCV] Remove stale TODOs from test. NFC
These were fixed by D106230.
2021-09-02 08:36:27 -07:00
Craig Topper
b5fd6b46f5 [RISCV] Teach instruction selection to elide sext.w in some cases.
If a sext_inreg is up for isel, and all its users are W instructions,
we can skip emitting the sext_inreg. This helpful if the producing
instruction can't become a W instruction.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D108966
2021-09-02 07:54:34 -07:00
Evandro Menezes
5ebdb07e7e [RISCV] Enable shrink wrap by default
Differential Revision: https://reviews.llvm.org/D109037
2021-09-02 09:47:58 -05:00
Craig Topper
e4e69ba4d1 [RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1.
X0 has special meaning for vsetvli, we need to make sure we never
create it a vsetvli that uses it by accident. This could happen
if the register coalescer coalesces a copy from X0 into this
instruction.

This patch splits the instruction so that we can have GPRNoX0
register class to use for the cases where we don't want the source
to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0
operand so we need a separate pseudo for those cases.

I don't currently have a failing example for this. There was a
failure in D107957, but the coalescable copy from that example
should have been optimized away much earlier so I've fixed that.

This is not a complete fix. We still need to prevent the same
possible issue on the AVL operand of all of the vector instruction
pseudos. I don't want to make two versions of all of those so we
need to find a different solution for those. I have an idea I'm
going to try.

Differential Revision: https://reviews.llvm.org/D109110
2021-09-02 07:45:31 -07:00
Ben Shi
9621bbdf62 [RISCV][test] Add more tests for (mul (add x, c1), c2)
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D108606
2021-09-02 17:30:03 +08:00
Ben Shi
e47ab56398 [RISCV][test] Add tests for optimization with SH*ADD in the zba extension
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D108915
2021-09-02 17:30:03 +08:00
Fraser Cormack
ef78f2106c [LegalizeTypes][VP] Add splitting support for binary VP ops
This patch extends D107904's introduction of vector-predicated (VP)
operation legalization to include vector splitting.

When the result of a binary VP operation needs splitting, all of its
operands are split in kind. The two operands and the mask are split as
usual, and the vector-length parameter EVL is "split" such that the low
and high halves each execute the correct number of elements.

Tests have been added to the RISC-V target to show splitting several
scenarios for fixed- and scalable-vector types. Without support for
`umax` (e.g. in the `B` extension) the generated code starts to branch.
Ideally a cost model would prevent their insertion in the first place.

Through these tests many opportunities for better codegen can be seen:
combining known-undef VP operations and for constant-folding operations
on `ISD::VSCALE`, to name but a few.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107957
2021-09-02 10:15:53 +01:00
Craig Topper
ccbb4c8b4f [RISCV] Fold (RISCVISD::SELECT_CC X, Y, CC, Z, Z) -> Z.
If the true and false values are the same, we don't need a SELECT_CC.

This would normally be folded before a select is legalized to
select_cc. The test case exploits the late legalization of vscale
to trigger a case where they become identical after legalization.

This works around an issue found on a test case in D107957. In that
case the true/false values were both eventually 0 and the select was
used by a vector AVL operand. The select_cc got expanded to control
flow and a phi, but the phi inputs were both copies from X0. MachineIR
optimizations simplified this to a single copy from X0 going into the
vector instruction. This became the input of a vsetvli after vsetvli
insertion. Then register coalescing folded the copy into the vsetvli.
X0 as the source of a vsetvli is a special encoding and should not be
created by coalesing. We need to fix our vsetvli handling to make sure
this can never happen any other way, but removing the unneeded select
is still a worthwhile optimization.
2021-09-01 12:37:52 -07:00
Craig Topper
af1ca4353e [RISCV] Add a test case showing an extra sext.w near a sh2add with multiple uses. NFC
See description in test.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D108965
2021-09-01 11:01:05 -07:00
Nick Desaulniers
e9b3f25730 [RISCVISelLowering] avoid emitting libcalls to __mulodi4() and __multi3()
Similar to D108842, D108844, D108926, D108928, and D108936.

__has_builtin(builtin_mul_overflow) returns true for 32b RISCV targets,
but Clang is deferring to compiler RT when encountering long long types.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108939
2021-08-31 11:23:56 -07:00
Craig Topper
0560a4adb3 [RISCV] Enable CONCAT_VECTORS for fixed FP vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D108487
2021-08-30 08:47:45 -07:00
Craig Topper
705d005781 [DAGCombiner][RISCV] Don't use vector types in DAGCombiner::tryStoreMergeOfLoads if we need a rotate.
The check for whether a rotate is possible occurs before the
memory legality checks for the integer type. So it's possible we
decide we can use a rotate, but then fail the legality checks. If
that happens we should not fall back to a vector type. This triggers
an assertion in the rotate handling when it finds a vector type
instead of an integer type.

In theory we could use a shufflevector in place of the rotate, but
right now I'd just like to fix the crash.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108839
2021-08-30 08:47:15 -07:00
Craig Topper
0eeab8b282 [RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization.
This adds an ELEN limit for fixed length vectors. This will scalarize
any elements larger than this. It will also disable some fractional
LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32
vectors can't use any fractional LMULs, i16/f16 can only use mf2,
and i8 can use mf2 and mf4.

We may also need something for the scalable vectors, but that has
interactions with the intrinsics and we can't scalarize a scalable
vector.

Longer term this should come from one of the Zve* features
2021-08-27 10:17:35 -07:00
Craig Topper
1b9417454e [RISCV] Insert a sext_inreg when type legalizing i32 shl by constant on RV64.
Similar to what we do for add/sub/mul.

This can help remove some sext.w. There are some regressions on
some bswap tests, but I have an idea how to fix that for a follow up.

A new PACKW pattern is added to handle the new sext_inreg placement.

Differential Revision: https://reviews.llvm.org/D108663
2021-08-26 10:20:19 -07:00
Craig Topper
8bb24289f3 [SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants.
We can halve the number of mask constants by masking before shl
and after srl.

This can reduce the number of mov immediate or constant
materializations. Or reduce the number of constant pool loads
for X86 vectors.

I think we might be able to do something similar for bswap. I'll
look at it next.

Differential Revision: https://reviews.llvm.org/D108738
2021-08-26 09:33:24 -07:00
Craig Topper
ccd364286b [RISCV] Fix the check prefixes in some B extension tests. NFC
Looks like a bad merge happened after these were renamed in
D107992.
2021-08-25 14:26:51 -07:00
Stanislav Mekhanoshin
92c1fd19ab Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-24 11:09:02 -07:00
Ben Shi
f69fb7ac72 [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711
2021-08-22 16:53:32 +08:00
Ben Shi
5b6c9a5ab0 [RISCV] Optimize add in the zba extension with SH*ADD
Optimize (add x, c) to (SH*ADD (c>>b), x) if c is not simm12
while (c>>b) is simm12 and c has b trailing zeros.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D108193
2021-08-20 22:41:49 +08:00
Fraser Cormack
5b06cbac11 [RISCV] Fix reporting of incorrect commutable operand indices
This patch fixes an issue where RISCV's `findCommutedOpIndices` would
incorrectly return the pseudo `CommuteAnyOperandIndex` as a commutable
operand index, rather than fixing a specific index.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D108206
2021-08-20 10:27:15 +01:00
David Green
d10f23a25d [ISel] Expand saddsat and ssubsat via asr and xor
This changes the lowering of saddsat and ssubsat so that instead of
using:
  r,o = saddo x, y
  c = setcc r < 0
  s = c ? INTMAX : INTMIN
  ret o ? s : r
into using asr and xor to materialize the INTMAX/INTMIN constants:
  r,o = saddo x, y
  s = ashr r, BW-1
  x = xor s, INTMIN
  ret o ? x : r
https://alive2.llvm.org/ce/z/TYufgD

This seems to reduce the instruction count in most testcases across most
architectures. X86 has some custom lowering added to compensate for
cases where it can increase instruction count.

Differential Revision: https://reviews.llvm.org/D105853
2021-08-19 16:08:07 +01:00
Ben Shi
b10e74389e [RISCV][test] Improve tests for (add (mul x, c1), c2)
Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107710
2021-08-19 21:04:35 +08:00
Fraser Cormack
e6b1ac8546 [LegalizeTypes][VP] Add widening support for binary VP ops
This patch adds the beginnings of more thorough support in the
legalizers for vector-predicated (VP) operations.

The first step is the ability to widen illegal vectors. The more
complicated scenario in which the result/operands need widening but the
mask doesn't has not been handled here. That would require a lot of code
without an in-tree target on which to test it.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D107904
2021-08-19 13:08:47 +01:00
Ben Shi
9e40a32620 [RISCV][test] Add new tests for add optimization in the zba extension
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D108188
2021-08-19 19:59:23 +08:00
Craig Topper
3f9b37ccb1 [RISCV] Remove sext_inreg+add/sub/mul/shl isel patterns.
Let the sext_inreg be selected to sext.w. Remove unneeded sext.w
during PostProcessISelDAG.

This gives opportunities for some other isel patterns to match
like the ADDIPair or matching mul with immediate to shXadd.

This becomes possible after D107658 started selecting W instructions
based on users. The sext.w will be considered a W user so isel
will often select a W instruction for the sext.w input and we can
just remove the sext.w. Otherwise we can combine the sext.w with
a ADD/SUB/MUL/SLLI to create a new W instruction in parallel
to the the original instruction.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107708
2021-08-18 11:07:11 -07:00
Craig Topper
6d7ea597ef [RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS.
We already do this for non-constants RHS. This just removes the
special case. I believe the special case may have been needed
because the ANY_EXTEND of a constant used to create zero extended
constants, but we recently changed that to produce sign extended
constants.

D107658 is needed to prevent some regressions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107697
2021-08-18 10:44:25 -07:00
Craig Topper
20e6265873 [RISCV] Improve constant materialization for stores of i16 or i32 negative constants.
DAGCombiner::visitStore can clear the upper bits of constants
used by stores. This leads prevents them from being recognized as
sign extended negative values making them more expensive to
materialize.

This patch uses the hasAllNBitUsers method from D107658 to make
a negative constant if none of the users care about the upper bits.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D108052
2021-08-18 10:25:12 -07:00
Craig Topper
d9ba1a9c5c [RISCV] Teach isel to select ADDW/SUBW/MULW/SLLIW when only the lower 32-bits are used.
We normally select these when the root node is a sext_inreg, but
SimplifyDemandedBits can sometimes bypass the sext_inreg for some
users. This can create situation where sext_inreg+add/sub/mul/shl
is selected to a W instruction, and then the add/sub/mul/shl is
separately selected to a non-W instruction with the same inputs.

This patch tries to detect when it would still be ok to use a W
instruction without the sext_inreg by checking the direct users.
This can allow the W instruction to CSE with one created for a
sext_inreg+add/sub/mul/shl. To minimize complexity and cost of
checking, we make no attempt to determine if the CSE will happen
and just always use a W instruction when we can.

Differential Revision: https://reviews.llvm.org/D107658
2021-08-18 10:22:00 -07:00
Petr Hosek
2d4470ab89 Revert "Allow rematerialization of virtual reg uses"
This reverts commit 877572cc19 which
introduced PR51516.
2021-08-18 00:12:41 -07:00
jacquesguan
a7ebc4d145 [DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTOR
Make DAGCombine turn mul by power of 2 into shl for scalable vector.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107883
2021-08-18 10:10:40 +08:00
Stanislav Mekhanoshin
877572cc19 Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-16 12:42:42 -07:00
Simon Pilgrim
d6fe8d37c6 [DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b)
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.

This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.

Differential Revision: https://reviews.llvm.org/D107597
2021-08-16 16:06:54 +01:00
Craig Topper
d63f117210 [RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode. 2021-08-13 18:00:09 -07:00
Craig Topper
a2556bf44c [RISCV] Improve check prefixes in B extension tests. NFC
-Add Z for the B extension subextensions.
-Don't mention I along with B or its sub extensions.

This is based on comments in D107817.

Differential Revision: https://reviews.llvm.org/D107992
2021-08-12 12:41:40 -07:00
Craig Topper
e25665f52e [RISCV] Add test cases showing inefficient materialization for stores of immediates. NFC
DAGCombiner::visitStore can call GetDemandedBits which will remove
upper bits from immediates. The upper bits are important for good
materialization of negative constants on RISCV. GetDemandedBits is a
different mechanism than SimplifyDemandedBits so
TargetShrinkDemandedConstant can't block it.

As far as I know this behavior is unique to stores.

I think we can fix this in isel using a concept similar to D107658.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107860
2021-08-12 10:14:07 -07:00
Craig Topper
79fbddbea0 [RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases.
For unit-stride and strided load/stores we set the SEW operand of
the pseudo instruction equal the EEW in the opcode. The LMUL
of the pseudo instruction is the LMUL we want.

These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use
this to avoid changing vtype if the SEW/LMUL of the previous
vtype matches the EEW/EMUL ratio we need for the instruction.

Due to how the global analysis works, we can only do this
optimization when the previous vsetvli was produced in the block
containing the store. We need to know in the first phase if the
vsetvli will be inserted so we can propagate information to
the successors in the second phase correctly. This means we can't
depend on predecessors.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D106601
2021-08-12 10:05:27 -07:00
Ben Shi
0247403910 [RISCV][test] Add new tests for mul optimization in the zba extension with SH*ADD
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107817
2021-08-11 09:56:45 +08:00
Craig Topper
6d5b14d854 [RISCV] Remove stale TODO from test. NFC 2021-08-10 13:13:48 -07:00
Craig Topper
6f5edc3487 [RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y))
Similar for sub except sub isn't commutative.

Modify the existing and/or/xor folds to also work on ISD::SELECT
and not just RISCVISD::SELECT_CC. This is needed to make sure
we do this transform before type legalization turns i32 add/sub
into add/sub+sign_extend_inreg on RV64. If we don't do this before
that, the sign_extend_inreg will still be after the select.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D107603
2021-08-10 09:02:56 -07:00
Fraser Cormack
2b4a1d4b86 [RISCV] Improve codegen for shuffles with LHS/RHS splats
Shuffles which are broken into separate halves reveal splats in which
a half is accessed via one index; such operations can be optimized to
use "vrgather.vi".

This optimization could be achieved by adding extra patterns to match
`vrgather_vv_vl` which uses a splat as an index operand, but this patch
instead identifies splat earlier. This way, future optimizations can
build on top of the data gathered here, e.g., to splat-gather dominant
indices and insert any leftovers.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107449
2021-08-09 10:31:40 +01:00
Fraser Cormack
b5c608c377 [RISCV] Add tests covering shuffles which can be optimized
These shuffles all take the form of a "splat" of the LHS and/or RHS to
some degree, with one or two elements needing patched up afterwards. We
currently lower all of these to full LHS/RHS vector-index shuffles with
vrgather.vv.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D107447
2021-08-09 10:20:42 +01:00
Craig Topper
2f3b738960 [RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64.
This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.
2021-08-08 18:30:48 -07:00
Craig Topper
6606936322 [RISCV] Remove -target-abi from half-bitmanip-dagcombines.ll.
This should be testing the custom ISD nodes we use for passing
half values in GPRs.

We should optimize these to integer operations, but we currently
don't.
2021-08-08 18:19:35 -07:00
Craig Topper
88bc29f5f2 [RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC
Previously we converted ISD condition codes to integers and stored
them directly in our MIR instructions. The ISD enum kind of belongs
to SelectionDAG so that seems like incorrect layering.

This patch instead uses a CondCode node on RISCV::SELECT_CC until
isel and then converts it from ISD encoding to a RISCV specific value.
This value can be converted to/from the RISCV branch opcodes in the
RISCV namespace.

My larger motivation is to possibly support a microarchitectural
feature of some CPUs where a short forward branch over a single
instruction can be predicated internally. This will require a new
pseudo instruction for select that needs to carry a branch condition
and live probably until RISCVExpandPseudos. At that point it can be
expanded to control flow without other instructions ending up in the
predicated basic block. Using an ISD encoding in RISCVExpandPseudos
doesn't seem like correct layering.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107400
2021-08-08 17:25:37 -07:00
Craig Topper
5894134c6e [RISCV] Autogenerate test. NFC 2021-08-07 17:11:11 -07:00
Craig Topper
d4ee84ceee [RISCV] Support FP_TO_S/UINT_SAT for i32 and i64.
The fcvt fp to integer instructions saturate if their input is
infinity or out of range, but the instructions produce a maximum
integer for nan instead of 0 required for the ISD opcodes.

This means we can use the instructions to do the saturating
conversion, but we'll need to fix up the nan case at the end.

We can probably improve the i8 and i16 default codegen as well,
but I'll leave that for a follow up.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D107230
2021-08-07 16:06:00 -07:00
Craig Topper
f7076cfd3a [DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding.
We don't have real demanded bits support for MULHU, but we can
still use the known bits based constant folding support at the end
of SimplifyDemandedBits to simplify a MULHU. This helps with cases
where we know the LHS and RHS have enough leading zeros so that
the high multiply result is always 0.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106471
2021-08-05 08:31:26 -07:00