The test cases extract a fixed element from a vector and splat it
into a vector. This gets DAG combined into a splat shuffle.
I've used some very wide vectors in the test to make sure we have
at least a couple tests where the element doesn't fit into the
uimm5 immediate of vrgather.vi so we fall back to vrgather.vx.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96186
This patch optimizes a build_vector "index sequence" and lowers it to
the existing custom RISCVISD::VID node. This pattern is common in
autovectorized code.
The custom node was updated to allow it to be used by both scalable and
fixed-length vectors, thus avoiding pattern duplication.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96332
As of the current draft these are no longer being considered
for the bitmanip spec. It wasn't clear what sub extension they
belonged in in the 0.93 spec.
So remove them. They can always be added back if something changes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96157
In vector v0.10, there are whole vector register load/store
instructions. I suggest to use the whole register load/store
instructions for generic load/store for scalable vector types. It could
save up vset{i}vl{i} for these load/store.
For fractional LMUL, I keep to use vle{eew}.v/vse{eew}.v instructions to
load/store partial vector registers.
Differential Revision: https://reviews.llvm.org/D95853
Building on the fixed vector support from D95705
I've added ISD nodes for vmv.v.x and vfmv.v.f and switched to
lowering the intrinsics to it. This allows us to share the same
isel patterns for both.
This doesn't handle splats of i64 on RV32 yet. The build_vector
gets converted to a vXi32 build_vector+bitcast during type
legalization. Not sure the best way to handle this at the moment.
Differential Revision: https://reviews.llvm.org/D96108
This is an alternative to D95563.
This is modeled after a similar feature for AArch64's SVE that uses
predicated scalable vector instructions.a
Rather than use predication, this patch uses an explicit VL operand.
I've limited it to always use LMUL=1 for now, but we can improve this
in the future.
This requires a bunch of new ISD opcodes to carry the VL operand.
I think we can probably lower intrinsics to these ISD opcodes to
cut down on the size of the isel table. Which is why I've added
patterns for all integer/float types and not just LMUL=1.
I'm only testing one vector width right now, but the width is
programmable via the command line.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95705
This adds support for commuting operands and converting between
vfmadd and vfmacc to avoid register copies.
To avoid messing up intrinsic behavior, I've added new pseudo
instructions that have the isCommutable flag set. These pseudos also
force a tail agnostic policy. The intrinsic version still use
the tail undisturbed policy.
For best results it looks like we need to start with fmadd and only
pick fmacc if its beneficial. MachineCSE commutes without contraining
the operands and then commutes back if it didn't help with CSE. So
I've made sure that when the operand choice isn't constrained, we
will keep fmadd for MachineCSE and when it does the second commute,
we get back the original instruction.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95800
This patch adds support for both the fadd reduction intrinsic, in both
the ordered and unordered modes.
The fmin and fmax intrinsics are not currently supported due to a
discrepancy between the LLVM semantics and the RVV ISA behaviour with
regards to signaling NaNs. This behaviour is likely fixed in version 2.3
of the RISC-V F/D/Q extension, but until then the intrinsics can be left
unsupported.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95870
This patch adds support for the integer reduction intrinsics supported
by RVV. This excludes "mul" which has no corresponding instruction.
The reduction instructions in RVV have slightly complicated type
constraints given they always produce a single "M1" vector register.
They are lowered to custom nodes including the second "scalar" reduction
operand to simplify the patterns and in the hope that they can be useful
for future DAG combines.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95620
This patch custom-legalizes all integer EXTRACT_VECTOR_ELT nodes where
SEW < XLEN to VMV_S_X nodes to help the compiler infer sign bits from
the result. This allows us to eliminate redundant sign extensions.
For parity, all integer EXTRACT_VECTOR_ELT nodes are legalized this way
so that we don't need TableGen patterns for some and not others.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95741
This patch adds support for lowering the sqrt intrinsic to the RVV
vfsqrt instruction.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96012
The vrgather.vv instruction uses a vector of indices with the same
SEW as operand 0. The vrgather.vx instructions use a scalar index
operand of XLen bits.
By splitting this into 2 intrinsics we are able to use LLVMatchType
in the definition to avoid specifying the type for the index operand
when creating the IR for the intrinsic. For .vv it will match the
operand 0 type. And for .vx it will match the type of the vl operand
we already needed to specify a type for.
I'm considering splitting more intrinsics. This was a somewhat
odd one because the .vx doesn't use the element type, it always
use XLen.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D95979
This improves our coverage of these operations and shows that we
use really large constants for division by constant on i8/i16
especially on RV64. The issue is that BuildSDIV/BuildUDIV are
limited to legal types so we have to promote to i64 before it
kicks in. At that point we've lost the range information for the
original type.
Due to a clerical error, the sdiv operation was mapping to vdivu and
udiv to vdiv, when the opposite mapping is the correct one.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95869
A follow up patch will add support for commuting operands or
changing opcode to vfmacc and friends.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95662
Rather than materializing the 0xffff immediate for the AND, use
a shift left to remove the upper bits and then shift in zeros
from the right.
This pattern occurs when type legalizing an i16 right shift.
I've implemented this with custom selection code for a number of
reasons. I've limited this to the AND having a single use. We need
to compensate for SimplifyDemandedBits altering the AND mask. I'm
using *W opcodes on RV64. We may want to generlize this in the
future. For all these reason it seemed easiest to do it this way.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95774
If we're going to end up expanding anyway, we should do it early
so we don't create extra operations to handle the bytes added by
promotion.
This is helfpul on RISCV where we might have to promote i16 all
the way to i64.
Differential Revision: https://reviews.llvm.org/D95756
This demonstrates a missed optimization: the `vmv.x.s` instruction is
used to extract the element from the vector, and this instruction
already sign-extends the value to XLEN.
These instructions have been removed from the 0.94 bitmanip spec.
We should focus on optimizing the codegen without using them.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D95302
This patch adds support for the full range of vector int-to-float,
float-to-int, and float-to-float conversions on legal types.
Many conversions are supported natively in RVV so are lowered with
patterns. These include conversions between (element) types of the same
size, and those that are half/double the size of the input. When
conversions take place between types that are less than half or more
than double the size we must lower them using sequences of instructions
which go via intermediate types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95447
In d2927f786e, I added patterns
to remove (and X, 31) from sllw/srlw/sraw shift amounts.
There is code in SelectionDAGISel.cpp that knows to use
computeKnownBits to fill in bits of the mask that were removed
by SimplifyDemandedBits based on bits being known zero.
The non-W shift patterns use immbottomxlenset which allows the
mask to have more than log2(xlen) trailing ones, but doesn't
have a call to computeKnownBits to fill in bits of the mask that may
have been cleared by SimplifyDemandedBits.
This patch copies code from X86 to handle more than log2(xlen)
bottom bits set and uses computeKnownBits to fill in missing bits
before counting.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95422
This patch fixes some crashes coming from
`RISCVISelLowering::getSetCCResultType`, which would occasionally return
an EVT constructed from an invalid MVT, which has a null Type pointer.
The attached test shows this happening currently for some fixed-length
vectors, which hit this issue when the V extension was enabled, even
though they're not legal types under the V extension. The fix was also
pre-emptively extended to scalable vectors which can't be represented as
an MVT, even though a test case couldn't be found for them.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95434
Move the Suffix string into the VTypeInfo class so we don't need a helper class to get to it.
Adjust pseudo naming scheme for FPRs to put F16/F32/F64 in
place of F in the pseudo instruction name rather than as a suffix.
This avoids special cases like VFMERGE from the original patch.
Differential Revision: https://reviews.llvm.org/D95404
When spilling, the spill size will depend on the size of register class.
For .vf vector instructions, it may spill the floating point scalar
argument. In order to use the correct load/store instructions for
spilling, we need to provide the correct floating point register class
for the .vf vector pseudo instructions.
In this commit, we define the .vf pseudo instructions as three
different kinds of pseudo instructions for half/float/double. For
example, PseudoVFADD_M1 will become as PseudoVFADD_F16_M1,
PseudoVFADD_F32_M1, and PseudoVFADD_F64_M1.
Differential Revision: https://reviews.llvm.org/D95234
RISCV has to use 2 shifts for (i64 (zext_inreg X, i32)), but we
can use addiw rd, rs1, x0 for sext_inreg. We already understood this
when type legalizing i32 seteq/ne on rv64. But this transform in
SimplifySetCC would sometimes undo it.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95289
This pattern can occur when an unsigned is used to index an array
on RV64.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95290
Original patch by @rogfer01.
This patch adds support for insertelt and extractelt operations on
scalable vectors.
Special care must be taken on RV32 when dealing with i64 vectors as
there are no straightforward ways to insert a 64-bit element without a
register of that size. To that end, both are custom-lowered to different
sequences.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94615
This makes our i8/i16 codegen more similar to the i32 codegen.
I've also added computeKnownBits support for DIVUW/REMUW so
that we can remove zero extending ANDs from the output. Without
this we end up turning DIVUW/REMUW back into DIVU/REMU via some
isel patterns.
Reviewed By: frasercrmck, luismarques
Differential Revision: https://reviews.llvm.org/D95322
As far as I know 32 bits arguments and returns on RV64 are always
sign extended to i64. So I think we should be taking this into
account around libcalls.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95285
This patch adds support for scalable-vector splats in DAGCombiner's
`isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions,
which enable the SelectionDAG div/rem-by-constant optimizations for
scalable vector types.
It also fixes up one case where the UDIV optimization was generating a
SETCC without first consulting the target for its preferred SETCC result
type.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94501
This adds support for ".attribute arch" for all extensions that are
currently supported by the compiler.
Differential Revision: https://reviews.llvm.org/D94931
The patterns that use this really want to know if the operand has at
least 32 sign/zero bits.
This increases opportunities to use W instructions when the original
source used i8/i16. Not sure how much this matters for performance,
but it makes i8/i16 code more consistent with i32.
We try to do this during DAG combine with SimplifyDemandedBits,
but it fails if there are multiple nodes using the AND. For
example, multiple shifts using the same shift amount.