This code was trying to save temporary argument registers in interrupt
handler functions that contain calls. With the exception that all FP
registers are saved including the normally callee saved registers.
If all of the callees use an FP ABI and the interrupt handler doesn't
touch the normally callee saved FP registers, we don't need to save
them.
It doesn't appear that we need to special case functions with calls. The
normal callee saved register handling will already check each of the calls
and consider a register clobbered if the call doesn't explicitly say it is preserved.
All of the test changes are from the removal of the FP callee saved
registers. There are tests for interrupt handlers with F and D extension
that use ilp32 or lp64 ABIs that are not affected by this change. They
still save the FP callee saved registers as they should.
gcc appears to have a bug where the D extension being enabled with the
ilp32f or lp64f ABI does not save the FP callee saved regs. The callee
would only save/restore the lower 32 bits and clobber the upper bits.
LLVM saves the FP callee saved regs in this case and there is an
unchanged test for it.
The unnecessary save/restore was raised in this thread
https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
We recently optimized the code when the Odd vector was undef to fix a
poison bug.
There are additional optimizations we can do if the even vector is
undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a
vzext.vf2 and a vsll.
This improves a pattern that occurs in 531.deepsjeng_r. Reducing the
dynamic instruction count by 0.5%.
This may be possible to improve in SelectionDAG, but given the special
cases around shXadd formation, it's not obvious it can be done in a
robust way without adding multiple special cases.
I've used a GEP with 2 indices because that mostly closely resembles the
motivating case. Most of the test cases are the simplest GEP case. One
test has a logical right shift on an index which is closer to the
deepsjeng code. This requires special handling in isel to reverse a
DAGCombiner canonicalization that turns a pair of shifts into (srl (and
X, C1), C2).
This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like
`(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can
use `vwadd.vx` and `vwadd.w` for such a case.
### Source code
```
define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) {
%sb = sext i32 %b to i64
%head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0
%splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
%vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64>
%ve = add <vscale x 8 x i64> %vc, %splat
ret <vscale x 8 x i64> %ve
}
```
### Before this patch
[Compiler Explorer](https://godbolt.org/z/sq191PsT4)
```
vwadd_vx_splat_sext:
sext.w a0, a0
vsetvli a1, zero, e64, m8, ta, ma
vmv.v.x v16, a0
vsetvli zero, zero, e32, m4, ta, ma
vwadd.wv v16, v16, v8
vmv8r.v v8, v16
ret
```
### After this patch
```
vwadd_vx_splat_sext
vsetvli a1, zero, e32, m4, ta, ma
vwadd.vx v16, v8, a0
vmv8r.v v8, v16
ret
```
This follows on from #87616, but includes the tests with codegen
differences. These are presumably due to the fact that the splat is now
a constant expression. They don't seem to affect anything that we were
specifically testing for.
If the subtarget has +zvbb then we can attempt folding shl and shl_vl to
vwsll nodes.
There are few test cases where we still don't pick up the vwsll:
- For fixed vector vwsll.vi on RV32, see the FIXME for VMV_V_X_VL in
fillUpExtensionSupport for support implicit sign extension
- For scalable vector vwsll.vi we need to support ISD::SPLAT_VECTOR, see
#87249
In NodeExtensionHelper we keep track of the VL and mask of the operand
being extended and check that they are the same as the root node's.
However for the nodes that we support, none of them have a passthru
operand with the exception of RISCV::VMV_V_X_VL, but we check that it's
passthru is undef anyway.
So it's safe to just discard the extend node's VL and mask and just use
the root's instead. (This is the same type of reasoning we use to treat
any vmset_vl as an all ones mask)
This allows us to match some more cases where we mix VP/non-VP/VL nodes,
but these don't seem to appear in practice. The main benefit from this
would be to simplify the code.
A handy shorthand for specifying the shufflevector(insertelement(poison,
foo, 0), poison, zeroinitializer) splat pattern was introduced in
#74620.
Some of the RISC-V tests were converted over to use this new form in
dbb65dd330, this patch handles the rest
which didn't have any codegen diffs.
This not only converts some constant expressions to the new form, but
also instruction sequences that weren't previously constant expressions
to constant expressions as well. In some cases this affects codegen, but
these have been omitted here and will be handled in a separate PR.
All of these test cases had iXLen in their name which got replaced
by sed. This prevented FileCheck from finding the function. The other
test cases in these files do not have that issue.
If we're falling back to generic constant formation in a register +
add/sub, we can check if we have a constant which is 12-bits but left
shifted by 2 or 3. If so, we can use a sh2add or sh3add to perform the
shift and add in a single instruction.
This is profitable when the unshifted constant would require two
instructions (LUI/ADDI) to form, but is never harmful since we're going
to need at least two instructions regardless of the constant value.
Since stacks are aligned to 16 bytes by default, sh3add allows addresing
(aligned) data out to 2^14 (i.e. 16kb) in at most two instructions
w/zba.
This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
expanding into a stack store and reload.
When the encoding of register tuples are aligned, we can use a copy
with larger LMUL to reduce copies.
Reviewers: preames, topperc, lukel97
Reviewed By: topperc, lukel97
Pull Request: https://github.com/llvm/llvm-project/pull/84455
Adds two sets of tests. First, one for prolog/epilogue insertions where
the second stack adjustment can be done with shNadd for zba. Second, a
set of tests with offsets off SP in the same ranges, but also adding
varying alignments.
sed was being used to use the same test functions with eq/ne branch
condition.
This commit duplicates the test functions so that we have a version
with each condition. This allows us to remove 2 RUN lines.
I plan to add a Zca testing to this file which now requires 1 new
RUN line instead of 2.
Test description says constant does not fit in 12 bits, but the constant
used was -2048 which does fit in 12 bits. Update to -2049.
Also remove uses of -NOT in favor of positive checks. One of the -NOT
should have been using RESBROPT instead of "c.beqz" so that it would
check for the absense of the correct instruction based on the sed
replacement on the RUN line.
On RV64, we legalize zexts of i1s to (vselect m, (splat_vector i64 1),
(splat_vector i64 0)), where the splat_vectors are implicitly
truncating.
When the vselect is used by a binop we want to pull the vselect out via
foldSelectWithIdentityConstant. But because vectors with an element size
< i64 will truncate, isNeutralConstant will return false.
This patch handles truncating splats by getting the APInt value and
truncating it. We almost don't need to do this since most of the neutral
elements are either one/zero/all ones, but it will make a difference for
smax and smin.
I wasn't able to figure out a way to write the tests in terms of select,
since we need the i1 zext legalization to create a truncating
splat_vector.
This supercedes #87236. Fixed vectors are unfortunately not handled by
this patch (since they get legalized to _VL nodes), but they don't seem
to appear in the wild.
Fixed vectors have their sext/zext operands legalized to _VL nodes, so
we need to handle them in the patterns.
This adds a riscv_ext_vl_oneuse pattern since we don't care about the
type of extension used for the shift amount, and extends
Low8BitsSplatPat to handle other _VL nodes. We don't actually need to
check the mask or VL there since none of the _VL nodes have passthru
operands.
The remaining test cases that are widening from i8->i64 need to be
handled by extending combineBinOp_VLToVWBinOp_VL.
This also fixes Low8BitsSplatPat incorrectly checking the vector size
instead of the element size to determine if the splat value might have
been truncated below 8 bits.
This patch legalizes G_ZEXT, G_SEXT, and G_ANYEXT. If the type is a
legal mask type, then the instruction is legalized as the element-wise
select, where the condition on the select is the mask typed source
operand, and the true and false values are 1 or -1 (for
zero/any-extension and sign extension) and zero. If the type is a legal integer
or vector integer type, then the instruction is marked as legal.
The legalization of the extends may introduce a G_SPLAT_VECTOR, which
needs to be legalized in this patch for the extend test cases to pass.
A G_SPLAT_VECTOR is legal if the vector type is a legal integer or
floating point vector type and the source operand is sXLen type. This is
because the SelectionDAG patterns only support sXLen typed
ISD::SPLAT_VECTORS, and we'd like to reuse those patterns. A
G_SPLAT_VECTOR is cutom legalized if it has a legal s1 element vector
type and s1 scalar operand. It is legalized to G_VMSET_VL or G_VMCLR_VL
if the splat is all ones or all zeros respectivley. In the case of a
non-constant mask splat, we legalize by promoting the scalar value to
s8.
In order to get the s8 element vector back into s1 vector, we use a
G_ICMP. In order for the splat vector and extend tests to pass, we also
need to legalize G_ICMP in this patch.
A G_ICMP is legal if the destination type is a legal bool vector and the LHS and
RHS are legal integer vector types.
Hi all,
This patch is a follow-up of #79101. It migrates logic from
`visitVSELECT` to `visitVP_SELECT` to simplify `vp.select`. With this
patch we can do the following combinations:
```
vp.select undef, T, F --> T (if T is a constant), F otherwise
vp.select <condition>, undef, F --> F
vp.select <condition>, T, undef --> T
vp.select false, T, F --> F
vp.select <condition>, T, T --> T
```
I'm a total newbie to llvm and I'm sure there's room for improvements in
this patch. Please let me know if you have any advice. Thank you in
advance!
If the odd vector is undef or poison, the widening add and multiply trick
doesn't work unless we freeze the odd vector.
Unfortunately, freezing doesn't work when the operand is provably
undef/poison. MIR doesn't have a representation for freeze so it
just becomes a COPY from IMPLICIT_DEF which freely propagates undef
to each operand independently.
To work around this, check for undef explicitly and lower to a VZEXT_VL
of the even vector. This produces better code than we'd get from a
freeze anyway.
I've left a FIXME for adding a freeze. I'll do that as a separate patch
as it affects other tests and doesn't help with the new test.
The interleave lowering relies on a math trick that requires passing
the odd vector to two math instructions. In order to be correct
these instructions must see the same value.
If the odd vector is provably poison or undef, SelectionDAG will
create a vwadd and vwmaccu where the operand is a copy from IMPLICIT_DEF.
Later this will become just the undef flag on the operand. This
gives the register allocator freedom to pick a different register
for each instruction.
SelectionDAG marks ISD::BITCAST as legal between scalable vector types
and ISelDAGToDAG deletes them.
We mark G_BITCAST between scalable vectors as legal in GISel. A future
patch will handle what to do with them after the legalizer (likley
either drop them in a isel-preprocess or convert them to COPYs).
BITCAST is needed for legalization of G_INSERT and G_EXTRACT. This is a
precommit for legalization of G_INSERT and G_EXTRACT.
DAGCombiner (or InstCombine) will convert an add to an or if the bits
are disjoint, which can prevent what was originally an (add {s,z}ext,
{s,z}ext) from being selected as a vwadd.
This teaches combineBinOp_VLToVWBinOp_VL to recover it by treating it as
an add.
We currently check if the source and promoted types are not equal before
generating truncate instructions. This does not work for RV64 where the
promoted type is i64 and this lead to a crash due to the generation of
truncate instructions from i32 to i64.
Fixes#86400
This narrows unsigned and signed div and rem nodes via
combineBinOpOfZExt.
Unlike other binary ops, there are no widening div or rem instructions.
So we will end up with an extra vzext.vf2.
However I'm assuming that div/rem are expensive enough that by reducing
their EMUL we will gain back the cost.
Alive2 proof: https://alive2.llvm.org/ce/z/Et_L6y