This includes a fix for the tramp3d failure from the llvm-testsuite
that caused the last revert. Hopefully the others failures were the
same issue.
Original commit message:
For RISC-V, load/store(exclude vector load/store) instructions only has a 12 bit immediate operand. If the offset is out-of-range, it must make use of a temp register to make up this offset. If between these offsets, they have a small(IsInt<12>) relative offset, LocalStackSlotAllocation pass can find a value as frame base register's value, and replace the origin offset with this register's value plus the relative offset.
Co-authored-by: luxufan <luxufan@iscas.ac.cn>
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D98101
Add patterns with seteq/setne conditions.
We don't have instructions for seteq/setne except for comparing
with zero and need to emit an ADDI or XOR before a seqz/snez to
compare other values.
The select ISD node takes a 0/1 value for the condition, but the
VT_MASKC(N) instructions check all XLen bits for zero or non-zero.
We can use this to avoid the seqz/snez in many cases.
This is pretty ridiculous number of patterns. I wonder if we could
use some ComplexPatterns to merge them, but I'd like to do that as
a follow up and focus on correctness of the result in this patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D140421
This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.
This uses vscale_range for min/max vector width unless the command
line overrides are used.
As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D139873
Fixing a crash during vsetvli insertion pass.
We have a testcase with 3 vsetvli:
1. vsetivli zero, 2, e8, m4, ta, ma
2. li a1, 32; vsetvli zero, a1, e8, m4, ta, mu
3. vsetivli zero, 2, e8, m4, ta, ma
and then we trying to optimize 2nd vsetvli since the only user is vmv.x.s, so
it could mutate the AVL operand to the AVL operand of the 3rd vsetvli.
OK, so we propagate 2 to vsetvli, BUT it's vsetvli not vsetivli, so it expect a
register rather than a immediate value, so we have to update the opcode
if needed.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D141061
The patch also adds expandVPCTLZ and expandVPCTTZ to expand vp.ctlz/cttz nodes
and the cost model of vp.ctlz/cttz.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140370
The scalar move instructions (vmv.s.x, and fvmv.s.f) depend solely on whether the VL is 0 or non-zero. By tracking the fact we only demand the zeroness and not the whole VL value, we can allow changing VL over a scalar move. This helps to eliminate vsetvli toggles.
Differential Revision: https://reviews.llvm.org/D140157
This is mostly geared at consolidating logic into one form to reduce code duplication, but also has the effect of being a slight generalization. Since these operations aren't masked, we can ignore the mask policy bit when deciding on compatibility. The previous code was overly strict in checking that both policy bits matched.
Note: There's a slight difference from the reviewed version. The reviewed version was based on a local revision which included the isCompatible change to only check AVL if VL is used. I apparently never landed that change, and while functional, the functional change isn't visible without this one. I chose to role the extra change into this patch.
Differential Revision: https://reviews.llvm.org/D140147
The patch tries to make more vslidup nodes use tail agnostic. The idea comes
from D125546 authored by Zack Chen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140669
isel is now capable of turning the SUB into XOR for shift amounts.
Though it uses NOT instead of XOR with ShiftSize-1.
By using SUB during lowering we enable more DAG combines with
other arithmetic on the shift amount.
If the shift amount is (sub C, X) where C is -1 modulo the size of
the shift, we can replace the sub with a NOT.
We could also use XORI X, size-1, but NOT would work better with
c.not from the future Zce extension.
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.
This patch is for scalars only, but we might be able to do this
for vectors in a follow up.
Differential Revision: https://reviews.llvm.org/D140750
Follow-up patch of D140530.
We can add FMIN, FMAX to isAssociativeAndCommutative to
increase instruction-level parallelism by the existing MachineCombiner
pass.
Differential Revision: https://reviews.llvm.org/D140602
Inspired by D138107.
We can add ADD, AND, OR, XOR, MUL, MIN[U]/MAX[U] to isAssociativeAndCommutative
to increase instruction-level parallelism by the existing MachineCombiner pass.
Differential Revision: https://reviews.llvm.org/D140530
There is no compressed form of ORI but there is a compressed form
for ADDI.
This also works for XORI since DAGCombine will turn Xor with disjoint
bits in Or.
Note: The compressed forms require a simm6 immediate, but I'm doing
this for the full simm12 range.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D140674
There were 4 RUN lines, but only 2 of them were unique. I believe
we were trying to test LMUL=1 and LMUL=8 with riscv32 and riscv64.
But put riscv32 on both LMUL=1 lines and riscv64 on both LMUL=8 lines.
We can recursively look through SRLI if the shift amount is less
than the demanded bits. We can reduce the demanded bit count by
the shift amount and check the users of the SRLI.
Two comparison operations and a logical operation are combined into selection using MIN or MAX and comparison operation.
For optimization to be applied conditions have to be satisfied:
1. In comparison operations has to be the one common operand.
2. Supports only signed and unsigned integers.
3. Comparison has to be the same with respect to common operand.
4. There are no more users of comparison except logic operation.
5. Every combination of comparison and AND, OR are supported.
It will convert
%l0 = %a < %c
%l1 = %b < %c
%res = %l0 or %l1
into
%sel = min(%a, %b)
%res = %sel < %c
It supports several comparison operations (<, <=, >, >=), signed, unsigned values and different order of operands if they do not violate conditions.
Differential Revision: https://reviews.llvm.org/D134277
Two comparison operations and a logical operation are combined into selection using MIN or MAX and comparison operation.
For optimization to be applied conditions have to be satisfied:
1. In comparison operations has to be the one common operand.
2. Supports only signed or unsigned integers.
3. Comparison has to be the same with respect to common operand.
4. There are no more users of comparison except logic operation.
5. Every combination of comparison and AND, OR are supported.
It will convert
%l0 = %a < %c
%l1 = %b < %c
%res = %l0 or %l1
into
%sel = min(%a, %b)
%res = %sel < %c
It supports several comparison operations (<, <=, >, >=), signed, unsigned values and different order of operands if they do not violate conditions.
SLLI and ADD are more compressible than SLLIW and ADDW. SLLI/ADD both have a 5-bit register encoding. SLLIW/ADDW have a 3-bit register encoding. They both require the dest to also be one of the sources.
We aggressively form ADDW/SLLIW as it helps hasAllWBitUsers in RISCVISelDAGToDAG to not require recursion. So we need a pass to remove excessive -w suffixes.
Differential Revision: https://reviews.llvm.org/D139948
Similar for sub, or, and xor. These are all operations that have 0
as a neutral value. This is based on a similar tranform in InstCombine.
This allows us to remove some XVentanaCondOps patterns and
some code from DAGCombine for RISCVISD::SELECT_CC.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D140465
These are test for select (and (x , 0x1) == 0), (z ^ y), y ) and select (and (x , 0x1) == 0), (z | y), y )
These can be made branchless by using ((x-1) & z ) ^ y.
Most of our existing tests use i1 arguments for the conditions.
With icmp conditions there are opportunities for improving the
generated code.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D140403
Primarily this allows us to fold the addi from PseudoLLA expansion
into a load.
If the linker is able to GP relax the constant pool access we'll
end up with a GP relative load.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D140341