IR intrinsics were already defined, but no codegen support had been
added.
I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
Zvfbfmin does not have any scalar operands making this an unnecessary
dependency. The spec was just updated to remove this. See
86d7a74f4b
This fixes a correctness issue where Xsfvfwmaccqqq was incorrectly
depending on Zfbfmin. The SiFive CPUs that support Xsfvfwmaccqqq do not
implement Zfbfmin, but do implement Zvfbfmin based on a previous
understanding that it only requires Zve32f. I've added tests for this
feature to raise the bar for adding dependencies to it in the future.
The SiFive7 scheduler model has been using AcquireAtCycles and
ReleaseAtCycles for some time. Without EnableIntervals, the scheduler
was not making decisions based on this information. This patch sets
EnableIntervals to true, and the test case demonstrates that the VADD
instructions can be issued one cycle earlier since the VCQ is not
reserved. This leads to better saturation of the SiFive7VA.
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
This reverts commit 08b306dc8e.
it causes the following assertion failure:
llvm/include/llvm/CodeGen/MachineFrameInfo.h:530: int64_t
llvm::MachineFrameInfo::getObjectOffset(int) const: Assertion
`!isDeadObjectIndex(ObjectIdx) && "Getting frame offset for a dead
object?"' failed.
If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants. This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on the
value of VL.
These test cases where intended to get a single vsetvli by using the
vmv.s.x intrinsic with the same LMUL as the reduction. This works for
FP, but does not work for integer.
I believe #71501 will break this for FP too. Hopefully the vsetvli pass
can be taught to fix this.
This ensures we clip the index to be in bounds of the vector we are
inserting into. If the index is out of bounds the results of the insert
element is poison. If we don't clip the index we can write memory that
was not part of the original store.
Fixes#74248#75557.
Now we have default abi lp64 for rv64if and ilp32 for rv32if, which is
different with riscv-gnu-toolchain. In
8e9fb09a0c/configure (L3385)
when have f and not d, it prefers lp64f/ilp32f but no soft float. This
patch tries to make their behaviors consistent.
When doing our backwards walk, we were not handling the case where the
AVL was defined by a register whose definition was an ADDI xN, x0,
<imm>. Doing so (as we already do in the forward pass) allows us to
prune a few more transitions.
If we know the exact VLEN, then we can tell if the AVL for particular
operation is equivalent to the vsetvli xN, zero, <vtype> encoding. Using
this encoding is better than having to materialize an immediate in a
register, but worse than being able to use the vsetivli zero, imm,
<type> encoding.
Middle end up optimizations can speculate away the short circuit
behavior of C/C++ && and ||. Using i1 and/or or logical select
instructions and a single branch.
SelectionDAGBuilder can turn i1 and/or/select back into multiple
branches, but this is disabled when jump is expensive.
RISC-V can use slt(u)(i) to evaluate a condition into any GPR which
makes us better than other targets that use a flag register. RISC-V also
has single instruction compare and branch. So its not clear from a code
size perspective that using compare+and/or is better.
If the full condition is dependent on multiple loads, using a logic
delays the branch resolution until all the loads are resolved even if
there is a cheap condition that makes the loads unnecessary.
PowerPC and Lanai are the only CPU targets that use setJumpIsExpensive.
NVPTX and AMDGPU also use it but they are GPU targets. PowerPC appears
to have a MachineIR pass that turns AND/OR of CR bits into multiple
branches. I don't know anything about Lanai and their reason for using
setJumpIsExpensive.
I think the decision to use logic vs branches is much more nuanced than
this big hammer. So I propose to make RISC-V match other CPU targets.
Anyone who wants the old behavior can still pass -mllvm
-jump-is-expensive=true.
Previously we bailed if we encountered a pseudo without a VL op, i.e.
vmv.x.s,
which prevented us from preserving VL and VTYPE. It looks like this was
copied
over from a time whenever this code was operating on the MachineInstrs
in
place, see https://reviews.llvm.org/D127870
However because we no longer mutate the MIs, we can just get rid of this
early
exit which allows us to preserve VL and VTYPE when dealing with vmv.x.s.
Support was added for the following fusions:
auipc-addi, slli-srli, ld-add
Some parts of the code became repetative, so small refactoring of
existing lui-addi fusion was done.
This test is added to be the counterpart of the SelectionDAG
llvm/test/CodeGen/RISCV/vararg.ll test. Minor changes were made compared
to the other version, all which are commented in the test file added in
this commit.
In the future, we can consider adding a G_VACOPY opcode instead of going
through the GIntrinsic for all targets. We do the approach in this patch
because that is what other targets do today.
G_VAARG can be expanded similiar to SelectionDAG::expandVAArg through
LegalizerHelper::lower. This patch implements the lowering through this
style of expansion.
The expansion gets the head of the va_list by loading the pointer to
va_list. Then, the head of the list is adjusted depending on argument
alignment information. This gives a pointer to the element to be read
out of the va_list. Next, the head of the va_list is bumped to the next
element in the list. The new head of the list is stored back to the
original pointer to the head of the va_list so that subsequent G_VAARG
instructions get the next element in the list. Lastly, the element is
loaded from the alignment adjusted pointer constructed earlier.
This change is stacked on #73062.
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.
This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.
This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.
This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.