GFX10 image instructions use one or more address operands starting at
vaddr0, instead of a single vaddr operand, to allow for NSA forms.
Differential Revision: https://reviews.llvm.org/D81675
Fix the division/remainder algorithm by adding a second quotient
refinement step, which is required in some cases like
0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212).
Also document, rewrite and simplify it by ensuring that we always have a
lower bound on inv(y), which simplifies the UNR step and the quotient
refinement steps.
Differential Revision: https://reviews.llvm.org/D83381
handleAssignments was assuming every argument type is an MVT, and
assignArg would always fail. This fixes one of the hacks in the
current AMDGPU calling convention code that pre-processes the
arguments.
The tests in a5b9ad7e9a actually failed
the verifier, which for some reason is not the default. Also add tests
for 0-sized function arguments, which do not add entries to the
expected register lists.
Even though wide vectors are legal they still cost more as we
will have to eventually split them. Not all operations can
be uniformly done on vector types.
Conservatively add the cost of splitting at least to 8 dwords,
which is our widest possible load.
We are more or less lying to cost mode with this change but
this can prevent vectorizer from creation of wide vectors which
results in RA problems for us.
Differential Revision: https://reviews.llvm.org/D83078
Summary:
Avoid exposing details about how roots are stored. This enables subsequent
type-erasure changes.
v5:
- cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE
Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d
Reviewers: arsenm, RKSimon, mehdi_amini, courbet
Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83085
This was resulting in a missing vreg def in the use select
instruction.
The output of the pseudo doesn't make sense, since it really shouldn't
have the vreg output in the first place, and instead an implicit scc
def to match the real scalar behavior.
We could have easier to understand tests if we selected scalar
versions of the [us]{add|sub}.with.overflow intrinsics.
This does still end up producing vector code in the end, since it gets
moved later.
The default constructor wasn't setting isSet o the ArgDescriptor, so
while these had the value set, they were treated as missing. This only
ended up mattering in the indirect call case (and for regular calls in
GlobalISel, which current doesn't have a way to support the variable
ABI).
Exit early if the exec mask is zero at the end of control flow.
Mark the ends of control flow during control flow lowering and
convert these to exits during the insert skips pass.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D82737
Exit early if the exec mask is zero at the end of control flow.
Mark the ends of control flow during control flow lowering and
convert these to exits during the insert skips pass.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D82737
Generate a single early exit block out-of-line and branch to this
if all lanes are killed. This avoids branching if lanes are active.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D82641
Summary:
If amdgpu-flat-work-group-size is not specified in LLVM IR, the backend
uses default value of 1024. For this, minimum waves per EU should be 4.
However, backend is still setting minimum value to 1 instead of calculated
value. This is not observed normally as frontend always provide
amdgpu-flat-work-group-size attribute.
Reviewers: rampitec, b-sumner, sameerds, msearles
Reviewed By: rampitec
Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81991
If the original register operand had a subregister, it wasn't getting
cleared. This resulted in reinterpreted the subreg index as
unrecognized target flags, which produced unparseable MIR.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.
Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.
Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D79794
Change imm with timm in pattern for SI_INIT_EXEC_LO and
remove regbank mappings for non register operands.
Differential Revision: https://reviews.llvm.org/D82885
In case of more than wavesize CSR SGPR spills, lanes of reserved VGPR were getting
overwritten due to wrap around.
Reserve a VGPR (when NumVGPRSpillLanes = 0, WaveSize, 2*WaveSize, ..) and when one
of the two conditions is true:
1. One reserved VGPR being tracked by VGPRReservedForSGPRSpill is not yet reserved.
2. All spill lanes of reserved VGPR(s) are full and another spill lane is required.
Reviewed By: arsenm, kerbowa
Differential Revision: https://reviews.llvm.org/D82463
Condition `LiteralCount` is checked both in an outer and in an inner
`if` statement in `SIInstrInfo::verifyInstruction()`. This patch removes
the redundant inner check.
The issue was found using `clang-tidy` check under review
`misc-redundant-condition`. See https://reviews.llvm.org/D81272.
Differential Revision: https://reviews.llvm.org/D82555
Select into corresponding V_CMP instruction based on CmpInst predicate,
stored as immediate, in last operand.
Differential Revision: https://reviews.llvm.org/D82652
For now, moving it to SIPreEmitPeephole.
Should find a right place to have this code.
Reviewed By: nhaehnle
Differential revision: https://reviews.llvm.org/D77544
Also fix an SSA violation in a test the MIRParser/verifier fails to
catch. It's illegal to define a subregister in SSA. For the purpose of
the test, it just needs to define the super-register to use the
subregister in the use operand.
This avoids many instances of failing to legalize a vector truncstore
of <4 x s8> to 2 bytes. We don't perfectly handle every truncstore
yet, largely because the given set of legalization actions can't
actually differentiate between changing the result type and changing
the memory type.
Summary:
Add patterns to select s_cselect in the isel.
Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Re-commit D81925 with a bugfix D82370.
Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
Summary:
Without fixImplicitOperands we may end up creating default implicit operands
that are the wrong wave size
Includes simple test that provokes insertBranch in the correct way to expose the
issue being fixed.
Change-Id: I92bdcdee9fcb7b4d91529b84e76a48ac8218483e
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82459