Commit Graph

491 Commits

Author SHA1 Message Date
Matt Arsenault
90eb7d5283 AMDGPU: Fix spilling of 96-bit AGPRs 2020-08-06 12:42:07 -04:00
Matt Arsenault
d188a608bd AMDGPU: Fix code duplication between the selectors
Not sure this is the right place for this helper.
2020-08-06 10:42:15 -04:00
Stanislav Mekhanoshin
0bcda1a261 [AMDGPU] Scavenge temp reg for AGPR spill
Differential Revision: https://reviews.llvm.org/D85234
2020-08-05 13:29:19 -07:00
Vitaly Buka
b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka
89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Matt Arsenault
e56e9022bc AMDGPU: Fix liveness errors when copying AGPR tuples
Avoid recursively calling copyPhysReg for AGPR handling. This was
dropping the necessary super register implicit defs to avoid liveness
verifier errors.
2020-07-30 18:13:04 -04:00
hsmahesha
33fd4a18e7 [AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic
Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as
follows. The existing heuristic roughly summarizes as below:

* Assume, all the mem ops instructions participating in the clustering process,  loads/stores same num bytes
* If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes
* If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes
* If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes

So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it
properly handles both the sub-word loads and the wide loads.

Reviewed By: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D84354
2020-07-30 21:41:13 +05:30
Matt Arsenault
6a7b6dd54b AMDGPU: Don't assert in canInsertSelect
Currently GlobalISel doesn't force all VGPR phi operands to VGPRs, so
this hit a case where it was queried with a VGPR and SGPR. This could
arguably be a verifier error, but it's currently not.
2020-07-28 21:01:06 -04:00
hsmahesha
4905536086 Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size"
This reverts commit cc9d693856.
2020-07-17 12:20:37 +05:30
Matt Arsenault
1912ace968 AMDGPU: Move handling of AGPR copies to a separate function
This is in preparation for fixing multiple problems with the way AGPR
copies are handled, but this change is NFC itself. First, it's relying
on recursively calling copyPhysReg, which is losing information
necessary to get correct super register handling.

Second, it's constructing a new RegScavenger and doing a O(N^2) walk
on every single sub-spill for every AGPR tuple copy. Third, it's using
the forward form of the scavenger, and not using the preferred
backwards scan.
2020-07-16 14:32:24 -04:00
Matt Arsenault
79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Matt Arsenault
d2e74fad20 AMDGPU: Set more mov flags on V_ACCVGPR_{READ|WRITE}_B32
This fixes extra copies when materializing constants in AGPRs. This
made it a lot harder to trigger the spilling in spill-agpr.ll
2020-07-01 18:58:59 -04:00
Matt Arsenault
14fe4607f1 AMDGPU: Support commuting register and global operand 2020-07-01 13:59:13 -04:00
Matt Arsenault
a21544ad11 AMDGPU: Fix handling of target flags when commuting instruction
If the original register operand had a subregister, it wasn't getting
cleared. This resulted in reinterpreted the subreg index as
unrecognized target flags, which produced unparseable MIR.
2020-07-01 13:59:13 -04:00
James Y Knight
4b0aa5724f Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Adam Balogh
71c6a36018 [AMDGPU][NFC] Remove redundant condition
Condition `LiteralCount` is checked both in an outer and in an inner
`if` statement in `SIInstrInfo::verifyInstruction()`. This patch removes
the redundant inner check.

The issue was found using `clang-tidy` check under review
`misc-redundant-condition`. See https://reviews.llvm.org/D81272.

Differential Revision: https://reviews.llvm.org/D82555
2020-07-01 09:04:25 +02:00
Piotr Sobczak
0045786f14 [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Re-commit D81925 with a bugfix D82370.

Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
2020-06-25 10:38:23 +02:00
dstuttar
e8775c8d81 [AMDGPU] Make sure to fix implicit operands on insertBranch
Summary:
Without fixImplicitOperands we may end up creating default implicit operands
that are the wrong wave size

Includes simple test that provokes insertBranch in the correct way to expose the
issue being fixed.

Change-Id: I92bdcdee9fcb7b4d91529b84e76a48ac8218483e

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82459
2020-06-24 16:50:48 +01:00
Matt Arsenault
778351df77 Revert "[AMDGPU] Enable compare operations to be selected by divergence"
This reverts commit 521ac0b5ce.

Reported to break thousands of piglit tests.
2020-06-24 11:21:30 -04:00
alex-t
521ac0b5ce [AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.

Reviewers: rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82194
2020-06-24 11:50:40 +03:00
Your Name
cc9d693856 [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This
heuristic is purely based on different experimentation conducted, and there is
no analytical logic here.

Reviewers: foad, rampitec, arsenm, vpykhtin

Reviewed By: rampitec

Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82393
2020-06-24 00:39:41 +05:30
hsmahesha
5832950adb [AMDGPU/MemOpsCluster] Compute width for MIMG instruction class.
Summary:
`width` computation is missing for newly added `MIMG`
instruction class. Add it.

Reviewers: foad, rampitec, arsenm

Reviewed By: foad

Subscribers: MatzeB, javed.absar, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81649
2020-06-23 17:32:17 +05:30
Carl Ritson
4a7de36afc [AMDGPU] Avoid use of V_READLANE into EXEC in SGPR spills
Always prefer to clobber input SGPRs and restore them after the
spill.  This applies to both spills to VGPRs and scratch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D81914
2020-06-20 12:10:47 +09:00
Piotr Sobczak
6d9565d6d5 Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with
expensive checks enabled.

This reverts commit 4067de569f.
2020-06-19 16:41:04 +02:00
Piotr Sobczak
4067de569f [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81925
2020-06-19 16:17:46 +02:00
Matt Arsenault
5f5f566b26 AMDGPU: Don't use 16-bit FP inline constants in integer operands
It seems to be a hardware defect that the half inline constants do not
work as expected for the 16-bit integer operations (the inverse does
work correctly). Experimentation seems to show these are really
reading the 32-bit inline constants, which can be observed by writing
inline asm using op_sel to see what's in the high half of the
constant. Theoretically we could fold the high halves of the 32-bit
constants using op_sel.

The *_asm_all.s MC tests are broken, and I don't know where the script
to autogenerate these are. I started manually fixing it, but there's
just too many cases to fix. This also does break the
assembler/disassembler support for these values, and I'm not sure what
to do about it. These are still valid encodings, so it seems like you
should be able to use them in some way. If you wrote assembly using
them, you could have really meant it (perhaps to read the high bits
with op_sel?). The disassembler will print the invalid literal
constant which will fail to re-assemble. The behavior is also
different depending on the use context. Consider this example, which
was previously accepted and encoded using the inline constant:

  v_mad_i16 v5, v1, -4.0, v3
  ; encoding: [0x05,0x00,0xec,0xd1,0x01,0xef,0x0d,0x04]

In contexts where an inline immediate is required (such as on gfx8/9),
this will now be rejected. For gfx10, this will produce the literal
encoding and change the printed format:
  v_mad_i16 v5, v1, 0xc400, v3
  ; encoding: [0x05,0x00,0x5e,0xd7,0x01,0xff,0x0d,0x04,0x00,0xc4,0x00,0x00]

This is just another variation of the issue that we don't perfectly
handle round trip assembly/disassembly due to not tracking how
immediates were encoded. This doesn't matter much in practice, since
compilers don't emit the suboptimal encoding. I doubt any users are
relying on this behavior (although I did make use of the old behavior
to figure out what was wrong).

Fixes bug 46302.
2020-06-17 19:14:10 -04:00
Matt Arsenault
46579471fd AMDGPU: Fix spill/restore of 192-bit registers
I tried to use an IR inline asm test, but that doesn't work since the
inline asm handling asserts without an MVT to use.
2020-06-14 13:12:01 -04:00
Sebastian Neubauer
29a6ad94fd [AMDGPU] Add G16 support to image instructions
Add G16 feature for GFX10 and support A16 and G16 in GlobalISel.

Differential Revision: https://reviews.llvm.org/D76836
2020-06-12 11:26:31 +02:00
hsmahesha
7410571ce9 Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size"
This reverts commit 40a632a335.
2020-06-09 19:27:17 +05:30
hsmahesha
40a632a335 [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This heuristic
is purely based on different experimentation conducted, and there is no analytical
logic here.

Reviewers: foad, rampitec, arsenm, vpykhtin

Reviewed By: foad, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81085
2020-06-09 14:09:14 +05:30
Jay Foad
275ecaae16 [AMDGPU] Cluster MIMG instructions
Differential Revision: https://reviews.llvm.org/D74035
2020-06-08 14:01:53 +01:00
hsmahesha
29c17ed96e [AMDGPU/MemOpsCluster] Code clean-up around accessing of memory operand width
Summary:
Clean-up the width computing logic given a memory operand, and re-arrange code to avoid
code duplication.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80946
2020-06-03 14:03:52 +05:30
Matt Arsenault
452e0d9023 AMDGPU: Don't run mode switches with exec 0
These are scalar instructions that change vector instructions, so they
should not be executed without any active lanes.

The implementation of -amdgpu-skip-threshold also seem to be backwards
from expected, since decreasing it prevents removal.
2020-06-02 13:47:48 -04:00
hsmahesha
0ed2c04636 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
Matt Arsenault
f012c58abd AMDGPU: Move MIMG MMO check to verifier 2020-05-29 20:58:23 -04:00
Matt Arsenault
1a9e0d7092 AMDGPU: Make S_DENORM_MODE not be a scheduling boundary
Now that the mode register uses/defs should be properly modeled, we
don't need to treat the FP mode switch as an arbitrary side effect.
2020-05-28 10:39:33 -04:00
Stanislav Mekhanoshin
7392bbc301 AMDGPU/GlobalISel: Fixed insert element for non-standard vectors
Differential Revision: https://reviews.llvm.org/D80653
2020-05-27 16:26:22 -07:00
alex-t
eb1092ada3 [AMDGPU] Fix for the lost CarryOut/CarryIn register operands in S_ADD/SUB_CO_PSEUDO.
Summary: This fixes the 5b898bddff bug when the carry-in and carry-out registers became lost in lowering S_ADD/SUB_CO_PSEUDO.

Reviewers: rampitec, arsenm

Reviewed By: arsenm

Subscribers: msearles, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80158
2020-05-27 22:41:04 +03:00
Matt Arsenault
d37ce53ad3 AMDGPU: Set StackPointerRegisterToSaveRestore
This will enable selecting non-entry block allocas. Skip the SP write
check in the base isSchedulingBoundary implementation to preserve the
previous scheduling behavior and avoid test churn. It's apparently for
compile time reasons, but if we were to use this more work would be
needed since in some of the failing tests, we seem to incorrectly get
hazard nops inserted.
2020-05-27 13:44:05 -04:00
Matt Arsenault
07cd19efa2 AMDGPU: Fix dropping MI flags when rewriting instructions
All 3 passes that change instruction encodings were dropping MI
flags. This avoids scheduling regressions caused by setting
mayRaiseFPExceptions on FP instructions for non-strictfp functions.
2020-05-27 13:27:06 -04:00
Matt Arsenault
833996cef1 AMDGPU: Fix backwards s_cselect_* operands
The vector equivalent has backwards operands, but the scalar version
does not. The passes that use these hooks aren't enabled by default,
so this doesn't really change anything.
2020-05-27 09:26:09 -04:00
David Blaikie
025cd300cd Collapse variable into assert to remove non-assert unused variable 2020-05-05 11:04:43 -07:00
Stanislav Mekhanoshin
9ef166e657 [AMDGPU] Fix FoldImmediate for 16 bit operand
Differential Revision: https://reviews.llvm.org/D79362
2020-05-05 10:19:14 -07:00
Stanislav Mekhanoshin
c85eda74b8 [AMDGPU] fix copies between 32 and 16 bit
This a hack to fix illegal 32 to 16 bit copies.
The problem is when we make 16 bit subregs legal it creates
a huge amount of failures which can only be resolved at once
without a temporary hack like this.

The next step is to change operands, instruction definitions
and patterns until this hack is not needed.

Differential Revision: https://reviews.llvm.org/D79119
2020-05-04 08:54:22 -07:00
alex-t
5b898bddff [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.
Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence.

Reviewers: rampitec, arsenm, vpykhtin

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78091
2020-05-04 16:42:25 +03:00
Stanislav Mekhanoshin
26777ad7a0 [AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs
It allows it not to crash and analyze 16 bit subregs if those
appear in the instructions. At the same time it does not attempt
to reassign these. It still can correctly identify register
banks to let larger registers to be reassigned.

More work will be needed here when real instructions will use
these registers and more tests as well.

Differential Revision: https://reviews.llvm.org/D78772
2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin
8a30460697 [AMDGPU] Define AGPR subregs
These are only needed as VGPR counterpart.

Differential Revision: https://reviews.llvm.org/D78597
2020-04-28 15:30:43 -07:00
Stanislav Mekhanoshin
46a75436f8 [AMDGPU] Define special SGPR subregs
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.

Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.

Differential Revision: https://reviews.llvm.org/D78591
2020-04-28 14:57:46 -07:00
Stanislav Mekhanoshin
395d93358e Revert "[AMDGPU] Define special SGPR subregs"
This reverts commit 1baaa080e0.
2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin
1baaa080e0 [AMDGPU] Define special SGPR subregs
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.

Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.

Differential Revision: https://reviews.llvm.org/D78591
2020-04-28 13:34:24 -07:00