Commit Graph

3444 Commits

Author SHA1 Message Date
Jay Foad
3d76824b7f [AMDGPU] Better support for VMEM soft clauses in GCNHazardRecognizer
VMEM soft clauses only contain VMEM and FLAT instructions. Teaching
GCNHazardRecognizer::checkSoftClauseHazards that other kinds of
instructions will naturally break the clause means there are far fewer
cases where it has to insert an s_nop instruction to forcibly break the
clause.

Differential Revision: https://reviews.llvm.org/D79353
2020-05-05 15:49:09 +01:00
Sebastian Neubauer
1de4e56933 [AMDGPU] Don't mark the .note section as ALLOC
Marking a section as ALLOC tells the ELF loader to load the section into memory.
As we do not want to load the notes into VRAM, the flag should not be there.

On AMDHSA, .note is still marked as ALLOC, apparently this is currently
needed for OpenCL (see https://reviews.llvm.org/D74995).

Differential Revision: https://reviews.llvm.org/D76278
2020-05-05 14:21:45 +02:00
Stanislav Mekhanoshin
c85eda74b8 [AMDGPU] fix copies between 32 and 16 bit
This a hack to fix illegal 32 to 16 bit copies.
The problem is when we make 16 bit subregs legal it creates
a huge amount of failures which can only be resolved at once
without a temporary hack like this.

The next step is to change operands, instruction definitions
and patterns until this hack is not needed.

Differential Revision: https://reviews.llvm.org/D79119
2020-05-04 08:54:22 -07:00
alex-t
5b898bddff [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.
Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence.

Reviewers: rampitec, arsenm, vpykhtin

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78091
2020-05-04 16:42:25 +03:00
LemonBoy
6d103ca855 [SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.

The technique employed by `ExpandLoad`  is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.

Differential Revision: https://reviews.llvm.org/D79096
2020-05-02 15:18:10 -07:00
Jay Foad
5f7ea85e78 [AMDGPU] Remove unnecessary s_waitcnt between VMEM loads
VMEM loads of the same type (sampler vs no sampler) are guaranteed to
write their result registers in order, so there is no need for an
s_waitcnt even if they write to overlapping vgprs.

Differential Revision: https://reviews.llvm.org/D79176
2020-05-01 10:10:23 +01:00
Stanislav Mekhanoshin
26777ad7a0 [AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs
It allows it not to crash and analyze 16 bit subregs if those
appear in the instructions. At the same time it does not attempt
to reassign these. It still can correctly identify register
banks to let larger registers to be reassigned.

More work will be needed here when real instructions will use
these registers and more tests as well.

Differential Revision: https://reviews.llvm.org/D78772
2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin
8a30460697 [AMDGPU] Define AGPR subregs
These are only needed as VGPR counterpart.

Differential Revision: https://reviews.llvm.org/D78597
2020-04-28 15:30:43 -07:00
Stanislav Mekhanoshin
46a75436f8 [AMDGPU] Define special SGPR subregs
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.

Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.

Differential Revision: https://reviews.llvm.org/D78591
2020-04-28 14:57:46 -07:00
Stanislav Mekhanoshin
395d93358e Revert "[AMDGPU] Define special SGPR subregs"
This reverts commit 1baaa080e0.
2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin
1baaa080e0 [AMDGPU] Define special SGPR subregs
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.

Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.

Differential Revision: https://reviews.llvm.org/D78591
2020-04-28 13:34:24 -07:00
Matt Arsenault
4cef9812eb AMDGPU: Add some missing atomics tests
We had no FP atomic load/store coverage.
2020-04-26 15:09:35 -04:00
Matt Arsenault
35e6a9c839 AMDGPU: Break read2/write2 search range on a memory fence
This is to fix performance regressions introduced by
86c944d790.

The old search would collect all potentially mergeable instructions in
the entire block. In this case, the same address is written in
multiple places in the block on the other side of a fence. When sorted
by offset, the two unmergeable, identical addresses would be next to
each other and the merge would give up.

Break the search space when we encounter an instruction we won't be
able to merge across. This will keep the identical addresses in
different merge attempts.

This may also improve compile time by reducing the merge list size.
2020-04-24 15:53:30 -04:00
Piotr Sobczak
7631af3af2 [AMDGPU] Skip generating cache invalidating instructions on AMDPAL
Summary:
Frontend guarantees that coherent accesses have
corresponding cache policy bits set (glc, dlc).
Therefore there is no need for extra instructions
that invalidate cache.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78800
2020-04-24 13:53:44 +02:00
Christudasan Devadasan
207cd5f68f [AMDGPU] Add the SGPR used for FP copy to block livein lists.
The temporary register used for FP copy
should be live throughout the function.
2020-04-24 11:47:38 +05:30
Matt Arsenault
89c8c80bd5 AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul
If f32 denormals were enabled pre-gfx9, we would still try to
implement this with v_max_f32. Pre-gfx9, these instructions ignored
the denormal mode and did not flush. Switch to the multiply form for
f32 as a workaround which should always work in any case.

This fixes conformance failures when the library implementation of
fmin/fmax were accidentally not inlined, forcing the assumption of no
flushing on targets where denormals are not enabled by default. This
is a workaround, since really we should not be mixing code with
different FP mode expectations, but prefer the lowering that will work
in any mode.

Now this will always use max to implement canonicalize on gfx9+. This
is only really beneficial for f64. For f32/f16 it's a neutral choice
(and worse in terms of code size in 1 case), but possibly worse for
the compiler since it does add an extra register use operand. Leave
this change for later.
2020-04-23 15:24:13 -04:00
Matt Arsenault
5fe3f06596 AMDGPU/GlobalISel: Add new baseline checks for canonicalize 2020-04-23 15:04:32 -04:00
Jay Foad
0337017a9f [AMDGPU] Use SGPR instead of SReg classes
12994a70cf did this for 128-bit classes:

    SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
    the additional non-allocatable TTMP registers. There's no point in
    allocating SReg_128 vregs. This shrinks the size of the classes
    regalloc needs to consider, which is usually good.

This patch extends it to all classes > 64 bits, for consistency.

Differential Revision: https://reviews.llvm.org/D78622
2020-04-23 11:45:22 +01:00
Jay Foad
dbdffe3ee9 [AMDGPU] Add 192-bit register classes
Differential Revision: https://reviews.llvm.org/D78312
2020-04-22 13:10:37 +01:00
Piotr Sobczak
c48ceaf37b Revert "[AMDGPU] Set the CostPerUse value for vgpr registers."
This reverts commit 728b878de6.

D76417 has caused vgpr count to go up significantly in real-world
graphics content.
2020-04-20 22:47:31 +02:00
Stanislav Mekhanoshin
992fbce4e9 [AMDGPU] copyPhysReg() for 16 bit SGPR subregs
Differential Revision: https://reviews.llvm.org/D78255
2020-04-17 11:59:39 -07:00
Stanislav Mekhanoshin
fde2aefa22 [AMDGPU] Use SDWA for 16 bit subreg copy
This simplifies the logic and allows to use it on GFX8.

Differential Revision: https://reviews.llvm.org/D78150
2020-04-17 11:45:44 -07:00
Dominik Montada
55e3a7c6b2 [GlobalISel][AMDGPU] add legalization for G_FREEZE
Summary:
Copy the legalization rules from SelectionDAG:
-widenScalar using anyext
-narrowScalar using intermediate merges
-scalarize/fewerElements using unmerge
-moreElements using G_IMPLICIT_DEF and insert

Add G_FREEZE legalization actions to AMDGPULegalizerInfo.
Use the same legalization actions as G_IMPLICIT_DEF.

Depends on D77795.

Reviewers: dsanders, arsenm, aqjune, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson

Reviewed By: arsenm

Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, volkan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78092
2020-04-17 16:44:46 +02:00
Stanislav Mekhanoshin
2e94a64b57 [AMDGPU] Define 16 bit SGPR subregs
These are needed as a counterpart for VGPR subregs even though
there are no scalar instructions which can operate 16 bit values.
When we are materializing a constant that is done into an SGPR
and that SGPR may/will be copied into a 16 bit VGPR subreg. Such
copy is illegal. There are also similar problems if a source
operand of a 16 bit VALU instruction is an SGPR. In addition
we need to get a register with a lo16 subregister of an SGPR
RC during selection and this fails as well.

All of that makes me believe we need these subregisters as a
syntactic glue.

Differential Revision: https://reviews.llvm.org/D78250
2020-04-16 10:31:39 -07:00
Konstantin Schwarz
1a3e89aa2b [MIR] Add comments to INLINEASM immediate flag MachineOperands
Summary:
The INLINEASM MIR instructions use immediate operands to encode the values of some operands.
The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature.

Reviewers: SjoerdMeijer, arsenm, efriedma

Reviewed By: arsenm

Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78088
2020-04-16 13:46:14 +02:00
Dominik Montada
e5d666d768 Revert "Revert "[GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast""
This reverts commit 1265899c5f.
2020-04-16 09:30:34 +02:00
Stanislav Mekhanoshin
8a9d48b46d [AMDGPU] Fixed lane mask in test. NFC. 2020-04-15 15:26:53 -07:00
Amara Emerson
c22cb5bd31 [GlobalISel] Enable artifact combiner to combine starting from a G_MERGE_VALUES.
We generally only combine starting from users to defs in the artifact combiner,
but this doesn't catch cases where at the point of combining a G_UNMERGE we don't
yet have the opposite G_MERGE on input yet since we haven't legalized that far.

This change adds the users of a G_MERGE to the artifact combiner worklist if one
of the uses is a G_UNMERGE or G_TRUNC.

Differential Revision: https://reviews.llvm.org/D77931
2020-04-15 10:34:13 -07:00
Dominik Montada
1265899c5f Revert "[GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast"
This reverts commit bddac41b9f.
2020-04-15 18:47:39 +02:00
Dominik Montada
bddac41b9f [GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast
Summary:
The combine for unmerge(cast(merge)) is only valid for vectors, but was
missing a corresponding check. Add a check that the operands are vectors
to avoid an invalid combine.

Without this check, the combiner would emit incorrect code for scalars
and pointers because the artifact cast (trunc/ext) only affects bits at
the end of the type, while this combine assumes that the casted bits
appear between meaningful bits.

This also uncovered a segmentation fault in the AMDGPU
InstructionSelector. The tests triggering this bug have been moved to
their own file and a check for the segmentation fault has been added.

Reviewers: arsenm, dsanders, aemerson, paquette, aditya_nandakumar

Reviewed By: arsenm

Subscribers: tpr, jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78191
2020-04-15 17:19:14 +02:00
Matt Arsenault
ef2cb8db34 AMDGPU/GlobalISel: Add some artifact combiner tests 2020-04-15 09:03:07 -04:00
Matt Arsenault
cc149172da AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS
This wasn't covered by existing tablegen patterns, but also suffers
the same issues as G_FNEG. Workaround them by manually selecting, like
G_FNEG.
2020-04-14 22:05:22 -04:00
Matt Arsenault
cb5dc3765b TableGen/GlobalISel: Fix constraining REG_SEQUENCE operands
This was hitting the default instruction constraint code which uses
the register classes in the instruction def, which REG_SEQUENCE does
not have.

Fixes not constraining the register class for AMDGPU fneg/fabs
patterns, which would fail when the use was another generic,
unconstrained instruction.

Another oddity I noticed is that the temporary registers are created
with an unnecessary, but incorrect 16-bit LLT but this shouldn't
matter.

I'm also still unclear why root and sub-instructions have to be
handled differently.
2020-04-14 22:05:22 -04:00
Eli Friedman
2876b3eef3 [SelectionDAG] Always preserve offset in MachinePointerInfo
Previously, getWithOffset() would drop the offset if the base was null.
Because of this, MachineMemOperand would return the wrong result from
getAlign() in these cases.  MachineMemOperand stores the alignment of
the pointer without the offset.

A bunch of MIR tests changed because we print the offset now.

Split off from D77687.

Differential Revision: https://reviews.llvm.org/D78049
2020-04-14 15:29:41 -07:00
Matt Arsenault
f48fe2c36e GlobalISel: Fix casted unmerge of G_CONCAT_VECTORS
This was assuming a scalarizing unmerge, and would fail assert if the
unmerge was to smaller vector types.
2020-04-13 22:03:05 -04:00
Matt Arsenault
0ba40d4ccf AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3]
Ports the existing DAG combines, minus the simplify demanded bits
which seems to have no equivalent now. Without these, this isn't
particularly helpful in most of the IR sample cases.
2020-04-13 19:18:19 -04:00
Austin Kerbow
a69b3e010c [AMDGPU][GlobalISel] Fix div_scale in FDIV lowering
Differential Revision: https://reviews.llvm.org/D78004
2020-04-13 15:54:49 -07:00
Eli Friedman
89e0662dee Make IRBuilder automatically set alignment on load/store/alloca.
This is equivalent in terms of LLVM IR semantics, but we want to
transition away from using MaybeAlign to represent the alignment of
these instructions.

Differential Revision: https://reviews.llvm.org/D77984
2020-04-13 13:43:14 -07:00
Matt Arsenault
e6605a209c DAG: Fix wrong legality check for ISD::FMAD
Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.

This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.

The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
2020-04-13 10:25:39 -07:00
Jon Roelofs
0b0bb1969f [llvm] Fix yet more missing FileCheck colons 2020-04-13 10:49:19 -06:00
Jonathan Roelofs
17bc995388 [llvm] Fix more missing FileCheck directive colons 2020-04-13 10:16:29 -06:00
Austin Kerbow
eab9a4f119 [AMDGPU] Don't assert on partial exec copy
After Machine CSE and coalescing we can end up with copies of exec to
subregister SGPRs.

Differential Revision: https://reviews.llvm.org/D77992
2020-04-12 21:14:36 -07:00
Matt Arsenault
96819011ca AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts
These need to be promoted and scalarized for the SALU.
2020-04-11 20:55:33 -04:00
Matt Arsenault
ac8d51a3c6 AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16
The current selector depends on 16-bit shifts using 16-bit shift
amount types, but really it should accept either for all types.
2020-04-11 18:12:26 -04:00
Matt Arsenault
c5497e5399 AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects 2020-04-11 15:59:51 -04:00
Matt Arsenault
cf29333f40 AMDGPU/GlobalISel: Work around forming illegal zextload after legalize
Selection would fail after the post legalize combiner put an illegal
zextload back together.

The base combiner has parameter to only allow legal operations, but
they appear to not be used. I also don't see a nice way to remove a
single entry from all_combines, so just hack around this.
2020-04-11 10:52:58 -04:00
Matt Arsenault
49ae0fc2f0 GlobalISel: Fix incorrect lowering G_FCOPYSIGN
In the basic case, this was reading the sign from the wrong operand.
2020-04-10 21:00:25 -04:00
Stanislav Mekhanoshin
44920e8566 [AMDGPU] Disable sub-dword scralar loads IR widening
These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.

Differential Revision: https://reviews.llvm.org/D77835
2020-04-10 08:20:49 -07:00
Simon Pilgrim
54502476e7 [AMDGPU] Refresh fmin_legacy.ll checks to fix issue reported on D77354
Some of the condition codes had inverted since this was generated
2020-04-08 17:05:29 +01:00
Simon Pilgrim
7e62684251 [AMDGPU] Regenerate vector-extract-insert test checks to fix issue reported on D77354 2020-04-08 13:18:32 +01:00