Commit Graph

2192 Commits

Author SHA1 Message Date
David Green
0861c87b06 Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.

I will try to follow this up with some better tests.

llvm-svn: 358113
2019-04-10 18:00:41 +00:00
Matt Arsenault
7187272b2b GlobalISel: Support legalizing G_CONSTANT with irregular breakdown
llvm-svn: 358109
2019-04-10 17:27:53 +00:00
Matt Arsenault
9e0eeba569 GlobalISel: Handle odd breakdowns for bit ops
llvm-svn: 358105
2019-04-10 17:07:56 +00:00
Stanislav Mekhanoshin
913ba8eeb4 Revert LIS handling in MachineDCE
One of out of tree targets has regressed with this patch. Reverting
it for now and let liveness to be fully reconstructed in case pass
was used after the LIS is created to resolve the regression.

Differential Revision: https://reviews.llvm.org/D60466

llvm-svn: 358015
2019-04-09 16:13:53 +00:00
Tom Stellard
206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Nikita Popov
3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Stanislav Mekhanoshin
c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Matt Arsenault
4ed6ccab9b AMDGPU/GlobalISel: Fix non-power-of-2 select
llvm-svn: 357762
2019-04-05 14:03:04 +00:00
Piotr Sobczak
0376ac1d94 [SelectionDAG] Compute known bits of CopyFromReg
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Reviewers: bogner, craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59535

llvm-svn: 357745
2019-04-05 07:44:09 +00:00
Matt Arsenault
396653f8a1 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Matt Arsenault
f426ddbfc7 AMDGPU: Assume ECC is enabled by default if supported
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.

llvm-svn: 357558
2019-04-03 01:58:57 +00:00
Matt Arsenault
2065680b47 AMDGPU: Don't use the default cpu in a few tests
Avoids unnecessary test changes in a future commit.

llvm-svn: 357539
2019-04-03 00:00:58 +00:00
Michael Liao
9bef688bc2 [AMDGPU] Add more test cases of D59608.
Summary: - Add more test cases.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60071

llvm-svn: 357442
2019-04-02 00:36:37 +00:00
Neil Henning
0a30f33ce2 [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.

Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).

This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).

Differential Revision: https://reviews.llvm.org/D59295

llvm-svn: 357400
2019-04-01 15:19:52 +00:00
Matt Arsenault
055e4dce45 AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

llvm-svn: 357302
2019-03-29 19:14:54 +00:00
Nirav Dave
fe59e14031 [DAGCombine] Prune unnused nodes.
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

llvm-svn: 357283
2019-03-29 17:35:56 +00:00
Konstantin Zhuravlyov
2b766ed774 AMDGPU: Make sram-ecc off by default for Vega20
Differential Revision: https://reviews.llvm.org/D59718

llvm-svn: 357247
2019-03-29 12:04:18 +00:00
Matt Arsenault
5fddf09187 AMDGPU/GlobalISel: Insert waterfall loop for vector indexing
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.

llvm-svn: 357235
2019-03-29 03:54:56 +00:00
Matt Arsenault
a353fd572a AMDGPU: Make exec mask optimzations more resistant to block splits
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.

llvm-svn: 357170
2019-03-28 14:01:39 +00:00
Piotr Sobczak
f896785cb7 [SelectionDAG] Add 2 tests for selection across basic blocks
Summary:
Add tests for selection across basic block boundary:
 * one test containing a buffer load, where part of the offset
   computation is placed in the predecessor of the load
 * similar test, but containing two buffer loads and shared
   computations

Please note that the behaviour being tested will be updated in
a subsequent commit.

This commit was extracted from https://reviews.llvm.org/D59535.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: jvesely, nhaehnle, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59690

llvm-svn: 357149
2019-03-28 07:06:26 +00:00
Justin Bogner
b1650f0da9 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)

This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.

Patch by Guillaume Marques. Thanks!

Differential Revision: https://reviews.llvm.org/D56201

llvm-svn: 357120
2019-03-27 20:35:56 +00:00
Matt Arsenault
2e9ddcc30e RegPressure: Fix crash on blocks with only dbg_value
If there were only dbg_values in the block, recede would hit the
beginning of the block and try to use thet dbg_value as a real
instruction.

llvm-svn: 357105
2019-03-27 18:14:02 +00:00
Amara Emerson
381188f1f3 [GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead instructions.
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).

This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.

As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.

Differential Revision: https://reviews.llvm.org/D59892

llvm-svn: 357101
2019-03-27 17:47:42 +00:00
Matt Arsenault
7b14b2425d Reapply "AMDGPU: Scavenge register instead of findUnusedReg"
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.

This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.

llvm-svn: 357098
2019-03-27 17:31:29 +00:00
Matt Arsenault
86e4fc0504 AMDGPU: Add testcase I meant to merge into r357093
llvm-svn: 357097
2019-03-27 17:31:26 +00:00
Quentin Colombet
89daf49e5c [PeepholeOpt] Don't stop simplifying copies on sequence of subregs
This patch removes an overly conservative check that would prevent
simplifying copies when the value we were tracking would go through
several subregister indices.
Indeed, the intend of this check was to not track values whenever
we have to compose subregister, but actually what the check was
doing was bailing anytime we see a second subreg, even if that
second subreg would actually be the new source of truth (as opposed
to a part of that subreg).

Differential Revision: https://reviews.llvm.org/D59891

llvm-svn: 357095
2019-03-27 17:27:56 +00:00
Matt Arsenault
17e39100a2 AMDGPU: Enable the scavenger for large frames
Another test is needed for the case where the scavenge fail, but
there's another issue with that which needs an additional fix.

llvm-svn: 357093
2019-03-27 17:14:32 +00:00
Matt Arsenault
4d47ac3b30 AMDGPU: Add additional MIR tests for exec mask optimizations
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.

Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.

llvm-svn: 357091
2019-03-27 16:58:30 +00:00
Matt Arsenault
4ab28b64b4 AMDGPU: Skip debug_instr when collapsing end_cf
Based on how these are inserted, I doubt this was causing a problem in
practice.

llvm-svn: 357090
2019-03-27 16:58:27 +00:00
Matt Arsenault
a42b7247d3 AMDGPU: Fix missing scc implicit def on s_andn2_b64_term
Introduce new helper class to copy properties directly from the base
instruction.

llvm-svn: 357089
2019-03-27 16:58:22 +00:00
Matt Arsenault
733b8571b4 MIR: Freeze reserved regs after parsing everything
The AMDGPU implementation of getReservedRegs depends on
MachineFunctionInfo fields that are parsed from the YAML section. This
was reserving the wrong register since it was setting the reserved
regs before parsing the correct one.

Some tests were relying on the default reserved set for the assumed
default calling convention.

llvm-svn: 357083
2019-03-27 16:12:26 +00:00
Matt Arsenault
e9ad7e9a71 AMDGPU: wave_barrier is not isBarrier
This is not a control flow instruction, so should not be marked as
isBarrier. This fixes a verifier error if followed by unreachable.

llvm-svn: 357081
2019-03-27 15:54:45 +00:00
Matt Arsenault
bbc59d8d0d AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics
The offset operand index is different for atomics.

llvm-svn: 357073
2019-03-27 15:41:00 +00:00
Matt Arsenault
b008b37b61 AMDGPU: Make collapse-endcf test more useful
Without a VALU instruction in the return block, these were mostly
testing the path to delete exec mask code before s_endpgm rather than
the end cf handling.

llvm-svn: 356955
2019-03-25 21:28:51 +00:00
Konstantin Zhuravlyov
51809cbc98 AMDGPU: Add support for cross address space synchronization scopes
Differential Revision: https://reviews.llvm.org/D59517

llvm-svn: 356946
2019-03-25 20:50:21 +00:00
Matt Arsenault
b27e4974d0 MISched: Don't schedule regions with 0 instructions
I think this is correct, but may not necessarily be the correct fix
for the assertion I'm really trying to solve. If a scheduling region
was found that only has dbg_value instructions, the RegPressure
tracker would end up in an inconsistent state because it would skip
over any debug instructions and point to an instruction outside of the
scheduling region. It may still be possible for this to happen if
there are some real schedulable instructions between dbg_values, but I
haven't managed to break this.

The testcase is extremely sensitive and I'm not sure how to make it
more resistent to future scheduler changes that would avoid stressing
this situation.

llvm-svn: 356926
2019-03-25 17:15:44 +00:00
Tim Renouf
6f0191a55a [AMDGPU] Use three- and five-dword result type in image ops
Some image ops return three or five dwords.  Previously, we modeled that
with a 4 or 8 dword register class.  The register allocator could
cleverly spot that some subregs were dead and allocate something else
there, but that caused the de-optimization that waitcnt insertion would
think that the result was used immediately.

This commit allows such an image op to have a result with a three or
five dword result, avoiding the above de-optimization.

Differential Revision: https://reviews.llvm.org/D58905

Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b
llvm-svn: 356757
2019-03-22 15:21:11 +00:00
Tim Renouf
677387d8dc [AMDGPU] Implemented dwordx3 variants of buffer/tbuffer load/store intrinsics
Now we have vec3 MVTs, this commit implements dwordx3 variants of the
buffer intrinsics.

On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4
instruction, and a dwordx3 buffer store intrinsic is not supported.
We need to support the dwordx3 load intrinsic because it is generated by
subtarget-unaware code in InstCombine.

Differential Revision: https://reviews.llvm.org/D58904

Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e
llvm-svn: 356755
2019-03-22 14:58:02 +00:00
Tim Renouf
033f99a2e5 [AMDGPU] Added v5i32 and v5f32 register classes
They are not used by anything yet, but a subsequent commit will start
using them for image ops that return 5 dwords.

Differential Revision: https://reviews.llvm.org/D58903

Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271
llvm-svn: 356735
2019-03-22 10:11:21 +00:00
Matt Arsenault
b34afa311d GlobalISel: Fix RegBankSelect for REG_SEQUENCE
The AArch64 test was broken since the result register already had a
set register class, so this test was a no-op. The mapping verify call
would fail because the result size is not the same as the inputs like
in a copy or phi.

The AMDGPU testcases are half broken and introduce illegal VGPR->SGPR
copies which need much more work to handle correctly (same for phis),
but add them as a baseline.

llvm-svn: 356713
2019-03-21 20:45:36 +00:00
Tim Renouf
361b5b2193 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
2019-03-21 12:01:21 +00:00
Stanislav Mekhanoshin
0a11829ab2 Allow machine dce to remove uses in the same instruction
Machine DCE cannot remove a dead definition if there are non-dbg uses.
A use however can be in the same instruction:

  dead %0 = INST %0

Such instructions sometimes created by Detect dead lanes pass.

Allow this instruction to be deleted despite the use if the only use
belongs to the same instruction.

Differential Revision: https://reviews.llvm.org/D59565

llvm-svn: 356619
2019-03-20 21:42:05 +00:00
Matt Arsenault
2065206a9d AMDGPU: Don't look for constant in insert/extract_vector_elt regbankselect
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.

llvm-svn: 356611
2019-03-20 20:41:34 +00:00
Michael Liao
eea5177d30 [AMDGPU] Fix clamp bit DAG operand
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
  bit, which is expected as an immediate operand. Under certain
  conditions, such as a common `i1 false` constant is used in other
  place and selected before the instruction with clamp bit, register
  operand may be added instead of immediate one. Use `targetcosntant` to
  enforce that.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59608

llvm-svn: 356608
2019-03-20 20:18:56 +00:00
Tim Renouf
e7bd52f86e [AMDGPU] Added MsgPack format PAL metadata
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.

The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.

Differential Revision: https://reviews.llvm.org/D57028

Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
llvm-svn: 356591
2019-03-20 18:47:21 +00:00
Tim Renouf
d737b551e9 [AMDGPU] Factored PAL metadata handling out into its own class
Summary:
This commit introduces a new AMDGPUPALMetadata class that:
* is inside the AMDGPU target;
* keeps an in-memory representation of PAL metadata;
* provides a method to read the frontend-supplied metadata from LLVM IR;
* provides methods for the asm printer to set metadata items;
* provides methods to write the metadata as a binary blob to put in a
  .note record or as an asm directive;
* provides a method to read the metadata as a binary blob from a .note
  record.

Because llvm-readobj cannot call directly into a target, I had to remove
llvm-readobj's ability to dump PAL metadata, pending a resolution to
https://reviews.llvm.org/D52821

Differential Revision: https://reviews.llvm.org/D57027

Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64
llvm-svn: 356582
2019-03-20 17:42:00 +00:00
David Stuttard
fc2a747345 [AMDGPU] Allow MIMG with no uses in adjustWritemask in isel
Summary:
If an MIMG instruction has managed to get through to adjustWritemask in isel but
has no uses (and doesn't enable TFC) then prevent an assertion by not attempting
to adjust the writemask.

The instruction will be removed anyway.

Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58964

llvm-svn: 356540
2019-03-20 09:29:55 +00:00
Ryan Taylor
00e063ab92 [AMDGPU] Add buffer/load 8/16 bit overloaded intrinsics
Summary:
Add buffer store/load 8/16 overloaded intrinsics for buffer, raw_buffer and struct_buffer

Change-Id: I166a29f071b2ff4e4683fb0392564b1f223ac61d

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59265

llvm-svn: 356465
2019-03-19 16:07:00 +00:00
Neil Henning
e85f6bd64f [AMDGPU] Ban i8 min3 promotion.
I found this really weird WWM-related case whereby through the WWM
transformations our isel lowering was trying to promote 2 min's into a
min3 for the i8 type, which our hardware doesn't support.

The new min3_i8.ll test case would previously spew the error:

PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68

Before the simple fix to our isel lowering to not do it for i8 MVT's.

Differential Revision: https://reviews.llvm.org/D59543

llvm-svn: 356464
2019-03-19 15:50:24 +00:00
Michael Liao
efb4f9e568 [AMDGPU] Enable code selection using s_mul_hi_u32/s_mul_hi_i32.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59501

llvm-svn: 356405
2019-03-18 20:40:09 +00:00