Commit Graph

958 Commits

Author SHA1 Message Date
Matt Arsenault
dd8fd9dcfd AMDGPU: Actually write nops for writeNopData
Before this was just writing 0s, which ends up looking like a
v_cndmask_b32 v0, s0, v0, vcc. Write out an encoded s_nop instead.

llvm-svn: 299816
2017-04-08 21:28:38 +00:00
Stanislav Mekhanoshin
478b81982f [AMDGPU] Unroll more to eliminate phis and conditions
Increase threshold to unroll a loop which contains an "if" statement
whose condition defined by a PHI belonging to the loop. This may help
to eliminate if region and potentially even PHI itself, saving on
both divergence and registers used for the PHI.

Add a small bonus for each of such "if" statements.

Differential Revision: https://reviews.llvm.org/D31693

llvm-svn: 299779
2017-04-07 16:26:28 +00:00
Sam Kolton
6e79529db4 [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes
Summary:
Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled.
With this change order of passes will not change.

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31705

llvm-svn: 299757
2017-04-07 10:53:12 +00:00
Konstantin Zhuravlyov
4b3847e865 AMDGPU/GFX9: Fix shared and private aperture queries
Differential Revision: https://reviews.llvm.org/D31786

llvm-svn: 299727
2017-04-06 23:02:33 +00:00
Eli Friedman
5fba1e53f2 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596

llvm-svn: 299723
2017-04-06 22:42:18 +00:00
Matt Arsenault
21a438255d AMDGPU: Diagnose illegal SGPR to VGPR copies
This is possible in ways that are not compiler bugs,
so stop asserting on them.

This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.

llvm-svn: 299712
2017-04-06 21:09:53 +00:00
Matt Arsenault
5cf4271883 AMDGPU: Replace fp16SrcZerosHighBits with a whitelist
FCOPYSIGN is lowered to bit operations which don't clear the high
bits.

llvm-svn: 299708
2017-04-06 20:58:30 +00:00
Stanislav Mekhanoshin
ea57c38521 [AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.

Differential Revision: https://reviews.llvm.org/D31731

llvm-svn: 299659
2017-04-06 16:48:30 +00:00
Sam Kolton
9fa169601f [AMDGPU] Resubmit SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299654
2017-04-06 15:03:28 +00:00
Ivan Krasin
d4f70c70b9 Revert r299536. [AMDGPU] SDWA peephole: enable by default.
Reason: breaks multiple bots:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173

Original Review URL: https://reviews.llvm.org/D31671

llvm-svn: 299583
2017-04-05 19:58:12 +00:00
Sam Kolton
34e29784fb [AMDGPU] SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299536
2017-04-05 12:00:45 +00:00
Matt Arsenault
3333968771 Verifier: Check some amdgpu calling convention restrictions
llvm-svn: 299457
2017-04-04 18:43:11 +00:00
Matt Arsenault
3e90f84806 AMDGPU: Remove legacy export intrinsic
llvm-svn: 299444
2017-04-04 16:34:39 +00:00
Matt Arsenault
236da200f1 AMDGPU: Remove legacy image intrinsics
llvm-svn: 299443
2017-04-04 16:34:35 +00:00
Matt Arsenault
b600e138cc AMDGPU: Remove llvm.SI.vs.load.input
llvm-svn: 299391
2017-04-03 21:45:13 +00:00
Matt Arsenault
c82768290d DAG: Fix missing legalization for any_extend_vector_inreg operands
llvm-svn: 299389
2017-04-03 21:28:13 +00:00
Matt Arsenault
754dd3eaef AMDGPU: Remove legacy bfe intrinsics
llvm-svn: 299372
2017-04-03 18:08:08 +00:00
Matt Arsenault
8edfaee7be AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

llvm-svn: 299246
2017-03-31 19:53:03 +00:00
Jan Vesely
3c99441ef4 AMDGPU/R600: Fix amdgpu alias analysis pass.
R600 uses higher AS number to access kernel parameters

Fixes: r298846
Differential Revision: https://reviews.llvm.org/D31520

llvm-svn: 299245
2017-03-31 19:26:23 +00:00
Sam Kolton
27e0f8bc72 [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.

llvm-svn: 299202
2017-03-31 11:42:43 +00:00
Matt Arsenault
79f837c254 AMDGPU: Add all atomicrmw fields to atomic.inc/dec
Add scope, order, isVolatile

llvm-svn: 299122
2017-03-30 22:21:40 +00:00
Stanislav Mekhanoshin
89653dfd2a [AMDGPU] Add GlobalOpt parameter to Always Inliner pass
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

llvm-svn: 299108
2017-03-30 20:16:02 +00:00
Stanislav Mekhanoshin
baf31ac7c8 [AMDGPU] Boost unroll threshold for loops reading local memory
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.

Differential Revision: https://reviews.llvm.org/D31412

llvm-svn: 298948
2017-03-28 22:13:51 +00:00
Stanislav Mekhanoshin
9053f22eeb [AMDGPU] Split -amdgpu-early-inline-all option
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.

Differential Revision: https://reviews.llvm.org/D31429

llvm-svn: 298935
2017-03-28 18:23:24 +00:00
Yaxun Liu
14834c3e3d [AMDGPU] Switch data layout by triple environment amdgiz
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.

amdgiz is for non-OpenCL environment where generic address space is 0.

amdgizcl is for OpenCL environment where generic address space is 0.

Differential Revision: https://reviews.llvm.org/D31211

llvm-svn: 298758
2017-03-25 02:05:44 +00:00
Matt Arsenault
0607a4427b AMDGPU: Fix annotating loops with nested loop conditions
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.

llvm-svn: 298737
2017-03-24 20:57:10 +00:00
Matt Arsenault
b5d23271e2 AMDGPU: Implement f16 fround
llvm-svn: 298730
2017-03-24 20:04:18 +00:00
Matt Arsenault
b8f8dbc227 AMDGPU: Unify divergent function exits.
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

llvm-svn: 298729
2017-03-24 19:52:05 +00:00
Stanislav Mekhanoshin
70603dcef2 [AMDGPU] Fold V_CNDMASK with identical source operands
Such instructions sometimes appear after lowering and folding.

Differential Revision: https://reviews.llvm.org/D31318

llvm-svn: 298723
2017-03-24 18:55:20 +00:00
Konstantin Zhuravlyov
4986d9fb45 [AMDGPU] Rename Kind to ValueKind in metadata to be consistent
llvm-svn: 298722
2017-03-24 18:43:15 +00:00
Stanislav Mekhanoshin
a27b2cac03 [AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline
Previously it was added only to the BE.

Differential Revision: https://reviews.llvm.org/D31323

llvm-svn: 298721
2017-03-24 18:01:14 +00:00
Konstantin Zhuravlyov
4cbb68959b [AMDGPU] Do not emit isa info as code object metadata
- It was decided to expose this information through other means (rocr)

Differential Revision: https://reviews.llvm.org/D30970

llvm-svn: 298560
2017-03-22 23:27:09 +00:00
Konstantin Zhuravlyov
a780ffaac2 [AMDGPU] Emit kernel debug properties as code object metadata
Differential Revision: https://reviews.llvm.org/D30969

llvm-svn: 298558
2017-03-22 23:10:46 +00:00
Konstantin Zhuravlyov
ca0e7f6472 [AMDGPU] Emit kernel code properties as code object metadata
- These are not required for low level runtime

Differential Revision: https://reviews.llvm.org/D29949

llvm-svn: 298556
2017-03-22 22:54:39 +00:00
Konstantin Zhuravlyov
7498cd61fb [AMDGPU] Restructure code object metadata creation
- Rename runtime metadata -> code object metadata
  - Make metadata not flow
  - Switch enums to use ScalarEnumerationTraits
  - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
  - Introduce in-memory representation for attributes
  - Code object metadata streamer
  - Create metadata for isa and printf during EmitStartOfAsmFile
  - Create metadata for kernel during EmitFunctionBodyStart
  - Finalize and emit metadata to .note during EmitEndOfAsmFile
  - Other minor improvements/bug fixes

Differential Revision: https://reviews.llvm.org/D29948

llvm-svn: 298552
2017-03-22 22:32:22 +00:00
Matt Arsenault
5b20fbb748 AMDGPU: Rename SI_RETURN
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

llvm-svn: 298452
2017-03-21 22:18:10 +00:00
Matthias Braun
8445cbd1ca SplitKit: Fix subreg copy related problems
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
  when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
  block.

This fixes http://llvm.org/PR32353

llvm-svn: 298448
2017-03-21 21:58:08 +00:00
Matt Arsenault
3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
George Burgess IV
56c7e88c2c Let llvm.objectsize be conservative with null pointers
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.

This fixes PR23277.

Differential Revision: https://reviews.llvm.org/D28494

llvm-svn: 298430
2017-03-21 20:08:59 +00:00
Marek Olsak
5c7a61d221 AMDGPU: Buffer descriptor changes for GFX9
Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31158

llvm-svn: 298397
2017-03-21 17:00:39 +00:00
Marek Olsak
e22fdb9cac AMDGPU: Always use VGPR indexing on GFX9
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31157

llvm-svn: 298396
2017-03-21 17:00:32 +00:00
Matt Arsenault
f8fb605a68 AMDGPU: Fix asserting on 0 dmask for image intrinsics
Fold these to undef during lowering so users get eliminated.

llvm-svn: 298387
2017-03-21 16:32:17 +00:00
Matt Arsenault
964a848514 AMDGPU: Convert image intrinsic uses in tests
llvm-svn: 298386
2017-03-21 16:24:12 +00:00
Matt Arsenault
dce313c3cf DAG: Fold bitcast/extract_vector_elt of undef to undef
Fixes not eliminating store when intrinsic is lowered to undef.

llvm-svn: 298385
2017-03-21 16:20:16 +00:00
Valery Pykhtin
fd4c410f4d [AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler
Differential revision: https://reviews.llvm.org/D31046

llvm-svn: 298368
2017-03-21 13:15:46 +00:00
Sam Kolton
f60ad58dad [ADMGPU] SDWA peephole optimization pass.
Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
    V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
    V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
    V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
   V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
    1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
    2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
    3. Iterate over all potential instructions and check if they can be converted into SDWA.
    4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
    1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
    2. Introduce more SDWA patterns
    3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365
2017-03-21 12:51:34 +00:00
Konstantin Zhuravlyov
2534bc07f4 [AMDGPU] Run always inliner early in opt
Differential Revision: https://reviews.llvm.org/D31141

llvm-svn: 298281
2017-03-20 18:06:45 +00:00
Konstantin Zhuravlyov
8a67eb144f Revert "[AMDGPU] Run always inliner early in opt"
This reverts commit r297958, it breaks device-libs build.

llvm-svn: 298239
2017-03-20 09:26:08 +00:00
Ahmed Bougacha
931904d777 [GlobalISel] Don't select trivially dead instructions.
Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).

Preserve existing tests by adding COPYs.

In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.

llvm-svn: 298224
2017-03-19 16:13:00 +00:00
Stanislav Mekhanoshin
8e45acfc38 [AMDGPU] Add address space based alias analysis pass
This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172
2017-03-17 23:56:58 +00:00