Commit Graph

442 Commits

Author SHA1 Message Date
Nicolai Haehnle
e40530ea7b AMDGPU: Fix return of non-void-returning shaders
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21975

llvm-svn: 274612
2016-07-06 08:35:17 +00:00
Matt Arsenault
2d79389508 DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.

llvm-svn: 274569
2016-07-05 18:25:02 +00:00
Matt Arsenault
ffc8275f2b AMDGPU: Fix folding SGPRs into madak/madmk src0
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.

r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.

llvm-svn: 274564
2016-07-05 17:09:01 +00:00
Matt Arsenault
accddacb70 TII: Fix inlineasm size counting comments as insts
The main problem was counting comments on their own
line as instructions.

llvm-svn: 274405
2016-07-01 23:26:50 +00:00
Matt Arsenault
7f681ac7a9 AMDGPU: Add feature for unaligned access
llvm-svn: 274398
2016-07-01 23:03:44 +00:00
Matt Arsenault
8af47a09e5 AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

llvm-svn: 274397
2016-07-01 22:55:55 +00:00
Matt Arsenault
327bb5ad82 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Nikolay Haustov
beb24f5b20 Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.

Check calling convention in AMDGPUMachineFunction::isKernel

This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

Also, in the future unused non-kernels may be optimized.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 274341
2016-07-01 10:00:58 +00:00
Matt Arsenault
b4d9503171 AMDGPU: Fix out of bounds indirect indexing errors
This was producing acceses to registers beyond the super
register's limits, resulting in verifier failures.

llvm-svn: 273977
2016-06-28 01:09:00 +00:00
Matt Arsenault
59c0ffa22a AMDGPU: Implement per-function subtargets
llvm-svn: 273940
2016-06-27 20:48:03 +00:00
Matt Arsenault
03d8584590 AMDGPU: Move subtarget feature checks into passes
llvm-svn: 273937
2016-06-27 20:32:13 +00:00
Matt Arsenault
21a4625a16 AMDGPU: Fix verifier errors with undef vector indices
Also fix pointlessly adding exec to liveins.

llvm-svn: 273916
2016-06-27 19:57:44 +00:00
Matt Arsenault
f0f721a682 DAGCombiner: Don't narrow volatile vector loads + extract
llvm-svn: 273909
2016-06-27 19:31:04 +00:00
Jan Vesely
3bc1af2be4 AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705

Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166

Differential Revision: http://reviews.llvm.org/D21633

llvm-svn: 273785
2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov
f2f3d14774 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Tom Stellard
b164a9843b AMDGPU/SI: Make sure not to fold offsets into local address space globals
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.

Reviewers: arsenm, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21647

llvm-svn: 273765
2016-06-25 01:59:16 +00:00
Matthias Braun
6ad3d05b68 MachineScheduler: Fully compare top/bottom candidates
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.

Differential Revision: http://reviews.llvm.org/D19401

llvm-svn: 273755
2016-06-25 00:23:00 +00:00
Matthias Braun
1e374a7aa6 AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

llvm-svn: 273751
2016-06-24 23:52:11 +00:00
Matt Arsenault
86de486d31 AMDGPU: Add stub custom CodeGenPrepare pass
This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

llvm-svn: 273657
2016-06-24 07:07:55 +00:00
Matt Arsenault
0534f4aa79 AMDGPU: Un-xfail and add tests
Un XFAIL a few tests plus a few more I had lying around
in my tree, which seem to all work now but I don't see tests
that quite test the same things.

llvm-svn: 273655
2016-06-24 06:58:01 +00:00
Matt Arsenault
c581611e11 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

llvm-svn: 273653
2016-06-24 06:30:22 +00:00
Diana Picus
e440f99913 [AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.

We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.

Fixes PR27761.

Differential Revision: http://reviews.llvm.org/D20852

llvm-svn: 273550
2016-06-23 09:19:16 +00:00
Matt Arsenault
3cb4ddeb4e AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
2016-06-22 23:40:57 +00:00
Changpeng Fang
47efe1f6db AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

llvm-svn: 273496
2016-06-22 21:33:49 +00:00
Matt Arsenault
9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Wei Ding
0526e7f8d9 AMDGPU: Add convergent flag to INLINEASM instruction.
Differential Revision: http://reviews.llvm.org/D21214

llvm-svn: 273455
2016-06-22 18:51:08 +00:00
Jan Vesely
fea814d531 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

llvm-svn: 273317
2016-06-21 20:46:20 +00:00
Matt Arsenault
2209625387 AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.

llvm-svn: 273182
2016-06-20 18:34:00 +00:00
Matt Arsenault
b6d8c37e1a AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.

Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.

llvm-svn: 273181
2016-06-20 18:33:56 +00:00
Matt Arsenault
ff98241f37 Generalize DiagnosticInfoStackSize to support other limits
Backends may want to report errors on resources other than
stack size.

llvm-svn: 273177
2016-06-20 18:13:04 +00:00
Matt Arsenault
a9720c67f1 AMDGPU: Use correct method for determining instruction size
llvm-svn: 273172
2016-06-20 17:51:32 +00:00
Tom Stellard
5350894265 AMDGPU: Add support for R_AMDGPU_REL32 relocations
Reviewers: arsenm, kzhuravl, rafael

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21401

llvm-svn: 273168
2016-06-20 17:33:43 +00:00
Tom Stellard
1c89eb7db0 AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations
Reviewers: arsenm, rafael, kzhuravl

Subscribers: rafael, arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21400

llvm-svn: 273166
2016-06-20 16:59:44 +00:00
Matt Arsenault
e935f05a94 AMDGPU: Fix kernel argument alignment impacting stack size
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.

llvm-svn: 273080
2016-06-18 05:15:53 +00:00
Matt Arsenault
0bb294b224 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

llvm-svn: 273062
2016-06-17 22:27:03 +00:00
Matt Arsenault
8885910f8e AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

llvm-svn: 273050
2016-06-17 21:18:41 +00:00
Matt Arsenault
191763026c AMDGPU: Disable scheduling in some slow tests
Disabling the pre-RA scheduler on large-work-group-registers
causes it to be ~50% slower.

llvm-svn: 272860
2016-06-16 00:56:47 +00:00
Nicolai Haehnle
a609259832 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

llvm-svn: 272761
2016-06-15 07:13:05 +00:00
Matt Arsenault
f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Marek Olsak
e93f6d6923 AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

llvm-svn: 272556
2016-06-13 16:05:57 +00:00
Tom Stellard
f3af841462 AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

llvm-svn: 272417
2016-06-10 19:26:38 +00:00
Matt Arsenault
58ddad5bd6 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

llvm-svn: 272351
2016-06-10 00:18:41 +00:00
Tom Stellard
26a2ab7477 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

llvm-svn: 272349
2016-06-10 00:01:04 +00:00
Matt Arsenault
7757c59e48 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

llvm-svn: 272345
2016-06-09 23:42:54 +00:00
Matt Arsenault
887018179a AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

llvm-svn: 272344
2016-06-09 23:42:48 +00:00
Matt Arsenault
25363d37fc AMDGPU: Fix missing and broken check lines in atomic tests
llvm-svn: 272343
2016-06-09 23:42:44 +00:00
Wei Ding
ed0f97fad2 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

llvm-svn: 272292
2016-06-09 19:17:15 +00:00
Jan Vesely
2da0cba5fb SelectionDAG: Implement expansion of {S,U}MIN/MAX in integer legalization
Fixes {u,}long_{min,max,clamp} opencl piglit regressions on EG.

Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D17898

llvm-svn: 272272
2016-06-09 16:04:00 +00:00
Nicolai Haehnle
c00e03b8f5 AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20839

llvm-svn: 272063
2016-06-07 21:37:17 +00:00
Eric Christopher
538d09d0dd Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

llvm-svn: 272056
2016-06-07 20:27:12 +00:00