Commit Graph

988 Commits

Author SHA1 Message Date
Nico Weber
085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of s_cselect_b64.""
This reverts commit 859ebca744.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas
859ebca744 Revert D109159 "[amdgpu] Enable selection of s_cselect_b64."
This reverts commit 640beb38e7.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Ron Lieberman
09b53296cf Revert "[AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range"
This reverts commit 9075009d1f.

 Failed amdgpu runtime buildbot # 3514
2021-12-22 11:39:28 -05:00
RamNalamothu
9075009d1f [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D114652
2021-12-22 20:51:12 +05:30
rkorsa
c680fb69d6 [AMDGPU] Fixes in ISelDAG path and GlobalISel path for 'bias' operand with A16 bit on
The LOD bias operand is of type 'half' when the A16-bit is ON' for MIMG instructions.
'bias' is only 16-bit but occupies 32-bits with upper 16-bits containing junk.
The patch fixes both the paths(ISelDAG and GlobalISel) for proper encoding of LOD bias operand.

Differential Revision: https://reviews.llvm.org/D111754
2021-12-17 16:11:51 +05:30
Neubauer, Sebastian
26924b57e8 [AMDGPU] Ignore special ABI registers for graphics
Fixed ABI arguments are compute specific and should not be added to
graphics shaders or functions, so do not try to add them.

Differential Revision: https://reviews.llvm.org/D115344
2021-12-13 16:44:37 +01:00
Matt Arsenault
06b90175e7 AMDGPU: Remove fixed function ABI option 2021-12-10 19:41:19 -05:00
Michael Liao
53fc971a4b Fix -Wunused-variable warning. NFC. 2021-12-04 23:34:55 -05:00
Jay Foad
2774bad112 [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args
The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.

Don't change the API of the corresponding OpenCL builtins.

Differential Revision: https://reviews.llvm.org/D115032
2021-12-04 10:32:11 +00:00
Petar Avramovic
ab01f4d264 AMDGPU/GlobalISel: Do not fcanonicalize const splat padded with undef
Recognize constant splat padded with undef in isCanonicalized.
Fcanonicalize will be removed by RemoveFcanonicalize in post-legalizer
combiner. We will treat undef as value that will result in a splat
in clamp combine after regbankselect.

Differential Revision: https://reviews.llvm.org/D104408
2021-12-03 12:49:38 +01:00
Daniil Fukalov
ab05ab59a7 [CostModel][AMDGPU] Fix instructions costs estimation for vector types.
1. Fixed vector instructions costs estimations incosistency - removed different
   logic for "not simple types" since it biases costs for these types.
2. Fixed legalization penalty for vectors too big for the target: changed from
   overwrite default legalization cost value estimation to added penalty.
3. Fixed few typos in tests.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D114893
2021-12-03 03:08:08 +03:00
Mirko Brkusanin
8951136216 [AMDGPU][GlobalISel] Transform (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D97937
2021-11-29 16:27:21 +01:00
Mirko Brkusanin
881840fc26 [AMDGPU][GlobalISel] Transform (fadd (fmul x, y), z) -> (fma x, y, z)
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D93305
2021-11-29 16:27:21 +01:00
Kazu Hirata
387927bbaf [Target] Use range-based for loops (NFC) 2021-11-26 21:21:17 -08:00
Christudasan Devadasan
654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Jay Foad
d7e03df719 [AMDGPU] Implement widening multiplies with v_mad_i64_i32/v_mad_u64_u32
Select SelectionDAG ops smul_lohi/umul_lohi to
v_mad_i64_i32/v_mad_u64_u32 respectively, with an addend of 0.
v_mul_lo, v_mul_hi and v_mad_i64/u64 are all quarter-rate instructions
so it is better to use one instruction than two.

Further improvements are possible to make better use of the addend
operand, but this is already a strict improvement over what we have
now.

Differential Revision: https://reviews.llvm.org/D113986
2021-11-24 11:25:02 +00:00
kpyzhov
c9690092c8 [AMDGPU] Small correction in SITargetLowering::performOrCombine().
Differential Revision: https://reviews.llvm.org/D113203
2021-11-10 21:07:27 -05:00
Thomas Symalla
76cbe62262 [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D111637
2021-11-04 21:50:18 +01:00
Jay Foad
21a1d4cf71 [AMDGPU] Change numBitsSigned for simplicity and document it. NFC.
Change numBitsSigned to return the minimum size of a signed integer that
can hold the value. This is different by one from the previous result
but is more consistent with numBitsUnsigned. Update all callers. All
callers are now more consistent between the signed and unsigned cases,
and some callers get simpler, especially the ones that deal with
quantities like numBitsSigned(LHS) + numBitsSigned(RHS).

Differential Revision: https://reviews.llvm.org/D112813
2021-10-29 14:22:06 +01:00
Vang Thao
52b43d1549 [AMDGPU] Fix cvt_f32_ubyte combine with shl
Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112733
2021-10-28 21:43:06 -07:00
Neubauer, Sebastian
487f15603e [AMDGPU] Fix setcc combine for i128
The combine asserted if constants could not be represented as uint64_t.
Use APInts to fix this.

Differential Revision: https://reviews.llvm.org/D112416
2021-10-26 13:39:50 +02:00
Ruiling Song
52785989e9 AMDGPU: Broadcast scalar boolean to vector boolean explicitly
This is used to fix wrong code generation of s_add_co_select_user in
test/CodeGen/AMDGPU/expand-scalar-carry-out-select-user.ll

  s_addc_u32 s4, s6, 0
  s_cselect_b64 vcc, 1, 0    <-- vcc set as 0x1 if SCC==1
  v_mov_b32_e32 v1, s4
  s_cmp_gt_u32 s6, 31
  v_cndmask_b32_e32 v1, 0, v1, vcc

If the s_addc_u32 set SCC, then we will get value 0x1 in VCC.
The v_cndmask will do per thread selection with VCC as condition
register. As VCC only gets the first bit being set, only the first
thread/lane in destination register can get correct result if the
very first lane is active. In fact, we should broadcast the value to all
active lanes of the final register.

The idea here is doing this broadcast to vector boolean explicitly
instead of lowering it into a COPY from SCC which would be interpreted as
selecting between 0/1.

This is used to replace D109754.

Reviewed-by: foad, alex-t

Differential Revision: https://reviews.llvm.org/D109889
2021-09-30 10:15:01 +08:00
Matt Arsenault
c305513cc2 AMDGPU: Fix assert with indirect call with known required inputs
The attributor can determine that some indirect calls do not require
special inputs. The special inputs will still be present in the ABI,
so we need to allocate the registers and pass undefs.
2021-09-13 22:54:11 -04:00
Matt Arsenault
0197cd0bd4 AMDGPU: Optimize amdgpu-no-* attributes
This allows clobbering a few extra registers in the fixed ABI, and
avoids some workitem ID packing instructions.
2021-09-09 18:24:28 -04:00
Craig Topper
9af8f1b18e [SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode.
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D109535
2021-09-09 13:28:30 -07:00
Michael Liao
640beb38e7 [amdgpu] Enable selection of s_cselect_b64.
Differential Revision: https://reviews.llvm.org/D109159
2021-09-07 10:45:07 -04:00
Joe Nash
c96839265a [AMDGPU] Enable ds_min/ds_max on more subtargets
Adds patterns for f64 ds_min/ds_max. Shrinks HasLDSFPAtomics
scope to enable f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108994

Change-Id: Id890b677841ee588b20d42b1bb3f4cdbf6e9ba1a
2021-08-31 13:22:31 -04:00
Matt Arsenault
98d7aa435f AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer
support using it outside of kernels.
2021-08-26 20:30:03 -04:00
Anshil Gandhi
508b06699a [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions
Produce remarks when atomic instructions are expanded into hardware instructions
in SIISelLowering.cpp. Currently, these remarks are only emitted for atomic fadd
instructions.

Differential Revision: https://reviews.llvm.org/D108150
2021-08-19 20:51:19 -06:00
David Stuttard
ebdb0d09a4 AMDGPU: During img instruction ret value construction cater for non int values
Make sure return type is int type.

Differential Revision: https://reviews.llvm.org/D108131

Change-Id: Ic02f07d1234cd51b6ed78c3fecd2cb1d6acd5644
2021-08-17 09:08:24 +01:00
Carl Ritson
99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Arthur Eubanks
92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Amara Emerson
2b067e3335 Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG.
DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.
2021-08-06 12:57:53 -07:00
Jay Foad
2b63933115 [AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz
Differential Revision: https://reviews.llvm.org/D107566
2021-08-05 15:57:40 +01:00
Jay Foad
40202b13b2 [AMDGPU] Legalize operands of V_ADDC_U32_e32 and friends
These instructions have an implicit use of vcc which counts towards the
constant bus limit. Pre gfx10 this means that the explicit operands
cannot be sgprs. Use the custom inserter hook to call legalizeOperands
to enforce that restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51217

Differential Revision: https://reviews.llvm.org/D106868
2021-08-03 09:04:52 +01:00
Carl Ritson
675c942373 [AMDGPU] Disable NSA for BVH instructions when appropriate
Check maximum NSA size when selecting NSA or non-NSA BVH instructions.

Differential Revision: https://reviews.llvm.org/D103230
2021-08-02 20:09:26 +09:00
Carl Ritson
fbaa35e169 [AMDGPU] Add SelectionDAG support for insert_subvector on v4f64
Enable custom insert_subvector for larger vector types.
This is necessary now that SelectionDAG can attempt v3f64 insert
to v4f64, etc.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D105385
2021-07-27 10:11:34 +09:00
Jay Foad
9ac10658ae [AMDGPU] Fix MMO for raw/struct buffer access with non-constant offset
Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.

Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.

Differential Revision: https://reviews.llvm.org/D106284
2021-07-26 14:27:30 +01:00
Carl Ritson
7d4baf25aa [AMDGPU] Add maximum NSA size limit ISA feature
Add maximum NSA size limit as an ISA feature.
Use this to reduce NSA usage on GFX10.1 to avoid stability issues
with 4 and 5 dwords NSA instructions.
Maintain use of longer NSA instructions on GFX10.3.

Note: this also contains some minor fixes for GlobalISel which
did not work correctly with non-NSA form instructions on GFX10.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103348
2021-07-23 16:16:06 +09:00
Carl Ritson
6efb3220b4 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Jay Foad
3ed29f960c [AMDGPU] NFC refactoring in isel for buffer access intrinsics
Rename getBufferOffsetForMMO to updateBufferMMO and pass in the MMO to
be updated, in preparation for the bug fix in D106284.

Call updateBufferMMO consistently for all buffer intrinsics, even the
ones that use setBufferOffsets to decompose a combined offset
expression.

Add a getIdxEn helper function.

Differential Revision: https://reviews.llvm.org/D106354
2021-07-21 11:12:49 +01:00
Jay Foad
96d8f2a1e0 [AMDGPU] Fix typo in comments idexen -> idxen 2021-07-19 13:39:30 +01:00
Carl Ritson
98f48723f2 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs
Add SReg_224, VReg_224, AReg_224, etc.
Link 224-bit types with v7i32/v7f32.
Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D104622
2021-06-24 12:41:22 +09:00
Matt Arsenault
a7786badb7 AMDGPU: Move zeroed FP high bits optimization to patterns 2021-06-22 12:47:56 -04:00
Carl Ritson
a10aeb3b32 [AMDGPU] Remove duplicate setOperationAction for v4i16/v4f16 (NFC) 2021-06-18 12:38:54 +09:00
Brendon Cahoon
294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon
211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon
ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Carl Ritson
f8816c7400 [AMDGPU] Add v5f32/VReg_160 support for MIMG instructions
Avoid having to round up to v8f32/VReg_256 when only 5 VGPRs are
required for a MIMG address operand.

Maintain _V8 instruction variants of pseudo instructions allowing
assembly prior to GFX10 to work as-is.  Currently the validator
can tell for GFX10 what the correct size is, so will disallow
oversize address registers.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103672
2021-06-08 11:11:40 +09:00
Nikita Popov
9914200393 [CodeGen] Add missing includes (NFC)
These currently rely on the IRBuilder.h include in TargetLowering.h.
Make them explicit.
2021-06-06 15:48:27 +02:00