Commit Graph

374 Commits

Author SHA1 Message Date
Joe Nash
05d04a0180 [AMDGPU] NFC. Refactor GISel for cmp intrinsics
Combine the logic for fcmp and icmp intrinsics and use operand presence
instead.

Reviewed By: kosarev, foad

Differential Revision: https://reviews.llvm.org/D148716
2023-04-19 11:33:47 -04:00
Jay Foad
6b5067a81a [AMDGPU] Don't assert that image intrinsics are supported
Unsupported intrinsics should give a regular "cannot select" error.

Differential Revision: https://reviews.llvm.org/D148147
2023-04-16 19:54:55 +01:00
Chen Zheng
a3d5ec51ba [AMDGPU][Global-ISel] reuse extension related patterns in td file
However the imported rules can not be used for now because Global ISel
selectImpl() seems has some bug/limitation to create a illegl COPY
from VGPR to SGPR. So currently workaround this by not auto selecting these
patterns.

Fixes #61468

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D147780
2023-04-10 02:11:33 +00:00
Jessica Del
04317d4da7 [AMDGPU][GISel] Add inverse ballot intrinsic
The inverse ballot intrinsic takes in a boolean mask for all lanes and
returns the boolean for the current lane. See SPIR-V's
`subgroupInverseBallot()` in the [[ https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt | GL_KHR_shader_subgroup extension ]].
This allows decision making via branch and select instructions with a manually
manipulated mask.

Implemented in GlobalISel and SelectionDAG, since currently both are supported.
The SelectionDAG required pseudo instructions to use the custom inserter.

The boolean mask needs to be uniform for all lanes.
Therefore we expect SGPR input. In case the source is in a
VGPR, we insert one or more `v_readfirstlane` instructions.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D146287
2023-04-06 07:46:50 +02:00
Jay Foad
c75e266d31 [AMDGPU] Remove two unused ComplexRendererFns
These were left over after https://reviews.llvm.org/D98663
2023-03-30 10:44:45 +01:00
Petar Avramovic
ded69779be Fix SGPR + VGPR + offset Scratch offset folding
Values in SGPR and VGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR or VGPR base can be negative calculate offset
using 32-bit add instructions, otherwise use
sgpr(unsigned) + vgpr(unsigned) + offset.

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR or VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144957
2023-03-09 10:53:41 +01:00
Petar Avramovic
3ae310d0ae Fix VGPR + offset Scratch offset folding
Values in VGPR register are treated as unsigned by hardware.

When value in 32-bit VGPR base can be negative calculate offset using
32-bit add instruction, otherwise use vgpr base(unsigned) + offset.
Does not affect case where whole offset comes from VGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in VGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144956
2023-03-09 10:52:44 +01:00
Petar Avramovic
5e56d59999 Fix SGPR + offset Scratch offset folding
Values in SGPR register are treated as unsigned by hardware.

When value in 32-bit SGPR base can be negative calculate offset using
32-bit add instruction, otherwise use sgpr base(unsigned) + offset.
Does not affect case where whole offset comes from SGPR register
(immediate offset is 0).

LoopStrengthReduce.cpp changes offsets to negative and in some
iterations value in SGPR register could be negative.

Differential Revision: https://reviews.llvm.org/D144955
2023-03-09 10:52:44 +01:00
Justin Bogner
c083c89744 [AMDGPU] Move V_FMA_MIX pattern matching into tablegen. NFC
The matching for V_FMA_MIX was partially implemented with a C++
matcher (for fmas with 32 bit results and 16 bit inputs) and partially
in tablegen (for fmas with 16 bit results). Move the C++ matcher logic
into tablegen to make this more consistent and so we can remove the
duplication between SDAG and GISel.

Differential Revision: https://reviews.llvm.org/D144612
2023-02-23 10:23:34 -08:00
Jay Foad
dcb834843e [AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies
future changes to make some of it depend on the subtarget.

Differential Revision: https://reviews.llvm.org/D144650
2023-02-23 16:38:15 +00:00
Mirko Brkusanin
926746d22a [AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions
If more registers are needed for VAddr then the NSA format allows then the
final register can act as a contigous set of remaining addresses. Update
legalizer to pack register for this new format and allow instruction
selection to use NSA encoding when number of addresses exceeds max size.
Also update SIShrinkInstructions to handle partial NSA.

Differential Revision: https://reviews.llvm.org/D144034
2023-02-23 13:33:34 +01:00
Piotr Sobczak
a3d7b3121c [AMDGPU][NFC] Add getMaxMUBUFImmOffset
Replace magic constant 4095 with the function getMaxMUBUFImmOffset().

Differential Revision: https://reviews.llvm.org/D144623
2023-02-23 11:29:59 +01:00
Joe Nash
80a8e6805a [AMDGPU] Don't set src mods on permlane16
v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src
modifiers on any input, but they can set op_sel on src0 or src1 to
represent fi or bc when desired. The ISel patterns were setting
the src_modifier bits to -1, effectively setting abs and neg as well,
whenever it was intended to set op_sel, due to an error in ISel. ISel
should now correctly only set the op_sel bits.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144519
2023-02-22 11:41:52 -05:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Kazu Hirata
cbde2124f1 Use APInt::popcount instead of APInt::countPopulation (NFC)
This is for consistency with the C++20-style bit manipulation
functions in <bit>.
2023-02-19 11:29:12 -08:00
Mirko Brkusanin
43924cbd29 [AMDGPU][GlobalISel] Fix selection of image sample g16 instructions
Pre-GFX10 A16 modifier would imply G16. From GFX10 and onwards there are
separate instructions for 16bit gradients. This fixes the condition for
selecting G16 opcodes. Also stop adding G16 flag to instructions that do not
use gradients for GFX10 onwards.
2023-02-09 16:26:55 +01:00
Matt Arsenault
93ec3fa402 AMDGPU: Support atomicrmw uinc_wrap/udec_wrap
For now keep the exising intrinsics working.
2023-01-27 22:17:16 -04:00
Kazu Hirata
22cdc6a126 [llvm] Use llvm::bit_ceil instead of PowerOf2Ceil (NFC)
The arguments to PowerOf2Ceil in this patch are all known to be
nonzero, so we can safely use llvm::bit_ceil here.
2023-01-25 00:05:33 -08:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Fangrui Song
21c4dc7997 std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).

This fixes clang.
2022-12-17 00:42:05 +00:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Justin Bogner
916ae0a060 [AMDGPU] Handle nnan and fast on the call in fpmed3 patterns
We were only allowing these med3 patterns if the operands were known
to not be NaN, but we should also allow it if the calls to max/min
have the `nnan` or `fast` flags.

Differential Revision: https://reviews.llvm.org/D139506
2022-12-06 22:57:52 -08:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Kazu Hirata
959c9cc7ac [AMDGPU] Use std::optional in AMDGPUInstructionSelector.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:23:09 -08:00
Pierre van Houtryve
9e7febb4f7 [AMDGPU][GISel] Select llvm.amdgcn.fcmp intrinsics
Adds FP CCs opcodes/selection logic, including src mods selection

Depends on D136591, D136448
Resolves #58326 (https://github.com/llvm/llvm-project/issues/58326)

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D136592
2022-11-22 14:18:58 +00:00
Pierre van Houtryve
a751676f98 [AMDGPU][GISel] Add llvm.amdgcn.icmp selection
Add missing logic to select i16 variants and enable GISel testing.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136448
2022-11-22 08:26:50 +00:00
Mirko Brkusanin
e58b116843 [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11
Differential Revision: https://reviews.llvm.org/D133012
2022-11-18 18:19:27 +01:00
Petar Avramovic
0f3e72e86c AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection
When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.

Differential Revision: https://reviews.llvm.org/D138044
2022-11-18 18:02:26 +01:00
Matt Arsenault
ae43420f39 AMDGPU/GlobalISel: Fix not selecting modifiers for f16 fma on gfx9
VOP3OpSel wasn't trying to match any modifiers. Just try to match the
basic case, like the DAG does.
2022-11-17 18:51:45 -08:00
Jay Foad
342642dc75 [AMDGPU][GISel] Smaller code for scalar 32 to 64-bit extensions
Differential Revision: https://reviews.llvm.org/D107639
2022-11-16 06:57:21 +00:00
Pierre van Houtryve
767999fca8 [AMDGPU][GlobalISel] Support mad/fma_mix selection
Adds support for selecting the following instructions using GlobalISel:
- v_mad_mix/v_fma_mix
- v_mad_mixhi/v_fma_mixhi
- v_mad_mixlo/v_fma_mixlo

To select those instructions properly, some additional changes were
needed which impacted other tests as well.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134354
2022-11-08 08:02:34 +00:00
Pierre van Houtryve
1809414fe1 [AMDGPU][GISel] Constrain selected operands in selectG_BUILD_VECTOR
Small bugfix. Currently harmless but a case in D134354 triggers it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136235
2022-10-21 06:50:16 +00:00
Jay Foad
ea09a426a9 [AMDGPU] Assume getDefIgnoringCopies will succeed. NFC.
getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.

Differential Revision: https://reviews.llvm.org/D136238
2022-10-19 11:10:00 +01:00
Pierre van Houtryve
c93104073c [AMDGPU] Always lower SHUFFLE_VECTOR
Make it illegal, remove InstructionSelector logic for it

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134967
2022-10-04 14:23:17 +00:00
Pierre van Houtryve
9a67a6b72a [AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR
Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal.
Also removes RegBankInfo's scalarization of small BUILD_VECTORs,
replacing it with InstructionSelector logic instead.

This allows for V2S16 BUILD_VECTOR instructions to survive
all the way to ISel so we can select FMA/MAD_MIX instructions
in D134354.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134433
2022-09-30 14:04:53 +00:00
Petar Avramovic
6db7921b65 AMDGPU: Use tablegen patterns for buffer global and flat atomic fadd
Remove manual selection for atomic fadd from global-isel.
Stop pre-isel translation to AtomicLoadFAdd/G_ATOMICRMW_FADD
which corresponds to llvm-ir's atomicrmw fadd instruction.

global and flat atomic fadd patterns changes:
Split rtn/no-rtn patterns
Add missing patterns or fix predicates
Remove atomicrmw patterns for v2f16 (atomic rmw doesn't support vectors).
Patterns now check addrspace of pointer, added patterns for flat intrinsic.
with global addrspace pointer that selects into global atomic instruction.

buffer atomic fadd patterns changes:
Rdit patterns to import into global-isel.
Remove gfx6/gfx7 _addr64 and _offset patterns.
Remove patterns that can't be reached (same pattern but different feature).

Differential Revision: https://reviews.llvm.org/D130579
2022-09-23 17:52:10 +02:00
Jay Foad
3822a01e0b [AMDGPU] Add GFX11 ds_bvh_stack_rtn_b32 instruction
Differential Revision: https://reviews.llvm.org/D133928
2022-09-15 16:46:14 +01:00
Ivan Kosarev
5db8d6fd2b [AMDGPU][CodeGen] Support (base | offset) SMEM loads.
Prevents generation of unnecessary s_or_b32 instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132552
2022-09-05 14:22:06 +01:00
Ivan Kosarev
f33645301e [AMDGPU][CodeGen] Support (soffset + offset) s_buffer_load's.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130263
2022-09-05 12:53:05 +01:00
Pierre van Houtryve
59cf9dd923 [AMDGPU][GISel] Enable Selection of ADD3 for G_PTR_ADD
Allows things like `(G_PTR_ADD (G_PTR_ADD a, b), c)` to be
simplified into a single ADD3 instruction instead of two adds.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D131254
2022-08-24 14:44:19 +00:00
Ivan Kosarev
75950be836 [AMDGPU][NFC] Validate G_MERGE_VALUES as we match zero-extended 32-bit scalars.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D130001
2022-07-21 14:49:57 +01:00
Stanislav Mekhanoshin
523a99c0eb [AMDGPU] Support for gfx940 fp8 smfmac
Differential Revision: https://reviews.llvm.org/D129908
2022-07-18 12:12:41 -07:00
Ivan Kosarev
432cbd7827 [AMDGPU][CodeGen] Support (register + immediate) SMRD offsets.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129381
2022-07-18 11:29:31 +01:00
Kazu Hirata
611ffcf4e4 [llvm] Use value instead of getValue (NFC) 2022-07-13 23:11:56 -07:00
Ivan Kosarev
8cd79bc12c [AMDGPU][GlobalISel] Support register offsets for SMRDs.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D128836
2022-07-05 13:41:06 +01:00
Piotr Sobczak
4874838a63 [AMDGPU] gfx11 WMMA instruction support
gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D128756
2022-06-30 11:13:45 -04:00
Jay Foad
3fbc945c3a [AMDGPU] llvm.amdgcn.exp.compr is not supported on GFX11
Differential Revision: https://reviews.llvm.org/D128259
2022-06-28 14:48:25 +01:00
Joe Nash
f1cfaa956d [AMDGPU] Use GFX11 S_PACK_HL instruction in more cases
Differential Revision: https://reviews.llvm.org/D128527
2022-06-28 14:35:19 +01:00
Kazu Hirata
a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00