Commit Graph

800 Commits

Author SHA1 Message Date
Joe Nash
e29228efae [AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418)
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus

The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.

This patch is a follow up to PR
https://github.com/llvm/llvm-project/pull/87382
2024-04-03 14:51:27 -04:00
Joe Nash
6a13bbf92f [AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382)
…ng VOPC.

Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions
should allow sgpr and imm operands.
PR #67461 added support for this with int operands, but it was missing a
piece for float.
Changing VOPC e64_dpp will be in a different patch because there is a
bug preventing that change.
2024-04-03 11:34:12 -04:00
Janek van Oirschot
1103a2a337 Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494)
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.

Relands #80855 with fixes
2024-03-27 11:59:56 +00:00
Janek van Oirschot
797336b127 Revert "[AMDGPU] MCExpr-ify MC layer kernel descriptor" (#86151)
Reverts llvm/llvm-project#80855
2024-03-21 10:19:54 -07:00
Joe Nash
d1f182c895 [AMDGPU][MC][True16] Rename and combine VINTERP MC tests (#85949)
NFC.
gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the
assembler and disassembler tests to be sorted based on real16 or fake16
instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16
(fake16 in encoding, but not by name) upstream, so that is why the test
files have a -fake16 suffix.

One test input is changed, and that is the disassembler test for
unsupported bits in the instruction. It is now an input that is valid on
both GFX11 and GFX12. This was necessary because the size of the opcode
field changed.
2024-03-21 10:42:39 -04:00
Janek van Oirschot
857161c367 [AMDGPU] MCExpr-ify MC layer kernel descriptor (#80855)
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.
2024-03-21 13:57:10 +00:00
Janek van Oirschot
aeb4dd1444 Fix macro expansion for AMDHSA_BITS_SET (#85661)
Corrects the `AMDHSA_BITS_SET` macro.
2024-03-19 10:47:40 +00:00
Stanislav Mekhanoshin
0b0e52836d [AMDGPU] Fix GFX11 sendmsg codes (#85299)
The code MSG_RTN_GET_TBA_TO_PC was missing, and the next code is off by
1 as a result.
2024-03-15 09:46:58 -07:00
Janek van Oirschot
f7bebc1914 Reland [AMDGPU] Add AMDGPU specific variadic operation MCExprs (#84562)
Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'. 

Relands #82022 with fixes
2024-03-14 14:31:00 +00:00
Jay Foad
9fa8660203 [AMDGPU] Test new GFX12 opcode name buffer_atomic_min_num_f32
The old name buffer_atomic_min_f32 is still tested as part of the alias
tests.
2024-03-13 13:11:45 +00:00
Jay Foad
36dece0013 [AMDGPU] Add missing GFX10 buffer format d16 hi instructions (#84809) 2024-03-12 08:20:08 +00:00
Jay Foad
212604698c [AMDGPU] Add missing tests for GFX10 (t)buffer format d16 instructions (#84789) 2024-03-11 18:25:49 +00:00
Shilei Tian
e963d0740e [AMDGPU] Replace isInlinableLiteral16 with specific version (#84402)
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
2024-03-08 14:49:52 -05:00
Florian Mayer
0083c3eb83 Revert "[AMDGPU] Add AMDGPU specific variadic operation MCExprs" (#84273)
Reverts llvm/llvm-project#82022

Fails on hwasan build bot:
https://lab.llvm.org/buildbot/#/builders/236/builds/9874/steps/10/logs/stdio
2024-03-06 19:37:49 -08:00
Janek van Oirschot
bec2d105c7 [AMDGPU] Add AMDGPU specific variadic operation MCExprs (#82022)
Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'.
2024-03-06 21:01:54 +00:00
Ivan Kosarev
a888f5e4d7 [AMDGPU][NFC] Update tests to use -triple= instead of -arch=. (#84153) 2024-03-06 12:44:19 +00:00
Jay Foad
20fe83bc85 [AMDGPU] Add new aliases ds_subrev_rtn_u32/u64 for ds_rsub_rtn_u32/u64 (#83408)
Following on from #83118, this adds aliases for the "rtn" forms of these
instructions. The fact that they were missing from SP3 was an oversight
which has been fixed now.
2024-02-29 12:02:06 +00:00
Ivan Kosarev
680c780a36 [AMDGPU][AsmParser] Support structured HWREG operands. (#82805)
Symbolic values are to be supported separately.
2024-02-28 14:44:34 +00:00
Jay Foad
5ec535b1bd [AMDGPU] Regenerate mnemonic alias checks (#83130)
Regenerate checks for the full output from the assembler, not just the
encoding bytes, to make it obvious that the alias has been mapped to a
different mnemonic.
2024-02-27 14:07:44 +00:00
Jay Foad
d273a1970e [AMDGPU] Shorten mnemonic alias tests (#83121)
Only test one example of each alias. Do not test error cases which are
already tested in the normal (non-alias) tests.
2024-02-27 11:53:23 +00:00
Jay Foad
ca0560d8c8 [AMDGPU] Add new aliases ds_subrev_u32/u64 for ds_rsub_u32/u64 (#83118)
Note that the instructions have not been renamed and that there are no
corresponding aliases for ds_rsub_rtn_u32/u64. This matches SP3
behavior.
2024-02-27 10:58:20 +00:00
Stanislav Mekhanoshin
3dfca24dda [AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710)
The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match
SP3 and does not set op_sel_hi bits properly.
2024-02-23 03:50:00 -08:00
Stanislav Mekhanoshin
98db8d0cb7 [AMDGPU] Fix v_dot2_f16_f16/v_dot2_bf16_bf16 operands (#82423)
src0 and src1 are packed f16/bf16, we are printing literals like
0x40002000, but we cannot parse it.
2024-02-20 16:34:40 -08:00
Shilei Tian
2ad43fa467 [AMDGPU] Fix operand types for V_DOT2_F32_BF16 (#82044) 2024-02-20 08:25:01 -05:00
Stanislav Mekhanoshin
030d07574f [AMDGPU] Fix bf16 inv2pi inline constant hadling (#82283)
Inline constant 1/(2*pi) has the truncated value 0x3e22. According to
the spec it is not rounded. A bf16 value in a nutshall is a fp32 value
with cleared 16 bites of mantissa. The value 0x3e22 converted to fp32 is
0.158203125 and the next representable value 0x3e23 means 0.1591796875.
The fp32 value of 1/(2*pi) = 0.15915494 cannot be represented in bf16.
Although since bf16 values are essentailly truncated fp32 values we can
use 0.15915494 as an idiomatic representation of 1/(2*pi) inline
constant. This is also consistent with sp3 behaviour. The patch fixes
the problem that value we are printing for inv2pi inline constant is not
parsed as inv2pi by the asm parser and gets rounded.
2024-02-19 15:34:09 -08:00
Stanislav Mekhanoshin
13e64958a0 [AMDGPU] Fix decoder for BF16 inline constants (#82276)
Fix #82039.
2024-02-19 13:45:23 -08:00
Ivan Kosarev
0ec524b120 [AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16. (#81131)
[AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16.

Also add missing v_ceil/floor_f16 tests.

Includes https://github.com/llvm/llvm-project/pull/80892.
2024-02-19 15:50:48 +00:00
Shilei Tian
46734aa1e5 [AMDGPU] Use bf16 instead of i16 for bfloat (#80908)
Currently we generally use `i16` to represent `bf16` in those tablegen
files. This patch is trying to use `bf16` directly.

Fix #79369.
2024-02-16 15:58:30 -05:00
Mirko Brkušanin
815e0485a4 [AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158) 2024-02-12 19:01:58 +01:00
Konstantin Zhuravlyov
cf55e61dd9 AMDGPU: Don't allow s_barrier on gfx12 (#81317)
- s_barrier is not present on gfx12
2024-02-12 11:32:46 -05:00
Ivan Kosarev
7d19dc50de [AMDGPU][True16] Support VOP3 source DPP operands. (#80892) 2024-02-08 16:23:00 +00:00
Pierre van Houtryve
500846d2f5 [AMDGPU] Introduce Code Object V6 (#76954)
Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same
as V5 except a new "generic version" flag can be present in EFLAGS. This
is related to new generic targets that'll be added in a follow-up patch.
It's also likely V6 will have new changes (possibly new metadata
entries) added later.

Docs change are part of the follow-up patch #76955
2024-02-05 08:19:53 +01:00
Shilei Tian
6a21e00e39 [AMDGPU][AsmParser] Allow v_writelane_b32 to use SGPR and M0 as source operands at the same time (#78827)
Currently the asm parser takes `v_writelane_b32 v1, s13, m0` as illegal
instruction for pre-gfx11 because it uses two constant buses while the
hardware
can only allow one. However, based on the comment of
`AMDGPUInstructionSelector::selectWritelane`,
it is allowed to have M0 as lane selector and a SGPR used as SRC0
because the
lane selector doesn't count as a use of constant bus. In fact, codegen
can already
generate this form, but this inconsistency is not exposed because the
validation
of constant bus limitation only happens when paring an assembly but we
don't have
a test case when both SGPR and M0 used as source operands for the
instruction.
2024-01-30 15:39:31 -05:00
Mirko Brkušanin
7fdf608cef [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Mariusz Sikora
cfddb59be2 [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Ivan Kosarev
5a458767dd [AMDGPU][True16] Support source DPP operands. (#79025) 2024-01-23 09:52:49 +00:00
Stanislav Mekhanoshin
1000cefc04 [AMDGPU] Remove s_set_inst_prefetch_distance support from GFX12 (#78786)
This instruction is not supported by GFX12.
2024-01-22 14:31:17 -08:00
Emma Pilkington
bc82cfb38d [AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-21 11:54:47 -05:00
Mariusz Sikora
2c78f3b860 [AMDGPU][GFX12] Add tests for flat_atomic_pk (#78683) 2024-01-19 12:08:17 +01:00
Piotr Sobczak
57f6a3f7ea [AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.

* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
2024-01-18 15:14:42 +01:00
Mariusz Sikora
3e6589f21c [AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917)
- image_atomic_pk_add_f16
- image_atomic_pk_add_bf16
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16
- buffer_atomic_pk_add_bf16
2024-01-18 14:01:09 +01:00
Mariusz Sikora
28b7e498b6 AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892)
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).

---------

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
2024-01-18 14:00:27 +01:00
Ivan Kosarev
2a869ced61 [AMDGPU][True16] Support V_FLOOR_F16. (#78446) 2024-01-18 08:43:47 +00:00
Mariusz Sikora
c99da46fc1 [AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2024-01-17 19:23:42 +01:00
Jay Foad
e4c8c58517 [AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (#77929) 2024-01-17 15:57:36 +00:00
Mirko Brkušanin
3867e6689e [AMDGPU] Add new GFX12 image atomic float instructions (#76946) 2024-01-11 17:28:04 +01:00
Jay Foad
c9c8f0c2fc [AMDGPU] Update tests for GFX12 errors and unsupported instructions (#77624) 2024-01-11 08:26:23 +00:00
Ivan Kosarev
084f1c2ee0 [AMDGPU][True16] Support V_CEIL_F16. (#73108)
As not all fake instructions have their real counterparts implemented
yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow
both fake and real True16 instructions in assembler and disassembler
tests in the -mattr=+real-true16 mode during the transition period.

Source DPP and desitnation VOPDstOperand_t16 operands are still not
supported and will be addressed separately.
2024-01-10 08:46:19 +00:00
Jay Foad
b59b8d4182 [AMDGPU] Add GFX12 S_WAIT_* instructions (#77336)
GFX12 has separate wait instructions per counter e.g. S_WAIT_LOADCNT.
S_WAITCNT still exists but is deprecated and codegen should stop using
it. S_WAITCNT_* (e.g. S_WAITCNT_VSCNT) are removed.

This patch adds/removes MC layer support for these instructions.
2024-01-09 09:05:48 +00:00
Mirko Brkušanin
7ca4473dd9 [AMDGPU] Add new cache flushing instructions for GFX12 (#76944)
Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>
2024-01-08 14:06:58 +00:00