clang-p2996

Author	SHA1	Message	Date
Joe Nash	e29228efae	[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418 ) Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on GFX1150Plus The w32 and w64 _e64_dpp assembler only real instructions were unused, and erroneously constructed in a way that bugged parsing of the new instructions. They are removed. This patch is a follow up to PR https://github.com/llvm/llvm-project/pull/87382	2024-04-03 14:51:27 -04:00
Joe Nash	6a13bbf92f	[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… (#87382 ) …ng VOPC. Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions should allow sgpr and imm operands. PR #67461 added support for this with int operands, but it was missing a piece for float. Changing VOPC e64_dpp will be in a different patch because there is a bug preventing that change.	2024-04-03 11:34:12 -04:00
Janek van Oirschot	1103a2a337	Reland [AMDGPU] MCExpr-ify MC layer kernel descriptor (#86494 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr. Relands #80855 with fixes	2024-03-27 11:59:56 +00:00
Janek van Oirschot	797336b127	Revert "[AMDGPU] MCExpr-ify MC layer kernel descriptor" (#86151 ) Reverts llvm/llvm-project#80855	2024-03-21 10:19:54 -07:00
Joe Nash	d1f182c895	[AMDGPU][MC][True16] Rename and combine VINTERP MC tests (#85949 ) NFC. gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the assembler and disassembler tests to be sorted based on real16 or fake16 instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16 (fake16 in encoding, but not by name) upstream, so that is why the test files have a -fake16 suffix. One test input is changed, and that is the disassembler test for unsupported bits in the instruction. It is now an input that is valid on both GFX11 and GFX12. This was necessary because the size of the opcode field changed.	2024-03-21 10:42:39 -04:00
Janek van Oirschot	857161c367	[AMDGPU] MCExpr-ify MC layer kernel descriptor (#80855 ) Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.	2024-03-21 13:57:10 +00:00
Janek van Oirschot	aeb4dd1444	Fix macro expansion for AMDHSA_BITS_SET (#85661 ) Corrects the `AMDHSA_BITS_SET` macro.	2024-03-19 10:47:40 +00:00
Stanislav Mekhanoshin	0b0e52836d	[AMDGPU] Fix GFX11 sendmsg codes (#85299 ) The code MSG_RTN_GET_TBA_TO_PC was missing, and the next code is off by 1 as a result.	2024-03-15 09:46:58 -07:00
Janek van Oirschot	f7bebc1914	Reland [AMDGPU] Add AMDGPU specific variadic operation MCExprs (#84562 ) Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'. Relands #82022 with fixes	2024-03-14 14:31:00 +00:00
Jay Foad	9fa8660203	[AMDGPU] Test new GFX12 opcode name buffer_atomic_min_num_f32 The old name buffer_atomic_min_f32 is still tested as part of the alias tests.	2024-03-13 13:11:45 +00:00
Jay Foad	36dece0013	[AMDGPU] Add missing GFX10 buffer format d16 hi instructions (#84809 )	2024-03-12 08:20:08 +00:00
Jay Foad	212604698c	[AMDGPU] Add missing tests for GFX10 (t)buffer format d16 instructions (#84789 )	2024-03-11 18:25:49 +00:00
Shilei Tian	e963d0740e	[AMDGPU] Replace `isInlinableLiteral16` with specific version (#84402 ) The current implementation of `isInlinableLiteral16` assumes, a 16-bit inlinable literal is either an `i16` or a `fp16`. This is not always true because of `bf16`. However, we can't tell `fp16` and `bf16` apart by just looking at the value. This patch splits `isInlinableLiteral16` into three versions, `i16`, `fp16`, `bf16` respectively, and call the corresponding version.	2024-03-08 14:49:52 -05:00
Florian Mayer	0083c3eb83	Revert "[AMDGPU] Add AMDGPU specific variadic operation MCExprs" (#84273 ) Reverts llvm/llvm-project#82022 Fails on hwasan build bot: https://lab.llvm.org/buildbot/#/builders/236/builds/9874/steps/10/logs/stdio	2024-03-06 19:37:49 -08:00
Janek van Oirschot	bec2d105c7	[AMDGPU] Add AMDGPU specific variadic operation MCExprs (#82022 ) Adds AMDGPU specific variadic MCExpr operations 'max' and 'or'.	2024-03-06 21:01:54 +00:00
Ivan Kosarev	a888f5e4d7	[AMDGPU][NFC] Update tests to use -triple= instead of -arch=. (#84153 )	2024-03-06 12:44:19 +00:00
Jay Foad	20fe83bc85	[AMDGPU] Add new aliases ds_subrev_rtn_u32/u64 for ds_rsub_rtn_u32/u64 (#83408 ) Following on from #83118, this adds aliases for the "rtn" forms of these instructions. The fact that they were missing from SP3 was an oversight which has been fixed now.	2024-02-29 12:02:06 +00:00
Ivan Kosarev	680c780a36	[AMDGPU][AsmParser] Support structured HWREG operands. (#82805 ) Symbolic values are to be supported separately.	2024-02-28 14:44:34 +00:00
Jay Foad	5ec535b1bd	[AMDGPU] Regenerate mnemonic alias checks (#83130 ) Regenerate checks for the full output from the assembler, not just the encoding bytes, to make it obvious that the alias has been mapped to a different mnemonic.	2024-02-27 14:07:44 +00:00
Jay Foad	d273a1970e	[AMDGPU] Shorten mnemonic alias tests (#83121 ) Only test one example of each alias. Do not test error cases which are already tested in the normal (non-alias) tests.	2024-02-27 11:53:23 +00:00
Jay Foad	ca0560d8c8	[AMDGPU] Add new aliases ds_subrev_u32/u64 for ds_rsub_u32/u64 (#83118 ) Note that the instructions have not been renamed and that there are no corresponding aliases for ds_rsub_rtn_u32/u64. This matches SP3 behavior.	2024-02-27 10:58:20 +00:00
Stanislav Mekhanoshin	3dfca24dda	[AMDGPU] Fix encoding of VOP3P dpp on GFX11 and GFX12 (#82710 ) The bug affects dpp forms of v_dot2_f32_f16. The encoding does not match SP3 and does not set op_sel_hi bits properly.	2024-02-23 03:50:00 -08:00
Stanislav Mekhanoshin	98db8d0cb7	[AMDGPU] Fix v_dot2_f16_f16/v_dot2_bf16_bf16 operands (#82423 ) src0 and src1 are packed f16/bf16, we are printing literals like 0x40002000, but we cannot parse it.	2024-02-20 16:34:40 -08:00
Shilei Tian	2ad43fa467	[AMDGPU] Fix operand types for `V_DOT2_F32_BF16` (#82044 )	2024-02-20 08:25:01 -05:00
Stanislav Mekhanoshin	030d07574f	[AMDGPU] Fix bf16 inv2pi inline constant hadling (#82283 ) Inline constant 1/(2pi) has the truncated value 0x3e22. According to the spec it is not rounded. A bf16 value in a nutshall is a fp32 value with cleared 16 bites of mantissa. The value 0x3e22 converted to fp32 is 0.158203125 and the next representable value 0x3e23 means 0.1591796875. The fp32 value of 1/(2pi) = 0.15915494 cannot be represented in bf16. Although since bf16 values are essentailly truncated fp32 values we can use 0.15915494 as an idiomatic representation of 1/(2*pi) inline constant. This is also consistent with sp3 behaviour. The patch fixes the problem that value we are printing for inv2pi inline constant is not parsed as inv2pi by the asm parser and gets rounded.	2024-02-19 15:34:09 -08:00
Stanislav Mekhanoshin	13e64958a0	[AMDGPU] Fix decoder for BF16 inline constants (#82276 ) Fix #82039.	2024-02-19 13:45:23 -08:00
Ivan Kosarev	0ec524b120	[AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16. (#81131 ) [AMDGPU][MC][True16] Support V_RCP/SQRT/RSQ/LOG/EXP_F16. Also add missing v_ceil/floor_f16 tests. Includes https://github.com/llvm/llvm-project/pull/80892.	2024-02-19 15:50:48 +00:00
Shilei Tian	46734aa1e5	[AMDGPU] Use `bf16` instead of `i16` for bfloat (#80908 ) Currently we generally use `i16` to represent `bf16` in those tablegen files. This patch is trying to use `bf16` directly. Fix #79369.	2024-02-16 15:58:30 -05:00
Mirko Brkušanin	815e0485a4	[AMDGPU][MC] Fix printing vcc(_lo) twice for VOPC DPP instrucitons (#81158 )	2024-02-12 19:01:58 +01:00
Konstantin Zhuravlyov	cf55e61dd9	AMDGPU: Don't allow s_barrier on gfx12 (#81317 ) - s_barrier is not present on gfx12	2024-02-12 11:32:46 -05:00
Ivan Kosarev	7d19dc50de	[AMDGPU][True16] Support VOP3 source DPP operands. (#80892 )	2024-02-08 16:23:00 +00:00
Pierre van Houtryve	500846d2f5	[AMDGPU] Introduce Code Object V6 (#76954 ) Introduce Code Object V6 in Clang, LLD, Flang and LLVM. This is the same as V5 except a new "generic version" flag can be present in EFLAGS. This is related to new generic targets that'll be added in a follow-up patch. It's also likely V6 will have new changes (possibly new metadata entries) added later. Docs change are part of the follow-up patch #76955	2024-02-05 08:19:53 +01:00
Shilei Tian	6a21e00e39	[AMDGPU][AsmParser] Allow `v_writelane_b32` to use SGPR and M0 as source operands at the same time (#78827 ) Currently the asm parser takes `v_writelane_b32 v1, s13, m0` as illegal instruction for pre-gfx11 because it uses two constant buses while the hardware can only allow one. However, based on the comment of `AMDGPUInstructionSelector::selectWritelane`, it is allowed to have M0 as lane selector and a SGPR used as SRC0 because the lane selector doesn't count as a use of constant bus. In fact, codegen can already generate this form, but this inconsistency is not exposed because the validation of constant bus limitation only happens when paring an assembly but we don't have a test case when both SGPR and M0 used as source operands for the instruction.	2024-01-30 15:39:31 -05:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Ivan Kosarev	5a458767dd	[AMDGPU][True16] Support source DPP operands. (#79025 )	2024-01-23 09:52:49 +00:00
Stanislav Mekhanoshin	1000cefc04	[AMDGPU] Remove s_set_inst_prefetch_distance support from GFX12 (#78786 ) This instruction is not supported by GFX12.	2024-01-22 14:31:17 -08:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Mariusz Sikora	2c78f3b860	[AMDGPU][GFX12] Add tests for flat_atomic_pk (#78683 )	2024-01-19 12:08:17 +01:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	28b7e498b6	AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892 ) Endoding is VOP3P. Tagged as deep/machine learning instructions. i32 type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32 fneg(neg_lo[2]) and f32 fabs(neg_hi[2]). --------- Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>	2024-01-18 14:00:27 +01:00
Ivan Kosarev	2a869ced61	[AMDGPU][True16] Support V_FLOOR_F16. (#78446 )	2024-01-18 08:43:47 +00:00
Mariusz Sikora	c99da46fc1	[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2024-01-17 19:23:42 +01:00
Jay Foad	e4c8c58517	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on GFX12 (#77929 )	2024-01-17 15:57:36 +00:00
Mirko Brkušanin	3867e6689e	[AMDGPU] Add new GFX12 image atomic float instructions (#76946 )	2024-01-11 17:28:04 +01:00
Jay Foad	c9c8f0c2fc	[AMDGPU] Update tests for GFX12 errors and unsupported instructions (#77624 )	2024-01-11 08:26:23 +00:00
Ivan Kosarev	084f1c2ee0	[AMDGPU][True16] Support V_CEIL_F16. (#73108 ) As not all fake instructions have their real counterparts implemented yet, we specify no AssemblerPredicate for UseFakeTrue16Insts to allow both fake and real True16 instructions in assembler and disassembler tests in the -mattr=+real-true16 mode during the transition period. Source DPP and desitnation VOPDstOperand_t16 operands are still not supported and will be addressed separately.	2024-01-10 08:46:19 +00:00
Jay Foad	b59b8d4182	[AMDGPU] Add GFX12 S_WAIT_* instructions (#77336 ) GFX12 has separate wait instructions per counter e.g. S_WAIT_LOADCNT. S_WAITCNT still exists but is deprecated and codegen should stop using it. S_WAITCNT_* (e.g. S_WAITCNT_VSCNT) are removed. This patch adds/removes MC layer support for these instructions.	2024-01-09 09:05:48 +00:00
Mirko Brkušanin	7ca4473dd9	[AMDGPU] Add new cache flushing instructions for GFX12 (#76944 ) Co-authored-by: Diana Picus <Diana-Magda.Picus@amd.com>	2024-01-08 14:06:58 +00:00

1 2 3 4 5 ...

800 Commits