Commit Graph

707 Commits

Author SHA1 Message Date
Mirko Brkušanin
7fdf608cef [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Mariusz Sikora
cfddb59be2 [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
2024-01-24 12:21:15 +01:00
Saiyedul Islam
082f87c9d4 [AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.
2024-01-23 17:08:18 +05:30
Jay Foad
e21b0b083e [AMDGPU] Remove gws feature from GFX12 (#78711)
This was already done for LLVM. This patch just updates the Clang
builtin handling to match.
2024-01-19 15:45:53 +00:00
Jay Foad
ed12388082 [AMDGPU] Do not emit V_DOT2C_F32_F16_e32 on GFX12 (#78709)
That instruction is not supported on GFX12.
Added a testcase which previously crashed without this change.

Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>
2024-01-19 14:36:27 +00:00
Piotr Sobczak
57f6a3f7ea [AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.

* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
2024-01-18 15:14:42 +01:00
Mariusz Sikora
3e6589f21c [AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917)
- image_atomic_pk_add_f16
- image_atomic_pk_add_bf16
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16
- buffer_atomic_pk_add_bf16
2024-01-18 14:01:09 +01:00
Mariusz Sikora
28b7e498b6 AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892)
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).

---------

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
2024-01-18 14:00:27 +01:00
Jay Foad
4c65787f1e [AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (#77926) 2024-01-18 10:14:01 +00:00
Mariusz Sikora
264fd9e13e [AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439) 2024-01-18 08:46:53 +01:00
Valery Pykhtin
9791e54149 Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429)
Reverts llvm/llvm-project#71556

Fixes failures:
https://lab.llvm.org/buildbot/#/builders/188/builds/40541
https://lab.llvm.org/buildbot/#/builders/91/builds/21847
https://lab.llvm.org/buildbot/#/builders/98/builds/31671
https://lab.llvm.org/buildbot/#/builders/139/builds/57289
2024-01-17 14:12:07 +01:00
Valery Pykhtin
57b50ef017 [AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556)
Substitute with zero-extended to i64 ballot.i32 intrinsic.
2024-01-17 17:02:05 +07:00
Nikita Popov
158d72d728 [Clang] Set writable and dead_on_unwind attributes on sret arguments (#77116)
Set the writable and dead_on_unwind attributes for sret arguments. These
indicate that the argument points to writable memory (and it's legal to
introduce spurious writes to it on entry to the function) and that the
argument memory will not be used if the call unwinds.

This enables additional MemCpyOpt/DSE/LICM optimizations.
2024-01-11 09:46:54 +01:00
Yingwei Zheng
1228becf7d [FuncAttrs] Deduce noundef attributes for return values (#76553)
This patch deduces `noundef` attributes for return values.
IIUC, a function returns `noundef` values iff all of its return values
are guaranteed not to be `undef` or `poison`.
Definition of `noundef` from LangRef:
```
noundef
This attribute applies to parameters and return values. If the value representation contains any 
undefined or poison bits, the behavior is undefined. Note that this does not refer to padding 
introduced by the type’s storage representation.
```
Alive2: https://alive2.llvm.org/ce/z/g8Eis6

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u
|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.01%|+0.01%|-0.01%|+0.01%|+0.03%|-0.04%|+0.01%|

The motivation of this patch is to reduce the number of `freeze` insts
and enable more optimizations.
2023-12-31 20:44:48 +08:00
Nikita Popov
a3d2d34e84 [Clang] Use poison as base for vector literals
When constructing vectors from elements, use poison instead of
undef as the base value. These literals always initialize all
elements (padding the remainder with zero), so that the choice
of base value does not affect semantics.
2023-12-19 11:53:18 +01:00
Jessica Del
32f9983c06 [AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Mariusz Sikora
966416b9e8 [AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
Mariusz Sikora
7f55d7de1a [AMDGPU] GFX12: Add Split Workgroup Barrier (#74836)
Co-authored-by: Vang Thao <Vang.Thao@amd.com>
2023-12-13 15:01:13 +01:00
Romaric Jodin
d56e0d07cc clang/OpenCL: set sqrt fp accuracy on call to Z4sqrt (#66651)
This is reverting the previous implementation to avoid adding inline
function in opencl headers.
This was breaking clspv flow google/clspv#1231, while
https://reviews.llvm.org/D156743 mentioned that just decorating the call
node with `!pfmath` was enough.
This PR is implementing this idea.
The test has been updated with this implementation.
2023-12-01 16:34:44 +09:00
serge-sans-paille
afe8b93ffd [clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677)
Recommit of 0d2860b795 with extra test
cases fixed.
2023-11-25 00:11:20 +01:00
Florian Hahn
419a4e41fc Revert "[clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677)"
This reverts commit fe5c360a9a.
The commit causes the tests below to fail on many buildbots, e.g.
https://lab.llvm.org/buildbot/#/builders/245/builds/17047

  Clang :: CodeGen/aapcs-align.cpp
  Clang :: CodeGen/aapcs64-align.cpp
2023-11-23 20:18:55 +00:00
Jay Foad
cf1e0c0b07 [AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
serge-sans-paille
fe5c360a9a [clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677)
Recommit of 0d2860b795 with extra test
cases fixed.
2023-11-23 17:37:03 +01:00
Jessica Del
b025864af8 [AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669)
Add clang builtins for the new tied wmma intrinsics. 
These variations tie the destination
accumulator matrix to the input
accumulator matrix.

See https://github.com/llvm/llvm-project/pull/69903 for context.
2023-11-13 13:23:26 +01:00
Rana Pratap Reddy
13ea1146a7 [AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (#69567)
Currently __builtin_amdgcn_read_exec_hi lowers to llvm.read_register,
this patch lowers it to use amdgcn_ballot.
2023-10-26 10:26:11 +05:30
Nikita Popov
3b25407d97 [IR] Mark zext/sext constant expressions as undesirable
Introduce isDesirableCastOp() which determines whether IR builder
and constant folding should produce constant expressions for a
given cast type. This mirrors what we do for binary operators.

Mark zext/sext as undesirable, which prevents most creations of such
constant expressions. This is still somewhat incomplete and there
are a few more places that can create zext/sext expressions.

This is part of the work for
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.

The reason for the odd result in the constantexpr-fneg.c test is
that initially the "a[]" global is created with an [0 x i32] type,
at which point the icmp expression cannot be folded. Later it is
replaced with an [1 x i32] global and the icmp gets folded away.
But at that point we no longer fold the zext.
2023-10-02 12:40:20 +02:00
Matt Arsenault
ddc3346a6b clang/AMDGPU: Fix accidental behavior change for __builtin_amdgcn_ldexph (#66340) 2023-09-14 18:15:44 +03:00
Matt Arsenault
15e0fe0b61 clang/OpenCL: Add inline implementations of sqrt in builtin header
We want the !fpmath metadata to be attached to the sqrt intrinsic to
make it to the backend lowering. Emit an available_externally
definition which uses the builtin, which emits the !fpmath.

Fixes #64264

https://reviews.llvm.org/D156743
2023-09-12 23:23:00 +03:00
Saiyedul Islam
466a8149b3 Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b.
2023-09-12 15:13:59 +05:30
Saiyedul Islam
0a8d17e79b [AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
Matt Arsenault
6a08cf12d9 clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic
https://reviews.llvm.org/D157911
2023-09-09 23:14:12 +03:00
Saiyedul Islam
f616c3eeb4 [OpenMP][DeviceRTL][AMDGPU] Support code object version 5
Update DeviceRTL and the AMDGPU plugin to support code
object version 5. Default is code object version 4.

CodeGen for __builtin_amdgpu_workgroup_size generates code
for cov4 as well as cov5 if -mcode-object-version=none
is specified. DeviceRTL compilation passes this argument
via Xclang option to generate abi-agnostic code.

Generated code for the above builtin uses a clang
control constant "llvm.amdgcn.abi.version" to branch on
the abi version, which is available during linking of
user's OpenMP code. Load of this constant gets eliminated
during linking.

AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.

Differential Revision: https://reviews.llvm.org/D139730

Reviewed By: jhuber6, yaxunl
2023-08-29 06:35:44 -05:00
Yaxun (Sam) Liu
b8a9c50f22 [AMDGPU] Add target feature gws to clang
Reviewed by: Matt Arsenault

Differential Revision: https://reviews.llvm.org/D158367
2023-08-25 11:50:47 -04:00
Matt Arsenault
61c8af6792 AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16
There's nothing special about f16 sqrt handling.

https://reviews.llvm.org/D158090
2023-08-23 20:30:40 -04:00
Changpeng Fang
d77c62053c [clang][AMDGPU]: Don't use byval for struct arguments in function ABI
Summary:
  Byval requires allocating additional stack space, and always requires an implicit copy to be inserted in codegen,
where it can be difficult to optimize. In this work, we use byref/IndirectAliased promotion method instead of
byval with the implicit copy semantics.

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D155986
2023-08-11 16:37:42 -07:00
Matt Arsenault
9e3d9c9eae clang: Add __builtin_elementwise_sqrt
This will be used in the opencl builtin headers to provide direct
intrinsic access with proper !fpmath metadata.

https://reviews.llvm.org/D156737
2023-08-11 19:32:39 -04:00
Changpeng Fang
4608686111 [clang][test] Fix LIT test failures for the following commit
commit c1803d5366 (HEAD -> main, origin/main, origin/HEAD)
Author: Changpeng Fang <changpeng.fang@amd.com>
Date:   Wed Aug 9 17:49:14 2023 -0700

    [FunctionAttrs] Unconditionally perform argument attribute inference in the first function-attrs pass

Differential Revision:
  https://reviews.llvm.org/D156397
2023-08-09 18:23:18 -07:00
Yaxun (Sam) Liu
ac72531043 [Driver] Add -f[no-]offload-uniform-block
By default, clang assumes HIP kernels are launched with uniform block size,
which is the case for kernels launched through triple chevron or
hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute
to HIP kernels to allow the backend to do optimizations on that.

However, in some rare cases, HIP kernels can be launched
through hipExtModuleLaunchKernel where global work size is specified,
which may result in non-uniform block size.

To be able to support non-uniform block size for HIP kernels,
an option `-f[no-]offload-uniform-block is added. This option
is generic for offloading languages. Its default value is on for
CUDA/HIP and off otherwise.

Make -cl-uniform-work-group-size an alias to -foffload-uniform-block.

Reviewed by: Siu Chi Chan, Matt Arsenault, Fangrui Song, Johannes Doerfert

Differential Revision: https://reviews.llvm.org/D155213

Fixes: SWDEV-406592
2023-07-27 16:36:02 -04:00
ranapratap55
970569b6cc [AMDGPU] __builtin_amdgcn_read_exec_* should be implemented with llvm.amdgcn.ballot
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156219
2023-07-26 16:21:31 +05:30
Nick Desaulniers
b54294e2c9 [clang][ConstantEmitter] have tryEmitPrivate[ForVarInit] try ConstExprEmitter fast-path first
As suggested by @efriedma in:
https://reviews.llvm.org/D76096#4370369

This should speed up evaluating whether an expression is constant or
not, but due to the complexity of these two different implementations,
we may start getting different answers for edge cases for which we do
not yet have test cases in-tree (or perhaps even performance regressions
for some cases). As such, contributors have carte blanche to revert if
necessary.

For additional historical context about ExprConstant vs CGExprConstant,
here's snippets from a private conversation on discord:

  ndesaulniers:
  why do we have clang/lib/AST/ExprConstant.cpp and
  clang/lib/CodeGen/CGExprConstant.cpp? Does clang constant fold during
  ast walking/creation AND during LLVM codegen?
  efriedma:
  originally, clang needed to handle two things: integer constant
  expressions (the "5" in "int x[5];"), and constant global initializers
  (the "5" in "int x = 5;").  pre-C++11, the two could be handled mostly
  separately; so we had the code for integer constants in AST/, and the
  code for globals in CodeGen/.  C++11 constexpr sort of destroyed that
  separation, though. so now we do both kinds of constant evaluation on
  the AST, then CGExprConstant translates the result of that evaluation
  to LLVM IR.  but we kept around some bits of the old cgexprconstant to
  avoid performance/memory usage regressions on large arrays.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D151587
2023-07-24 13:50:45 -07:00
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Matt Arsenault
bac2a07540 clang: Attach !fpmath metadata to __builtin_sqrt based on language flags
OpenCL and HIP have -cl-fp32-correctly-rounded-divide-sqrt and
-fno-hip-correctly-rounded-divide-sqrt. The corresponding fpmath metadata
was only set on fdiv, and not sqrt. The backend is currently underutilizing
sqrt lowering options, and the responsibility is split between the libraries
and backend and this metadata is needed.

CUDA/NVCC has -prec-div and -prev-sqrt but clang doesn't appear to be
aiming for compatibility with those. Don't know if OpenMP has a similar
control.
2023-07-14 18:46:18 -04:00
Kevin P. Neal
91f886a40d [FPEnv][TableGen] Add strictfp attribute to constrained intrinsics by default.
In D146869 @arsenm pointed out that the constrained intrinsics aren't
getting the strictfp attribute by default. They should be since they are
required to have it anyway.

TableGen did not know about this attribute until now. This patch adds
strictfp to TableGen, and it uses it on all of the constrained intrinsics.

Differential Revision: https://reviews.llvm.org/D154991
2023-07-12 09:55:53 -04:00
Matt Arsenault
42d4c85ca8 clang: Stop emitting "strictfp"
The attribute is a proper enum attribute, strictfp. We were getting
strictfp and "strictfp" set on every function with
-fexperimental-strict-floating-point.

https://reviews.llvm.org/D139629
2023-07-07 15:28:21 -04:00
Matt Arsenault
75b7901901 clang: Regenerate test checks 2023-07-07 15:28:21 -04:00
Matt Arsenault
b15bf305ca Reapply "clang: Use new frexp intrinsic for builtins and add f16 version"
This reverts commit 0c545a4412.

ARM libcall expansion was fixed in 160d7227e0
2023-06-30 09:07:23 -04:00
Hans Wennborg
0c545a4412 Revert "clang: Use new frexp intrinsic for builtins and add f16 version"
This caused asserts in some Android and Windows builds:

SelectionDAGNodes.h:1138: llvm::SDValue::SDValue(SDNode *, unsigned int):
Assertion `(!Node || !ResNo || ResNo < Node->getNumValues()) && "Invalid result number for the given node!"' failed.

See comment on 85bdea023f

Also revert "HIP: Use frexp builtins in math headers"
which seems to depend on this change.

This reverts commit 85bdea023f.
This reverts commit bf8e92c0e7.
2023-06-30 13:26:25 +02:00
Sameer Sahasrabuddhe
7a101798b7 Revert "[AMDGPU] Mark mbcnt as convergent"
This reverts commit 37114036aa.

The output of mbcnt does not depend on other active lanes, and hence it is not
convergent. The original change was made as a possible fix for

https://github.com/ROCm-Developer-Tools/HIP/issues/3172

But changing mbcnt does not fix that issue.

Reviewed By: ruiling, foad, yaxunl

Differential Revision: https://reviews.llvm.org/D153953
2023-06-30 13:10:44 +05:30
Matt Arsenault
85bdea023f clang: Use new frexp intrinsic for builtins and add f16 version 2023-06-28 14:50:17 -04:00
Arthur Eubanks
457dc72fdd Reland [InstCombine] Infer inbounds for more GEPs of dereferenceable pointers
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.

Fixed clang test broken in initial land.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D153815
2023-06-27 09:31:20 -07:00