Commit Graph

52 Commits

Author SHA1 Message Date
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Anshil Gandhi
a955a31896 [AMDGPU] Replace target feature for global fadd32
Change target feature of __builtin_amdgcn_global_atomic_fadd_f32
to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a,
gfx940 and gfx1100 as they all support the return variant of
`global_atomic_add_f32`.

Fixes https://github.com/llvm/llvm-project/issues/61331.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D146840
2023-03-28 15:58:30 -06:00
Mariusz Sikora
ea064ee2a3 [AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions
Introducing Subtarget Features for instructions:
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16

Differential Revision: https://reviews.llvm.org/D146701
2023-03-24 13:10:40 +01:00
Stanislav Mekhanoshin
df0488369d [AMDGPU] Split dot7 feature
Differential Revision: https://reviews.llvm.org/D142507
2023-01-26 10:34:36 -08:00
Stanislav Mekhanoshin
870b92977e [AMDGPU] Split dot8 feature
Differential Revision: https://reviews.llvm.org/D142407
2023-01-24 11:16:07 -08:00
Stanislav Mekhanoshin
4ab2246d48 [AMDGPU] Remove dot1 and dot6 features from clang for gfx11
These are unsupported.

Differential Revision: https://reviews.llvm.org/D142493
2023-01-24 10:52:42 -08:00
Matt Arsenault
81849497b4 clang/AMDGPU: Remove flat-address-space from feature map
This was only used for checking if is_shared/is_private were legal,
which we're not bothering to do anymore.

This is apparently visible to more than the target attribute (which
seems to silently ignore unrecognized features), so this has the
potential to break something (i.e. see the OpenMP test change)
2023-01-05 16:35:04 -05:00
Matt Arsenault
f4bcd7f598 AMDGPU/clang: Add builtins for llvm.amdgcn.ballot
Use explicit _w32/_w64 suffixes for the wave size to be consistent
with the existing other wave dependent intrinsics. Also start
diagnosing trying to use both wave32 and wave64.

I would have preferred to avoid the +wavefrontsize64 spam on targets
where that's the only option, but avoiding this seems to be more work
than I expected.
2022-12-29 17:58:55 -05:00
Stanislav Mekhanoshin
9fa5a6b7e8 [AMDGPU] Support for gfx940 fp8 conversions
Differential Revision: https://reviews.llvm.org/D129902
2022-07-18 11:48:43 -07:00
Joe Nash
8bdfc73f63 [AMDGPU][clang] Definition of gfx11 subtarget
Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537
2022-04-29 13:55:56 -04:00
Aakanksha
840695814a [AMDGPU] Add gfx1036 target
Differential Revision: https://reviews.llvm.org/D120846
2022-03-02 23:26:38 +00:00
Stanislav Mekhanoshin
2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Aakanksha Patil
3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Brendon Cahoon
294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon
211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Brendon Cahoon
ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Aakanksha Patil
464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
Jay Foad
967b64beb4 [AMDGPU] Split dot2-insts feature
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.

Differential Revision: https://reviews.llvm.org/D98717
2021-03-17 09:42:21 +00:00
Jay Foad
99682bc039 Revert "Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030""
This reverts commit e58d68fcd0.

This reinstates commit fc28f600e5
with a fix to initialize HasShaderCyclesRegister. See
https://reviews.llvm.org/D97928.
2021-03-06 09:00:01 +00:00
Mitch Phillips
e58d68fcd0 Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030"
Broke the ASan/MSan buildbots. See more comments in the original patch,
https://reviews.llvm.org/D97928.

Build failure at http://lab.llvm.org:8011/#/builders/5/builds/5327

This reverts commit fc28f600e5.
2021-03-05 18:24:59 -08:00
Jay Foad
fc28f600e5 [AMDGPU] Restore the s_memtime instruction in gfx1030
gfx1030 added a new way to implement readcyclecounter using the
SHADER_CYCLES hardware register, but the s_memtime instruction still
exists, so the MC layer should still accept it and the
llvm.amdgcn.s.memtime intrinsic should still work.

Differential Revision: https://reviews.llvm.org/D97928
2021-03-05 20:19:11 +00:00
Stanislav Mekhanoshin
a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Stanislav Mekhanoshin
8e661d3d9c [AMDGPU] Set s-memtime-inst feature from clang
Differential Revision: https://reviews.llvm.org/D95733
2021-02-01 14:20:43 -08:00
Tony
92ab6ed667 [AMDGPU] Add missing targets to amdgpu-features.cl
Differential Revision: https://reviews.llvm.org/D93017
2020-12-12 18:19:02 +00:00
Tim Renouf
89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf
ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Tony
5984097823 [AMDGPU] Add missing support for targets
- Add missing tests.

Differential Revision: https://reviews.llvm.org/D90212
2020-10-27 15:36:31 +00:00
Stanislav Mekhanoshin
d1beb95d12 [AMDGPU] gfx1032 target
Differential Revision: https://reviews.llvm.org/D89487
2020-10-15 12:41:18 -07:00
Tim Renouf
666ef0db20 [AMDGPU] Add gfx602, gfx705, gfx805 targets
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:

* The "Oland" and "Hainan" variants of gfx601 are now split out into
  gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
  that to avoid using the shaderZExport workaround on gfx602.

* One variant of gfx703 is now split out into gfx705. LLPC and other
  front-ends could use that to avoid using the
  shaderSpiCsRegAllocFragmentation workaround on gfx705.

* The "TongaPro" variant of gfx802 is now split out into gfx805.
  TongaPro has a faster 64-bit shift than its former friends in gfx802,
  and a subtarget feature could be set up for that to take advantage of
  it. This commit does not make that change; it just adds the target.

V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
    so fix the GPUKind order.

Differential Revision: https://reviews.llvm.org/D88916

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-10-10 17:22:22 +01:00
Stanislav Mekhanoshin
ea7d0e2996 [AMDGPU] gfx1031 target
Differential Revision: https://reviews.llvm.org/D85337
2020-08-05 12:36:26 -07:00
Stanislav Mekhanoshin
9ee272f13d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Stanislav Mekhanoshin
58de24ce6c [AMDGPU] Sorted targets in amdgpu-features.cl. NFC. 2020-06-12 11:57:40 -07:00
Matt Arsenault
ce2258c1cd clang/AMDGPU: Stop setting old denormal subtarget features 2020-04-02 17:17:12 -04:00
Matt Arsenault
00b2a9df45 Reapply "clang: Treat ieee mode as the default for denormal-fp-math"
This reverts commit 737394c490.

The fp-model test was failing on platforms that enable denormal flushing
based on -ffast-math. This needs to reset to IEEE, not the default in
these cases.

Change-Id: Ibbad32f66d0d0b89b9c1173a3a96fb1a570ddd89
2020-03-06 11:46:55 -08:00
Jeremy Morse
737394c490 Revert "clang: Treat ieee mode as the default for denormal-fp-math"
This reverts commit c64ca93053.

This patch tripped a few build bots:

  http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/24703/
  http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/13465/
  http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15994/

Reverting to clear the bots.
2020-03-05 10:55:24 +00:00
Matt Arsenault
c64ca93053 clang: Treat ieee mode as the default for denormal-fp-math
The IR hasn't switched the default yet, so explicitly add the ieee
attributes.

I'm still not really sure how the target default denormal mode should
interact with -fno-unsafe-math-optimizations. The target may have
selected the default mode to be non-IEEE based on the flags or based
on its true behavior, but we don't know which is the case. Since the
only users of a non-IEEE mode without a flag still support IEEE mode,
just reset to IEEE.
2020-03-04 23:34:02 -05:00
Konstantin Pyzhov
987aa3435f Corrected clang amdgpu-features.cl test for 6d614a82a4 (AMDGPU MFMA built-ins)
Differential Revision: https://reviews.llvm.org/D72723
2020-01-28 05:41:42 -05:00
Matt Arsenault
a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Matt Arsenault
281f2e2c37 AMDGPU: Add builtins for is_shared/is_private
llvm-svn: 371010
2019-09-05 03:00:43 +00:00
Stanislav Mekhanoshin
0cfd75a07d [AMDGPU] gfx908 clang target
Differential Revision: https://reviews.llvm.org/D64430

llvm-svn: 365528
2019-07-09 18:19:00 +00:00
Matt Arsenault
fc84925208 AMDGPU: Fix target builtins for gfx10
This wasn't setting some of the features from older generations.

llvm-svn: 364123
2019-06-22 01:30:00 +00:00
Stanislav Mekhanoshin
cafccd7a53 [AMDGPU] gfx1011/gfx1012 clang support
Differential Revision: https://reviews.llvm.org/D63308

llvm-svn: 363345
2019-06-14 00:33:59 +00:00
Stanislav Mekhanoshin
91792f1b93 [AMDGPU] gfx1010 clang target
Differential Revision: https://reviews.llvm.org/D61875

llvm-svn: 360634
2019-05-13 23:15:59 +00:00
Stanislav Mekhanoshin
1d9f286ecb [AMDGPU] rename vi-insts into gfx8-insts
Differential Revision: https://reviews.llvm.org/D60293

llvm-svn: 357792
2019-04-05 18:25:00 +00:00
Stanislav Mekhanoshin
1607a37308 [AMDGPU] Split dot-insts feature
Differential Revision: https://reviews.llvm.org/D57972

llvm-svn: 353588
2019-02-09 00:34:41 +00:00
Stanislav Mekhanoshin
6332f4d0d4 [AMDGPU] Separate feature dot-insts
Differential Revision: https://reviews.llvm.org/D56525

llvm-svn: 350794
2019-01-10 03:25:47 +00:00
Matt Arsenault
45bc148093 AMDGPU: Fix enabling denormals by default on pre-VI targets
Fast FMAF is not a sufficient condition to enable denormals.
Before VI, enabling denormals caused F32 instructions to
run at F64 speeds.

llvm-svn: 339278
2018-08-08 17:48:37 +00:00
Matt Arsenault
31c895ecdf AMDGPU: Add builtin for s_dcache_wb
llvm-svn: 339110
2018-08-07 07:49:13 +00:00