Commit Graph

273 Commits

Author SHA1 Message Date
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Jon Chesterfield
6043d4dfec [amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca 2023-07-15 21:37:21 +01:00
Jon Chesterfield
74e928a081 [amdgpu][lds] Remove recalculation of LDS frame from backend
Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend.

Prior to this patch:
The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol
metadata so that the assembler can build lookup tables out of it. There is a fragile association between
kernel functions and named structs which is used to recompute the frame layout in the backend, with
fatal_errors catching inconsistencies in the second calculation.

After this patch:
The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same
absolute_symbol metadata that the assembler uses to place objects within that frame size.

Deleted the now dead allocation code from the backend. Left for a later cleanup:
- enabling lowering for anonymous functions
- removing the elide-module-lds attribute (test churn, it's not used by llc any more)
- adjusting the dynamic alignment check to not use symbol names

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155190
2023-07-13 23:54:38 +01:00
Matt Arsenault
30fd35f59c AMDGPU: Add some notes about amdgpu-flat-work-group-size 2023-07-07 19:02:46 -04:00
Matt Arsenault
5491666248 AMDGPU: Correctly lower llvm.exp.f32
The library expansion has too many paths for all the permutations of
DAZ, unsafe and the 3 exp functions. It's easier to expand it in the
backend when we know all of these things. The library currently misses
the no-infinity check on the overflow, which this handles optimizing
out.

Some of the <3 x half> fast tests regress due to vector widening
dropping flags which will be fixed separately.

Apparently there is no exp10 intrinsic, but there should be. Adds some
deadish code in preparation for adding one while I'm following along
with the current library expansion.
2023-07-05 17:23:49 -04:00
Matt Arsenault
ed556a1ad5 AMDGPU: Correctly lower llvm.exp2.f32
Previously this did a fast math expansion only.
2023-07-05 17:23:48 -04:00
Matt Arsenault
4e15f378ee AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32
Previously we expanded these in a fast-math way and the device
libraries were relying on this behavior. The libraries have a pending
change to switch to the new target intrinsic.

Unlike the library version, this takes advantage of no-infinities on
the result overflow check.
2023-07-05 15:30:35 -04:00
Matt Arsenault
003b58f65b IR: Add llvm.frexp intrinsic
Add an intrinsic which returns the two pieces as multiple return
values. Alternatively could introduce a pair of intrinsics to
separately return the fractional and exponent parts.

AMDGPU has native instructions to return the two halves, but could use
some generic legalization and optimization handling. For example, we
should be able to handle legalization of f16 on older targets, and for
bf16. Additionally antique targets need a hardware workaround which
would be better handled in the backend rather than in library code
where it is now.
2023-06-28 14:50:16 -04:00
Matt Arsenault
89ccfa1b39 AMDGPU: Use correct lowering for llvm.log2.f32
We previously directly codegened to v_log_f32, which is broken for
denormals. The lowering isn't complicated, you simply need to scale
denormal inputs and adjust the result. Note log and log10 are still
not accurate enough, and will be fixed separately.
2023-06-23 08:37:37 -04:00
Joe Nash
83d47ba15a [AMDGPU] Add _e64_dpp asm suffix to docs
The _e64_dpp suffix can be added to an instruction to force the
AsmParser to encode it as VOP3 with DPP if possible on GFX11+. This has
been the behavior since GFX11 was introduced; this patch only updates
the documentation.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153564
2023-06-22 14:31:09 -04:00
Diana Picus
041bfe40a7 [AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFC
Co-authored-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

Differential Revision: https://reviews.llvm.org/D151997
2023-06-20 10:45:58 +02:00
Diana Picus
5c5eff44ce [AMDGPU] Start documenting calling conventions. NFC
Add a section to AMDGPUUsage.rst about calling conventions and list the
ones from the CallingConv enum. Full descriptions can come later (help
appreciated).

Differential Revision: https://reviews.llvm.org/D151996
2023-06-20 10:45:58 +02:00
Diana Picus
99610fa77d Revert "[AMDGPU] Start documenting calling conventions. NFC"
This reverts commit aa7b127cb7.

...because I really ought to install sphinx.
2023-06-20 09:55:52 +02:00
Diana Picus
d116f20e37 Fixup D151996 2023-06-20 09:37:16 +02:00
Diana Picus
aa7b127cb7 [AMDGPU] Start documenting calling conventions. NFC
Add a section to AMDGPUUsage.rst about calling conventions and list the
ones from the CallingConv enum. Full descriptions can come later (help
appreciated).

Differential Revision: https://reviews.llvm.org/D151996
2023-06-20 09:29:19 +02:00
Aaron Ballman
582cf31862 Fix the LLVM Sphinx build
This addresses the issue found in:
https://lab.llvm.org/buildbot/#/builders/30/builds/36346
2023-06-15 07:56:42 -04:00
Matt Arsenault
28f3edd2be AMDGPU: Add llvm.amdgcn.exp2 intrinsic
Provide direct access to v_exp_f32 and v_exp_f16, so we can start
correctly lowering the generic exp intrinsics.

Unfortunately have to break from the usual naming convention of
matching the instruction name and stripping the v_ prefix. exp is
already taken by the export intrinsic. On the clang builtin side, we
have a choice of maintaining the convention to the instruction name,
or following the intrinsic name.
2023-06-15 07:00:07 -04:00
Matt Arsenault
ccf216fe2c AMDGPU: Add llvm.amdgcn.log to intrinsic documentation 2023-06-15 07:00:07 -04:00
Aaron Ballman
540b934ff9 Fix LLVM sphinx build
Addresses the issue found by:
https://lab.llvm.org/buildbot/#/builders/30/builds/35886
2023-06-05 14:14:23 -04:00
Krzysztof Drewniak
23098bd454 [AMDGPU] Add intrinsic for converting global pointers to resources
Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).

This intrinsic is lowered during the early phases of the backend.

This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.

Depends on D148184

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D148957
2023-06-05 17:07:59 +00:00
Kazu Hirata
96ddbd6dd8 [llvm] Fix typos in documentation 2023-05-12 23:47:46 -07:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
a1be6f0290 AMDGPU: Reserve 0x048, 0x049, 0x04a MACHs
Differential Revision: https://reviews.llvm.org/D149856
2023-05-05 11:05:07 -04:00
Krzysztof Drewniak
f0415f2a45 Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.

This reverts commit 3f2fbe92d0.

Differential Revision: https://reviews.llvm.org/D149776
2023-05-03 19:43:56 +00:00
Krzysztof Drewniak
3f2fbe92d0 Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede254.

Differential Revision: https://reviews.llvm.org/D149758
2023-05-03 16:11:00 +00:00
Krzysztof Drewniak
f9c1ede254 [AMDGPU] Define data layout entries for buffers
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.

The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.

The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.

Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.

This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.

Depends on D143437

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145441
2023-05-03 15:25:58 +00:00
Diana Picus
d9bf8aba23 [AMDGPU] Add MMOs for GFX11 Streamout Instructions
The GFX11 NGG Streamout Instructions perform atomic operations on
dedicated registers. At the moment, they lack machine memory operands,
which causes the si-memory-legalizer pass to treat them conservatively
and introduce several unnecessary waits and cache invalidations.

This patch introduces a new address space to represent these special
registers and teaches instruction selection to add memory operands with
this new address space to DS_ADD/SUB_GS_REG_RTN.

Since this address space is meant to be compiler-internal, we move it
up a bit from the other address spaces and give it the number 128.
According to the LLVM Language Reference, address space numbers can go
all the way up to 2^24, but I'm not sure how well this is supported in
practice [1], so using a smaller number seems safer.

[1] 0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)

Differential Revision: https://reviews.llvm.org/D146031
2023-04-11 11:11:32 +02:00
Aaron Ballman
f8f329a224 Fix the LLVM Sphinx build
This addresses the issue found by:
https://lab.llvm.org/buildbot/#/builders/30/builds/32299
2023-02-24 07:57:25 -05:00
Matt Arsenault
c0db240e51 AMDGPU: Document denormal behavior
Not sure this is the right place for it next to the table.
2023-02-24 07:41:29 -04:00
Tony Tye
3138fda3d9 [AMDGPU][NFC] Heterogeneous DWARF extensions update
- Clarify CFI rules in heterogeneous DWARF extensions

- Added DWARF source language memory spaces.

- Added DWARF architecture address spaces.

- Other minor corrections.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D141548
2023-01-13 02:06:20 +00:00
Matt Arsenault
4d4894ab92 Partially reapply "AMDGPU: Invert handling of enqueued block detection"
This mostly reverts commit 270e96f435.

Keep the attributor related changes around, but functionally restore
the old behavior as a workaround. Device enqueue goes back to not
working at -O0 with this version.
2023-01-12 15:02:16 -05:00
Matt Arsenault
270e96f435 Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977.

The runtime is having trouble with this at -O0 when the inputs are
always enabled.
2023-01-07 21:48:07 -05:00
Matt Arsenault
47288cc977 AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.

There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
2023-01-06 21:16:08 -05:00
Vang Thao
25d72330ff [AMDGPU] Add .uniform_work_group_size metadata to v5
Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires
uniform work group size (i.e. each dimension of global size is a multiple of
corresponding dimension of work group size). hipExtModuleLaunchKernel allows to
launch HIP kernel with non-uniform workgroup size, which makes it necessary for
runtime to check and enforce uniform workgroup size if kernel requires it. To
let runtime be able to enforce that, this metadata is needed to indicate that
the kernel requires uniform workgroup size.

Reviewed By: kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D141012
2023-01-05 21:29:56 +00:00
Dmitry Preobrazhensky
b8e1071a29 [AMDGPU][GFX11][DOC][NFC] Add GFX11 assembler syntax description 2022-12-21 12:49:48 +03:00
Pierre van Houtryve
9fa46200ea [AMDGPU] Add .workgroup_processor_mode to v5 MD
Adds Workgroup Processor Mode (WGP) to the HSA Metadata for Code Object v5/GFX10+.
The field is already present as an asm directive and in the compute program resource register but is also needed in the MD.

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D139931
2022-12-13 10:44:52 -05:00
Abinav Puthan Purayil
bb4abb1a00 [docs] Fix warning in AMDGPUUsage.rst after 3d9f011a9c 2022-10-11 23:57:04 +05:30
Abinav Puthan Purayil
3d9f011a9c [AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later
Unfortunately, we have a broken handling of this in the runtime of rocm
5.3. The runtime is expected to handle this correctly when v5 becomes
the default.

Differential Revision: https://reviews.llvm.org/D134714
2022-10-11 23:28:43 +05:30
Abinav Puthan Purayil
3759398b4b [AMDGPU] Report minimum scratch size in code object v5 and later by default
This change sets
-amdgpu-assume-{external-call-stack-size | dynamic-stack-object-size}
options to zero by default for code object v5 and later. The runtime is
expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit
in the kernel descriptor is set.

Differential Revision: https://reviews.llvm.org/D128346
2022-09-29 09:52:45 +05:30
Abinav Puthan Purayil
d96361d714 [AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map
This change introduces the dynamic stack boolean field to code-object-v3
and above under the code properties of the kernel descriptor and under
the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to
the is_dynamic_callstack field of amd_kernel_code_t.

Differential Revision: https://reviews.llvm.org/D128344
2022-07-18 10:07:13 +05:30
Jay Foad
044b8f4bc8 [AMDGPU] Restore documentation of .amdhsa_shared_vgpr_count
This was accidentally lost in D127402.
2022-06-10 17:06:08 +01:00
Jay Foad
b0a3849439 [AMDGPU] Update dlc usage for GFX11
In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed
to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass.

Update the documentation and SIMemoryLegalizer accordingly. Set dlc for
nontemporal and volatile accesses.

Differential Revision: https://reviews.llvm.org/D127405
2022-06-10 08:10:34 +01:00
Tony
802e3f4f57 [AMDGPU] Add GFX11 documentation to AMDGPUUsage
Update most of the document to include GFX11. Memory model changes will
come later.

Differential Revision: https://reviews.llvm.org/D127402
2022-06-10 08:10:34 +01:00
Dmitry Preobrazhensky
62c46093f1 [AMDGPU][DOC][NFC] Add GFX90C and GFX940 assembler syntax description 2022-05-31 14:29:06 +03:00
Brian Tracy
87a55137e2 Fix "the the" typo in documentation and user facing strings
There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users.

Reviewed By: #libc, Mordante, martong, ldionne

Differential Revision: https://reviews.llvm.org/D124708
2022-05-05 17:52:08 +02:00
Joe Nash
ec6d1a0278 Fix sphinx build error in AMDGPUUsage.rst
Corrects error from
813e521e55
2022-04-29 13:32:06 -04:00
Joe Nash
813e521e55 [AMDGPU] Add gfx11 subtarget ELF definition
This is the first patch of a series to upstream support for the new
subtarget.

Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 1/N for upstreaming AMDGPU gfx11 architectures.

Reviewed By: foad, kzhuravl, #amdgpu

Differential Revision: https://reviews.llvm.org/D124536
2022-04-29 12:27:17 -04:00
Changpeng Fang
8edaf25986 AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary:
  Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default.
We use implicitarg_ptr + offset to check whether the multigrid synchronization
pointer is used. If yes, we remove this attribute and also remove
amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg
only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.

Reviewers: arsenm, sameerds, b-sumner and foad

Differential Revision: https://reviews.llvm.org/D123548
2022-04-12 12:36:30 -07:00
Scott Linder
09f33a430b [AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic
The diagnostic is unreliable, and triggers even for dead uses of
hostcall that may exist when linking the device-libs at lower
optimization levels.

Eliminate the diagnostic, and directly document the limitation for
OpenCL before code object V5.

Make some NFC changes to clarify the related code in the
MetadataStreamer.

Add a clang test to tie OCL sources containing printf to the backend IR
tests for this situation.

Reviewed By: sameerds, arsenm, yaxunl

Differential Revision: https://reviews.llvm.org/D121951
2022-04-05 19:10:23 +00:00