clang-p2996

Author	SHA1	Message	Date
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Jon Chesterfield	6043d4dfec	[amdgpu] Accept an optional max to amdgpu-lds-size attribute for use in PromoteAlloca	2023-07-15 21:37:21 +01:00
Jon Chesterfield	74e928a081	[amdgpu][lds] Remove recalculation of LDS frame from backend Do the LDS frame calculation once, in the IR pass, instead of repeating the work in the backend. Prior to this patch: The IR lowering pass sets up a per-kernel LDS frame and annotates the variables with absolute_symbol metadata so that the assembler can build lookup tables out of it. There is a fragile association between kernel functions and named structs which is used to recompute the frame layout in the backend, with fatal_errors catching inconsistencies in the second calculation. After this patch: The IR lowering pass additionally sets a frame size attribute on kernels. The backend uses the same absolute_symbol metadata that the assembler uses to place objects within that frame size. Deleted the now dead allocation code from the backend. Left for a later cleanup: - enabling lowering for anonymous functions - removing the elide-module-lds attribute (test churn, it's not used by llc any more) - adjusting the dynamic alignment check to not use symbol names Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D155190	2023-07-13 23:54:38 +01:00
Matt Arsenault	30fd35f59c	AMDGPU: Add some notes about amdgpu-flat-work-group-size	2023-07-07 19:02:46 -04:00
Matt Arsenault	5491666248	AMDGPU: Correctly lower llvm.exp.f32 The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know all of these things. The library currently misses the no-infinity check on the overflow, which this handles optimizing out. Some of the <3 x half> fast tests regress due to vector widening dropping flags which will be fixed separately. Apparently there is no exp10 intrinsic, but there should be. Adds some deadish code in preparation for adding one while I'm following along with the current library expansion.	2023-07-05 17:23:49 -04:00
Matt Arsenault	ed556a1ad5	AMDGPU: Correctly lower llvm.exp2.f32 Previously this did a fast math expansion only.	2023-07-05 17:23:48 -04:00
Matt Arsenault	4e15f378ee	AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32 Previously we expanded these in a fast-math way and the device libraries were relying on this behavior. The libraries have a pending change to switch to the new target intrinsic. Unlike the library version, this takes advantage of no-infinities on the result overflow check.	2023-07-05 15:30:35 -04:00
Matt Arsenault	003b58f65b	IR: Add llvm.frexp intrinsic Add an intrinsic which returns the two pieces as multiple return values. Alternatively could introduce a pair of intrinsics to separately return the fractional and exponent parts. AMDGPU has native instructions to return the two halves, but could use some generic legalization and optimization handling. For example, we should be able to handle legalization of f16 on older targets, and for bf16. Additionally antique targets need a hardware workaround which would be better handled in the backend rather than in library code where it is now.	2023-06-28 14:50:16 -04:00
Matt Arsenault	89ccfa1b39	AMDGPU: Use correct lowering for llvm.log2.f32 We previously directly codegened to v_log_f32, which is broken for denormals. The lowering isn't complicated, you simply need to scale denormal inputs and adjust the result. Note log and log10 are still not accurate enough, and will be fixed separately.	2023-06-23 08:37:37 -04:00
Joe Nash	83d47ba15a	[AMDGPU] Add _e64_dpp asm suffix to docs The _e64_dpp suffix can be added to an instruction to force the AsmParser to encode it as VOP3 with DPP if possible on GFX11+. This has been the behavior since GFX11 was introduced; this patch only updates the documentation. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153564	2023-06-22 14:31:09 -04:00
Diana Picus	041bfe40a7	[AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFC Co-authored-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Differential Revision: https://reviews.llvm.org/D151997	2023-06-20 10:45:58 +02:00
Diana Picus	5c5eff44ce	[AMDGPU] Start documenting calling conventions. NFC Add a section to AMDGPUUsage.rst about calling conventions and list the ones from the CallingConv enum. Full descriptions can come later (help appreciated). Differential Revision: https://reviews.llvm.org/D151996	2023-06-20 10:45:58 +02:00
Diana Picus	99610fa77d	Revert "[AMDGPU] Start documenting calling conventions. NFC" This reverts commit `aa7b127cb7`. ...because I really ought to install sphinx.	2023-06-20 09:55:52 +02:00
Diana Picus	d116f20e37	Fixup D151996	2023-06-20 09:37:16 +02:00
Diana Picus	aa7b127cb7	[AMDGPU] Start documenting calling conventions. NFC Add a section to AMDGPUUsage.rst about calling conventions and list the ones from the CallingConv enum. Full descriptions can come later (help appreciated). Differential Revision: https://reviews.llvm.org/D151996	2023-06-20 09:29:19 +02:00
Aaron Ballman	582cf31862	Fix the LLVM Sphinx build This addresses the issue found in: https://lab.llvm.org/buildbot/#/builders/30/builds/36346	2023-06-15 07:56:42 -04:00
Matt Arsenault	28f3edd2be	AMDGPU: Add llvm.amdgcn.exp2 intrinsic Provide direct access to v_exp_f32 and v_exp_f16, so we can start correctly lowering the generic exp intrinsics. Unfortunately have to break from the usual naming convention of matching the instruction name and stripping the v_ prefix. exp is already taken by the export intrinsic. On the clang builtin side, we have a choice of maintaining the convention to the instruction name, or following the intrinsic name.	2023-06-15 07:00:07 -04:00
Matt Arsenault	ccf216fe2c	AMDGPU: Add llvm.amdgcn.log to intrinsic documentation	2023-06-15 07:00:07 -04:00
Aaron Ballman	540b934ff9	Fix LLVM sphinx build Addresses the issue found by: https://lab.llvm.org/buildbot/#/builders/30/builds/35886	2023-06-05 14:14:23 -04:00
Krzysztof Drewniak	23098bd454	[AMDGPU] Add intrinsic for converting global pointers to resources Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit pointer, the 16-bit stride/swizzling constant that replace the high 16 bits of an address in a buffer resource, the 32-bit extent/number of elements, and the 32-bit flags (the latter two being the 3rd and 4th wards of the resource), and combines them into a ptr addrspace(8). This intrinsic is lowered during the early phases of the backend. This intrinsic is needed so that alias analysis can correctly infer that a certain buffer resource points to the same memory as some global pointer. Previous methods of constructing buffer resources, which relied on ptrtoint, would not allow for such an inference. Depends on D148184 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148957	2023-06-05 17:07:59 +00:00
Kazu Hirata	96ddbd6dd8	[llvm] Fix typos in documentation	2023-05-12 23:47:46 -07:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	1fc70210a6	AMDGPU: Add basic gfx941 target Differential Revision: https://reviews.llvm.org/D149982	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	a1be6f0290	AMDGPU: Reserve 0x048, 0x049, 0x04a MACHs Differential Revision: https://reviews.llvm.org/D149856	2023-05-05 11:05:07 -04:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit `3f2fbe92d0`. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit `f9c1ede254`. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
Diana Picus	d9bf8aba23	[AMDGPU] Add MMOs for GFX11 Streamout Instructions The GFX11 NGG Streamout Instructions perform atomic operations on dedicated registers. At the moment, they lack machine memory operands, which causes the si-memory-legalizer pass to treat them conservatively and introduce several unnecessary waits and cache invalidations. This patch introduces a new address space to represent these special registers and teaches instruction selection to add memory operands with this new address space to DS_ADD/SUB_GS_REG_RTN. Since this address space is meant to be compiler-internal, we move it up a bit from the other address spaces and give it the number 128. According to the LLVM Language Reference, address space numbers can go all the way up to 2^24, but I'm not sure how well this is supported in practice [1], so using a smaller number seems safer. [1] `0107513fe7/llvm/utils/TableGen/IntrinsicEmitter.cpp (L401)` Differential Revision: https://reviews.llvm.org/D146031	2023-04-11 11:11:32 +02:00
Aaron Ballman	f8f329a224	Fix the LLVM Sphinx build This addresses the issue found by: https://lab.llvm.org/buildbot/#/builders/30/builds/32299	2023-02-24 07:57:25 -05:00
Matt Arsenault	c0db240e51	AMDGPU: Document denormal behavior Not sure this is the right place for it next to the table.	2023-02-24 07:41:29 -04:00
Tony Tye	3138fda3d9	[AMDGPU][NFC] Heterogeneous DWARF extensions update - Clarify CFI rules in heterogeneous DWARF extensions - Added DWARF source language memory spaces. - Added DWARF architecture address spaces. - Other minor corrections. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D141548	2023-01-13 02:06:20 +00:00
Matt Arsenault	4d4894ab92	Partially reapply "AMDGPU: Invert handling of enqueued block detection" This mostly reverts commit `270e96f435`. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.	2023-01-12 15:02:16 -05:00
Matt Arsenault	270e96f435	Revert "AMDGPU: Invert handling of enqueued block detection" This reverts commit `47288cc977`. The runtime is having trouble with this at -O0 when the inputs are always enabled.	2023-01-07 21:48:07 -05:00
Matt Arsenault	47288cc977	AMDGPU: Invert handling of enqueued block detection Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).	2023-01-06 21:16:08 -05:00
Vang Thao	25d72330ff	[AMDGPU] Add .uniform_work_group_size metadata to v5 Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires uniform work group size (i.e. each dimension of global size is a multiple of corresponding dimension of work group size). hipExtModuleLaunchKernel allows to launch HIP kernel with non-uniform workgroup size, which makes it necessary for runtime to check and enforce uniform workgroup size if kernel requires it. To let runtime be able to enforce that, this metadata is needed to indicate that the kernel requires uniform workgroup size. Reviewed By: kzhuravl, arsenm Differential Revision: https://reviews.llvm.org/D141012	2023-01-05 21:29:56 +00:00
Dmitry Preobrazhensky	b8e1071a29	[AMDGPU][GFX11][DOC][NFC] Add GFX11 assembler syntax description	2022-12-21 12:49:48 +03:00
Pierre van Houtryve	9fa46200ea	[AMDGPU] Add `.workgroup_processor_mode` to v5 MD Adds Workgroup Processor Mode (WGP) to the HSA Metadata for Code Object v5/GFX10+. The field is already present as an asm directive and in the compute program resource register but is also needed in the MD. Reviewed By: kzhuravl Differential Revision: https://reviews.llvm.org/D139931	2022-12-13 10:44:52 -05:00
Abinav Puthan Purayil	bb4abb1a00	[docs] Fix warning in AMDGPUUsage.rst after `3d9f011a9c`	2022-10-11 23:57:04 +05:30
Abinav Puthan Purayil	3d9f011a9c	[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later Unfortunately, we have a broken handling of this in the runtime of rocm 5.3. The runtime is expected to handle this correctly when v5 becomes the default. Differential Revision: https://reviews.llvm.org/D134714	2022-10-11 23:28:43 +05:30
Abinav Puthan Purayil	3759398b4b	[AMDGPU] Report minimum scratch size in code object v5 and later by default This change sets -amdgpu-assume-{external-call-stack-size \| dynamic-stack-object-size} options to zero by default for code object v5 and later. The runtime is expected to adjust the scratch size if the amdhsa_uses_dynamic_stack bit in the kernel descriptor is set. Differential Revision: https://reviews.llvm.org/D128346	2022-09-29 09:52:45 +05:30
Abinav Puthan Purayil	d96361d714	[AMDGPU] Add the uses_dynamic_stack field to the kernel descriptor and the kernel metadata map This change introduces the dynamic stack boolean field to code-object-v3 and above under the code properties of the kernel descriptor and under the kernel metadata map of NT_AMDGPU_METADATA. This field corresponds to the is_dynamic_callstack field of amd_kernel_code_t. Differential Revision: https://reviews.llvm.org/D128344	2022-07-18 10:07:13 +05:30
Jay Foad	044b8f4bc8	[AMDGPU] Restore documentation of .amdhsa_shared_vgpr_count This was accidentally lost in D127402.	2022-06-10 17:06:08 +01:00
Jay Foad	b0a3849439	[AMDGPU] Update dlc usage for GFX11 In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass. Update the documentation and SIMemoryLegalizer accordingly. Set dlc for nontemporal and volatile accesses. Differential Revision: https://reviews.llvm.org/D127405	2022-06-10 08:10:34 +01:00
Tony	802e3f4f57	[AMDGPU] Add GFX11 documentation to AMDGPUUsage Update most of the document to include GFX11. Memory model changes will come later. Differential Revision: https://reviews.llvm.org/D127402	2022-06-10 08:10:34 +01:00
Dmitry Preobrazhensky	62c46093f1	[AMDGPU][DOC][NFC] Add GFX90C and GFX940 assembler syntax description	2022-05-31 14:29:06 +03:00
Brian Tracy	87a55137e2	Fix "the the" typo in documentation and user facing strings There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users. Reviewed By: #libc, Mordante, martong, ldionne Differential Revision: https://reviews.llvm.org/D124708	2022-05-05 17:52:08 +02:00
Joe Nash	ec6d1a0278	Fix sphinx build error in AMDGPUUsage.rst Corrects error from `813e521e55`	2022-04-29 13:32:06 -04:00
Joe Nash	813e521e55	[AMDGPU] Add gfx11 subtarget ELF definition This is the first patch of a series to upstream support for the new subtarget. Contributors: Jay Foad <jay.foad@amd.com> Konstantin Zhuravlyov <kzhuravl_dev@outlook.com> Patch 1/N for upstreaming AMDGPU gfx11 architectures. Reviewed By: foad, kzhuravl, #amdgpu Differential Revision: https://reviews.llvm.org/D124536	2022-04-29 12:27:17 -04:00
Changpeng Fang	8edaf25986	AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally Summary: Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default. We use implicitarg_ptr + offset to check whether the multigrid synchronization pointer is used. If yes, we remove this attribute and also remove amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function. Reviewers: arsenm, sameerds, b-sumner and foad Differential Revision: https://reviews.llvm.org/D123548	2022-04-12 12:36:30 -07:00
Scott Linder	09f33a430b	[AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic The diagnostic is unreliable, and triggers even for dead uses of hostcall that may exist when linking the device-libs at lower optimization levels. Eliminate the diagnostic, and directly document the limitation for OpenCL before code object V5. Make some NFC changes to clarify the related code in the MetadataStreamer. Add a clang test to tie OCL sources containing printf to the backend IR tests for this situation. Reviewed By: sameerds, arsenm, yaxunl Differential Revision: https://reviews.llvm.org/D121951	2022-04-05 19:10:23 +00:00

1 2 3 4 5 ...

273 Commits