clang-p2996

Author	SHA1	Message	Date
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
Changpeng Fang	3054d0dae7	AMDGPU: Rename and add bf16 support for global_load_tr builtins (#86202 ) Make the name of a clang builtin as close to the mnemonic instruction name as possible. The data type suffix may not be enough to tell what instruction the builtin is going to produce. This patch also add the bf16 support for global_load_tr_b128 builtins.	2024-03-22 08:51:53 -07:00
Antonio Frighetto	b433076fcb	[clang][CodeGen] Allow `memcpy` replace with trivial auto var init When emitting the storage (or memory copy operations) for constant initializers, the decision whether to split a constant structure or array store into a sequence of field stores or to use `memcpy` is based upon the optimization level and the size of the initializer. In `afe8b93ffd`, we extended this by allowing constants to be split when the array (or struct) type does not match the type of data the address to the object (constant) is expected to contain. This may happen when `emitStoresForConstant` is called by `EmitAutoVarInit`, as the element type of the address gets shrunk. When this occurs, let the initializer be split into a bunch of stores only under `-ftrivial-auto-var-init=pattern`. Fixes: https://github.com/llvm/llvm-project/issues/84178.	2024-03-21 09:55:04 +01:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Changpeng Fang	96813de52d	AMDGPU: Define a feature for v_dot4_f32_* instructions (#84248 ) FeatureDot11Insts (dot11-insts) for: v_dot4_f32_fp8_fp8, v_dot4_f32_fp8_bf8, v_dot4_f32_bf8_fp8, v_dot4_f32_bf8_bf8	2024-03-06 14:37:03 -08:00
Emma Pilkington	4490003a22	[AMDGPU] Rename COV module flag to amdhsa_code_object_version (#79905 ) The previous name 'amdgpu_code_object_version', was misleading since this is really a property of the HSA OS. The new spelling also matches the asm directive I added in `bc82cfb`.	2024-03-06 09:51:48 -05:00
Joseph Huber	1fc5e50ceb	[AMDGPU] Implement 'llvm.get.fpenv' and 'llvm.set.fpenv' (#83906 ) Summary: This patch implements the LLVM floating point environment control intrinsics and also exposes it through clang. We encode the floating point environment as a 64-bit value that simply concatenates the values of the mode registers and the current trap status. We only fetch the bits relevant for floating point instructions. That is, rounding mode, denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16 overflow, and active exceptions.	2024-03-06 08:11:54 -06:00
Shilei Tian	2ad43fa467	[AMDGPU] Fix operand types for `V_DOT2_F32_BF16` (#82044 )	2024-02-20 08:25:01 -05:00
Shilei Tian	46734aa1e5	[AMDGPU] Use `bf16` instead of `i16` for bfloat (#80908 ) Currently we generally use `i16` to represent `bf16` in those tablegen files. This patch is trying to use `bf16` directly. Fix #79369.	2024-02-16 15:58:30 -05:00
Logikable	5fdd094837	[clang][CodeGen] Emit atomic IR in place of optimized libcalls. (#73176 ) In the beginning, Clang only emitted atomic IR for operations it knew the underlying microarch had instructions for, meaning it required significant knowledge of the target. Later, the backend acquired the ability to lower IR to libcalls. To avoid duplicating logic and improve logic locality, we'd like to move as much as possible to the backend. There are many ways to describe this change. For example, this change reduces the variables Clang uses to decide whether to emit libcalls or IR, down to only the atomic's size.	2024-02-12 09:33:09 -08:00
Joseph Huber	3c707310a3	[NVPTX] Add clang builtin for `__nvvm_reflect` intrinsic (#81277 ) Summary: Some recent support made usage of `__nvvm_reflect` more consistent. We should expose it as a builtin rather than forcing users to externally define the function.	2024-02-09 14:11:01 -06:00
Joseph Huber	dbed89814e	[AMDGPU] Add missing `__builtin_amdgcn_wavefrontsize` builtin (#80741 ) Summary: The backend supports the wavefrontsize intrinsic, and suggests that it is tied to a corresponding clang builtin, but it is not actually present. This simply adds it in so it can be used from clang. This attribute likely isn't the best to rely on, but for the `libc` use-case we will need to detect a struct's differing size in a way that will depend on the wavefront size.	2024-02-05 17:58:19 -06:00
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit `9791e54149`.	2024-02-02 13:09:25 +01:00
Jay Foad	b5c0b67bc2	[AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (#79980 )	2024-02-01 10:49:42 +00:00
Mariusz Sikora	f96e85b949	[AMDGPU][GFX12] Add tests for unsupported builtins (#78729 ) __builtin_amdgcn_mfma* and __builtin_amdgcn_smfmac*	2024-01-31 14:04:04 +01:00
Joseph Huber	f2a78e68ee	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#80035 ) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.	2024-01-30 13:05:29 -06:00
Joseph Huber	72d4fc1b4d	Revert "[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660 )" This reverts commit `c9a6e993f7`. This breaks HIP code that incorrectly depended on GPU-specific macros to be set. The code is totally wrong as using `__WAVEFRTONSIZE__` on the host is absolutely meaningless, but it seems this entire corner of the toolchain is fundmentally broken. Reverting for now to avoid breakages.	2024-01-29 11:11:25 -06:00
Joseph Huber	c9a6e993f7	[AMDGPU] Do not emit arch dependent macros with unspecified cpu (#79660 ) Summary: Currently, the AMDGPU toolchain accepts not passing `-mcpu` as a means to create a sort of "generic" IR. The resulting IR will not contain any target dependent attributes and can then be inserted into another program via `-mlink-builtin-bitcode` to inherit its attributes. However, there are a handful of macros that can leak incorrect information when compiling for an unspecified architecture. Currently, things like the wavefront size will default to 64, which is actually variable. We should not expose these macros unless it is known.	2024-01-29 08:46:14 -06:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Mariusz Sikora	cfddb59be2	[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414 ) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>	2024-01-24 12:21:15 +01:00
Saiyedul Islam	082f87c9d4	[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Corresponding llvm-objdump AMDGPU lit tests are updated in a follow-up PR.	2024-01-23 17:08:18 +05:30
Jay Foad	e21b0b083e	[AMDGPU] Remove gws feature from GFX12 (#78711 ) This was already done for LLVM. This patch just updates the Clang builtin handling to match.	2024-01-19 15:45:53 +00:00
Jay Foad	ed12388082	[AMDGPU] Do not emit `V_DOT2C_F32_F16_e32` on GFX12 (#78709 ) That instruction is not supported on GFX12. Added a testcase which previously crashed without this change. Co-authored-by: pvanhout <pierre.vanhoutryve@amd.com>	2024-01-19 14:36:27 +00:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	28b7e498b6	AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892 ) Endoding is VOP3P. Tagged as deep/machine learning instructions. i32 type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32 fneg(neg_lo[2]) and f32 fabs(neg_hi[2]). --------- Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>	2024-01-18 14:00:27 +01:00
Jay Foad	4c65787f1e	[AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (#77926 )	2024-01-18 10:14:01 +00:00
Mariusz Sikora	264fd9e13e	[AMDGPU][NFC] Rename feature FP8Insts to FP8ConversionInsts (#78439 )	2024-01-18 08:46:53 +01:00
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Nikita Popov	158d72d728	[Clang] Set writable and dead_on_unwind attributes on sret arguments (#77116 ) Set the writable and dead_on_unwind attributes for sret arguments. These indicate that the argument points to writable memory (and it's legal to introduce spurious writes to it on entry to the function) and that the argument memory will not be used if the call unwinds. This enables additional MemCpyOpt/DSE/LICM optimizations.	2024-01-11 09:46:54 +01:00
Yingwei Zheng	1228becf7d	[FuncAttrs] Deduce `noundef` attributes for return values (#76553 ) This patch deduces `noundef` attributes for return values. IIUC, a function returns `noundef` values iff all of its return values are guaranteed not to be `undef` or `poison`. Definition of `noundef` from LangRef: ``` noundef This attribute applies to parameters and return values. If the value representation contains any undefined or poison bits, the behavior is undefined. Note that this does not refer to padding introduced by the type’s storage representation. ``` Alive2: https://alive2.llvm.org/ce/z/g8Eis6 Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=30dcc33c4ea3ab50397a7adbe85fe977d4a400bd&to=c5e8738d4bfbf1e97e3f455fded90b791f223d74&stat=instructions:u \|stage1-O3\|stage1-ReleaseThinLTO\|stage1-ReleaseLTO-g\|stage1-O0-g\|stage2-O3\|stage2-O0-g\|stage2-clang\| \|--\|--\|--\|--\|--\|--\|--\| \|+0.01%\|+0.01%\|-0.01%\|+0.01%\|+0.03%\|-0.04%\|+0.01%\| The motivation of this patch is to reduce the number of `freeze` insts and enable more optimizations.	2023-12-31 20:44:48 +08:00
Nikita Popov	a3d2d34e84	[Clang] Use poison as base for vector literals When constructing vectors from elements, use poison instead of undef as the base value. These literals always initialize all elements (padding the remainder with zero), so that the choice of base value does not affect semantics.	2023-12-19 11:53:18 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Romaric Jodin	d56e0d07cc	clang/OpenCL: set sqrt fp accuracy on call to Z4sqrt (#66651 ) This is reverting the previous implementation to avoid adding inline function in opencl headers. This was breaking clspv flow google/clspv#1231, while https://reviews.llvm.org/D156743 mentioned that just decorating the call node with `!pfmath` was enough. This PR is implementing this idea. The test has been updated with this implementation.	2023-12-01 16:34:44 +09:00
serge-sans-paille	afe8b93ffd	[clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677 ) Recommit of `0d2860b795` with extra test cases fixed.	2023-11-25 00:11:20 +01:00
Florian Hahn	419a4e41fc	Revert "[clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677 )" This reverts commit `fe5c360a9a`. The commit causes the tests below to fail on many buildbots, e.g. https://lab.llvm.org/buildbot/#/builders/245/builds/17047 Clang :: CodeGen/aapcs-align.cpp Clang :: CodeGen/aapcs64-align.cpp	2023-11-23 20:18:55 +00:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
serge-sans-paille	fe5c360a9a	[clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677 ) Recommit of `0d2860b795` with extra test cases fixed.	2023-11-23 17:37:03 +01:00
Jessica Del	b025864af8	[AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669 ) Add clang builtins for the new tied wmma intrinsics. These variations tie the destination accumulator matrix to the input accumulator matrix. See https://github.com/llvm/llvm-project/pull/69903 for context.	2023-11-13 13:23:26 +01:00
Rana Pratap Reddy	13ea1146a7	[AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (#69567 ) Currently __builtin_amdgcn_read_exec_hi lowers to llvm.read_register, this patch lowers it to use amdgcn_ballot.	2023-10-26 10:26:11 +05:30
Nikita Popov	3b25407d97	[IR] Mark zext/sext constant expressions as undesirable Introduce isDesirableCastOp() which determines whether IR builder and constant folding should produce constant expressions for a given cast type. This mirrors what we do for binary operators. Mark zext/sext as undesirable, which prevents most creations of such constant expressions. This is still somewhat incomplete and there are a few more places that can create zext/sext expressions. This is part of the work for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. The reason for the odd result in the constantexpr-fneg.c test is that initially the "a[]" global is created with an [0 x i32] type, at which point the icmp expression cannot be folded. Later it is replaced with an [1 x i32] global and the icmp gets folded away. But at that point we no longer fold the zext.	2023-10-02 12:40:20 +02:00
Matt Arsenault	ddc3346a6b	clang/AMDGPU: Fix accidental behavior change for __builtin_amdgcn_ldexph (#66340 )	2023-09-14 18:15:44 +03:00
Matt Arsenault	15e0fe0b61	clang/OpenCL: Add inline implementations of sqrt in builtin header We want the !fpmath metadata to be attached to the sqrt intrinsic to make it to the backend lowering. Emit an available_externally definition which uses the builtin, which emits the !fpmath. Fixes #64264 https://reviews.llvm.org/D156743	2023-09-12 23:23:00 +03:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit `0a8d17e79b`.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Matt Arsenault	6a08cf12d9	clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic https://reviews.llvm.org/D157911	2023-09-09 23:14:12 +03:00
Saiyedul Islam	f616c3eeb4	[OpenMP][DeviceRTL][AMDGPU] Support code object version 5 Update DeviceRTL and the AMDGPU plugin to support code object version 5. Default is code object version 4. CodeGen for __builtin_amdgpu_workgroup_size generates code for cov4 as well as cov5 if -mcode-object-version=none is specified. DeviceRTL compilation passes this argument via Xclang option to generate abi-agnostic code. Generated code for the above builtin uses a clang control constant "llvm.amdgcn.abi.version" to branch on the abi version, which is available during linking of user's OpenMP code. Load of this constant gets eliminated during linking. AMDGPU plugin queries the ELF for code object version and then prepares various implicitargs accordingly. Differential Revision: https://reviews.llvm.org/D139730 Reviewed By: jhuber6, yaxunl	2023-08-29 06:35:44 -05:00

1 2 3 4 5 ...

725 Commits