clang-p2996

Author	SHA1	Message	Date
Nikita Popov	a3d2d34e84	[Clang] Use poison as base for vector literals When constructing vectors from elements, use poison instead of undef as the base value. These literals always initialize all elements (padding the remainder with zero), so that the choice of base value does not affect semantics.	2023-12-19 11:53:18 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Mariusz Sikora	7f55d7de1a	[AMDGPU] GFX12: Add Split Workgroup Barrier (#74836 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2023-12-13 15:01:13 +01:00
Romaric Jodin	d56e0d07cc	clang/OpenCL: set sqrt fp accuracy on call to Z4sqrt (#66651 ) This is reverting the previous implementation to avoid adding inline function in opencl headers. This was breaking clspv flow google/clspv#1231, while https://reviews.llvm.org/D156743 mentioned that just decorating the call node with `!pfmath` was enough. This PR is implementing this idea. The test has been updated with this implementation.	2023-12-01 16:34:44 +09:00
serge-sans-paille	afe8b93ffd	[clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677 ) Recommit of `0d2860b795` with extra test cases fixed.	2023-11-25 00:11:20 +01:00
Florian Hahn	419a4e41fc	Revert "[clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677 )" This reverts commit `fe5c360a9a`. The commit causes the tests below to fail on many buildbots, e.g. https://lab.llvm.org/buildbot/#/builders/245/builds/17047 Clang :: CodeGen/aapcs-align.cpp Clang :: CodeGen/aapcs64-align.cpp	2023-11-23 20:18:55 +00:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
serge-sans-paille	fe5c360a9a	[clang] Avoid memcopy for small structure with padding under -ftrivial-auto-var-init (#71677 ) Recommit of `0d2860b795` with extra test cases fixed.	2023-11-23 17:37:03 +01:00
Jessica Del	b025864af8	[AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669 ) Add clang builtins for the new tied wmma intrinsics. These variations tie the destination accumulator matrix to the input accumulator matrix. See https://github.com/llvm/llvm-project/pull/69903 for context.	2023-11-13 13:23:26 +01:00
Rana Pratap Reddy	13ea1146a7	[AMDGPU] Lower __builtin_amdgcn_read_exec_hi to use amdgcn_ballot (#69567 ) Currently __builtin_amdgcn_read_exec_hi lowers to llvm.read_register, this patch lowers it to use amdgcn_ballot.	2023-10-26 10:26:11 +05:30
Nikita Popov	3b25407d97	[IR] Mark zext/sext constant expressions as undesirable Introduce isDesirableCastOp() which determines whether IR builder and constant folding should produce constant expressions for a given cast type. This mirrors what we do for binary operators. Mark zext/sext as undesirable, which prevents most creations of such constant expressions. This is still somewhat incomplete and there are a few more places that can create zext/sext expressions. This is part of the work for https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. The reason for the odd result in the constantexpr-fneg.c test is that initially the "a[]" global is created with an [0 x i32] type, at which point the icmp expression cannot be folded. Later it is replaced with an [1 x i32] global and the icmp gets folded away. But at that point we no longer fold the zext.	2023-10-02 12:40:20 +02:00
Matt Arsenault	ddc3346a6b	clang/AMDGPU: Fix accidental behavior change for __builtin_amdgcn_ldexph (#66340 )	2023-09-14 18:15:44 +03:00
Matt Arsenault	15e0fe0b61	clang/OpenCL: Add inline implementations of sqrt in builtin header We want the !fpmath metadata to be attached to the sqrt intrinsic to make it to the backend lowering. Emit an available_externally definition which uses the builtin, which emits the !fpmath. Fixes #64264 https://reviews.llvm.org/D156743	2023-09-12 23:23:00 +03:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit `0a8d17e79b`.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Matt Arsenault	6a08cf12d9	clang: Add __builtin_exp10* and use new llvm.exp10 intrinsic https://reviews.llvm.org/D157911	2023-09-09 23:14:12 +03:00
Saiyedul Islam	f616c3eeb4	[OpenMP][DeviceRTL][AMDGPU] Support code object version 5 Update DeviceRTL and the AMDGPU plugin to support code object version 5. Default is code object version 4. CodeGen for __builtin_amdgpu_workgroup_size generates code for cov4 as well as cov5 if -mcode-object-version=none is specified. DeviceRTL compilation passes this argument via Xclang option to generate abi-agnostic code. Generated code for the above builtin uses a clang control constant "llvm.amdgcn.abi.version" to branch on the abi version, which is available during linking of user's OpenMP code. Load of this constant gets eliminated during linking. AMDGPU plugin queries the ELF for code object version and then prepares various implicitargs accordingly. Differential Revision: https://reviews.llvm.org/D139730 Reviewed By: jhuber6, yaxunl	2023-08-29 06:35:44 -05:00
Yaxun (Sam) Liu	b8a9c50f22	[AMDGPU] Add target feature gws to clang Reviewed by: Matt Arsenault Differential Revision: https://reviews.llvm.org/D158367	2023-08-25 11:50:47 -04:00
Matt Arsenault	61c8af6792	AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16 There's nothing special about f16 sqrt handling. https://reviews.llvm.org/D158090	2023-08-23 20:30:40 -04:00
Changpeng Fang	d77c62053c	[clang][AMDGPU]: Don't use byval for struct arguments in function ABI Summary: Byval requires allocating additional stack space, and always requires an implicit copy to be inserted in codegen, where it can be difficult to optimize. In this work, we use byref/IndirectAliased promotion method instead of byval with the implicit copy semantics. Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D155986	2023-08-11 16:37:42 -07:00
Matt Arsenault	9e3d9c9eae	clang: Add __builtin_elementwise_sqrt This will be used in the opencl builtin headers to provide direct intrinsic access with proper !fpmath metadata. https://reviews.llvm.org/D156737	2023-08-11 19:32:39 -04:00
Changpeng Fang	4608686111	[clang][test] Fix LIT test failures for the following commit commit `c1803d5366` (HEAD -> main, origin/main, origin/HEAD) Author: Changpeng Fang <changpeng.fang@amd.com> Date: Wed Aug 9 17:49:14 2023 -0700 [FunctionAttrs] Unconditionally perform argument attribute inference in the first function-attrs pass Differential Revision: https://reviews.llvm.org/D156397	2023-08-09 18:23:18 -07:00
Yaxun (Sam) Liu	ac72531043	[Driver] Add `-f[no-]offload-uniform-block` By default, clang assumes HIP kernels are launched with uniform block size, which is the case for kernels launched through triple chevron or hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute to HIP kernels to allow the backend to do optimizations on that. However, in some rare cases, HIP kernels can be launched through hipExtModuleLaunchKernel where global work size is specified, which may result in non-uniform block size. To be able to support non-uniform block size for HIP kernels, an option `-f[no-]offload-uniform-block is added. This option is generic for offloading languages. Its default value is on for CUDA/HIP and off otherwise. Make -cl-uniform-work-group-size an alias to -foffload-uniform-block. Reviewed by: Siu Chi Chan, Matt Arsenault, Fangrui Song, Johannes Doerfert Differential Revision: https://reviews.llvm.org/D155213 Fixes: SWDEV-406592	2023-07-27 16:36:02 -04:00
ranapratap55	970569b6cc	[AMDGPU] __builtin_amdgcn_read_exec_* should be implemented with llvm.amdgcn.ballot Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156219	2023-07-26 16:21:31 +05:30
Nick Desaulniers	b54294e2c9	[clang][ConstantEmitter] have tryEmitPrivate[ForVarInit] try ConstExprEmitter fast-path first As suggested by @efriedma in: https://reviews.llvm.org/D76096#4370369 This should speed up evaluating whether an expression is constant or not, but due to the complexity of these two different implementations, we may start getting different answers for edge cases for which we do not yet have test cases in-tree (or perhaps even performance regressions for some cases). As such, contributors have carte blanche to revert if necessary. For additional historical context about ExprConstant vs CGExprConstant, here's snippets from a private conversation on discord: ndesaulniers: why do we have clang/lib/AST/ExprConstant.cpp and clang/lib/CodeGen/CGExprConstant.cpp? Does clang constant fold during ast walking/creation AND during LLVM codegen? efriedma: originally, clang needed to handle two things: integer constant expressions (the "5" in "int x[5];"), and constant global initializers (the "5" in "int x = 5;"). pre-C++11, the two could be handled mostly separately; so we had the code for integer constants in AST/, and the code for globals in CodeGen/. C++11 constexpr sort of destroyed that separation, though. so now we do both kinds of constant evaluation on the AST, then CGExprConstant translates the result of that evaluation to LLVM IR. but we kept around some bits of the old cgexprconstant to avoid performance/memory usage regressions on large arrays. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D151587	2023-07-24 13:50:45 -07:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Matt Arsenault	bac2a07540	clang: Attach !fpmath metadata to __builtin_sqrt based on language flags OpenCL and HIP have -cl-fp32-correctly-rounded-divide-sqrt and -fno-hip-correctly-rounded-divide-sqrt. The corresponding fpmath metadata was only set on fdiv, and not sqrt. The backend is currently underutilizing sqrt lowering options, and the responsibility is split between the libraries and backend and this metadata is needed. CUDA/NVCC has -prec-div and -prev-sqrt but clang doesn't appear to be aiming for compatibility with those. Don't know if OpenMP has a similar control.	2023-07-14 18:46:18 -04:00
Kevin P. Neal	91f886a40d	[FPEnv][TableGen] Add strictfp attribute to constrained intrinsics by default. In D146869 @arsenm pointed out that the constrained intrinsics aren't getting the strictfp attribute by default. They should be since they are required to have it anyway. TableGen did not know about this attribute until now. This patch adds strictfp to TableGen, and it uses it on all of the constrained intrinsics. Differential Revision: https://reviews.llvm.org/D154991	2023-07-12 09:55:53 -04:00
Matt Arsenault	42d4c85ca8	clang: Stop emitting "strictfp" The attribute is a proper enum attribute, strictfp. We were getting strictfp and "strictfp" set on every function with -fexperimental-strict-floating-point. https://reviews.llvm.org/D139629	2023-07-07 15:28:21 -04:00
Matt Arsenault	75b7901901	clang: Regenerate test checks	2023-07-07 15:28:21 -04:00
Matt Arsenault	b15bf305ca	Reapply "clang: Use new frexp intrinsic for builtins and add f16 version" This reverts commit `0c545a4412`. ARM libcall expansion was fixed in `160d7227e0`	2023-06-30 09:07:23 -04:00
Hans Wennborg	0c545a4412	Revert "clang: Use new frexp intrinsic for builtins and add f16 version" This caused asserts in some Android and Windows builds: SelectionDAGNodes.h:1138: llvm::SDValue::SDValue(SDNode *, unsigned int): Assertion `(!Node \|\| !ResNo \|\| ResNo < Node->getNumValues()) && "Invalid result number for the given node!"' failed. See comment on `85bdea023f` Also revert "HIP: Use frexp builtins in math headers" which seems to depend on this change. This reverts commit `85bdea023f`. This reverts commit `bf8e92c0e7`.	2023-06-30 13:26:25 +02:00
Sameer Sahasrabuddhe	7a101798b7	Revert "[AMDGPU] Mark mbcnt as convergent" This reverts commit `37114036aa`. The output of mbcnt does not depend on other active lanes, and hence it is not convergent. The original change was made as a possible fix for https://github.com/ROCm-Developer-Tools/HIP/issues/3172 But changing mbcnt does not fix that issue. Reviewed By: ruiling, foad, yaxunl Differential Revision: https://reviews.llvm.org/D153953	2023-06-30 13:10:44 +05:30
Matt Arsenault	85bdea023f	clang: Use new frexp intrinsic for builtins and add f16 version	2023-06-28 14:50:17 -04:00
Arthur Eubanks	457dc72fdd	Reland [InstCombine] Infer inbounds for more GEPs of dereferenceable pointers Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments. Fixed clang test broken in initial land. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153815	2023-06-27 09:31:20 -07:00
Matt Arsenault	b84721df63	clang/AMDGPU: Emit atomicrmw for atomic_inc/dec builtins This makes the scope and ordering arguments actually do something. Also add some new OpenCL tests since the existing HIP tests didn't cover address spaces.	2023-06-16 20:18:50 -04:00
Matt Arsenault	28f3edd2be	AMDGPU: Add llvm.amdgcn.exp2 intrinsic Provide direct access to v_exp_f32 and v_exp_f16, so we can start correctly lowering the generic exp intrinsics. Unfortunately have to break from the usual naming convention of matching the instruction name and stripping the v_ prefix. exp is already taken by the export intrinsic. On the clang builtin side, we have a choice of maintaining the convention to the instruction name, or following the intrinsic name.	2023-06-15 07:00:07 -04:00
Matt Arsenault	eccc89b26c	AMDGPU: Add llvm.amdgcn.log intrinsic This will map directly to the hardware instruction which does not handle denormals for f32. This will allow moving the generic intrinsic to be lowered correctly. Also handles selecting the f16 version, but there's no reason to use it over the generic intrinsic.	2023-06-12 21:10:30 -04:00
Nikita Popov	066fb7a58c	[Clang] Remove -no-opaque-pointers cc1 flag Migration of clang tests to opaque pointers is finished, so remove the -no-opaque-pointers flag. Differential Revision: https://reviews.llvm.org/D152447	2023-06-08 17:52:20 +02:00
Matt Arsenault	8a21ea1d0a	clang: Start emitting intrinsic for __builtin_ldexp* Also introduce __builtin_ldexpf16.	2023-06-06 17:07:19 -04:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Sergei Barannikov	cc7dc90481	[test] Fix const-str-array-decay.cl failure on PowerPC D150520 converted the test to use opaque pointers. The update version fails on PowerPC because of different return type of the function. This patch resolves the failure by removing the return type check; it also makes the test look more like it was before the conversion to prevent other potential issues caused by ABI differences across targets.	2023-05-15 19:56:28 +03:00
Sergei Barannikov	f46b0e6d75	[clang] Convert a few tests to opaque pointers Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D150520	2023-05-14 21:00:15 +03:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	1fc70210a6	AMDGPU: Add basic gfx941 target Differential Revision: https://reviews.llvm.org/D149982	2023-05-10 11:51:06 -04:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit `3f2fbe92d0`. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit `f9c1ede254`. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00

1 2 3 4 5 ...

693 Commits