clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	bac2a07540	clang: Attach !fpmath metadata to __builtin_sqrt based on language flags OpenCL and HIP have -cl-fp32-correctly-rounded-divide-sqrt and -fno-hip-correctly-rounded-divide-sqrt. The corresponding fpmath metadata was only set on fdiv, and not sqrt. The backend is currently underutilizing sqrt lowering options, and the responsibility is split between the libraries and backend and this metadata is needed. CUDA/NVCC has -prec-div and -prev-sqrt but clang doesn't appear to be aiming for compatibility with those. Don't know if OpenMP has a similar control.	2023-07-14 18:46:18 -04:00
boxu.zhang	f05b58a946	[clang] Support '-fgpu-default-stream=per-thread' for NVIDIA CUDA I'm using clang to compile CUDA code. And just found that clang doesn't support the per-thread stream option for NV CUDA. I don't know if there is another solution. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D154822	2023-07-13 16:54:57 -07:00
root	250f2bb2c6	adding bf16 support to NVPTX Currently, bf16 has been scatteredly added to the PTX codegen. This patch aims to complete the set of instructions and code path required to support bf16 data type. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D144911 Co-authored-by: Artem Belevich <tra@google.com>	2023-06-28 11:57:13 -07:00
pvanhout	23431b5246	[clang][CodeGen] Fix GPU-specific attributes being dropped by bitcode linking Device libs make use of patterns like this: ``` __attribute__((target("gfx11-insts"))) static unsigned do_intrin_stuff(void) { return __builtin_amdgcn_s_sendmsg_rtnl(0x0); } ``` For functions that are assumed to be eliminated if the currennt GPU target doesn't support them. At O0 such functions aren't eliminated by common optimizations but often by AMDGPURemoveIncompatibleFunctions instead, which sees the "+gfx11-insts" attribute on, say, GFX9 and knows it's not valid, so it removes the function. D142907 accidentally made it so such attributes were dropped during bitcode linking, making it impossible for RemoveIncompatibleFunctions to catch the functions and causing ISel to catch fire eventually. This fixes the issue and adds a new test to ensure we don't accidentally fall into this trap again. Fixes SWDEV-403642 Reviewed By: arsenm, yaxunl Differential Revision: https://reviews.llvm.org/D152251	2023-06-07 15:51:52 +02:00
Yaxun (Sam) Liu	f2677afe91	[CUDA][HIP] Externalize device var in anonymous namespace Device variables in an anonymous namespace may be referenced by host code, therefore they need to be externalized in a similar way as a static device variables or kernels in an anonymous namespace. Fixes: https://github.com/ROCm-Developer-Tools/HIP/issues/3246 Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D152164	2023-06-06 12:03:48 -04:00
Artem Belevich	dc90f42ea7	Coalesce 16-bit FP types to use integer register classes. i16/f16/bf16 will use the same .b16 registers and i32/v2f16 and v2bf16 will share .b32 registers. The changes are mostly mechanical, intended to remove unnecessary register classes which tend to produce redundant register moves. Differential Revision: https://reviews.llvm.org/D151601 v2f16 regtype conversion to i32	2023-06-05 12:21:52 -07:00
Yaxun (Sam) Liu	00448a548c	[clang] Allow fp in atomic fetch max/min builtins LLVM IR already allows floating point type in atomicrmw. Update clang atomic fetch max/min builtins to accept floating point type like we did for fetch add/sub. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D150985 Fixes: SWDEV-401056	2023-05-31 15:19:31 -04:00
Luke Drummond	e3fbede7f3	[HIP] Add missing __hip_atomic_fetch_sub support The rest of the fetch/op intrinsics were added in `e13246a2ec` but sub was conspicuous by its absence. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D151701	2023-05-30 22:22:43 +01:00
M. Zeeshan Siddiqui	e621757365	[Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support Pursuant to discussions at https://discourse.llvm.org/t/rfc-c-23-p1467r9-extended-floating-point-types-and-standard-names/70033/22, this commit enhances the handling of the __bf16 type in Clang. - Firstly, it upgrades __bf16 from a storage-only type to an arithmetic type. - Secondly, it changes the mangling of __bf16 to DF16b on all architectures except ARM. This change has been made in accordance with the finalization of the mangling for the std::bfloat16_t type, as discussed at https://github.com/itanium-cxx-abi/cxx-abi/pull/147. - Finally, this commit extends the existing excess precision support to the __bf16 type. This applies to hardware architectures that do not natively support bfloat16 arithmetic. Appropriate tests have been added to verify the effects of these changes and ensure no regressions in other areas of the compiler. Reviewed By: rjmccall, pengfei, zahiraam Differential Revision: https://reviews.llvm.org/D150913	2023-05-27 13:33:50 +08:00
Artem Belevich	25708b3df6	[NVPTX, CUDA] barrier intrinsics and builtins for sm_90 Differential Revision: https://reviews.llvm.org/D151363	2023-05-25 11:57:57 -07:00
Artem Belevich	0a0bae1e9f	[CUDA] plumb through new sm_90-specific builtins. Differential Revision: https://reviews.llvm.org/D151168	2023-05-25 11:57:56 -07:00
Sergei Barannikov	f46b0e6d75	[clang] Convert a few tests to opaque pointers Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D150520	2023-05-14 21:00:15 +03:00
Matt Arsenault	bc37be1855	LangRef: Add "dynamic" option to "denormal-fp-math" This is stricter than the default "ieee", and should probably be the default. This patch leaves the default alone. I can change this in a future patch. There are non-reversible transforms I would like to perform which are legal under IEEE denormal handling, but illegal with flushing zero behavior. Namely, conversions between llvm.is.fpclass and fcmp with zeroes. Under "ieee" handling, it is legal to translate between llvm.is.fpclass(x, fcZero) and fcmp x, 0. Under "preserve-sign" handling, it is legal to translate between llvm.is.fpclass(x, fcSubnormal\|fcZero) and fcmp x, 0. I would like to compile and distribute some math library functions in a mode where it's callable from code with and without denormals enabled, which requires not changing the compares with denormals or zeroes. If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0, it is no longer possible to call the function from code with denormals enabled, or write an optimization to move the function into a denormal flushing mode. For the original function, if x was a denormal, the class would evaluate to false. If the function compiled with denormal handling was converted to or called from a preserve-sign function, the fcmp now evaluates to true. This could also be of use for strictfp handling, where code may be changing the denormal mode. Alternative name could be "unknown". Replaces the old AMDGPU custom inlining logic with more conservative logic which tries to permit inlining for callees with dynamic handling and avoids inlining other mismatched modes.	2023-04-29 08:44:59 -04:00
Artem Belevich	2aa90da012	[CUDA] Update cached kernel handle when the function instance changes. Fixes clang crash caused by a stale function pointer. The bug has been present for a pretty long time, but we were lucky not to trigger it until D140663. Differential Revision: https://reviews.llvm.org/D146448	2023-03-21 15:36:12 -07:00
Nikita Popov	06621ecdaf	[Clang] Convert some tests to opaque pointers (NFC)	2023-02-17 15:08:50 +01:00
Matt Arsenault	7cbbbc0a04	clang: Rename misleading test name This is a test for attribute propagation, not metadata	2023-02-15 08:32:57 -04:00
Yaxun (Sam) Liu	f4d8b8781d	[AMDGPU ASAN] Remove reference to asan bitcode library The asan functions are now attributed as used in the device library, no need to keep the declaration of asan device preserve function. Patch by: Praveen Velliengiri Reviewed by: Yaxun Liu Differential Revision: https://reviews.llvm.org/D143495	2023-02-14 11:52:41 -05:00
Daniele Castagna	32c26e27b6	CUDA/HIP: Use kernel name to map to symbol Currently CGCUDANV uses an llvm::Function as a key to map kernels to a symbol in host code. HIP adds one level of indirection and uses the llvm::Function to map to a global variable that will be initialized to the kernel stub ptr. Unfortunately there is no garantee that the llvm::Function created by GetOrCreateLLVMFunction will be the same. In fact, the first time we encounter GetOrCrateLLVMFunction for a kernel, the type might not be completed yet, and the type of llvm::Function will be a generic {}, since the complete type is not required to get a symbol to a function. In this case we end up creating two global variables, one for the llvm::Function with the incomplete type and one for the function with the complete type. The first global variable will be declared by not defined, resulting in a linking error. This change uses the mangled name of the llvm::Function as key in the KernelHandles map, in this way the same llvm::Function will be associated to the same kernel handle even if they types are different. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D140663	2023-01-19 15:02:14 -08:00
Nikita Popov	b164b047f2	[Clang] Convert test to opaque pointers (NFC)	2023-01-17 10:08:57 +01:00
Nikita Popov	02856565ac	[Clang] Emit noundef metadata next to range metadata To preserve the previous semantics after D141386, adjust places that currently emit !range metadata to also emit !noundef metadata. This retains range violation as immediate undefined behavior, rather than just poison. Differential Revision: https://reviews.llvm.org/D141494	2023-01-12 10:03:05 +01:00
Matt Arsenault	2e9c663ab4	clang/AMDGPU: Add missing tests for some builtin These were tested under opencl but need hip testing for the potential addrspacecasts.	2023-01-10 13:07:01 -05:00
Pierre van Houtryve	678d8946ba	[AMDGPU] Add bf16 storage support - [Clang] Declare AMDGPU target as supporting BF16 for storage-only purposes on amdgcn - Add Sema & CodeGen tests cases. - Also add cases that D138651 would have covered as this patch replaces it. - [AMDGPU] Add BF16 storage-only support - Support legalization/dealing with bf16 operations in DAGIsel. - bf16 as a type remains illegal and is represented as i16 for storage purposes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139398	2022-12-13 10:34:26 -05:00
Nikita Popov	0419465fa4	[Clang] Update some CUDA tests to opaque pointers (NFC)	2022-12-13 11:50:08 +01:00
Nikita Popov	9466b49171	[Clang] Convert various tests to opaque pointers (NFC) These were all tests where no manual fixup was required.	2022-12-12 17:11:46 +01:00
Yaxun (Sam) Liu	262c3034bb	Revert "[AMDGPU] Disable bool range metadata to workaround backend issue" This reverts commit `107ee26130` to facilitate investigating and fixing the root cause. Differential Revision: https://reviews.llvm.org/D135269	2022-12-08 17:35:10 -05:00
Ron Lieberman	ca856fff1c	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit `d882ba7aea`.	2022-11-29 15:21:09 -06:00
Ron Lieberman	d882ba7aea	enable code-object-version=5	2022-11-29 15:11:57 -06:00
Pierre van Houtryve	c05f1639f7	[clang][cuda/hip] Allow `__noinline__` lambdas D124866 seem to have had an unintended side effect: __noinline__ on lambdas was no longer accepted. This fixes the regression and adds a test case for it. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D137251	2022-11-04 07:33:31 +00:00
Artem Belevich	0e8a414ab3	[CUDA, NVPTX] Added basic __bf16 support for NVPTX. Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and that makes those types visible furing GPU-side compilation, where it currently fails with Sema complaining that __bf16 is not supported. Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's enabled on the host should pose no issues, correctness-wise. Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better support for __bf16 on NVPTX going forward. Differential Revision: https://reviews.llvm.org/D136311	2022-10-25 11:08:06 -07:00
Artem Belevich	a10eb07d1a	Do not append terminating NUL to the binary string with embedded fatbin. Extra NUL does not impact functionality of the generated code, but it confuses various NVIDIA tools used to examine embedded GPU binaries. Differential Revision: https://reviews.llvm.org/D135832	2022-10-17 15:39:39 -07:00
Fangrui Song	ccd89be645	[Driver] Remove legacy cc1 alias -mlink-cuda-bitcode r340193 added -mlink-builtin-bitcode as the canonical spelling.	2022-10-15 14:02:13 -07:00
Zahira Ammarguellat	84a9ec2ff1	Remove redundant option -menable-unsafe-fp-math. There are currently two options that are used to tell the compiler to perform unsafe floating-point optimizations: '-ffast-math' and '-funsafe-math-optimizations'. '-ffast-math' is enabled by default. It automatically enables the driver option '-menable-unsafe-fp-math'. Below is a table illustrating the special operations enabled automatically by '-ffast-math', '-funsafe-math-optimizations' and '-menable-unsafe-fp-math' respectively. Special Operations -ffast-math -funsafe-math-optimizations -menable-unsafe-fp-math MathErrno 0 1 1 FiniteMathOnly 1 0 0 AllowFPReassoc 1 1 1 NoSignedZero 1 1 1 AllowRecip 1 1 1 ApproxFunc 1 1 1 RoundingMath 0 0 0 UnsafeFPMath 1 0 1 FPContract fast on on '-ffast-math' enables '-fno-math-errno', '-ffinite-math-only', '-funsafe-math-optimzations' and sets 'FpContract' to 'fast'. The driver option '-menable-unsafe-fp-math' enables the same special options than '-funsafe-math-optimizations'. This is redundant. We propose to remove the driver option '-menable-unsafe-fp-math' and use instead, the setting of the special operations to set the function attribute 'unsafe-fp-math'. This attribute will be enabled only if those special operations are enabled and if 'FPContract' is either 'fast' or set to the default value. Differential Revision: https://reviews.llvm.org/D135097	2022-10-14 10:55:29 -04:00
Yaxun (Sam) Liu	eb26baf4ad	Fix test bool-range.cu Promoting kernel arg pointer to global addr space is only available with registered amdgcn target. Fix test so that it does not require registered amdgcn target.	2022-10-07 11:17:28 -04:00
Yaxun (Sam) Liu	107ee26130	[AMDGPU] Disable bool range metadata to workaround backend issue Currently there is a middle-end or backend issue https://github.com/llvm/llvm-project/issues/58176 which causes values loaded from bool pointer incorrect when bool range metadata is emitted. Temporarily disable bool range metadata until the backend issue is fixed. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D135269 Fixes: SWDEV-344137	2022-10-07 10:46:04 -04:00
Yaxun (Sam) Liu	5e25284dbc	[AMDGPU] Emit module flag for all code object versions Reviewed by: Changpeng Fang, Matt Arsenault, Brian Sumner Differential Revision: https://reviews.llvm.org/D134355	2022-09-22 16:51:33 -04:00
Fangrui Song	74742147ee	[test] Change cc1 -fvisibility to -fvisibility=	2022-09-02 12:36:44 -07:00
Johannes Doerfert	48d6f52401	[CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive A copy-paste error caused UB in the definition of the unsigned long long versions of the shfl intrinsics. Reported and diagnosed by @trws. Differential Revision: https://reviews.llvm.org/D129536	2022-07-21 12:36:54 -05:00
Joseph Huber	b370be37cc	[CUDA] Allow the new driver to compile CUDA in non-RDC mode The new driver primarily allows us to support RDC-mode compilations with proper linking. This is not needed for non-RDC mode compilation, but we still would like the new driver to be able to handle this mode so we can transition away from the old driver in the future. This patch adds the necessary code to support creating a fatbinary for CUDA code generation as well as removing old assumptions and errors about RDC-mode with the new driver. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D129655	2022-07-13 21:49:15 -04:00
Joseph Huber	e88d53d25f	[HIP] Generate offloading entries for HIP with the new driver. This patch adds the small change required to output offloading entried for HIP instead of CUDA. These should be placed in different sections so because they need to be distinct to the offloading toolchain, otherwise we'd have HIP trying to register CUDA kernels or vice-versa. This patch will precede support for HIP in the linker wrapper. Reviewed By: yaxunl, tra Differential Revision: https://reviews.llvm.org/D128850	2022-07-11 15:49:21 -04:00
Nikita Popov	935570b2ad	[ConstExpr] Don't create div/rem expressions This removes creation of udiv/sdiv/urem/srem constant expressions, in preparation for their removal. I've added a ConstantExpr::isDesirableBinOp() predicate to determine whether an expression should be created for a certain operator. With this patch, div/rem expressions can still be created through explicit IR/bitcode, forbidding them entirely will be the next step. Differential Revision: https://reviews.llvm.org/D128820	2022-07-05 15:54:53 +02:00
Yaxun (Sam) Liu	8ad4c6e4b1	[HIP] add -fhip-kernel-arg-name Add option -fhip-kernel-arg-name to emit kernel argument name metadata, which is needed for certain HIP applications. Reviewed by: Artem Belevich, Fangrui Song, Brian Sumner Differential Revision: https://reviews.llvm.org/D128022	2022-06-24 11:15:36 -04:00
Nikita Popov	6f258c0fd3	[Clang] Don't test register allocation This test was broken by `719658d078`. How did an assembly test get into clang/test?	2022-06-23 12:46:00 +02:00
Yaxun (Sam) Liu	af9ee3357c	[HIP] fix long double size For amdgpu target long double type is the same as double type. The width and align of long double type was incorrectly overridden when copying aux target properties, which caused assertion in codegen when emitting global variables with long double type. This patch fix that by saving and restoring width and align of long double type. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D127771 Fixes: SWDEV-335515	2022-06-14 21:57:56 -04:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Joseph Huber	1bae02b773	[Cuda] Use fallback method to mangle externalized decls if no CUID given CUDA requires that static variables be visible to the host when offloading. However, The standard semantics of a stiatc variable dictate that it should not be visible outside of the current file. In order to access it from the host we need to perform "externalization" on the static variable on the device. This requires generating a semi-unique name that can be affixed to the variable as to not cause linker errors. This is currently done using the CUID functionality, an MD5 hash value set up by the clang driver. This allows us to achieve is mostly unique ID that is unique even between multiple compilations of the same file. However, this is not always availible. Instead, this patch uses the unique ID from the file to generate a unique symbol name. This will create a unique name that is consistent between the host and device side compilations without requiring the CUID to be entered by the driver. The one downside to this is that we are no longer stable under multiple compilations of the same file. However, this is a very niche use-case and is not supported by Nvidia's CUDA compiler so it likely to be good enough. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D125904	2022-05-26 09:18:22 -04:00
Joseph Huber	0035f7154c	[CUDA] Create offloading entries when using the new driver The changes made in D123460 generalized the code generation for OpenMP's offloading entries. We can use the same scheme to register globals for CUDA code. This patch adds the code generation to create these offloading entries when compiling using the new offloading driver mode. The offloading entries are simple structs that contain the information necessary to register the global. The struct used is as follows: ``` Type struct __tgt_offload_entry { void addr; // Pointer to the offload entry info. // (function or global) char name; // Name of the function or global. size_t size; // Size of the entry info (0 if it a function). int32_t flags; int32_t reserved; }; ``` Currently CUDA handles RDC code generation by deferring the registration of globals in the current TU to a callback function containing the modules ID. Later all the module IDs will be used to register all of the globals at once. Rather than mimic this, offloading entries allow us to mimic the way OpenMP registers globals. That is, we create a simple global struct for each device global to be registered. These are placed at a special section `cuda_offloading_entires`. Because this section is a valid C-identifier, the linker will profide a `__start` and `__stop` pointer that we can use to iterate and register all globals at runtime. the registration requires a flag variable to indicate which registration function to use. I have assigned the flags somewhat arbitrarily, but these use the following values. Kernel: 0 Variable: 0 Managed: 1 Surface: 2 Texture: 3 Depends on D120272 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D123471	2022-05-11 07:30:21 -04:00
Yaxun (Sam) Liu	afc9d674fe	[CUDA][HIP] support __noinline__ as keyword CUDA/HIP programs use __noinline__ like a keyword e.g. __noinline__ void foo() {} since __noinline__ is defined as a macro __attribute__((noinline)) in CUDA/HIP runtime header files. However, gcc and clang supports __attribute__((__noinline__)) the same as __attribute__((noinline)). Some C++ libraries use __attribute__((__noinline__)) in their header files. When CUDA/HIP programs include such header files, clang will emit error about invalid attributes. This patch fixes this issue by supporting __noinline__ as a keyword, so that CUDA/HIP runtime could remove the macro definition. Reviewed by: Aaron Ballman, Artem Belevich Differential Revision: https://reviews.llvm.org/D124866	2022-05-10 14:32:27 -04:00
Yaxun (Sam) Liu	11d3e31c60	[CUDA][HIP] Fix mangling number for local struct MSVC and Itanium mangling use different mangling numbers for function-scope structs, which causes inconsistent mangled kernel names in device and host compilations. This patch uses Itanium mangling number for structs in for mangling device side names in CUDA/HIP host compilation on Windows to fix this issue. A state is added to ASTContext to indicate whether the current name mangling is for device side names in host compilation. Device and host mangling number are encoded/decoded as upper and lower half of 32 bit unsigned integer to fit into the original mangling number field for AST. Diagnostic will be emitted if a manglining number exceeds limit. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D122734 Fixes: SWDEV-328515	2022-04-28 19:54:43 -04:00
Yaxun (Sam) Liu	57a210e5b7	[CUDA][HIP] Fix linkage of __clang_gpu_used_external Different TU's may have this globl var. appending linkage can only be used with lld recognized special variables. Change it to internal linkage. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D124466	2022-04-26 20:43:39 -04:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00

1 2 3 4 5 ...

340 Commits