clang-p2996

Author	SHA1	Message	Date
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Matt Arsenault	bac2a07540	clang: Attach !fpmath metadata to __builtin_sqrt based on language flags OpenCL and HIP have -cl-fp32-correctly-rounded-divide-sqrt and -fno-hip-correctly-rounded-divide-sqrt. The corresponding fpmath metadata was only set on fdiv, and not sqrt. The backend is currently underutilizing sqrt lowering options, and the responsibility is split between the libraries and backend and this metadata is needed. CUDA/NVCC has -prec-div and -prev-sqrt but clang doesn't appear to be aiming for compatibility with those. Don't know if OpenMP has a similar control.	2023-07-14 18:46:18 -04:00
Kevin P. Neal	91f886a40d	[FPEnv][TableGen] Add strictfp attribute to constrained intrinsics by default. In D146869 @arsenm pointed out that the constrained intrinsics aren't getting the strictfp attribute by default. They should be since they are required to have it anyway. TableGen did not know about this attribute until now. This patch adds strictfp to TableGen, and it uses it on all of the constrained intrinsics. Differential Revision: https://reviews.llvm.org/D154991	2023-07-12 09:55:53 -04:00
Matt Arsenault	42d4c85ca8	clang: Stop emitting "strictfp" The attribute is a proper enum attribute, strictfp. We were getting strictfp and "strictfp" set on every function with -fexperimental-strict-floating-point. https://reviews.llvm.org/D139629	2023-07-07 15:28:21 -04:00
Matt Arsenault	75b7901901	clang: Regenerate test checks	2023-07-07 15:28:21 -04:00
Matt Arsenault	b15bf305ca	Reapply "clang: Use new frexp intrinsic for builtins and add f16 version" This reverts commit `0c545a4412`. ARM libcall expansion was fixed in `160d7227e0`	2023-06-30 09:07:23 -04:00
Hans Wennborg	0c545a4412	Revert "clang: Use new frexp intrinsic for builtins and add f16 version" This caused asserts in some Android and Windows builds: SelectionDAGNodes.h:1138: llvm::SDValue::SDValue(SDNode *, unsigned int): Assertion `(!Node \|\| !ResNo \|\| ResNo < Node->getNumValues()) && "Invalid result number for the given node!"' failed. See comment on `85bdea023f` Also revert "HIP: Use frexp builtins in math headers" which seems to depend on this change. This reverts commit `85bdea023f`. This reverts commit `bf8e92c0e7`.	2023-06-30 13:26:25 +02:00
Sameer Sahasrabuddhe	7a101798b7	Revert "[AMDGPU] Mark mbcnt as convergent" This reverts commit `37114036aa`. The output of mbcnt does not depend on other active lanes, and hence it is not convergent. The original change was made as a possible fix for https://github.com/ROCm-Developer-Tools/HIP/issues/3172 But changing mbcnt does not fix that issue. Reviewed By: ruiling, foad, yaxunl Differential Revision: https://reviews.llvm.org/D153953	2023-06-30 13:10:44 +05:30
Matt Arsenault	85bdea023f	clang: Use new frexp intrinsic for builtins and add f16 version	2023-06-28 14:50:17 -04:00
Arthur Eubanks	457dc72fdd	Reland [InstCombine] Infer inbounds for more GEPs of dereferenceable pointers Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments. Fixed clang test broken in initial land. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153815	2023-06-27 09:31:20 -07:00
Matt Arsenault	b84721df63	clang/AMDGPU: Emit atomicrmw for atomic_inc/dec builtins This makes the scope and ordering arguments actually do something. Also add some new OpenCL tests since the existing HIP tests didn't cover address spaces.	2023-06-16 20:18:50 -04:00
Matt Arsenault	28f3edd2be	AMDGPU: Add llvm.amdgcn.exp2 intrinsic Provide direct access to v_exp_f32 and v_exp_f16, so we can start correctly lowering the generic exp intrinsics. Unfortunately have to break from the usual naming convention of matching the instruction name and stripping the v_ prefix. exp is already taken by the export intrinsic. On the clang builtin side, we have a choice of maintaining the convention to the instruction name, or following the intrinsic name.	2023-06-15 07:00:07 -04:00
Matt Arsenault	eccc89b26c	AMDGPU: Add llvm.amdgcn.log intrinsic This will map directly to the hardware instruction which does not handle denormals for f32. This will allow moving the generic intrinsic to be lowered correctly. Also handles selecting the f16 version, but there's no reason to use it over the generic intrinsic.	2023-06-12 21:10:30 -04:00
Nikita Popov	066fb7a58c	[Clang] Remove -no-opaque-pointers cc1 flag Migration of clang tests to opaque pointers is finished, so remove the -no-opaque-pointers flag. Differential Revision: https://reviews.llvm.org/D152447	2023-06-08 17:52:20 +02:00
Matt Arsenault	8a21ea1d0a	clang: Start emitting intrinsic for __builtin_ldexp* Also introduce __builtin_ldexpf16.	2023-06-06 17:07:19 -04:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Sergei Barannikov	cc7dc90481	[test] Fix const-str-array-decay.cl failure on PowerPC D150520 converted the test to use opaque pointers. The update version fails on PowerPC because of different return type of the function. This patch resolves the failure by removing the return type check; it also makes the test look more like it was before the conversion to prevent other potential issues caused by ABI differences across targets.	2023-05-15 19:56:28 +03:00
Sergei Barannikov	f46b0e6d75	[clang] Convert a few tests to opaque pointers Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D150520	2023-05-14 21:00:15 +03:00
Konstantin Zhuravlyov	9d05727972	AMDGPU: Add basic gfx942 target Differential Revision: https://reviews.llvm.org/D149983	2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov	1fc70210a6	AMDGPU: Add basic gfx941 target Differential Revision: https://reviews.llvm.org/D149982	2023-05-10 11:51:06 -04:00
Krzysztof Drewniak	f0415f2a45	Re-land "[AMDGPU] Define data layout entries for buffers"" Re-land D145441 with data layout upgrade code fixed to not break OpenMP. This reverts commit `3f2fbe92d0`. Differential Revision: https://reviews.llvm.org/D149776	2023-05-03 19:43:56 +00:00
Krzysztof Drewniak	3f2fbe92d0	Revert "[AMDGPU] Define data layout entries for buffers" This reverts commit `f9c1ede254`. Differential Revision: https://reviews.llvm.org/D149758	2023-05-03 16:11:00 +00:00
Krzysztof Drewniak	f9c1ede254	[AMDGPU] Define data layout entries for buffers Per discussion at https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798, we define two new address spaces for AMDGCN targets. The first is address space 7, a non-integral address space (which was already in the data layout) that has 160-bit pointers (which are 256-bit aligned) and uses a 32-bit offset. These pointers combine a 128-bit buffer descriptor and a 32-bit offset, and will be usable with normal LLVM operations (load, store, GEP). However, they will be rewritten out of existence before code generation. The second of these is address space 8, the address space for "buffer resources". These will be used to represent the resource arguments to buffer instructions, and new buffer intrinsics will be defined that take them instead of <4 x i32> as resource arguments. ptr addrspace(8). These pointers are 128-bits long (with the same alignment). They must not be used as the arguments to getelementptr or otherwise used in address computations, since they can have arbitrarily complex inherent addressing semantics that can't be represented in LLVM. Even though, like their address space 7 cousins, these pointers have deterministic ptrtoint/inttoptr semantics, they are defined to be non-integral in order to prevent optimizations that rely on pointers being a [0, [addr_max]] value from applying to them. Future work includes: - Defining new buffer intrinsics that take ptr addrspace(8) resources. - A late rewrite to turn address space 7 operations into buffer intrinsics and offset computations. This commit also updates the "fallback address space" for buffer intrinsics to the buffer resource, and updates the alias analysis table. Depends on D143437 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D145441	2023-05-03 15:25:58 +00:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Anshil Gandhi	a955a31896	[AMDGPU] Replace target feature for global fadd32 Change target feature of __builtin_amdgcn_global_atomic_fadd_f32 to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a, gfx940 and gfx1100 as they all support the return variant of `global_atomic_add_f32`. Fixes https://github.com/llvm/llvm-project/issues/61331. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D146840	2023-03-28 15:58:30 -06:00
Mariusz Sikora	69061f9627	[AMDGPU] Add clang builtin for __builtin_amdgcn_ds_atomic_fadd_v2f16 Differential Revision: https://reviews.llvm.org/D146808	2023-03-24 16:27:44 +01:00
Mariusz Sikora	ea064ee2a3	[AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions Introducing Subtarget Features for instructions: - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 Differential Revision: https://reviews.llvm.org/D146701	2023-03-24 13:10:40 +01:00
Joshua Cranmer	bcad161db3	[Clang][SPIR-V] Emit target extension types for OpenCL types on SPIR-V. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D141008	2023-03-13 14:20:24 -04:00
Yaxun (Sam) Liu	37114036aa	[AMDGPU] Mark mbcnt as convergent since it depends on CFG. Otherwise some passes will try to merge them and cause incorrect results. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D145072	2023-03-02 11:56:32 -05:00
Nikita Popov	3d84f4268d	[Clang] Convert some tests to opaque pointers (NFC)	2023-02-17 09:49:03 +01:00
Diana Picus	819dfc338b	[AMDGPU] Autogenerate checks for several tests. NFCI	2023-02-16 10:54:34 +01:00
Matt Arsenault	647925648a	clang/OpenCL: Apply default attributes to enqueued blocks This was missing important environment context, like denormal-fp-math and target-features. Curiously this seems to be losing nounwind. Note this only fixes the actual invoke kernel. The invoke function is already setting the default attribute set for internal functions. However that is still buggy since it's not applying any use function attributes (it's also missing uniform-work-group-size). There seem to be too many different functions for setting attributes with inconsistent behavior. The Function overload of addDefaultFunctionAttributes seems to miss the target-cpu and target-features. The AttrBuilder one seems to miss optnone (but that seems to be disallowed on blocks anyway). Neither one calls setTargetAttributes, when it probably should. uniform-work-group-size is also set through AMDGPU code when it should be emitting generically as a language property. I also noticed update_cc_test_checks for attributes seem to not connect the captured attribute variables to the attributes at the end (although I think the numbers happen to work out correctly).	2023-01-30 15:03:15 -04:00
Matt Arsenault	d12ee4bf7c	clang/OpenCL: Extend tests for enqueued block attributes Baseline tests showing that enqueued blocks are not getting the correct attributes applied.	2023-01-30 15:03:15 -04:00
Matt Arsenault	00f6a7f02f	clang/OpenCL: Fix not setting convergent on block invoke kernels Yet another example how convergent not being the default is dangerous and backwards.	2023-01-30 15:03:14 -04:00
Stanislav Mekhanoshin	df0488369d	[AMDGPU] Split dot7 feature Differential Revision: https://reviews.llvm.org/D142507	2023-01-26 10:34:36 -08:00
Nikita Popov	eaea793d5e	[Clang] Convert some tests to opaque pointers (NFC) These are all tests that end up running SROA.	2023-01-26 11:33:19 +01:00
Stanislav Mekhanoshin	870b92977e	[AMDGPU] Split dot8 feature Differential Revision: https://reviews.llvm.org/D142407	2023-01-24 11:16:07 -08:00
Stanislav Mekhanoshin	4ab2246d48	[AMDGPU] Remove dot1 and dot6 features from clang for gfx11 These are unsupported. Differential Revision: https://reviews.llvm.org/D142493	2023-01-24 10:52:42 -08:00
Sven van Haastregt	1495210914	[OpenCL] Always add nounwind attribute for OpenCL Neither OpenCL nor C++ for OpenCL support exceptions, so add the `nounwind` attribute unconditionally for those languages. Differential Revision: https://reviews.llvm.org/D142033	2023-01-20 12:01:22 +00:00
Matt Arsenault	7f2f6eec3e	clang/OpenCL: Check calling convention in test update_cc_test_checks misses this, so make sure at least one block enqueue test manually checks the calling convention for the kernel.	2023-01-12 13:39:23 -05:00
Nikita Popov	02856565ac	[Clang] Emit noundef metadata next to range metadata To preserve the previous semantics after D141386, adjust places that currently emit !range metadata to also emit !noundef metadata. This retains range violation as immediate undefined behavior, rather than just poison. Differential Revision: https://reviews.llvm.org/D141494	2023-01-12 10:03:05 +01:00
Paul Walker	eae26b6640	[IRBuilder] Use canonical i64 type for insertelement index used by vector splats. Instcombine prefers this canonical form (see getPreferredVectorIndex), as does IRBuilder when passing the index as an integer so we may as well use the prefered form from creation. NOTE: All test changes are mechanical with nothing else expected beyond a change of index type from i32 to i64. Differential Revision: https://reviews.llvm.org/D140983	2023-01-11 14:08:06 +00:00
Matt Arsenault	f9559b1e30	clang: Convert test to generated checks and opaque pointers	2023-01-10 20:35:49 -05:00
Matt Arsenault	81849497b4	clang/AMDGPU: Remove flat-address-space from feature map This was only used for checking if is_shared/is_private were legal, which we're not bothering to do anymore. This is apparently visible to more than the target attribute (which seems to silently ignore unrecognized features), so this has the potential to break something (i.e. see the OpenMP test change)	2023-01-05 16:35:04 -05:00
Nikita Popov	aae20a7421	[CodeGenOpenCL] Convert some tests to opaque pointers (NFC)	2023-01-05 10:57:30 +01:00
Matt Arsenault	e630d9b299	AMDGPU/clang: Remove target features from address space test builtins It turns out we can codegen these on targets without flat addressing, although the runtime probably didn't put anything useful there. The proper diagnostic would be to disallow flat pointer uses or languages with them, not this one edge case. Allows removing one of the special cases requiring subtarget support in the device libraries.	2022-12-29 18:46:41 -05:00
Matt Arsenault	f4bcd7f598	AMDGPU/clang: Add builtins for llvm.amdgcn.ballot Use explicit _w32/_w64 suffixes for the wave size to be consistent with the existing other wave dependent intrinsics. Also start diagnosing trying to use both wave32 and wave64. I would have preferred to avoid the +wavefrontsize64 spam on targets where that's the only option, but avoiding this seems to be more work than I expected.	2022-12-29 17:58:55 -05:00
Roman Lebedev	96d3c82645	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)" While the PPC litte-endian miscompile did get addressed by https://reviews.llvm.org/D140046 the PPV big-endian bots are still unhappy. https://lab.llvm.org/buildbot/#/builders/93/builds/12560 This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.	2022-12-16 22:58:41 +03:00
Roman Lebedev	cfd594f8bb	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3) * This is a recommit of `3c4d2a0396`, * which was reverted in `25f01d593c`, because it exposed a miscompile in PPC backend, which was resolved in https://reviews.llvm.org/D140089 / `cb3f415cd2`. * which was a recommit of `cf624b23bc`, * which was reverted in `5cfc22cafe`, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, 5and caused compile time explosions in some cases. Let's try with something really REALLY conservative first, just to get somewhere, and try to bump it later. FIXME: should this respect TTI reg width * num vec regs? Original commit message: Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-12-16 19:27:38 +03:00
Nikita Popov	9466b49171	[Clang] Convert various tests to opaque pointers (NFC) These were all tests where no manual fixup was required.	2022-12-12 17:11:46 +01:00

1 2 3 4 5 ...

667 Commits