clang-p2996

Author	SHA1	Message	Date
Stanislav Mekhanoshin	ba1a09da8d	[AMDGPU] Allow overload of __builtin_amdgcn_mov_dpp8 (#113610 ) The same handling as for __builtin_amdgcn_mov_dpp.	2024-10-31 02:19:20 -07:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Stanislav Mekhanoshin	03fef62b84	[AMDGPU] Relax __builtin_amdgcn_update_dpp sema check (#113341 ) Recent change applied too strict check for old and src operands match. These shall be compatible, but not necessarily exactly the same. Fixes: SWDEV-493072	2024-10-22 12:32:08 -07:00
Alex Voicu	6e0b0038cd	[clang][OpenCL][CodeGen][AMDGPU] Do not use `private` as the default AS for when `generic` is available (#112442 ) Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use `private` as the default address space. This is wrong for cases where the `generic` address space is available, and is corrected via this patch. In general, this AS map abuse is a bad hack and we should re-work it altogether, but at least after this patch we will stop being incorrect for e.g. OpenCL 2.0.	2024-10-22 12:05:48 +01:00
Stanislav Mekhanoshin	622e398d88	[AMDGPU] Allow overload of __builtin_amdgcn_mov/update_dpp (#112447 ) We need to support 64-bit data types (intrinsics do support it). We are also silently converting FP to integer argument now, also fixed.	2024-10-21 11:57:18 -07:00
Matt Arsenault	51b4ada458	clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (#102462 )	2024-10-17 17:10:45 +04:00
Alex Voicu	fc362521a3	[clang][OpenCL][NFC] Switch two tests to being generated (#112554 ) Turns out these tests are a bit unwieldy to hand-update, so switch them over to being generated, as requested in #112442.	2024-10-16 18:48:17 +01:00
Sven van Haastregt	caa7301bc8	[OpenCL] Restore addrspacecast for pipe builtins (#112514 ) Commit `84ee629bc5` ("clang: Remove some pointer bitcasts (#112324)", 2024-10-15) triggered some "Call parameter type does not match function signature!" errors when using the OpenCL pipe builtin functions under the spir triple, due to a missing addrspacecast. This would have been caught by the pipe_builtin.cl test if that had used the `spir-unknown-unknown` triple, so extend the test to use that triple too.	2024-10-16 13:58:12 +02:00
Alex Voicu	e13cbaca69	[clang][CodeGen][SPIR-V] Fix incorrect SYCL usage, implement missing interface (#109415 ) This is primarily meant to address the issue identified in #109182, around incorrect usage of `-fsycl-is-device`; we now have AMDGCN flavoured SPIR-V which retains the desired behaviour around the default AS and does not depend on the SYCL language being enabled to do so. Overall, there are three changes: 1. We unconditionally use the `SPIRDefIsGen` AS map for AMDGCNSPIRV target, as there is no case where the hack of setting default to private would be desirable, and it can be used for languages other than OCL/HIP; 2. We implement `SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace` for SPIR-V in general, because otherwise using it from languages other than HIP or OpenCL would yield 0, incorrectly; 3. We remove the incorrect usage of `-fsycl-is-device`.	2024-09-26 14:06:14 +01:00
Alex Voicu	3cfd0c0d36	[SPIRV][RFC] Rework / extend support for memory scopes (#106429 ) This change adds support for correctly lowering the `__scoped` Clang builtins, and corresponding scoped LLVM instructions. These were previously unconditionally lowered to Device scope, which is possibly incorrect. Furthermore, the default / implicit scope is changed from Device (an OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not OpenCL specific / can ingest IR coming from other language front-ends. OpenCL defaulting to Device scope is now reflected in the front-end handling of atomic ops, which seems preferable.	2024-09-25 00:44:57 +01:00
Nikita Popov	5a4c6f9799	[Loads] Check context instruction for context-sensitive derefability (#109277 ) If a dereferenceability fact is provided through `!dereferenceable` (or similar), it may only hold on the given control flow path. When we use `isSafeToSpeculativelyExecute()` to check multiple instructions, we might make use of `!dereferenceable` information that does not hold at the speculation target. This doesn't happen when speculating instructions one by one, because `!dereferenceable` will be dropped while speculating. Fix this by checking whether the instruction with `!dereferenceable` dominates the context instruction. If this is not the case, it means we are speculating, and cannot guarantee that it holds at the speculation target. Fixes https://github.com/llvm/llvm-project/issues/108854.	2024-09-23 09:13:09 +02:00
Stanislav Mekhanoshin	0745219d4a	[AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293 )	2024-09-06 11:41:21 -07:00
Matt Arsenault	a291fe5ed4	clang/AMDGPU: Update test message order Order of atomic expansion remarks is backwards since `100d9b8994`	2024-09-06 21:18:41 +04:00
Stanislav Mekhanoshin	bd840a4004	[AMDGPU] Add target intrinsic for s_prefetch_data (#107133 )	2024-09-05 15:14:31 -07:00
Matt Arsenault	93e0f312c2	clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (#96876 )	2024-08-20 23:24:15 +04:00
Matt Arsenault	5822cc271b	clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (#96875 )	2024-08-20 23:20:03 +04:00
Matt Arsenault	0a22655f31	clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32\|f64} builtins (#96874 )	2024-08-20 23:15:55 +04:00
Matt Arsenault	ce132a58b8	clang/AMDGPU: Emit atomicrmw from {global\|flat}_atomic_fadd_v2f16 builtins (#96873 )	2024-08-20 23:01:15 +04:00
Matt Arsenault	b5e63cc533	clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32\|f64} (#96872 ) Need to emit syncscope and new metadata to get the native instruction, most of the time.	2024-08-15 22:59:24 +04:00
Hari Limaye	94473f4db6	[IRBuilder] Generate nuw GEPs for struct member accesses (#99538 ) Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.	2024-08-09 13:25:04 +01:00
Eli Friedman	1762e01cca	Fix codegen of consteval functions returning an empty class, and related issues (#93115 ) Fix codegen of consteval functions returning an empty class, and related issues If a class is empty, don't store it to memory: the store might overwrite useful data. Similarly, if a class has tail padding that might overlap other fields, don't store the tail padding to memory. The problem here turned out a bit more general than I initially thought: basically all uses of EmitAggregateStore were broken. Call lowering had a method that did mostly the right thing, though: CreateCoercedStore. Adapt CreateCoercedStore so it always does the conservatively right thing, and use it for both calls and ConstantExpr. Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was set incorrectly for empty classes in some cases. Fixes #93040.	2024-08-01 16:18:20 -07:00
Zahira Ammarguellat	c9c91f59c3	Remove FiniteMathOnly and use only NoHonorINFs and NoHonorNANs. (#97342 ) Currently `__FINITE_MATH_ONLY__` is set when `FiniteMathOnly`. And `FiniteMathOnly` is set when `NoHonorInfs` && `NoHonorNans` is true. But what happens one of the latter flags is false? To avoid potential inconsistencies, the internal option `FiniteMathOnly` is removed option and the macro `__FINITE_MATH_ONLY__` is set when `NoHonorInfs` && `NoHonorNans`.	2024-07-26 08:16:38 -04:00
Vikash Gupta	d65f037591	[Clang] Use private address space for builtin_alloca return type for OpenCL (#95750 ) The __builtin_alloca was returning a flat pointer with no address space when compiled using openCL1.2 or below but worked fine with openCL2.0 and above. This accounts to the fact that later uses the concept of generic address space which supports cast to other address space(i.e to private address space which is used for stack allocation) . But, in actuality, as it returns pointer to the stack, it should be pointing to private address space irrespective of openCL version becuase builtin_alloca allocates stack memory used for current function in which it is called. Thus,it requires redefintion of the builtin function with appropraite return pointer to private address space.	2024-07-26 15:24:06 +05:30
Farzon Lotfi	a14baec0f3	[clang] Emit constraint intrinsics for arc and hyperbolic trig clang builtins (#98949 ) ## Change(s) - `Builtins.td` - Add f16 support for libm arc and hyperbolic trig functions - `CGBuiltin.cpp` - Emit constraint intrinsics for trig clang builtins ## History This change is part of an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 ## Precursor PR(s) Note this PR needs Merge after: - #98937 - #98755	2024-07-19 10:19:41 -04:00
Shilei Tian	af5352fe8e	[Clang][AMDGPU] Use unsigned data type for `__builtin_amdgcn_raw_buffer_store_*` (#99546 )	2024-07-18 16:34:59 -04:00
Shilei Tian	892c58cf74	[Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.ptr.buffer.load` (#99258 )	2024-07-18 15:33:03 -04:00
Changpeng Fang	280d90d0fd	AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540 ) half and bfloat are common types for 16-bit elements. The support of them was original there and dropped due to some reasons. This work adds the support of the float types back.	2024-07-18 11:23:35 -07:00
Stanislav Mekhanoshin	f363e30f15	[AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633 )	2024-07-09 14:25:58 -07:00
Matt Arsenault	8f63d154ec	clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (#96738 )	2024-06-27 15:32:08 +02:00
Vikram Hegde	35f7b60aa6	[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725 ) These are incremental changes over #89217 , with core logic being the same. This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.	2024-06-26 09:24:09 +05:30
Shilei Tian	c9f083a994	[Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.ptr.buffer.store` (#94576 ) Depends on https://github.com/llvm/llvm-project/pull/96313.	2024-06-25 09:55:37 -04:00
Vikram Hegde	5feb32ba92	[AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217 ) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <vikhegde@amd.com>	2024-06-25 14:35:19 +05:30
Shilei Tian	e3eb12cce9	[Clang][AMDGPU] Add a builtin for `llvm.amdgcn.make.buffer.rsrc` intrinsic (#95276 ) Depends on https://github.com/llvm/llvm-project/pull/94830.	2024-06-20 11:01:54 -04:00
Andreas Jonson	01ba3fa37b	[Clang] Swap range and noundef metadata to attribute for intrinsics. (#94851 )	2024-06-19 17:23:53 +02:00
Shilei Tian	ad599211a7	[Clang][AMDGPU] Add a new builtin type for buffer rsrc (#94830 ) This patch adds a new builtin type for AMDGPU's buffer rsrc data type, which is effectively an AS 8 pointer. This is needed because we'd like to expose certain intrinsics to users via builtins which take buffer rsrc as argument.	2024-06-18 20:46:53 -04:00
Matt Arsenault	76894c5e6e	clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (#95395 ) We should have done this for the f32/f64 case a long time ago. Now that codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting it instead. This also does upgrade the behavior to respect a volatile qualified pointer, which was previously ignored (for the cases that don't have an explicit volatile argument).	2024-06-18 20:51:14 +02:00
Stephen Tozer	094572701d	[RemoveDIs] Print IR with debug records by default (#91724 ) This patch makes the final major change of the RemoveDIs project, changing the default IR output from debug intrinsics to debug records. This is expected to break a large number of tests: every single one that tests for uses or declarations of debug intrinsics and does not explicitly disable writing records. If this patch has broken your downstream tests (or upstream tests on a configuration I wasn't able to run): 1. If you need to immediately unblock a build, pass `--write-experimental-debuginfo=false` to LLVM's option processing for all failing tests (remember to use `-mllvm` for clang/flang to forward arguments to LLVM). 2. For most test failures, the changes are trivial and mechanical, enough that they can be done by script; see the migration guide for a guide on how to do this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates 3. If any tests fail for reasons other than FileCheck check lines that need updating, such as assertion failures, that is most likely a real bug with this patch and should be reported as such. For more information, see the recent PSA: https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578	2024-06-14 15:07:27 +01:00
Farzon Lotfi	189d471191	[clang] Reland Add tanf16 builtin and support for tan constrained intrinsic (#94559 ) Relanding this PR now that https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN` landing in [TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63 ) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm backends. In LLVM, the `llvm.experimental.constrained.cos` and `llvm.experimental.constrained.sin` intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the `tan` intrinsic. - `Builtins.td` - replace TanF128 with F16F128MathTemplate - `CGBuiltin.cpp` - map existing tan builtins to `tan` and `constrained_tan` intrinsic - `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode. resolves #91421 --------- Co-authored-by: Farzon Lotfi <farzon@farzon.com>	2024-06-10 20:46:26 -04:00
Alex Voicu	88e2bb4092	[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796 ) This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features: - AMDGCN inline ASM is allowed/supported, under the assumption that the [SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc) extension is enabled/used - AMDGCN target specific builtins are allowed/supported, under the assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is enabled when using the downstream translator - the featureset matches the union of AMDGCN targets' features - the datalayout string is overspecified to affix both the program address space and the alloca address space, the latter under the assumption that the [SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc) extension is enabled/used, case in which the extant SPIRV datalayout string would lead to pointers to function pointing to the private address space, which would be wrong. Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).	2024-06-07 11:50:23 +01:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Farzon Lotfi	7348bb23ab	Revert "[clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314 )" (#93721 ) This reverts commit `b15a0a3740`. This should undo PR: https://github.com/llvm/llvm-project/pull/93314 will need to re-open https://github.com/llvm/llvm-project/issues/91421 wait for https://github.com/llvm/llvm-project/pull/90503 to land	2024-05-29 15:32:38 -04:00
Farzon Lotfi	b15a0a3740	[clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314 ) In LLVM, the `llvm.experimental.constrained.cos` and `llvm.experimental.constrained.sin` intrinsics are used for performing cosine and sine calculations with additional constraints on floating-point operations. This behavior is expected for all floating-point math intrinsics. This change adds these constraints for the `tan` intrinsic. - `Builtins.td` - replace TanF128 with F16F128MathTemplate - `CGBuiltin.cpp` - map existing tan builtins to `tan` and `constrained_tan` intrinsic - `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode. - `ISDOpcodes.h` - define tan and strict tan opcodes resolves #91421	2024-05-29 11:16:18 -04:00
Shilei Tian	d53c6cdbc1	[AMDGPU][Clang] Builtin for GLOBAL_LOAD_LDS on GFX940 (#92962 ) Fixes: SWDEV-459212	2024-05-22 00:03:59 -04:00
Fangrui Song	7e59223ac4	[test] %clang_cc1: remove redundant actions ParseFrontendArgs takes the last OPT_Action_Group option. The other actions are overridden.	2024-05-05 10:46:06 -07:00
Fangrui Song	7c1d9b15ee	[test] %clang_cc1: remove redundant actions	2024-05-04 23:08:11 -07:00
Fangrui Song	0d501f38f3	[test] %clang_cc1 -emit-llvm: remove redundant -S Also replace aarch64-none-linux-gnu (none can indicate an OS as well) with aarch64	2024-05-04 17:15:51 -07:00
Fangrui Song	c5de4dd1ea	[test] %clang_cc1 -emit-llvm: remove redundant -S And replace -emit-llvm -o - with -emit-llvm-only	2024-05-04 17:00:29 -07:00
Joseph Huber	70b79a9ccd	[AMDGPU] Allow the `__builtin_flt_rounds` functions on AMDGPU (#90994 ) Summary: Previous patches added support for the LLVM rounding intrinsic functions. This patch allows them to me emitted using the clang builtins when targeting AMDGPU.	2024-05-03 14:01:09 -05:00
Andreas Jonson	b8f3024a31	[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop (#88776 ) Since all optimizations that use range metadata now also handle range attribute, this patch replaces writes of range metadata for call instructions to range attributes.	2024-04-25 01:45:50 +08:00
Corbin Robeck	27ce513788	[AMDGPU] Add Clang builtins for amdgcn s_ttrace intrinsics (#88076 )	2024-04-11 22:05:01 -04:00

1 2 3 4 5 ...

776 Commits