Commit Graph

776 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
ba1a09da8d [AMDGPU] Allow overload of __builtin_amdgcn_mov_dpp8 (#113610)
The same handling as for __builtin_amdgcn_mov_dpp.
2024-10-31 02:19:20 -07:00
Carl Ritson
076aac59ac [AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Stanislav Mekhanoshin
03fef62b84 [AMDGPU] Relax __builtin_amdgcn_update_dpp sema check (#113341)
Recent change applied too strict check for old and src operands match.
These shall be compatible, but not necessarily exactly the same.

Fixes: SWDEV-493072
2024-10-22 12:32:08 -07:00
Alex Voicu
6e0b0038cd [clang][OpenCL][CodeGen][AMDGPU] Do not use private as the default AS for when generic is available (#112442)
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.
2024-10-22 12:05:48 +01:00
Stanislav Mekhanoshin
622e398d88 [AMDGPU] Allow overload of __builtin_amdgcn_mov/update_dpp (#112447)
We need to support 64-bit data types (intrinsics do support it). We are
also silently converting FP to integer argument now, also fixed.
2024-10-21 11:57:18 -07:00
Matt Arsenault
51b4ada458 clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (#102462) 2024-10-17 17:10:45 +04:00
Alex Voicu
fc362521a3 [clang][OpenCL][NFC] Switch two tests to being generated (#112554)
Turns out these tests are a bit unwieldy to hand-update, so switch them over to being generated, as requested in #112442.
2024-10-16 18:48:17 +01:00
Sven van Haastregt
caa7301bc8 [OpenCL] Restore addrspacecast for pipe builtins (#112514)
Commit 84ee629bc5 ("clang: Remove some pointer bitcasts (#112324)",
2024-10-15) triggered some "Call parameter type does not match function
signature!" errors when using the OpenCL pipe builtin functions under
the spir triple, due to a missing addrspacecast.

This would have been caught by the pipe_builtin.cl test if that had used
the `spir-unknown-unknown` triple, so extend the test to use that
triple too.
2024-10-16 13:58:12 +02:00
Alex Voicu
e13cbaca69 [clang][CodeGen][SPIR-V] Fix incorrect SYCL usage, implement missing interface (#109415)
This is primarily meant to address the issue identified in #109182,
around incorrect usage of `-fsycl-is-device`; we now have AMDGCN
flavoured SPIR-V which retains the desired behaviour around the default
AS and does not depend on the SYCL language being enabled to do so.
Overall, there are three changes:

1. We unconditionally use the `SPIRDefIsGen` AS map for AMDGCNSPIRV
target, as there is no case where the hack of setting default to private
would be desirable, and it can be used for languages other than OCL/HIP;
2. We implement `SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace` for
SPIR-V in general, because otherwise using it from languages other than
HIP or OpenCL would yield 0, incorrectly;
3. We remove the incorrect usage of `-fsycl-is-device`.
2024-09-26 14:06:14 +01:00
Alex Voicu
3cfd0c0d36 [SPIRV][RFC] Rework / extend support for memory scopes (#106429)
This change adds support for correctly lowering the `__scoped` Clang
builtins, and corresponding scoped LLVM instructions. These were
previously unconditionally lowered to Device scope, which is possibly incorrect. 
Furthermore, the default / implicit scope is changed from Device (an 
OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not 
OpenCL specific / can ingest IR coming from other language front-ends. OpenCL 
defaulting to Device scope is now reflected in the front-end handling of atomic 
ops, which seems preferable.
2024-09-25 00:44:57 +01:00
Nikita Popov
5a4c6f9799 [Loads] Check context instruction for context-sensitive derefability (#109277)
If a dereferenceability fact is provided through `!dereferenceable` (or
similar), it may only hold on the given control flow path. When we use
`isSafeToSpeculativelyExecute()` to check multiple instructions, we
might make use of `!dereferenceable` information that does not hold at
the speculation target. This doesn't happen when speculating
instructions one by one, because `!dereferenceable` will be dropped
while speculating.

Fix this by checking whether the instruction with `!dereferenceable`
dominates the context instruction. If this is not the case, it means we
are speculating, and cannot guarantee that it holds at the speculation
target.

Fixes https://github.com/llvm/llvm-project/issues/108854.
2024-09-23 09:13:09 +02:00
Stanislav Mekhanoshin
0745219d4a [AMDGPU] Add target intrinsic for s_buffer_prefetch_data (#107293) 2024-09-06 11:41:21 -07:00
Matt Arsenault
a291fe5ed4 clang/AMDGPU: Update test message order
Order of atomic expansion remarks is backwards since
100d9b8994
2024-09-06 21:18:41 +04:00
Stanislav Mekhanoshin
bd840a4004 [AMDGPU] Add target intrinsic for s_prefetch_data (#107133) 2024-09-05 15:14:31 -07:00
Matt Arsenault
93e0f312c2 clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (#96876) 2024-08-20 23:24:15 +04:00
Matt Arsenault
5822cc271b clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (#96875) 2024-08-20 23:20:03 +04:00
Matt Arsenault
0a22655f31 clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64} builtins (#96874) 2024-08-20 23:15:55 +04:00
Matt Arsenault
ce132a58b8 clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (#96873) 2024-08-20 23:01:15 +04:00
Matt Arsenault
b5e63cc533 clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (#96872)
Need to emit syncscope and new metadata to get the native instruction,
most of the time.
2024-08-15 22:59:24 +04:00
Hari Limaye
94473f4db6 [IRBuilder] Generate nuw GEPs for struct member accesses (#99538)
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.

Regression tests are updated using update scripts where possible, and by
find + replace where not.
2024-08-09 13:25:04 +01:00
Eli Friedman
1762e01cca Fix codegen of consteval functions returning an empty class, and related issues (#93115)
Fix codegen of consteval functions returning an empty class, and related
issues

If a class is empty, don't store it to memory: the store might overwrite
useful data. Similarly, if a class has tail padding that might overlap
other fields, don't store the tail padding to memory.

The problem here turned out a bit more general than I initially thought:
basically all uses of EmitAggregateStore were broken. Call lowering had
a method that did mostly the right thing, though: CreateCoercedStore.
Adapt CreateCoercedStore so it always does the conservatively right
thing, and use it for both calls and ConstantExpr.

Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was
set incorrectly for empty classes in some cases.

Fixes #93040.
2024-08-01 16:18:20 -07:00
Zahira Ammarguellat
c9c91f59c3 Remove FiniteMathOnly and use only NoHonorINFs and NoHonorNANs. (#97342)
Currently `__FINITE_MATH_ONLY__` is set when `FiniteMathOnly`. And
`FiniteMathOnly` is set when `NoHonorInfs` && `NoHonorNans` is true. But
what happens one of the latter flags is false?
To avoid potential inconsistencies, the internal option `FiniteMathOnly`
is removed option and the macro `__FINITE_MATH_ONLY__` is set when
`NoHonorInfs` && `NoHonorNans`.
2024-07-26 08:16:38 -04:00
Vikash Gupta
d65f037591 [Clang] Use private address space for builtin_alloca return type for OpenCL (#95750)
The __builtin_alloca was returning a flat pointer with no address space
when compiled using openCL1.2 or below but worked fine with openCL2.0
and above. This accounts to the fact that later uses the concept of
generic address space which supports cast to other address space(i.e to
private address space which is used for stack allocation) .

But, in actuality, as it returns pointer to the stack, it should be
pointing to private address space irrespective of openCL version becuase
builtin_alloca allocates stack memory used for current function in which
it is called. Thus,it requires redefintion of the builtin function with
appropraite return pointer to private address space.
2024-07-26 15:24:06 +05:30
Farzon Lotfi
a14baec0f3 [clang] Emit constraint intrinsics for arc and hyperbolic trig clang builtins (#98949)
## Change(s)
- `Builtins.td` - Add f16 support for libm arc and hyperbolic trig
functions
- `CGBuiltin.cpp` - Emit constraint intrinsics for trig clang builtins

## History
This change is part of an implementation of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics. Which was discussed in
this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.

https://github.com/llvm/llvm-project/issues/70079
https://github.com/llvm/llvm-project/issues/70080
https://github.com/llvm/llvm-project/issues/70081
https://github.com/llvm/llvm-project/issues/70083
https://github.com/llvm/llvm-project/issues/70084
https://github.com/llvm/llvm-project/issues/95966

## Precursor PR(s)

Note this PR needs Merge after:
- #98937
- #98755
2024-07-19 10:19:41 -04:00
Shilei Tian
af5352fe8e [Clang][AMDGPU] Use unsigned data type for __builtin_amdgcn_raw_buffer_store_* (#99546) 2024-07-18 16:34:59 -04:00
Shilei Tian
892c58cf74 [Clang][AMDGPU] Add builtins for instrinsic llvm.amdgcn.raw.ptr.buffer.load (#99258) 2024-07-18 15:33:03 -04:00
Changpeng Fang
280d90d0fd AMDGPU: Add back half and bfloat support for global_load_tr16 pats (#99540)
half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reasons. This work adds
the support of the float types back.
2024-07-18 11:23:35 -07:00
Stanislav Mekhanoshin
f363e30f15 [AMDGPU] Report error in clang if wave32 is requested where unsupported (#97633) 2024-07-09 14:25:58 -07:00
Matt Arsenault
8f63d154ec clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (#96738) 2024-06-27 15:32:08 +02:00
Vikram Hegde
35f7b60aa6 [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (#92725)
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
2024-06-26 09:24:09 +05:30
Shilei Tian
c9f083a994 [Clang][AMDGPU] Add builtins for instrinsic llvm.amdgcn.raw.ptr.buffer.store (#94576)
Depends on https://github.com/llvm/llvm-project/pull/96313.
2024-06-25 09:55:37 -04:00
Vikram Hegde
5feb32ba92 [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (#89217)
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel

---------

Co-authored-by: vikramRH <vikhegde@amd.com>
2024-06-25 14:35:19 +05:30
Shilei Tian
e3eb12cce9 [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (#95276)
Depends on https://github.com/llvm/llvm-project/pull/94830.
2024-06-20 11:01:54 -04:00
Andreas Jonson
01ba3fa37b [Clang] Swap range and noundef metadata to attribute for intrinsics. (#94851) 2024-06-19 17:23:53 +02:00
Shilei Tian
ad599211a7 [Clang][AMDGPU] Add a new builtin type for buffer rsrc (#94830)
This patch adds a new builtin type for AMDGPU's buffer rsrc data type,
which is effectively an AS 8 pointer. This is needed because we'd like
to expose certain intrinsics to users via builtins which take buffer
rsrc as argument.
2024-06-18 20:46:53 -04:00
Matt Arsenault
76894c5e6e clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (#95395)
We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.

This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
2024-06-18 20:51:14 +02:00
Stephen Tozer
094572701d [RemoveDIs] Print IR with debug records by default (#91724)
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records. 

If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.

For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2024-06-14 15:07:27 +01:00
Farzon Lotfi
189d471191 [clang] Reland Add tanf16 builtin and support for tan constrained intrinsic (#94559)
Relanding this PR now that
https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN`
landing in
[TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63
) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm
backends.

In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.

-  `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
-   `ConstrainedOps.def` map tan and constrained_tan  to an ISDOpcode.

resolves  #91421

---------

Co-authored-by: Farzon Lotfi <farzon@farzon.com>
2024-06-10 20:46:26 -04:00
Alex Voicu
88e2bb4092 [clang][SPIR-V] Add support for AMDGCN flavoured SPIRV (#89796)
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:

- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.

Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
2024-06-07 11:50:23 +01:00
Shilei Tian
1ca0055f45 [AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Farzon Lotfi
7348bb23ab Revert "[clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314)" (#93721)
This reverts commit b15a0a3740.

This should undo PR: https://github.com/llvm/llvm-project/pull/93314
will need to re-open https://github.com/llvm/llvm-project/issues/91421

wait for https://github.com/llvm/llvm-project/pull/90503 to land
2024-05-29 15:32:38 -04:00
Farzon Lotfi
b15a0a3740 [clang] Add tanf16 builtin and support for tan constrained intrinsic (#93314)
In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.

-  `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
-   `ConstrainedOps.def` map tan and constrained_tan  to an ISDOpcode.
-  `ISDOpcodes.h` - define tan and strict tan  opcodes

resolves  #91421
2024-05-29 11:16:18 -04:00
Shilei Tian
d53c6cdbc1 [AMDGPU][Clang] Builtin for GLOBAL_LOAD_LDS on GFX940 (#92962)
Fixes: SWDEV-459212
2024-05-22 00:03:59 -04:00
Fangrui Song
7e59223ac4 [test] %clang_cc1: remove redundant actions
ParseFrontendArgs takes the last OPT_Action_Group option. The other
actions are overridden.
2024-05-05 10:46:06 -07:00
Fangrui Song
7c1d9b15ee [test] %clang_cc1: remove redundant actions 2024-05-04 23:08:11 -07:00
Fangrui Song
0d501f38f3 [test] %clang_cc1 -emit-llvm: remove redundant -S
Also replace aarch64-none-linux-gnu (none can indicate an OS as well) with aarch64
2024-05-04 17:15:51 -07:00
Fangrui Song
c5de4dd1ea [test] %clang_cc1 -emit-llvm: remove redundant -S
And replace -emit-llvm -o - with -emit-llvm-only
2024-05-04 17:00:29 -07:00
Joseph Huber
70b79a9ccd [AMDGPU] Allow the __builtin_flt_rounds functions on AMDGPU (#90994)
Summary:
Previous patches added support for the LLVM rounding intrinsic
functions. This patch allows them to me emitted using the clang builtins
when targeting AMDGPU.
2024-05-03 14:01:09 -05:00
Andreas Jonson
b8f3024a31 [InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop (#88776)
Since all optimizations that use range metadata now also handle range attribute, this patch replaces writes of
range metadata for call instructions to range attributes.
2024-04-25 01:45:50 +08:00
Corbin Robeck
27ce513788 [AMDGPU] Add Clang builtins for amdgcn s_ttrace intrinsics (#88076) 2024-04-11 22:05:01 -04:00