Recent change applied too strict check for old and src operands match.
These shall be compatible, but not necessarily exactly the same.
Fixes: SWDEV-493072
Currently, for AMDGPU, when compiling for OpenCL, we unconditionally use
`private` as the default address space. This is wrong for cases where
the `generic` address space is available, and is corrected via this
patch. In general, this AS map abuse is a bad hack and we should re-work
it altogether, but at least after this patch we will stop being
incorrect for e.g. OpenCL 2.0.
Commit 84ee629bc5 ("clang: Remove some pointer bitcasts (#112324)",
2024-10-15) triggered some "Call parameter type does not match function
signature!" errors when using the OpenCL pipe builtin functions under
the spir triple, due to a missing addrspacecast.
This would have been caught by the pipe_builtin.cl test if that had used
the `spir-unknown-unknown` triple, so extend the test to use that
triple too.
This is primarily meant to address the issue identified in #109182,
around incorrect usage of `-fsycl-is-device`; we now have AMDGCN
flavoured SPIR-V which retains the desired behaviour around the default
AS and does not depend on the SYCL language being enabled to do so.
Overall, there are three changes:
1. We unconditionally use the `SPIRDefIsGen` AS map for AMDGCNSPIRV
target, as there is no case where the hack of setting default to private
would be desirable, and it can be used for languages other than OCL/HIP;
2. We implement `SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace` for
SPIR-V in general, because otherwise using it from languages other than
HIP or OpenCL would yield 0, incorrectly;
3. We remove the incorrect usage of `-fsycl-is-device`.
This change adds support for correctly lowering the `__scoped` Clang
builtins, and corresponding scoped LLVM instructions. These were
previously unconditionally lowered to Device scope, which is possibly incorrect.
Furthermore, the default / implicit scope is changed from Device (an
OpenCL assumption) to AllSvmDevices (aka System), since the SPIR-V BE is not
OpenCL specific / can ingest IR coming from other language front-ends. OpenCL
defaulting to Device scope is now reflected in the front-end handling of atomic
ops, which seems preferable.
If a dereferenceability fact is provided through `!dereferenceable` (or
similar), it may only hold on the given control flow path. When we use
`isSafeToSpeculativelyExecute()` to check multiple instructions, we
might make use of `!dereferenceable` information that does not hold at
the speculation target. This doesn't happen when speculating
instructions one by one, because `!dereferenceable` will be dropped
while speculating.
Fix this by checking whether the instruction with `!dereferenceable`
dominates the context instruction. If this is not the case, it means we
are speculating, and cannot guarantee that it holds at the speculation
target.
Fixes https://github.com/llvm/llvm-project/issues/108854.
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.
Regression tests are updated using update scripts where possible, and by
find + replace where not.
Fix codegen of consteval functions returning an empty class, and related
issues
If a class is empty, don't store it to memory: the store might overwrite
useful data. Similarly, if a class has tail padding that might overlap
other fields, don't store the tail padding to memory.
The problem here turned out a bit more general than I initially thought:
basically all uses of EmitAggregateStore were broken. Call lowering had
a method that did mostly the right thing, though: CreateCoercedStore.
Adapt CreateCoercedStore so it always does the conservatively right
thing, and use it for both calls and ConstantExpr.
Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was
set incorrectly for empty classes in some cases.
Fixes#93040.
Currently `__FINITE_MATH_ONLY__` is set when `FiniteMathOnly`. And
`FiniteMathOnly` is set when `NoHonorInfs` && `NoHonorNans` is true. But
what happens one of the latter flags is false?
To avoid potential inconsistencies, the internal option `FiniteMathOnly`
is removed option and the macro `__FINITE_MATH_ONLY__` is set when
`NoHonorInfs` && `NoHonorNans`.
The __builtin_alloca was returning a flat pointer with no address space
when compiled using openCL1.2 or below but worked fine with openCL2.0
and above. This accounts to the fact that later uses the concept of
generic address space which supports cast to other address space(i.e to
private address space which is used for stack allocation) .
But, in actuality, as it returns pointer to the stack, it should be
pointing to private address space irrespective of openCL version becuase
builtin_alloca allocates stack memory used for current function in which
it is called. Thus,it requires redefintion of the builtin function with
appropraite return pointer to private address space.
half and bfloat are common types for 16-bit elements. The support of
them was original there and dropped due to some reasons. This work adds
the support of the float types back.
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
This patch adds a new builtin type for AMDGPU's buffer rsrc data type,
which is effectively an AS 8 pointer. This is needed because we'd like
to expose certain intrinsics to users via builtins which take buffer
rsrc as argument.
We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.
This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
Relanding this PR now that
https://github.com/llvm/llvm-project/pull/90503 has merged. with `FTAN`
landing in
[TargetLoweringBase.cpp:L1021](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1020C23-L1021C63
) There is now a llvm tan intrinsic 32\64\128 Expand case for all llvm
backends.
In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.
- `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
- `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode.
resolves #91421
---------
Co-authored-by: Farzon Lotfi <farzon@farzon.com>
This change seeks to add support for vendor flavoured SPIRV - more
specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that
carries some extra bits of information that are only usable by AMDGCN
targets, forfeiting absolute genericity to obtain greater expressiveness
for target features:
- AMDGCN inline ASM is allowed/supported, under the assumption that the
[SPV_INTEL_inline_assembly](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc)
extension is enabled/used
- AMDGCN target specific builtins are allowed/supported, under the
assumption that e.g. the `--spirv-allow-unknown-intrinsics` option is
enabled when using the downstream translator
- the featureset matches the union of AMDGCN targets' features
- the datalayout string is overspecified to affix both the program
address space and the alloca address space, the latter under the
assumption that the
[SPV_INTEL_function_pointers](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc)
extension is enabled/used, case in which the extant SPIRV datalayout
string would lead to pointers to function pointing to the private
address space, which would be wrong.
Existing AMDGCN tests are extended to cover this new target. It is
currently dormant / will require some additional changes, but I thought
I'd rather put it up for review to get feedback as early as possible. I
will note that an alternative option is to place this under AMDGPU, but
that seems slightly less natural, since this is still SPIRV, albeit
relaxed in terms of preconditions & constrained in terms of
postconditions, and only guaranteed to be usable on AMDGCN targets (it
is still possible to obtain pristine portable SPIRV through usage of the
flavoured target, though).
In LLVM, the `llvm.experimental.constrained.cos` and
`llvm.experimental.constrained.sin` intrinsics are used for performing
cosine and sine calculations with additional constraints on
floating-point operations. This behavior is expected for all
floating-point math intrinsics. This change adds these constraints for
the `tan` intrinsic.
- `Builtins.td` - replace TanF128 with F16F128MathTemplate
- `CGBuiltin.cpp` - map existing tan builtins to `tan` and
`constrained_tan` intrinsic
- `ConstrainedOps.def` map tan and constrained_tan to an ISDOpcode.
- `ISDOpcodes.h` - define tan and strict tan opcodes
resolves #91421
Summary:
Previous patches added support for the LLVM rounding intrinsic
functions. This patch allows them to me emitted using the clang builtins
when targeting AMDGPU.
Since all optimizations that use range metadata now also handle range attribute, this patch replaces writes of
range metadata for call instructions to range attributes.