OpenCL and HIP have -cl-fp32-correctly-rounded-divide-sqrt and
-fno-hip-correctly-rounded-divide-sqrt. The corresponding fpmath metadata
was only set on fdiv, and not sqrt. The backend is currently underutilizing
sqrt lowering options, and the responsibility is split between the libraries
and backend and this metadata is needed.
CUDA/NVCC has -prec-div and -prev-sqrt but clang doesn't appear to be
aiming for compatibility with those. Don't know if OpenMP has a similar
control.
In D146869 @arsenm pointed out that the constrained intrinsics aren't
getting the strictfp attribute by default. They should be since they are
required to have it anyway.
TableGen did not know about this attribute until now. This patch adds
strictfp to TableGen, and it uses it on all of the constrained intrinsics.
Differential Revision: https://reviews.llvm.org/D154991
The attribute is a proper enum attribute, strictfp. We were getting
strictfp and "strictfp" set on every function with
-fexperimental-strict-floating-point.
https://reviews.llvm.org/D139629
This caused asserts in some Android and Windows builds:
SelectionDAGNodes.h:1138: llvm::SDValue::SDValue(SDNode *, unsigned int):
Assertion `(!Node || !ResNo || ResNo < Node->getNumValues()) && "Invalid result number for the given node!"' failed.
See comment on 85bdea023f
Also revert "HIP: Use frexp builtins in math headers"
which seems to depend on this change.
This reverts commit 85bdea023f.
This reverts commit bf8e92c0e7.
Use Value::getPointerDereferenceableBytes() instead of hardcoding dereferenceable only for allocas. Allows us to infer inbounds GEPs for other Values like CallInsts and Arguments.
Fixed clang test broken in initial land.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153815
This makes the scope and ordering arguments actually do something.
Also add some new OpenCL tests since the existing HIP tests didn't
cover address spaces.
Provide direct access to v_exp_f32 and v_exp_f16, so we can start
correctly lowering the generic exp intrinsics.
Unfortunately have to break from the usual naming convention of
matching the instruction name and stripping the v_ prefix. exp is
already taken by the export intrinsic. On the clang builtin side, we
have a choice of maintaining the convention to the instruction name,
or following the intrinsic name.
This will map directly to the hardware instruction which does not
handle denormals for f32. This will allow moving the generic intrinsic
to be lowered correctly. Also handles selecting the f16 version, but
there's no reason to use it over the generic intrinsic.
Migration of clang tests to opaque pointers is finished, so remove
the -no-opaque-pointers flag.
Differential Revision: https://reviews.llvm.org/D152447
AMDGPU has native instructions and target intrinsics for this, but
these really should be subject to legalization and generic
optimizations. This will enable legalization of f16->f32 on targets
without f16 support.
Implement a somewhat horrible inline expansion for targets without
libcall support. This could be better if we could introduce control
flow (GlobalISel version not yet implemented). Support for strictfp
legalization is less complete but works for the simple cases.
D150520 converted the test to use opaque pointers. The update version
fails on PowerPC because of different return type of the function.
This patch resolves the failure by removing the return type check;
it also makes the test look more like it was before the conversion to
prevent other potential issues caused by ABI differences across targets.
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.
The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.
The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.
Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.
This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.
Depends on D143437
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D145441
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
Change target feature of __builtin_amdgcn_global_atomic_fadd_f32
to atomic-fadd-rtn-insts. Enable atomic-fadd-rtn-insts for gfx90a,
gfx940 and gfx1100 as they all support the return variant of
`global_atomic_add_f32`.
Fixes https://github.com/llvm/llvm-project/issues/61331.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D146840
since it depends on CFG.
Otherwise some passes will try to merge them and cause
incorrect results.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D145072
This was missing important environment context, like denormal-fp-math
and target-features. Curiously this seems to be losing nounwind. Note
this only fixes the actual invoke kernel. The invoke function is
already setting the default attribute set for internal
functions. However that is still buggy since it's not applying any use
function attributes (it's also missing uniform-work-group-size).
There seem to be too many different functions for setting attributes
with inconsistent behavior. The Function overload of
addDefaultFunctionAttributes seems to miss the target-cpu and
target-features. The AttrBuilder one seems to miss optnone (but that
seems to be disallowed on blocks anyway). Neither one calls
setTargetAttributes, when it probably should. uniform-work-group-size
is also set through AMDGPU code when it should be emitting generically
as a language property.
I also noticed update_cc_test_checks for attributes seem to not
connect the captured attribute variables to the attributes at the end
(although I think the numbers happen to work out correctly).
Neither OpenCL nor C++ for OpenCL support exceptions, so add the
`nounwind` attribute unconditionally for those languages.
Differential Revision: https://reviews.llvm.org/D142033
To preserve the previous semantics after D141386, adjust places
that currently emit !range metadata to also emit !noundef metadata.
This retains range violation as immediate undefined behavior,
rather than just poison.
Differential Revision: https://reviews.llvm.org/D141494
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.
NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.
Differential Revision: https://reviews.llvm.org/D140983
This was only used for checking if is_shared/is_private were legal,
which we're not bothering to do anymore.
This is apparently visible to more than the target attribute (which
seems to silently ignore unrecognized features), so this has the
potential to break something (i.e. see the OpenMP test change)
It turns out we can codegen these on targets without flat addressing,
although the runtime probably didn't put anything useful there. The
proper diagnostic would be to disallow flat pointer uses or languages
with them, not this one edge case. Allows removing one of the special
cases requiring subtarget support in the device libraries.
Use explicit _w32/_w64 suffixes for the wave size to be consistent
with the existing other wave dependent intrinsics. Also start
diagnosing trying to use both wave32 and wave64.
I would have preferred to avoid the +wavefrontsize64 spam on targets
where that's the only option, but avoiding this seems to be more work
than I expected.
* This is a recommit of 3c4d2a0396,
* which was reverted in 25f01d593c,
because it exposed a miscompile in PPC backend, which was resolved
in https://reviews.llvm.org/D140089 / cb3f415cd2.
* which was a recommit of cf624b23bc,
* which was reverted in 5cfc22cafe,
because the cut-off on the number of vector elements was not low enough,
and it triggered both SDAG SDNode operand number assertions,
5and caused compile time explosions in some cases.
Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it later.
FIXME: should this respect TTI reg width * num vec regs?
Original commit message:
Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.
But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.
Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.
Fixes#59116.