Fix mapping of structs to device.
The following example fails:
```
#include <stdio.h>
#include <stdlib.h>
struct Descriptor {
int *datum;
long int x;
int xi;
long int arr[1][30];
};
int main() {
Descriptor dat = Descriptor();
dat.datum = (int *)malloc(sizeof(int)*10);
dat.xi = 3;
dat.arr[0][0] = 1;
#pragma omp target enter data map(to: dat.datum[:10]) map(to: dat)
#pragma omp target
{
dat.xi = 4;
dat.datum[dat.arr[0][0]] = dat.xi;
}
#pragma omp target exit data map(from: dat)
return 0;
}
```
This is a rework of the previous attempt:
https://github.com/llvm/llvm-project/pull/72410
Currently we are missing set up-boundary address for FinalArraySection
as highests elements in partial struct data.
Currently for:
\#pragma omp target map(D.a) map(D.b[:2])
The size is:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%2 = getelementptr float, ptr %arrayidx, i32 1
%3 = ptrtoint ptr %2 to i64
%4 = ptrtoint ptr %a to i64
%5 = sub i64 %3, %4
%6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
Where %2 is wrong for (D.b[:2]) is pointer to first element of array
section. It should pointe to last element of array section.
The fix is to emit the pointer to the last element of array section and
use this pointer as the highest element in partial struct data.
After change IR:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1
%1 = getelementptr float, ptr %arrayidx2, i32 1
%2 = ptrtoint ptr %1 to i64
%3 = ptrtoint ptr %a to i64
%4 = sub i64 %2, %3
%5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.
This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.
One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:
```
struct S { ~S() { foo(); } };
void foo() {
static S s;
}
```
However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.
This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.
Depends on: https://github.com/llvm/llvm-project/pull/71549
This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.
This patch moves `OMPDeclareReductionDecl::InitKind` to DeclBase.h, so that it's complete at the point where corresponding bit-field is declared. This patch also converts it to scoped enum named `OMPDeclareReductionInitKind`
This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.
We used to pass the min/max threads/teams values through different paths
from the frontend to the middle end. This simplifies the situation by
passing the values once, only when we will create the KernelEnvironment,
which contains the values. At that point we also manifest the metadata,
as appropriate. Some footguns have also been removed, e.g., our target
check is now triple-based, not calling convention-based, as the latter
is dependent on the ordering of operations. The types of the values have
been unified to int32_t.
We now provide the information about the min/max thread and team count
from to the OMPIRBuilder, no matter what the source was. That means we
unify `thread_limit`, `num_teams`, `num_threads` handling with the
target specific attriutes (`__launch_bounds__` and
`amdgpu_flat_work_group_size`). This is in preparation to pass the
values to the runtime, and to allow the middle-end (OpenMP-opt) to
tighten the values if it seems appropriate. There is no "real" change
after this commit.
Fixed assertion failure
Basic Block in function 'main' does not have terminator!
label %land.end
caused by premature setting of CodeGenIP upon entry to
emitTargetDataCalls, where subsequent evaluation of logical expression
created new basic blocks, leaving CodeGenIP pointing to the wrong basic
block. CodeGenIP is now set near the end of the function, just prior to
generating a comparison of the logical expression result (from the if
clause) which uses CodeGenIP to insert new IR.
This patch seeks to move the following functions to the OMPIRBuilder:
- getFlagMemberOffset
- getMemberOfFlag
- setCorrectMemberOfFlag
These small helper functions help set the end bits of the
OpenMPOffloadMappingFlags flag that correspond to the reserved segment
for OMP_MAP_MEMBER_OF.
They will be of use in the future for lowering MLIR types/values that
can contian members and can be lowered similarly to a structure or class
type within the OpenMPToLLVMIRTranslation step of the OpenMP dialects
lowering to LLVM-IR. In particular for Flang which currently uses this
flow. Types with descriptors like pointers/allocatables, and likely
derived types in certain cases can be lowered as if they were structures
with explicitly mapped members.
Hashing the sugared type instead of the canonical type meant that
a simple example like this would always fail under MSVC:
```
static auto l() {}
int main() {
auto a = l;
a();
}
```
`clang --target=x86_64-pc-windows-msvc -fno-exceptions
-fsanitize=function -g -O0 -fuse-ld=lld -o test.exe test.cc`
produces:
```
test.cc:4:3: runtime error: call to function l through pointer to incorrect function type 'void (*)()'
```
Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.
Depends on D147217, D147218 and D158278.
Differential Revision: https://reviews.llvm.org/D147219
This patch updates the `OpenMPIRBuilderConfig` structure to hold all
available 'requires' clauses, and it replicates part of the code
generation for the 'requires' registration function from clang in the
`OMPIRBuilder`, to be used with flang.
Porting the rest of features of the clang implementation to the IRBuilder
and sharing it between clang and flang remains for a future patch, due to the
complexity of the logic selecting the attributes of the generated
registration function.
Differential Revision: https://reviews.llvm.org/D147217
The goal of this change is to clean up some of the code surrounding
HLSL using CXXThisExpr as a non-pointer l-value. This change cleans up
a bunch of assumptions and inconsistencies around how the type of
`this` is handled through the AST and code generation.
This change is be mostly NFC for HLSL, and completely NFC for other
language modes.
This change introduces a new member to query for the this object's type
and seeks to clarify the normal usages of the this type.
With the introudction of HLSL to clang, CXXThisExpr may now be an
l-value and behave like a reference type rather than C++'s normal
method of it being an r-value of pointer type.
With this change there are now three ways in which a caller might need
to query the type of `this`:
* The type of the `CXXThisExpr`
* The type of the object `this` referrs to
* The type of the implicit (or explicit) `this` argument
This change codifies those three ways you may need to query
respectively as:
* CXXMethodDecl::getThisType()
* CXXMethodDecl::getThisObjectType()
* CXXMethodDecl::getThisArgType()
This change then revisits all uses of `getThisType()`, and in cases
where the only use was to resolve the pointee type, it replaces the
call with `getThisObjectType()`. In other cases it evaluates whether
the desired returned type is the type of the `this` expr, or the type
of the `this` function argument. The `this` expr type is used for
creating additional expr AST nodes and for member lookup, while the
argument type is used mostly for code generation.
Additionally some cases that used `getThisType` in simple queries could
be substituted for `getThisObjectType`. Since `getThisType` is
implemented in terms of `getThisObjectType` calling the later should be
more efficient if the former isn't needed.
Reviewed By: aaron.ballman, bogner
Differential Revision: https://reviews.llvm.org/D159247
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.
Differential Revision: https://reviews.llvm.org/D158016
offloading
- This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5]
- The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region.
Differential Revision: https://reviews.llvm.org/D152054
OpenMP 5.1 allows emission of the `indirect` clause on declare target
functions, see https://www.openmp.org/spec-html/5.1/openmpsu70.html#x98-1080002.14.7.
The intended use of this is to permit calling device functions via their
associated host pointer. In order to do this the first step will be
building a map associating these variables. Doing this will require the
same offloading entry handling we use for other kernels and globals.
We intentionally emit a new global on the device side. Although it's
possible to look up the device function's address directly, this would
require changing the visibility and would prevent us from making static
functions indirect. Also, the CUDA toolchain will optimize out unused
functions and using a global prevents that. The downside is that the
runtime will need to read the global and copy its value, but there
shouldn't be any other costs.
Note that this patch just performs the codegen, currently this new
offloading entry type is unused and will be ignored by the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D157738
We used to have two separate implementations to derive the number of
threads used in a target region. This lead us to sometimes miss out on
user provided thread bounds (num_threads, or thread_limit) when we
looked for "constant default values". If we might miss out on the
presence of those bounds, we cannot set the thread_limit statically
since the runtime will try to honor user input rather than cap it at the
"preferred default". This patch replaces the secondary implementation
with the primary in a mode that will not emit code but just look for the
presence, and potentially upper bounds, of thread limiting clauses.
The runtime test would not pass without this rewrite as we missed some
clauses, set the static limit on the device to the preferred value, but
then violated that value at runtime.
Fixes: https://github.com/llvm/llvm-project/issues/64845
Differential Revision: https://reviews.llvm.org/D158381
Migrate createForStaticInitFunction, createDispatchInitFunction, createDispatchNextFunction and createDispatchFiniFunction from Clang CodeGen to OMPIRBuilder.
Differential Revision: https://reviews.llvm.org/D157994
OpenMP runtime functions assume the pointers are aligned to sizeof(pointer),
but it is being aligned incorrectly. Fix with the proper alignment in the IR builder.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D157040
Currently, the precense of the OpenMP target declare metadata requires
that we always codegen a global declaration. This is undesirable in the
case that we could defer or omit this declaration as is common with
unused extern variables. This is important as it allows us, in the
runtime, to rely on static linking semantics to omit unused symbols so
they are not included when the user links it in.
This patch changes the check for always emitting these variables.
Because of this we also need to extend this logic to the generation of
the offloading entries. This has the result of derring the offload entry
generation to the canonical definitoin. So we are effectively assuming
whoever owns the storage for this variable will perform that operation.
This makes an exception for `link` attributes as those require their own
special handling.
Let me know if this is sound in the implementation, I do not have the
largest view of the standards here.
Fixes: https://github.com/llvm/llvm-project/issues/64133
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D156368
CUDA and HIP have kernel attributes to tune the code generation (in the
backend). To reuse this functionality for OpenMP target regions we
introduce the `ompx_attribute` clause that takes these kernel
attributes and emits code as if they had been attached to the kernel
fuction (which is implicitly generated).
To limit the impact, we only support three kernel attributes:
`amdgpu_waves_per_eu`, for AMDGPU
`amdgpu_flat_work_group_size`, for AMDGPU
`launch_bounds`, for NVPTX
The existing implementations of those attributes are used for error
checking and code generation. `ompx_attribute` can be attached to any
executable target region and it can hold more than one kernel attribute.
Differential Revision: https://reviews.llvm.org/D156184
This patch migrates the UseDevicePtr and UseDeviceAddr clause related code for handling privatisation from Clang codegen to the OMPIRBuilder
Depends on D150860
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D152554
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
The loop directive is a descriptive construct which allows the compiler
flexibility in how it generates code for the directive's associated
loop(s). See OpenMP specification 5.2 [257:8-9].
Codegen added in this patch for the combined 'loop' directives are:
'target teams loop' -> 'target teams distribute parallel for'
'teams loop' -> 'teams distribute parallel for'
'target parallel loop' -> 'target parallel for'
'parallel loop' -> 'parallel for'
NOTE: The implementation of the 'loop' directive itself is unchanged.
Differential Revision: https://reviews.llvm.org/D145823
This patch changes the emitTargetDataCalls function in clang to make use of the OpenMPIRBuilder::createTargetData function for Target Data directive code gen.
Reapplying commit 0d8d718171 after fixing libomptarget test failure related to debug info.
Depends on D146557
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D150860
Partial progress towards replacing uses of CreateElementBitCast, as it
no longer does what its name suggests.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D154229
This patch refactors the code generation that emits the offloading kernel
launch and moves the core portion to the OpenMPIRBuilder so that it can be used
from flang in the future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D151035
This patch changes the emitTargetDataCalls function in clang to make use of the OpenMPIRBuilder::createTargetData function for Target Data directive code gen.
Depends on D146557
Differential Revision: https://reviews.llvm.org/D150860
This patch migrates the emitOffloadingArrays and EmitNonContiguousDescriptor functions from Clang codegen to OpenMPIRBuilder.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149872
This change tries to move registerTargetglobalVariable and
getAddrOfDeclareTargetVar out of Clang's CGOpenMPRuntime
and into the OMPIRBuilder for shared use with MLIR's OpenMPDialect
and Flang (or other languages that may want to utilise it).
This primarily does this by trying to hoist the Clang specific
types into arguments or callback functions in the form of
lambdas, replacing it with LLVM equivelants and
utilising shared OMPIRBuilder enumerators for
the clauses, rather than Clang's own variation.
Reviewers: jsjodin, jdoerfert
Differential Revision: https://reviews.llvm.org/D149162
There are places in the runtime, like __kmp_init_indirect_csptr, which
assume these pointers are aligned to sizeof(void*), so make sure we emit
them with the correct alignment.
Fixes#62668
Reviewed By: jlpeyton
Differential Revision: https://reviews.llvm.org/D150723
Currently compiler assert when passing variable "memspace" in
omp_init_allocator.
omp_allocator_handle_t alloc=omp_init_allocator(memspace,1,traits)
The problem is memspace is not mapping to the target region. During
the call to emitAllocatorInit, calls to EmitVarDecl for "alloc", then
emit initialization of "alloc" that cause to assert.
If I understant correct, it is not necessary to emit variable
initialization, since "allocator" is private to target region.
To fix this call CGF.EmitAutoVarAlloca(allocator) instead
CGF.EmitVarDecl(allocator).
Differential Revision: https://reviews.llvm.org/D151743
It seems load of traits.addr should be passed in runtime call. Currently
the load of load traits.addr gets passed cause runtime to fail.
To fix this, skip the call to EmitLoadOfScalar for extra load.
Differential Revision: https://reviews.llvm.org/D151576
This patch fixes the issue that list items in `has_device_addr` are still mapped
to the target device because front end emits map type `OMP_MAP_TO`.
Fix#59160.
Reviewed By: jyu2
Differential Revision: https://reviews.llvm.org/D141627