IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.
If a loop-nest:
- does not contain a function call, or
- the -fopenmp-assume-no-nested-parallelism has been specified,
- or the call is to an OpenMP API AND
- does not contain nested loop bind(parallel) directives
then it can be emitted as 'target teams distribute parallel for', which
is the current default. Otherwise, it is emitted as 'target teams
distribute'.
Added debug output indicating how 'target teams loop' was emitted. Flag
is -mllvm -debug-only=target-teams-loop-codegen
Added LIT tests explicitly verifying 'target teams loop' emitted as a
parallel loop and a distribute loop.
Updated other 'loop' related tests as needed to reflect change in IR.
- These updates account for most of the changed files and
additions/deletions.
Summary:
These entires are generic for offloading with the new driver now. Having
the `omp` prefix was a historical artifact and is confusing when used
for CUDA. This patch just renames them for now, future patches will
rework the binary format to make it more common.
Original commit message:
"
Commit https://github.com/llvm/llvm-project/commit/46f3ade introduced a notion
of printing the attributes on the left to improve the printing of attributes
attached to variable declarations. The intent was to produce more GCC compatible
code because clang tends to print the attributes on the right hand side which is
not accepted by gcc.
This approach has increased the complexity in tablegen and the attrubutes
themselves as now the are supposed to know where they could appear. That lead to
mishandling of the `override` keyword which is modelled as an attribute in
clang.
This patch takes an inspiration from the existing approach and tries to keep the
position of the attributes as they were written. To do so we use simpler
heuristic which checks if the source locations of the attribute precedes the
declaration. If so, it is considered to be printed before the declaration.
Fixes https://github.com/llvm/llvm-project/issues/87151
"
The reason for the bot breakage is that attributes coming from ApiNotes are not
marked implicit even though they do not have source locations. This caused an
assert to trigger. This patch forces attributes with no source location
information to be printed on the left. That change is consistent to the overall
intent of the change to increase the chances for attributes to compile across
toolchains and at the same time the produced code to be as close as possible to
the one written by the user.
Commit https://github.com/llvm/llvm-project/commit/46f3ade introduced a
notion of printing the attributes on the left to improve the printing of
attributes attached to variable declarations. The intent was to produce
more GCC compatible code because clang tends to print the attributes on
the right hand side which is not accepted by gcc.
This approach has increased the complexity in tablegen and the
attrubutes themselves as now the are supposed to know where they could
appear. That lead to mishandling of the `override` keyword which is
modelled as an attribute in clang.
This patch takes an inspiration from the existing approach and tries to
keep the position of the attributes as they were written. To do so we
use simpler heuristic which checks if the source locations of the
attribute precedes the declaration. If so, it is considered to be
printed before the declaration.
Fixes https://github.com/llvm/llvm-project/issues/87151
This test is the bottleneck for OpenMP lit tests, running about twice as
long as the others. Break it into five tests based on run lines with the
same version.
Summary:
This new attribute was introduced recently. We already do this for NVPTX
kernels so we should apply this for AMDGPU as well. This patch simply
applies this metadata in cases where a lower bound is known
When emitting the storage (or memory copy operations) for constant
initializers, the decision whether to split a constant structure or
array store into a sequence of field stores or to use `memcpy` is
based upon the optimization level and the size of the initializer.
In afe8b93ffd, we extended this by
allowing constants to be split when the array (or struct) type does
not match the type of data the address to the object (constant) is
expected to contain. This may happen when `emitStoresForConstant` is
called by `EmitAutoVarInit`, as the element type of the address gets
shrunk. When this occurs, let the initializer be split into a bunch
of stores only under `-ftrivial-auto-var-init=pattern`.
Fixes: https://github.com/llvm/llvm-project/issues/84178.
As part of the migration to ptradd
(https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699),
we need to change the representation of the `inrange` attribute, which
is used for vtable splitting.
Currently, inrange is specified as follows:
```
getelementptr inbounds ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, inrange i32 1, i64 2)
```
The `inrange` is placed on a GEP index, and all accesses must be "in
range" of that index. The new representation is as follows:
```
getelementptr inbounds inrange(-16, 16) ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, i32 1, i64 2)
```
This specifies which offsets are "in range" of the GEP result. The new
representation will continue working when canonicalizing to ptradd
representation:
```
getelementptr inbounds inrange(-16, 16) (i8, ptr @vt, i64 48)
```
The inrange offsets are relative to the return value of the GEP. An
alternative design could make them relative to the source pointer
instead. The result-relative format was chosen on the off-chance that we
want to extend support to non-constant GEPs in the future, in which case
this variant is more expressive.
This implementation "upgrades" the old inrange representation in bitcode
by simply dropping it. This is a very niche feature, and I don't think
trying to upgrade it is worthwhile. Let me know if you disagree.
A kernel implicit parameter (dyn_ptr) was introduced some time back.
This patch increments the kernel args version for a compiler supporting
dyn_ptr. The version will be used by the runtime to determine whether
the implicit parameter is generated by the compiler. The versioning is
required to support use cases where code generated by an older compiler
is linked with a newer runtime.
If approved, this patch should be backported to release 18.
The casting of FP atomic loads and stores were always done by the
front-end, even though the AtomicExpandPass will do it if the target
requests it (which is the default).
This patch removes this casting in the front-end entirely.
This pr implements the `[[omp::assume]]` spelling for the `__attribute__((assume))` attribute. It does not change anything about how that attribute is handled by the rest of Clang.
Modified clang/lib/CodeGen/CGStmtOpenMP.cpp to accept multiple use &
destroy clauses with interop directive.
Modified clang/test/OpenMP/interop_codegen.cpp to check for the changes.
Co-authored-by: Sunil Kuravinakop <kuravina@pe28vega.us.cray.com>
This patch fixes the #67002 ([OpenMP][Clang] Scan Directive not
supported for Generic types). It disables the Sema checks/analysis that
are run on the helper arrays which go into the implementation of the
`omp scan` directive until the template instantiation happens.
Grateful to @alexey-bataev for suggesting these changes.
Summary:
Currently, OpenMP handles the `omp requires` clause by emitting a global
constructor into the runtime for every translation unit that requires
it. However, this is not a great solution because it prevents us from
having a defined order in which the runtime is accessed and used.
This patch changes the approach to no longer use global constructors,
but to instead group the flag with the other offloading entires that we
already handle. This has the effect of still registering each flag per
requires TU, but now we have a single constructor that handles
everything.
This function removes support for the old `__tgt_register_requires` and
replaces it with a warning message. We just had a recent release, and
the OpenMP policy for the past four releases since we switched to LLVM
is that we do not provide strict backwards compatibility between major
LLVM releases now that the library is versioned. This means that a user
will need to recompile if they have an old binary that relied on
`register_requires` having the old behavior. It is important that we
actively deprecate this, as otherwise it would not solve the problem of
having no defined init and shutdown order for `libomptarget`. The
problem of `libomptarget` not having a define init and shutdown order
cascades into a lot of other issues so I have a strong incentive to be
rid of it.
It is worth noting that the current `__tgt_offload_entry` only has space
for a 32-bit integer here. I am planning to overhaul these at some point
as well.
According to [dcl.fct] p23:
> An abbreviated function template can have a _template-head_. The
invented _template-parameters_ are appended to the
_template-parameter-list_ after the explicitly declared
_template-parameters_.
`template<>` is not a _template-head_ -- a _template-head_ must have at
least one _template-parameter_. This patch corrects our current behavior
of appending the invented template parameters to the innermost template
parameter list, regardless of whether it is empty. Example:
```
template<typename T>
struct A
{
void f(auto);
};
template<>
void A<int>::f(auto); // ok
template<>
template<> // warning: extraneous template parameter list in template specialization
void A<int>::f(auto);
```
According to [temp.pre] p5:
> In a template-declaration, explicit specialization, or explicit instantiation the init-declarator-list in the declaration shall contain at most one declarator.
A member-declaration that is a template-declaration or explicit-specialization contains a declaration, even though it declares a member. This means it _will_ contain an init-declarator-list (not a member-declarator-list), so [temp.pre] p5 applies.
This diagnoses declarations such as:
```
struct A
{
template<typename T>
static const int x = 0, f(); // error: a template declaration can only declare a single entity
template<typename T>
static const int g(), y = 0; // error: a template declaration can only declare a single entity
};
```
The diagnostic messages are the same as those of the equivalent namespace scope declarations.
Note: since we currently do not diagnose declarations with multiple abbreviated function template declarators at namespace scope e.g., `void f(auto), g(auto);`, so this patch does not add diagnostics for the equivalent member declarations.
This patch also refactors `ParseSingleDeclarationAfterTemplate` (now named `ParseDeclarationAfterTemplate`) to call `ParseDeclGroup` and return the resultant `DeclGroup`.
This is a support for " #pragma omp atomic compare weak". It has Parser
& AST support for now.
---------
Authored-by: Sunil Kuravinakop <kuravina@pe28vega.us.cray.com>
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.
This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.
The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.
Fixes https://github.com/llvm/llvm-project/issues/69841.
This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.
The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.
This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.
This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
Set the writable and dead_on_unwind attributes for sret arguments. These
indicate that the argument points to writable memory (and it's legal to
introduce spurious writes to it on entry to the function) and that the
argument memory will not be used if the call unwinds.
This enables additional MemCpyOpt/DSE/LICM optimizations.
Changes uploaded to the phabricator on Dec 16th are lost because the
phabricator is down. Hence re-uploading it to the github.com.
Changes to be committed:
modified: clang/include/clang/Sema/Sema.h
modified: clang/lib/Sema/SemaOpenMP.cpp
modified: clang/test/OpenMP/generic_loop_ast_print.cpp
modified: clang/test/OpenMP/loop_bind_messages.cpp
modified: clang/test/PCH/pragma-loop.cpp
---------
Co-authored-by: Sunil Kuravinakop
This is a continuation of https://reviews.llvm.org/D123235 ([OpenMP]
atomic compare fail : Parser & AST support). In this branch Support for
codegen support for atomic compare fail is being added.
---------
Co-authored-by: Sunil Kuravinakop
This patch makes `num_teams` and `thread_limit` mandatory for bare
kernels,
similar to a reguar kernel language that when launching a kernel, the
grid size
has to be set explicitly.
Fix mapping of structs to device.
The following example fails:
```
#include <stdio.h>
#include <stdlib.h>
struct Descriptor {
int *datum;
long int x;
int xi;
long int arr[1][30];
};
int main() {
Descriptor dat = Descriptor();
dat.datum = (int *)malloc(sizeof(int)*10);
dat.xi = 3;
dat.arr[0][0] = 1;
#pragma omp target enter data map(to: dat.datum[:10]) map(to: dat)
#pragma omp target
{
dat.xi = 4;
dat.datum[dat.arr[0][0]] = dat.xi;
}
#pragma omp target exit data map(from: dat)
return 0;
}
```
This is a rework of the previous attempt:
https://github.com/llvm/llvm-project/pull/72410
Currently we are missing set up-boundary address for FinalArraySection
as highests elements in partial struct data.
Currently for:
\#pragma omp target map(D.a) map(D.b[:2])
The size is:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%2 = getelementptr float, ptr %arrayidx, i32 1
%3 = ptrtoint ptr %2 to i64
%4 = ptrtoint ptr %a to i64
%5 = sub i64 %3, %4
%6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
Where %2 is wrong for (D.b[:2]) is pointer to first element of array
section. It should pointe to last element of array section.
The fix is to emit the pointer to the last element of array section and
use this pointer as the highest element in partial struct data.
After change IR:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1
%1 = getelementptr float, ptr %arrayidx2, i32 1
%2 = ptrtoint ptr %1 to i64
%3 = ptrtoint ptr %a to i64
%4 = sub i64 %2, %3
%5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
Fix#69214
In `emitOMPSimdRegion`, the `EmitOMPPrivateLoopCounters` should be after
`EmitOMPPrivateClause`.
If not, the private variables will be registered too early, which is not
allowed by `EmitOMPPrivateClause`.
Update all callers to pass through the Address.
For the older builtins such as `__sync_*` and MSVC `_Interlocked*`,
natural alignment of the atomic access is _assumed_. This change
preserves that behavior. It will pass through greater-than-required
alignments, however.
- `nothing` directive was effecting the `if` block structure which it
should not. So return an empty statement instead of an error statement
while parsing to avoid this.
Currently PresentModifierLocs defined with size DefaultmapKindNum; where
DefaultmapKindNum = OMPC_DEFAULTMAP_pointer + 1
Before 5.0 variable-category can not be omitted. For the test like
\#pragma omp target map(tofrom: errors) defaultmap(present)
error would be mitted.
After 5.0 that is allowd.
When try to:
PresentModifierLocs[DMC->getDefaultmapKind()] =
DMC->getDefaultmapModifierLoc();
It is accessed beyond array end.
To fix this using OMPC_DEFAULTMAP_unknow instead OMPC_DEFAULTMAP_poiner.
Hello!
This PR fixes#63871. Clang should no longer crash and instead emits an
error message.
Below is an example of the new error message:
```
~/dev/fork-llvm-project omp_dispatch_unimpl
❯ ./install/bin/clang -fopenmp -c -emit-llvm -Xclang -disable-llvm-passes test.c
test.c:6:5: error: cannot compile this OpenMP dispatch directive yet
6 | #pragma omp dispatch
| ^~~~~~~~~~~~~~~~~~~~
1 error generated.
```
Summary:
This patch provides the initial support to allow handling the new
driver's offloading entries. Normally, the ELF target can emit varibles
at C-identifier named sections and the linker will provide a pointer to
the section. For COFF target, instead the linker merges sections
containing a `$` in alphabetical order. We thus can emit these variables
at sections and then emit two variables that are guaranteed to be sorted
before and after the others to traverse it. Previous patches
consolidated the handling of offloading entries so that this patch more
easily can handle mapping them to the appropriate section.
Ideally, the only remaining step to allow the new driver to run on
Windows targets is to accurately map the following `ld.lld` arguments to
their `llvm-link` equivalents. These are used inside the linker-wrapper,
so we should simply need to remap the arguments to the same
functionality if possible.
```
-o, -output
-l, --library
-L, --library-path
-v, --version
-rpath
-whole-archive, -no-whole-archive
```
I have not tested this at runtime as I do not have access to a windows
machine.
This patch was adapted from some initial efforts in
https://reviews.llvm.org/D137470.
This reverts commit edd675ac28.
This breaks clang build where every component is a shared library.
The file clang/lib/Basic/OpenMPKinds.cpp, which is a part of
libclangBasic.so, uses `getOpenMPClauseName` which isn't:
/usr/bin/ld: CMakeFiles/obj.clangBasic.dir/OpenMPKinds.cpp.o: in functio
n `clang ::getOpenMPSimpleClauseTypeName(llvm::omp::Clause, unsigned int
)':
OpenMPKinds.cpp:(.text._ZN5clang29getOpenMPSimpleClauseTypeNameEN4llvm3o
mp6ClauseEj+0x9b): undefined reference to `llvm::omp::getOpenMPClauseNam
e(llvm::omp::Clause)'
In Clang 16, we implemented the ability to add a label at the end of a
compound statement. These changes complete the implementation by
allowing a label to be followed by a declaration in C.
Note, this seems to have fixed an issue with some OpenMP stand-alone
directives not being properly diagnosed as per:
https://www.openmp.org/spec-html/5.1/openmpsu19.html#x34-330002.1.3
(The same requirement exists in OpenMP 5.2 as well.)