- **[Inliner] Add tests for propagating more parameter attributes; NFC**
- **[Inliner] Propagate more attributes to params when inlining**
Add support for propagating:
- `derefereancable`
- `derefereancable_or_null`
- `align`
- `nonnull`
- `range`
These are only propagated if the parameter to the to-be-inlined callsite
match the exact parameter used in the to-be-inlined function.
Add the permutation clause for the interchange directive which will be
introduced in the upcoming OpenMP 6.0 specification. A preview has been
published in
[Technical Report12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).
This test passes as-is on non-X86 hosts only because almost no target
implements `isValidFeatureName` (the default implementation
unconditionally returns true). RISC-V does implement it, and like X86
checks that the feature name is one supported by the architecture. This
means the test creates an additional warning on RISC-V due to
`_attribute__((target("avx512f")))`.
The simple solution here is to just explicitly target x86_64-linux-gnu.
Summary:
This patch causes us to respect the `-fconvergent-functions` and
`-fno-convergent-functions` options correctly. GPU targets should have
this set all the time, but we now offer `-fno-convergent-functions` to
opt-out if you want to test broken behavior. This munged about with a
lot of the old weird logic, but I don't think it makes any real changes.
Set up these tests so these are marked as unsupported when targeting
z/OS. Most would already be unsupported if you ran lit on z/OS. However,
they also need to be unsupported if the default triple is z/OS.
Added codegen for scope directive, enabled allocate and firstprivate
clauses, and added scope directive LIT test.
Testing
- LIT tests (including new scope test).
- OpenMP scope example test from 5.2 OpenMP API examples document.
- Three executable scope tests from OpenMP_VV/sollve_vv suite.
These transforms all perform a variant of (gep (gep p, x), y)
to (gep p, (x + y)). We can preserve both inbounds and nuw
during such transforms (https://alive2.llvm.org/ce/z/Stu4cN), but
not nusw, which would require proving that the new add is nsw.
For the constant offset case, I've conservatively retained the
logic that checks for negative intermediate offsets, though I'm
not sure it's still reachable nowadays.
Add nuw attribute to inbounds GEPs where the expression used to form the
GEP is an addition of unsigned indices.
Relands #105496, which was reverted because it exposed a miscompilation
arising from #98608. This is now fixed by #106512.
Summary:
Currently, we assign this to private memory. This causes failures on
some SOLLVE tests. The standard isn't clear on the semantics of this
allocation type, but there seems to be a consensus that it's supposed to
be shared memory.
This patch fixes a couple of cases where Clang aborts with loop nests
that are being collapsed (via the relevant OpenMP clause) into a new,
combined loop.
The problematic cases happen when a variable declared within the loop
nest is used in the (init, condition, iter) statement of a more
deeply-nested loop. I don't think these cases (generally?) fall under
the non-rectangular loop nest rules as defined in OpenMP 5.0+, but I
could be wrong (and anyway, emitting an error is better than crashing).
In terms of implementation: the crash happens because (to a first
approximation) all the loop bounds calculations are pulled out to the
start of the new, combined loop, but variables declared in the loop nest
"haven't been seen yet". I believe there is special handling for
iteration variables declared in "for" init statements, but not for
variables declared elsewhere in the "imperfect" parts of a loop nest.
So, this patch tries to diagnose the troublesome cases before they can
cause a crash. This is slightly awkward because at the point where we
want to do the diagnosis (SemaOpenMP.cpp), we don't have scope
information readily available. Instead we "manually" scan through the
AST of the loop nest looking for var decls (ForVarDeclFinder), then we
ensure we're not using any of those in loop control subexprs
(ForSubExprChecker). All that is only done when we have a "collapse"
clause.
Range-for loops can also cause crashes at present without this patch, so
are handled too.
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.
Regression tests are updated using update scripts where possible, and by
find + replace where not.
By the OpenMP standard, `num_teams` clause can only accept one
expression (for now). In this patch, we extend it to allow to accept
multiple expressions when it is used with `target teams ompx_bare`
construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.
This is for mapping structure has data members, which have 'default'
mappers, where needs to map these members individually using
their 'default' mappers.
example map(tofrom: spp[0][0]), look at test case.
currently create 6 maps:
1>&spp, &spp[0], size 8, maptype TARGET_PARAM | FROM | TO
2>&spp[0], &spp[0][0], size(D)with maptype OMP_MAP_NONE, nullptr
3>&spp[0], &spp[0][0].e, size(e) with maptype MEMBER_OF | FROM | TO
4>&spp[0], &spp[0][0].h, size(h) with maptype MEMBER_OF | FROM | TO
5>&spp, &spp[0],size(8), maptype MEMBER_OF | IMPLICIT | FROM | TO
6>&spp[0], &spp[0][0].f size(D) with maptype MEMBER_OF |IMPLICIT
|PTR_AND_OBJ, @.omp_mapper._ZTS1C.default
maptype with/without OMP_MAP_PTR_AND_OBJ
For "2" and "5", since it is mapping pointer and pointee pair,
PTR_AND_OBJ should be set
But for "6" the PTR_AND_OBJ should not set.
However, "5" is duplicate with "1" can be skip.
To fix "2", during the call to emitCombinEntry with false with
NotTargetParams
instead !PartialStruct.PreliminaryMapData.BasePointers.empty(), since
all captures need to be TARGET_PARAM
And inside emitCombineEntry: check
!PartialStruct.PreliminaryMapData.BasePointers.empty() to set
PTR_AND_OBJ
For "5" and "6": the fix in generateInfoForComponentList:
Add new variable IsPartialMapped set with
!PartialStruct.PreliminaryMapData.BasePointers.empty();
When that is true, skip generate "5" and don"t set IsExpressionFirstInfo
to false, so that PTR_AND_OBJ would be set.
After fix: will have 5 maps instead 6
1>&spp, &spp[0], size 8, maptype TARGET_PARAM | FROM | TO
2>&spp[0], &spp[0][0], size(D), maptype PTR_AND_OBJ, nullptr
3>&spp[0], &spp[0][0].e, size(e), maptype MEMBER_OF_2 | FROM | TO
4>&spp[0], &spp[0][0].h, size(h), maptype MEMBER_OF_2 | FROM | TO
5>&spp[0], &spp[0][0].f size(32), maptype MEMBER_OF_2 | IMPLICIT,
@.omp_mapper._ZTS1C.default
For map(sppp[0][0][0]):
after fix: will have 6 maps instead 8.
https://github.com/llvm/llvm-project/pull/101903
This is a minimal patch to support parsing for "omp assume" directives.
These are meant to be hints to a compiler's optimisers: as such, it is
legitimate (if not very useful) to ignore them. The patch builds on top
of the existing support for "omp assumes" directives (note spelling!).
Unlike the "omp [begin/end] assumes" directives, "omp assume" is
associated with a compound statement, i.e. it can appear within a
function. The "holds" assumption could (theoretically) be mapped onto
the existing builtin "__builtin_assume", though the latter applies to a
single point in the program, and the former to a range (i.e. the whole
of the associated compound statement).
This patch fixes sollve's OpenMP 5.1 "omp assume"-based tests.
This is only for struct containing nested structs with user defined
mappers.
Add four functions:
1>buildImplicitMap: build map for default mapper
2>buildImplicitMapper: build default mapper.
3>hasUserDefinedMapper for given mapper name and mapper type, lookup
user defined map, if found one return true.
4>isImplicitMapperNeeded check if Mapper is needed
During create map, in checkMappableExpressionList, call
isImplicitMapperNeeded when it return true, call buildImplicitMapper to
generate implicit mapper and added to map clause.
https://github.com/llvm/llvm-project/pull/101101
Use this to replace the emission of the amdgpu-unsafe-fp-atomics
attribute in favor of per-instruction metadata. In the future
new fine grained controls should be introduced that also cover
the integer cases.
Add a wrapper around CreateAtomicRMW that appends the metadata,
and update a few use contexts to use it.
Fixes issue with defaultmap where scalar isn't handled correctly for
present modifier. Adds all variable cateogry introduced in OpenMP 5.2
and alters existing tests for error messages to check OpenMP 5.2
defaultmap messages.
In debug mode there is a wrapper (the kernel) around the function in
which we generate the kernel code. We worked around this before to get
the correct kernel name, but now we really distinguish both to attach
the launch bounds to the kernel, not the inner function.
Given "loop" construct, clang will try to treat it as "for",
"distribute" or "simd", depending on either the implied binding, or the
bind clause if present. This patch moves the code that performs this
construct remapping from sema to codegen.
For a "loop" construct without a bind clause, this patch will create an
implicit bind clause based on implied binding to simplify further
analysis.
During codegen the function `EmitOMPGenericLoopDirective` (i.e. "loop")
will invoke the "emit" functions for "for", "distribute" or "simd",
depending on the bind clause.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
This reverts the revert commit bee240367c.
This version includes updates to the tests to use patterns when matching
the pointer argument.
Original commit message:
This patch extends Clang's TBAA generation code to emit distinct tags
for incompatible pointer types.
Pointers with different element types are incompatible if the pointee
types are also incompatible (modulo sugar/modifiers).
Express this in TBAA by generating different tags for pointers based on
the pointer depth and pointee type. To get the TBAA tag for the pointee
type it uses getTypeInfoHelper on the pointee type.
(Moved from https://reviews.llvm.org/D122573)
PR: https://github.com/llvm/llvm-project/pull/76612
The expectation for multiple iterators used in a single depend clause
(`depend(iterator(i=0:5,j=0:5), in:x[i][j])`) is that the iterator space
is the product of the iteration vectors (25 in that case). The current
codeGen only works correctly, if `numIterators() = 1`. For more
iterators, the execution results in runtime assertions or segfaults.
The modified codeGen first calculates the iteration space, then
multiplies to the number of dependencies in the depend clause and
finally adds to the total number of iterator dependencies.
This is a follow-up from the conversation starting at
https://github.com/llvm/llvm-project/pull/93809#issuecomment-2173729801
The root problem that motivated the change are external AST sources that
compute `ASTRecordLayout`s themselves instead of letting Clang compute
them from the AST. One such example is LLDB using DWARF to get the
definitive offsets and sizes of C++ structures. Such layouts should be
considered correct (modulo buggy DWARF), but various assertions and
lowering logic around the `CGRecordLayoutBuilder` relies on the AST
having `[[no_unique_address]]` attached to them. This is a
layout-altering attribute which is not encoded in DWARF. This causes us
LLDB to trip over the various LLVM<->Clang layout consistency checks.
There has been precedent for avoiding such layout-altering attributes
from affecting lowering with externally-provided layouts (e.g., packed
structs).
This patch proposes to replace the `isZeroSize` checks in
`CGRecordLayoutBuilder` (which roughly means "empty field with
[[no_unique_address]]") with checks for
`CodeGen::isEmptyField`/`CodeGen::isEmptyRecord`.
**Details**
The main strategy here was to change the `isZeroSize` check in
`CGRecordLowering::accumulateFields` and
`CGRecordLowering::accumulateBases` to use the `isEmptyXXX` APIs
instead, preventing empty fields from being added to the `Members` and
`Bases` structures. The rest of the changes fall out from here, to
prevent lookups into these structures (for field numbers or base
indices) from failing.
Added `isEmptyRecordForLayout` and `isEmptyFieldForLayout` (open to
better naming suggestions). The main difference to the existing
`isEmptyRecord`/`isEmptyField` APIs, is that the `isEmptyXXXForLayout`
counterparts don't have special treatment for `unnamed bitfields`/arrays
and also treat fields of empty types as if they had
`[[no_unique_address]]` (i.e., just like the `AsIfNoUniqueAddr` in
`isEmptyField` does).
There are two problems with _BitInt prior to this patch:
1. For at least some values of N, we cannot use LLVM's iN for the type
of struct elements, array elements, allocas, global variables, and so
on, because the LLVM layout for that type does not match the high-level
layout of _BitInt(N).
Example: Currently for i128:128 targets correct implementation is
possible either for __int128 or for _BitInt(129+) with lowering to iN,
but not both, since we have now correct implementation of __int128 in
place after a21abc7.
When this happens, opaque [M x i8] types used, where M =
sizeof(_BitInt(N)).
2. LLVM doesn't guarantee any particular extension behavior for integer
types that aren't a multiple of 8. For this reason, all _BitInt types
are now have in-memory representation that is a whole number of bytes.
I.e. for example _BitInt(17) now will have memory layout type i32.
This patch also introduces concept of load/store type and adds an API to
CodeGenTypes that returns the IR type that should be used for load and
store operations. This is particularly useful for the case when a
_BitInt ends up having array of bytes as memory layout type. For
_BitInt(N), let M = sizeof(_BitInt(N)), and let BITS = M * 8. Loads and
stores of iM would both (1) produce far better code from the backends
and (2) be far more optimizable by IR passes than loads and stores of [M
x i8].
Fixes https://github.com/llvm/llvm-project/issues/85139
Fixes https://github.com/llvm/llvm-project/issues/83419
---------
Co-authored-by: John McCall <rjmccall@gmail.com>
This patch extends Clang's TBAA generation code to emit distinct tags
for incompatible pointer types.
Pointers with different element types are incompatible if the pointee
types are also incompatible (modulo sugar/modifiers).
Express this in TBAA by generating different tags for pointers based on
the pointer depth and pointee type. To get the TBAA tag for the pointee
type it uses getTypeInfoHelper on the pointee type.
(Moved from https://reviews.llvm.org/D122573)
PR: https://github.com/llvm/llvm-project/pull/76612
The previous check was inconsistent. For example, it would allow
```
#pragma omp target
#pragma omp parallel for
for (...) {
#pragma omp scan
}
```
but not
```
#pragma omp target parallel for
for (...) {
#pragma omp scan
}
```
Make the check conform to the wording on the specification.
Summary:
The parsing for this was implemented, but we never hooked up the default
value to the result of this clause. This patch adds the support by
making it default to the requires directive.
This patch implements (not yet published)
[P3144R2](https://wiki.edg.com/pub/Wg21stlouis2024/StrawPolls/p3144r2.pdf)
"Deleting a Pointer to an Incomplete Type Should be Ill-formed". Wording
changes (not yet merged into the working draft) read:
> 7.6.2.9 [expr.delete] Delete
> If the object being deleted has incomplete class type at the point of
deletion <del>and the complete class has a
non-trivial destructor or a deallocation function, the behavior is
undefined</del>, <ins>the program is ill-formed</ins>.
We preserve status quo of emitting a warning when deleting a pointer to
incomplete type up to, and including, C++23, but make it ill-formed
since C++26. Same goes for deleting pointers to `void`, which has been
allowed as an extension.
While lowering (#pragma omp target update from), clang's generated
.omp_task_entry. is setting up 9 arguments while calling
__tgt_target_data_update_nowait_mapper.
At the same time, in __tgt_target_data_update_nowait_mapper, call to
targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming
it has the argument count listed in the signature.
AARCH64 asm sequence for this is as follows (removed unrelated insns):
`
.omp_task_entry..108:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8. // stack canary
str xzr, [sp]
bl __tgt_target_data_update_nowait_mapper
__tgt_target_data_update_nowait_mapper:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8 // stack canary
// Sibcall argument setup
adrp x8,
:got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
ldr x8, [x8,
:got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
stp x9, x8, x29, #16
adrp x8, .L.str.8
add x8, x8, :lo12:.L.str.8
str x8, x29, #32. <==. This is the insn that erases $fp
ldp x29, x30, sp, #16 // 16-byte Folded Reload
add sp, sp, #32
// Sibcall
b
ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD
`
On AArch64, call to __tgt_target_data_update_nowait_mapper in
.omp_task_entry. sets up only single space on stack and this results in
ovewriting $fp and subsequent stack corruption. This issue can be
credited to discrepancy of __tgt_target_data_update_nowait_mapper
signature in openmp/libomptarget/include/omptarget.h taking 13 arguments
while clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.
This patch modifies __tgt_target_data_update_nowait_mapper signature to
match .omp_task_entry usage(and other 2 files mentioned above).
Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
Fix another runtime problem when explicit map both pointer and pointee
in target data region.
In #92210, problem is only addressed in target region, but missing for
target data region.
The change just passing AreBothBasePtrAndPteeMapped in
generateInfoForComponentList when processing target data.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.
Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>
Static verifier noticed the current code has logically dead code parsing
the clause where IsComma is assigned. Fix this and improve the error
message received when a bad adjust-op is specified.
This will now be handled like 'map' where a nice diagnostic is given
with the correct values, then parsing continues on the next clause
reducing unhelpful diagnostics.
The static verifier flagged dead code in the check since the loop will
only execute once and never reach the iterator increment.
The loop needs to iterate twice to correctly diagnose when a statement
is after the teams.
Since there are two iterations again, reset the iterator to the first
teams directive when the double teams case is seen so the diagnostic can
report both locations.
This checks if the layout of `std::initializer_list` is something Clang
can handle much earlier and deduplicates the checks in
CodeGen/CGExprAgg.cpp and AST/ExprConstant.cpp
Also now diagnose `union initializer_list` (Fixes#95495), bit-field for
the size (Fixes a crash that would happen during codegen if it were
unnamed), base classes (that wouldn't be initialized) and polymorphic
classes (whose vtable pointer wouldn't be initialized).
Some of the OpenMP code can change the instruction pointed at by the
insertion point. This leads to an assert in the compiler about
BB->getParent() and IP->getParent() not matching.
The fix is to rebuild the insertionpoint from the block, rather than use
builder.restoreIP.
Also, move some of the alloca generation, rather than skipping back and
forth between insert points (and ensure all the allocas are done before
their users are created).
A simple test, mainly to ensure the minimal reproducer doesn't fail to
compile in the future is also added.
This patch makes the final major change of the RemoveDIs project, changing the
default IR output from debug intrinsics to debug records. This is expected to
break a large number of tests: every single one that tests for uses or
declarations of debug intrinsics and does not explicitly disable writing
records.
If this patch has broken your downstream tests (or upstream tests on a
configuration I wasn't able to run):
1. If you need to immediately unblock a build, pass
`--write-experimental-debuginfo=false` to LLVM's option processing for all
failing tests (remember to use `-mllvm` for clang/flang to forward arguments to
LLVM).
2. For most test failures, the changes are trivial and mechanical, enough that
they can be done by script; see the migration guide for a guide on how to do
this: https://llvm.org/docs/RemoveDIsDebugInfo.html#test-updates
3. If any tests fail for reasons other than FileCheck check lines that need
updating, such as assertion failures, that is most likely a real bug with this
patch and should be reported as such.
For more information, see the recent PSA:
https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
Reapplying without changes. The flang+openmp buildbot failure
should be addressed by https://github.com/llvm/llvm-project/pull/94541.
-----
This is a followup to https://github.com/llvm/llvm-project/pull/93823
and drops the DataLayout-unaware GEP of GEP fold entirely. All cases are
now left to the DataLayout-aware constant folder, which will fold
everything to a single i8 GEP.
We didn't have any test coverage for this fold in LLVM, but some Clang
tests change.