Fixes#136357
The barrier needs to go between the copying into firstprivate variables
and the initialization call for the OpenMP construct (e.g. wsloop).
There is no way of expressing this in MLIR because for delayed
privatization that is all implicit (added in MLIR->LLVMIR conversion).
The previous approach put the barrier immediately before the wsloop (or
similar). For delayed privatization, the firstprivate copy code would
then be inserted after that, opening the possibility for the race
observed in the bug report.
This patch solves the issue by instead setting an attribute on the mlir
operation, which will instruct openmp dialect to llvm ir conversion to
insert a barrier in the correct place.
This now produces code equivalent to if there was an explicit private
clause on the SECTIONS construct.
The problem was that each SECTION construct got its own DSP, which tried
to privatize the same symbol for that SECTION. Privatization for
SECTION(S) happens on the outer SECTION construct and so the outer
construct's DSP should be shared.
Fixes#135108
This PR tries to fix `lastprivate` update issues in composite
constructs. In particular, pre-determined `lastprivate` symbols are
attached to the wrong leaf of the composite construct (the outermost
one). When using delayed privatization (should be the default mode in
the future), this results in trying to update the `lastprivate` symbol
in the wrong construct (outside the `omp.loop_nest` op).
For example, given the following input:
```fortran
!$omp target teams distribute parallel do simd collapse(2) private(y_max)
do i=x_min,x_max
do j=y_min,y_max
enddo
enddo
```
Without the fixes introduced in this PR, the `DataSharingProcessor`
tries to generate the `lastprivate` update ops in the `parallel` op
since this is the op for which the DSP instance is created.
The fix consists of 2 main parts:
1. Instead of creating a single DSP instance, one instance is created
for the leaf constructs that might need privatization (whether for
explicit, implicit, or pre-determined symbols).
2. When generating the `lastprivate` comparison ops, we don't directly
use the SSA values of the UBs and steps. Instead, we regenerated these
SSA values from the original loop bounds' expressions. We have to do
this to avoid using `host_eval` values in the `lastprivate` comparison
logic which is illegal.
Use hlfir dereferencing for pointers and allocatables and use hlfir
assign. Also, change the code updating IV in lastprivate.
Note: This is a small change. Modifications in existing tests are
changes from fir.store to hlfir.assign.
Fixes#121290
See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084
The aim here is to ensure that tasks which are not executed for a while
after they are created do not try to reference any data which are now
out of scope. This is done by packing the data referred to by the task
into a heap allocated structure (freed at the end of the task).
I decided to create the task context structure in
OpenMPToLLVMIRTranslation instead of adapting how it is done
CodeExtractor (via OpenMPIRBuilder] because CodeExtractor is (at least
in theory) generic code which could have other unrelated uses.
Whenever there is a `lastprivate` variable and another unrelated
variable sets the `mightHaveReadHostSym` flag during Flang lowering
privatization, this will result in the insertion of a barrier.
This patch modifies this behavior such that this barrier will not be
inserted unless the same symbol both sets the flag and has
`lastprivate`.
Fixes a bug in updating `lastprivate` variables. Previously, we only
iterated over the symbols collected from `lastprivate` clauses. This
meants that for pre-determined symbols, we did not implement the update
correctly (e.g. for loop iteration variables of `simd` constructs).
The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084
For example, a privatizer for an integer used to look like
```mlir
omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
^bb0(%arg0: !fir.ref<i32>):
%0 = ... allocate proper memory for the private clone ...
omp.yield(%0 : !fir.ref<i32>)
}
```
After this change, allocation become implicit in the operation:
```mlir
omp.private {type = private} @x.privatizer : i32
```
For more complex types that require initialization after allocation, an
init region can be used:
``` mlir
omp.private {type = private} @x.privatizer : !some.type init {
^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
// initialize %arg1, using %arg0 as a mold for allocations
omp.yield(%arg1 : !some.pointer<!some.type>)
} dealloc {
^bb0(%arg0: !some.pointer<!some.type>):
... deallocate memory allocated by the init region ...
omp.yield
}
```
This patch lays the groundwork for delayed task execution but is not
enough on its own.
After this patch all gfortran tests which previously passed still pass.
There
are the following changes to the Fujitsu test suite:
- 0380_0009 and 0435_0009 are fixed
- 0688_0041 now fails at runtime. This patch is testing firstprivate
variables with tasks. Previously we got lucky with the undefined
behavior and won the race. After these changes we no longer get lucky.
This patch lays the groundwork for a proper fix for this issue.
In flang the lowering re-uses the existing lowering used for reduction
init and dealloc regions.
In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.
This fixes a bug when the same variable is used in `firstprivate` and
`lastprivate` clauses on the same construct. The issue boils down to the
fact that `copyHostAssociateVar` was deciding the direction of the copy
assignment (i.e. the `lhs` and `rhs`) based on whether the
`copyAssignIP`
parameter is set. This is not the best way to do it since it is not
related to whether we doing a copy from host to localized copy or the
other way around. When we set the insertion for `firstprivate` in
delayed privatization, this resulted in switching the direction of the
copy assignment. Instead, this PR adds a new paramter to explicitely
tell
the function the direction of the assignment.
This is a follow up PR for
https://github.com/llvm/llvm-project/pull/122471, only the latest commit
is relevant.
InitializeClone(), implemented in #120295, was not handling top
level pointers and allocatables correctly.
Pointers and unallocated variables must be skipped.
This caused some regressions in the Fujitsu testsuite:
https://linaro.atlassian.net/browse/LLVM-1488
Allocatable members of privatized derived types must be allocated,
with the same bounds as the original object, whenever that member
is also allocated in it, but Flang was not performing such
initialization.
The `Initialize` runtime function can't perform this task unless
its signature is changed to receive an additional parameter, the
original object, that is needed to find out which allocatable
members, with their bounds, must also be allocated in the clone.
As `Initialize` is used not only for privatization, sometimes this
other object won't even exist, so this new parameter would need
to be optional.
Because of this, it seemed better to add a new runtime function:
`InitializeClone`.
To avoid unnecessary calls, lowering inserts a call to it only for
privatized items that are derived types with allocatable members.
Fixes https://github.com/llvm/llvm-project/issues/114888
Fixes https://github.com/llvm/llvm-project/issues/114889
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect
for pointers and derived type with default initialization.
For pointers, the descriptor was not established with the rank/type
code/element size, leading to undefined behavior if any inquiry was made
to it prior to a pointer assignment (and if/when using the runtime for
pointer assignments, the descriptor must have been established).
For derived type with default initialization, the copies were not
default initialized.
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
OpenMP prohibits privatisation of variables that appear in expressions
for statement functions.
This is a re-working of an old patch https://reviews.llvm.org/D93213 by
@praveen-g-ctt.
The old patch couldn't be landed because of ordering concerns. Statement
functions are rewritten during parse tree rewriting, but this was done
after resolve-directives and so some array expressions were incorrectly
identified as statement functions. For this reason **I have opted to
re-order the semantics driver so that resolve-directives is run after
parse tree rewriting**.
Closes#54677
---------
Co-authored-by: Praveen <praveen@compilertree.com>
This patch creates a simple RAII wrapper class for `SymMap` to make it
easier to use and prevent a missing matching `popScope()` for a
`pushScope()` call on simple use cases.
Some push-pop pairs are replaced with instances of the new class by this
patch.
This patch removes unused and undefined method declarations from
`DataSharingProcessor`, as well as the unused `hasLastPrivateOp` class
member. The `insPt` class member is replaced by a local `InsertionGuard`
in the only place it is set and used.
This patch updates the `omp.parallel` operation according to the results
of the discussion in [this
RFC](https://discourse.llvm.org/t/rfc-disambiguation-between-loop-and-block-associated-omp-parallelop/79972).
It is removed from the set of loop wrapper operations, changing the
expected MLIR representation for composite `distribute parallel do/for`
into the following:
```mlir
omp.parallel {
...
omp.distribute {
omp.wsloop {
omp.loop_nest ... { ... }
omp.terminator
}
omp.terminator
}
...
omp.terminator
}
```
MLIR verifiers for operations impacted by this representation change are
updated, as well as related tests. The `LoopWrapperInterface` is also
updated, since it's no longer representing an optional "role" of an
operation but a mandatory set of restrictions instead.
Variables referenced in the body of statement functions need to be
handled as if they are explicitly referenced. Otherwise, they are
skipped during implicit privatization, because statement functions
are represented as procedures in the parse tree.
To avoid missing symbols referenced only in statement functions
during implicit privatization, new symbols, associated with them,
are created and inserted into the context of the directive that
privatizes them. They are later collected and processed in
lowering. To avoid confusing these new symbols with regular ones,
they are tagged with the new OmpFromStmtFunction flag.
Fixes https://github.com/llvm/llvm-project/issues/74273
Handles variables that are storage associated via `equivalence`. The
problem is that these variables are declared as `fir.ptr`s while their
privatized storage is declared as `fir.ref` which was triggering a
validation error in the OpenMP dialect.
This patch introduces a new OpenMP clause definition not defined by the spec.
Its main purpose is to define the `loop_inclusive` (previously "inclusive",
renamed according to the parent of this PR in the stack) argument of
`omp.loop_nest` in such a way that a followup implementation of a tablegen
backend to automatically generate clause and operation operand structures
directly from `OpenMP_Op` and `OpenMP_Clause` definitions can properly generate
the `LoopNestOperands` structure.
`collapse` clause arguments are also moved into this new definition, as they
represent information on the loop nests being collapsed rather than the
`collapse` clause itself.
Currently, there are some inconsistencies to how clause arguments are
named in the OpenMP dialect. Additionally, the clause operand structures
associated to them also diverge in certain cases. The purpose of this
patch is to normalize argument names across all `OpenMP_Clause` tablegen
definitions and clause operand structures.
This has the benefit of providing more consistent representations for
clauses in the dialect, but the main short-term advantage is that it
enables the development of an OpenMP-specific tablegen backend to
automatically generate the clause operand structures without breaking
dependent code.
The main re-naming decisions made in this patch are the following:
- Variadic arguments (i.e. multiple values) have the "_vars" suffix.
This and other similar suffixes are removed from array attribute
arguments.
- Individual required or optional value arguments do not have any suffix
added to them (e.g. "val", "var", "expr", ...), except for `if` which
would otherwise result in an invalid C++ variable name.
- The associated clause's name is prepended to argument names that don't
already contain it as part of its name. This avoids future collisions
between arguments named the same way on different clauses and adding
both clauses to the same operation.
- Privatization and reduction related arguments that contain lists of
symbols pointing to privatizer/reducer operations use the "_syms"
suffix. This removes the inconsistencies between the names for
"copyprivate_funcs", "[in]reductions", "privatizers", etc.
- General improvements to names, replacement of camel case for snake
case everywhere, etc.
- Renaming of operation-associated operand structures to use the
"Operands" suffix in place of "ClauseOps", to better differentiate
between clause operand structures and operation operand structures.
- Fields on clause operand structures are sorted according to the
tablegen definition of the same clause.
The assembly format for a few arguments is updated to better reflect the
clause they are associated with:
- `chunk_size` -> `dist_schedule_chunk_size`
- `grain_size` -> `grainsize`
- `simd` -> `par_level_simd`
Don't use `copyHostAssociateVar` for allocatable variables. It isn't
clear to me whether or not this should be addressed in
`copyHostAssociateVar` instead of inside OpenMP. I opted for OpenMP
to minimise how many things I effected. `copyHostAssociateVar` will
not update the destination variable if the destination variable
was unallocated. This is incorrect because assignment inside of the
openmp block can cause the allocation status of the variable to
change. Furthermore, `copyHostAssociateVar` seems to only copy the
variable address not other metadata like the size of the allocation.
Reallocation by assignment could cause this to change.
This patch enables the lastprivate clause to be used in the presence of
the collapse clause.
Note: the way we currently implement lastprivate means that this adds a
large number of compare instructions to the end of every iteration of
the loop. This is a clearly non-optimal thing to do, but lastprivate in
general will need re-implementing to prevent this. This is planned as
part of the delayed privatization work. This current implementation is
just a stop-gap measure as generating sub-optimal but working code is
better than crashing out.
This patch removes the introduction of `fir.undef` operations as a way
to keep track of insertion points inside of the `DataSharingProcessor`,
and it replaces them with an `InsertionGuard` to avoid creating such
operations inside of loop wrappers.
Leaving any `fir.undef` operation inside of a loop wrapper would result
in a verifier error, since they enforce strict requirements on the
contents of their code regions.
Extends delayed privatization support to `taraget .. private(..)`. With
this PR, `private` is support for `target` **only** is delayed
privatization mode.
The object identity requires more than just `Symbol`. Don't use `id()`
to get the Symbol associated with the object, becase the return value
will need to change. Instead use `sym()` which is added for that reason.
Fixes a bug in emiting deacllocation logic when delayed privatization is
disabled. I introduced the bug when implementing delayed privatization
for allocatables: when delayed privatization is disabled the
deacllocation ops are emitted for only one allocatable variables.
This PR contains 2 commits:
1. A commit to reapply changes introduced #91116 (was reverted earlier
due to test suite failures)
2. A commit containing a possible solution for the issue causing the
test suite failures. In particular, it introduces a simple symbol
visitor class to keep track of the current active OMP construct and
marking this active construct as the scope defining the symbol being
visisted.
Besides duplicating code, privatizing variables in every section
causes problems when synchronization barriers are used. This
happens because each section is executed by a given thread, which
will cause the program to hang if not all running threads execute
the barrier operation.
Fixes https://github.com/llvm/llvm-project/issues/72824
Current implementation of default clause privatization incorrectly fails
to privatize in presence of non-OpenMP constructs (i.e. nested
constructs with regions whose symbols need to be privatized in the scope
of the parent OpenMP construct). This patch fixes the same by
considering non-OpenMP constructs separately by collecting symbols of a
nested region if it is a non-OpenMP construct with a region, and
privatizing it in the scope of the parent OpenMP construct.
Fixes https://github.com/llvm/llvm-project/issues/71914 and
https://github.com/llvm/llvm-project/issues/71915
This patch updates lowering from PFT to MLIR of workshare loops to
follow the loop wrapper approach. Unit tests impacted by this change are
also updated.
As the last patch of the stack, this should compile and pass unit tests.
This patch replaces some `saveInsertionPoint`, `restoreInsertionPoint`
call pairs for an `InsertionGuard` instance where it makes sense within
Flang OpenMP lowering to make further modifications less error-prone.