The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:
https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084
For example, a privatizer for an integer used to look like
```mlir
omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
^bb0(%arg0: !fir.ref<i32>):
%0 = ... allocate proper memory for the private clone ...
omp.yield(%0 : !fir.ref<i32>)
}
```
After this change, allocation become implicit in the operation:
```mlir
omp.private {type = private} @x.privatizer : i32
```
For more complex types that require initialization after allocation, an
init region can be used:
``` mlir
omp.private {type = private} @x.privatizer : !some.type init {
^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
// initialize %arg1, using %arg0 as a mold for allocations
omp.yield(%arg1 : !some.pointer<!some.type>)
} dealloc {
^bb0(%arg0: !some.pointer<!some.type>):
... deallocate memory allocated by the init region ...
omp.yield
}
```
This patch lays the groundwork for delayed task execution but is not
enough on its own.
After this patch all gfortran tests which previously passed still pass.
There
are the following changes to the Fujitsu test suite:
- 0380_0009 and 0435_0009 are fixed
- 0688_0041 now fails at runtime. This patch is testing firstprivate
variables with tasks. Previously we got lucky with the undefined
behavior and won the race. After these changes we no longer get lucky.
This patch lays the groundwork for a proper fix for this issue.
In flang the lowering re-uses the existing lowering used for reduction
init and dealloc regions.
In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.
Scan directive allows to specify scan reductions within an worksharing
loop, worksharing loop simd or simd directive which should have an
`InScan` modifier associated with it. This change adds the mlir support
for the same.
Related PR: [Parsing and Semantic Support for
scan](https://github.com/llvm/llvm-project/pull/102792)
This patch adds the `host_eval` clause to the `omp.target` operation.
Additionally, it updates its op verifier to make sure all uses of block
arguments defined by this clause fall within one of the few cases where
they are allowed.
MLIR to LLVM IR translation fails on translation of this clause with a
not-yet-implemented error.
This PR adds translation support for task detach. Essentially, if the
`detach` clause is present on a task, emit a
`__kmpc_task_allow_completion_event` on it, and store its return (of
type `kmp_event_t*`) into the `event_handle`.
This PR extends the MLIR representation for `omp.target` ops by adding a
`map_idx` to `private` vars. This annotation stores the index of the map
info operand corresponding to the private var. If the variable does not
have a map operand, the `map_idx` attribute is either not present at all
or its value is `-1`.
This makes matching the private variable to its map info op easier (see
https://github.com/llvm/llvm-project/pull/116576 for usage).
Adds initial support for lowering the `loop` directive to MLIR.
The PR includes basic suport and testing for the following clauses:
* `collapse`
* `order`
* `private`
* `reduction`
Parent PR: #113911, only the latest commit is relevant to this PR.
This is one of 3 PRs in a PR stack that aims to add support for explicit
mapping of allocatable members in derived types.
The primary changes in this PR are the OpenMPToLLVMIRTranslation.cpp
changes, which are small and seek to alter the current member mapping to
add an additional map insertion for pointers. Effectively, if the member
is a pointer (currently indicated by having a varPtrPtr field) we add an
additional map for the pointer and then alter the subsequent mapping of
the member (the data) to utilise the member rather than the parents base
pointer. This appears to be necessary in certain cases when mapping
pointer data within record types to avoid segfaulting on device (due to
incorrect data mapping). In general this record type mapping may be
simplifiable in the future.
There are also additions of tests which should help to showcase the
affect of the changes above.
This patch unifies the handling of errors passed through the
OpenMPIRBuilder and removes some redundant error messages through the
introduction of a custom `ErrorInfo` subclass.
Additionally, the current list of operations and clauses unsupported by
the MLIR to LLVM IR translation pass is added to a new Lit test to check
they are being reported to the user.
This patch moves the part of operation verifiers dependent on the
contents of their regions to the corresponding `verifyRegions` method.
This ensures these are only triggered after the operations in the region
have themselved already been verified in advance, avoiding checks based
on invalid nested operations.
The `LoopWrapperInterface` is also updated so that its verifier runs
after operations in the region of ops with this interface have already
been verified.
This patch enables lowering to MLIR of the reduction clause of `simd`
constructs. Lowering from MLIR to LLVM IR remains unimplemented, so at
that stage it will result in errors being emitted rather than silently
ignoring it as it is currently done.
On composite `do simd` constructs, this lowering error will remain
untriggered, as the `omp.simd` operation in that case is currently
ignored. The MLIR representation, however, will now contain `reduction`
information.
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
This fixes all the places in MLIR that hit the new assertion added in
#106524, in preparation for enabling it by default. That is, cases where
the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, or set the
implicitTrunc flag to retain the old behavior. I've left TODOs for the
latter case in some places, where I think that it may be worthwhile to
stop doing implicit truncation in the future.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
This is just the MLIR changes split off from
https://github.com/llvm/llvm-project/pull/80309.
The `omp.section` operation is an outlier in that the block arguments it
has are defined by clauses on the required parent `omp.sections`
operation.
This patch updates the definition of this operation introducing the
`BlockArgOpenMPOpInterface` to simplify the handling and verification of
these block arguments, implemented based on the parent `omp.sections`.
This patch updates the `omp.target_data` operation to use the same
formatting as `map` clauses on `omp.target` for `use_device_addr` and
`use_device_ptr`. This is done so the mapping that is being enforced
between op arguments and associated entry block arguments is explicit.
The way it is achieved is by marking these clauses as entry block
argument-defining and adjusting printer/parsers accordingly.
As a result of this change, block arguments for `use_device_addr` come
before those for `use_device_ptr`, which is the opposite of the previous
undocumented situation. Some unit tests are updated based on this
change, in addition to those updated because of the format change.
This patch updates printing and parsing of operations including clauses
that define entry block arguments to the operation's region. This
impacts `in_reduction`, `map`, `private`, `reduction` and
`task_reduction`.
The proposed representation to be used by all such clauses is the
following:
```
<clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) {
...
}
```
The `byref` tag is only allowed for reduction-like clauses and the
`@<sym>` is required and only allowed for the `private` and
reduction-like clauses. The `map` clause does not accept any of these
two.
This change fixes some currently broken op representations, like
`omp.teams` or `omp.sections` reduction:
```
omp.teams reduction([byref] @<sym> -> %value : <type>) {
^bb0(%block_arg : <type>):
...
}
```
Additionally, it addresses some redundancy in the representation of the
previously mentioned cases, as well as e.g. `map` in `omp.target`. The
problem is that the block argument name after the arrow is not checked
in any way, which makes some misleading representations legal:
```mlir
omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) {
^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr):
...
}
```
In that case, `%x` maps to `%arg0`, contrary to what the representation
states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it
would likely cause issues if used anywhere inside of the operation's
region.
The solution implemented in this patch makes it so that values
introduced after the arrow on the representation of these clauses
implicitly define the corresponding entry block arguments, removing the
potential for these problematic representations. This is what is already
implemented for the `private` and `reduction` clauses of `omp.parallel`.
There are a couple of consequences of this change:
- Entry block argument-defining clauses must come at the end of the
operation's representation and in alphabetical order. This is because
they are printed/parsed as part of the region and a standardized
ordering is needed to reliably match op arguments with their
corresponding entry block arguments via the `BlockArgOpenMPOpInterface`.
- We can no longer define per-clause assembly formats to be reused by
all operations that take these clauses, since they must be passed to a
custom printer including the region and arguments of all other entry
block argument-defining clauses. Code duplication and potential for
introducing issues is minimized by providing the generic
`{print,parse}BlockArgRegion` helpers and associated structures.
MLIR and Flang lowering unit tests are updated due to changes in the
order and formatting of impacted operations.
This patch introduces a new MLIR interface for the OpenMP dialect aimed
at providing a uniform way of verifying and handling entry block
arguments defined by OpenMP clauses.
The approach consists in defining a set of overrideable methods that
return the number of block arguments the operation holds regarding each
of the clauses that may define them. These by default return 0, but they
are overriden by the corresponding clause through the
`extraClassDeclaration` mechanism.
Another set of interface methods to get the actual lists of block
arguments is defined, which is implemented based on the previously
described methods. These implicitly define a standardized ordering
between the list of block arguments associated to each clause, based on
the alphabetical ordering of their names. They should be the preferred
way of matching operation arguments and entry block arguments to that
operation's first region.
Some updates are made to the printing/parsing of `omp.parallel` to
follow the expected order between `private` and `reduction` clauses, as
well as the MLIR to LLVM IR translation pass to access block arguments
using the new interface. Unit tests of operations impacted by additional
verification checks and sorting of entry block arguments.
This patch moves verification code for the `LoopWrapperInterface` to the
interface itself, checking it automatically for each operation that has
that interface.
Starts delayed privatizaiton support for standalone `distribute`
directives. Other flavours of `distribute` are still TODO as well as
MLIR to LLVM IR lowering.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5 for further reference )
* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
This patch adds the "gen-openmp-clause-ops" `mlir-tblgen` generator to
produce the structure definitions previously in OpenMPClauseOperands.h
automatically from the information contained in OpenMPOps.td and
OpenMPClauses.td.
The original header is maintained to enable the definition of similar
structures that are not directly related to any single `OpenMP_Clause`
or `OpenMP_Op` tablegen definition.
This patch updates the `omp.parallel` operation according to the results
of the discussion in [this
RFC](https://discourse.llvm.org/t/rfc-disambiguation-between-loop-and-block-associated-omp-parallelop/79972).
It is removed from the set of loop wrapper operations, changing the
expected MLIR representation for composite `distribute parallel do/for`
into the following:
```mlir
omp.parallel {
...
omp.distribute {
omp.wsloop {
omp.loop_nest ... { ... }
omp.terminator
}
omp.terminator
}
...
omp.terminator
}
```
MLIR verifiers for operations impacted by this representation change are
updated, as well as related tests. The `LoopWrapperInterface` is also
updated, since it's no longer representing an optional "role" of an
operation but a mandatory set of restrictions instead.
This region is intended to separate alloca operations from reduction
variable initialization. This makes it easier to hoist allocas to the
entry block before control flow and complex code for initialization.
The verifier checks that there is at most one block in the alloc region.
This is not sufficient to avoid control flow in general MLIR, but by the
time we are converting to LLVMIR structured control flow should already
have been lowered to the cf dialect.
1/3
Part 2: https://github.com/llvm/llvm-project/pull/102524
Part 3: https://github.com/llvm/llvm-project/pull/102525
This patch sets the omp.composite unit attr for composite wrapper ops
and also add appropriate checks to the verifiers of supported ops for
the presence/absence of the attribute.
This is patch 2/2 in a series of patches. Patch 1 - #102340.
This patch sorts the clause lists for the following OpenMP operations:
- omp.taskloop
- omp.taskgroup
- omp.target_data
- omp.target_enter_data
- omp.target_exit_data
- omp.target_update
- omp.target
This change results in the reordering of operation arguments, so
impacted unit tests are updated accordingly.
This patch sorts the clause lists for the following OpenMP operations:
- omp.parallel
- omp.teams
- omp.sections
- omp.wsloop
- omp.distribute
- omp.task
This change results in the reordering of operation arguments, so
impacted unit tests are updated accordingly.
This patch adds the missing `OpenMP_Clause` definitions to all existing
`OpenMP_Op`s and updates their operand structure based builders to
initialize the new arguments.
The result of this change is that operation operand structures are now
based in the same list of clauses as their tablegen counterparts. This
means that all of the information needed is now in place to
automatically generate OpenMP operand structures from tablegen
defitions.
Since this change doesn't involve the introduction of actual support for
these clauses, new arguments are not initialized from values stored in
the corresponding operand structure fields but rather set to empty or
null. Those should be updated when support for these clauses on the
corresponding operation is added.
This patch introduces a new OpenMP clause definition not defined by the spec.
Its main purpose is to define the `loop_inclusive` (previously "inclusive",
renamed according to the parent of this PR in the stack) argument of
`omp.loop_nest` in such a way that a followup implementation of a tablegen
backend to automatically generate clause and operation operand structures
directly from `OpenMP_Op` and `OpenMP_Clause` definitions can properly generate
the `LoopNestOperands` structure.
`collapse` clause arguments are also moved into this new definition, as they
represent information on the loop nests being collapsed rather than the
`collapse` clause itself.
Currently, there are some inconsistencies to how clause arguments are
named in the OpenMP dialect. Additionally, the clause operand structures
associated to them also diverge in certain cases. The purpose of this
patch is to normalize argument names across all `OpenMP_Clause` tablegen
definitions and clause operand structures.
This has the benefit of providing more consistent representations for
clauses in the dialect, but the main short-term advantage is that it
enables the development of an OpenMP-specific tablegen backend to
automatically generate the clause operand structures without breaking
dependent code.
The main re-naming decisions made in this patch are the following:
- Variadic arguments (i.e. multiple values) have the "_vars" suffix.
This and other similar suffixes are removed from array attribute
arguments.
- Individual required or optional value arguments do not have any suffix
added to them (e.g. "val", "var", "expr", ...), except for `if` which
would otherwise result in an invalid C++ variable name.
- The associated clause's name is prepended to argument names that don't
already contain it as part of its name. This avoids future collisions
between arguments named the same way on different clauses and adding
both clauses to the same operation.
- Privatization and reduction related arguments that contain lists of
symbols pointing to privatizer/reducer operations use the "_syms"
suffix. This removes the inconsistencies between the names for
"copyprivate_funcs", "[in]reductions", "privatizers", etc.
- General improvements to names, replacement of camel case for snake
case everywhere, etc.
- Renaming of operation-associated operand structures to use the
"Operands" suffix in place of "ClauseOps", to better differentiate
between clause operand structures and operation operand structures.
- Fields on clause operand structures are sorted according to the
tablegen definition of the same clause.
The assembly format for a few arguments is updated to better reflect the
clause they are associated with:
- `chunk_size` -> `dist_schedule_chunk_size`
- `grain_size` -> `grainsize`
- `simd` -> `par_level_simd`
The `OpenMPDialectFoldInterface` was originally introduced to prevent
constants from being hoisted out of `omp.target` regions. This hasn't
been necessary since the `IsolatedFromAbove` trait was added to that
operation, so it's safe to remove this interface.
Adding MLIR Op support for omp masked. Omp masked is introduced in 5.2
standard and allows a region to be executed by threads
specified by a programmer. This is achieved with the help of filter
clause which helps to specify thread id expected to execute the region.
This patch updates `OpenMP_Op` definitions to be based on the new set of
`OpenMP_Clause` definitions, and to take advantage of clause-based
automatically-generated argument lists, descriptions, assembly format
and class declarations.
There are also changes introduced to the clause operands structures to
match the current set of tablegen clause definitions. These two are very
closely linked and should be kept in sync. It would probably be a good
idea to try generating clause operands structures from the tablegen
`OpenMP_Clause` definitions in the future.
As a result of this change, arguments for some operations have been
reordered. This patch also addresses this by updating affected operation
build calls and unit tests. Some other updates to tests related to the
order of arguments in the resulting assembly format and others due to
certain previous inconsistencies in the printing/parsing of clauses are
addressed.
The printer and parser functions for the `map` clause are updated, so
that they are able to handle `map` clauses linked to entry block
arguments as well as those which aren't.
This PR causes a build failure in the flang subproject. This is addressed
by the next PR in the stack.
Now all operations with a reduction clause have an array of bools
controlling whether each reduction variable should be passed by
reference or value.
This was already supported for Wsloop and Parallel. The new operations
modified here currently have no flang lowering or translation to LLVMIR
and so further changes are not needed.
It isn't possible to check the verifier in
mlir/test/Dialect/OpenMP/invalid.mlir because there is no way of parsing
an operation to have an incorrect number of byref attributes. The
verifier exists to pick up buggy operation builders or in-place
operation modification.
Fixes#88935
Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,
```
reduction(+:scalar,scalar2) reduction(+:array)
```
The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.
This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
This PR adds two new fields to omp.map_info, one BoolAttr and one I64ArrayAttr.
The BoolAttr is named partial_map, and is a flag that indicates if the record type captured by
the map_info operation is a partial map, or if it is mapped in its entirety, this currently helps
the later lowering determine the type of map entries that need to be generated.
The I64ArrayAttr named members_index is intended to track the placement of each member
map_info operations (and by extension mapped member variable) placement in the parent
record type. This may need to be extended to an N-D array for nested member mapping.
Pull Request: https://github.com/llvm/llvm-project/pull/82851
Extends `omp.private` with a new region: `dealloc` where deallocation
logic for Fortran deallocatables will be outlined (this will happen in
later PRs).
This patch updates verifiers for `omp.ordered`, `omp.ordered.region`,
`omp.cancel` and `omp.cancellation_point`, which check for a parent
`omp.wsloop`.
After transitioning to a loop wrapper-based approach, the expected
direct parent will become `omp.loop_nest` instead, so verifiers need to
take this into account.
This PR on its own will not pass premerge tests. All patches in the
stack are needed before it can be compiled and passes tests.