This patch moves the part of operation verifiers dependent on the
contents of their regions to the corresponding `verifyRegions` method.
This ensures these are only triggered after the operations in the region
have themselved already been verified in advance, avoiding checks based
on invalid nested operations.
The `LoopWrapperInterface` is also updated so that its verifier runs
after operations in the region of ops with this interface have already
been verified.
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
The `omp.section` operation is an outlier in that the block arguments it
has are defined by clauses on the required parent `omp.sections`
operation.
This patch updates the definition of this operation introducing the
`BlockArgOpenMPOpInterface` to simplify the handling and verification of
these block arguments, implemented based on the parent `omp.sections`.
This patch updates the `omp.target_data` operation to use the same
formatting as `map` clauses on `omp.target` for `use_device_addr` and
`use_device_ptr`. This is done so the mapping that is being enforced
between op arguments and associated entry block arguments is explicit.
The way it is achieved is by marking these clauses as entry block
argument-defining and adjusting printer/parsers accordingly.
As a result of this change, block arguments for `use_device_addr` come
before those for `use_device_ptr`, which is the opposite of the previous
undocumented situation. Some unit tests are updated based on this
change, in addition to those updated because of the format change.
This patch updates printing and parsing of operations including clauses
that define entry block arguments to the operation's region. This
impacts `in_reduction`, `map`, `private`, `reduction` and
`task_reduction`.
The proposed representation to be used by all such clauses is the
following:
```
<clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) {
...
}
```
The `byref` tag is only allowed for reduction-like clauses and the
`@<sym>` is required and only allowed for the `private` and
reduction-like clauses. The `map` clause does not accept any of these
two.
This change fixes some currently broken op representations, like
`omp.teams` or `omp.sections` reduction:
```
omp.teams reduction([byref] @<sym> -> %value : <type>) {
^bb0(%block_arg : <type>):
...
}
```
Additionally, it addresses some redundancy in the representation of the
previously mentioned cases, as well as e.g. `map` in `omp.target`. The
problem is that the block argument name after the arrow is not checked
in any way, which makes some misleading representations legal:
```mlir
omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) {
^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr):
...
}
```
In that case, `%x` maps to `%arg0`, contrary to what the representation
states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it
would likely cause issues if used anywhere inside of the operation's
region.
The solution implemented in this patch makes it so that values
introduced after the arrow on the representation of these clauses
implicitly define the corresponding entry block arguments, removing the
potential for these problematic representations. This is what is already
implemented for the `private` and `reduction` clauses of `omp.parallel`.
There are a couple of consequences of this change:
- Entry block argument-defining clauses must come at the end of the
operation's representation and in alphabetical order. This is because
they are printed/parsed as part of the region and a standardized
ordering is needed to reliably match op arguments with their
corresponding entry block arguments via the `BlockArgOpenMPOpInterface`.
- We can no longer define per-clause assembly formats to be reused by
all operations that take these clauses, since they must be passed to a
custom printer including the region and arguments of all other entry
block argument-defining clauses. Code duplication and potential for
introducing issues is minimized by providing the generic
`{print,parse}BlockArgRegion` helpers and associated structures.
MLIR and Flang lowering unit tests are updated due to changes in the
order and formatting of impacted operations.
This patch introduces a new MLIR interface for the OpenMP dialect aimed
at providing a uniform way of verifying and handling entry block
arguments defined by OpenMP clauses.
The approach consists in defining a set of overrideable methods that
return the number of block arguments the operation holds regarding each
of the clauses that may define them. These by default return 0, but they
are overriden by the corresponding clause through the
`extraClassDeclaration` mechanism.
Another set of interface methods to get the actual lists of block
arguments is defined, which is implemented based on the previously
described methods. These implicitly define a standardized ordering
between the list of block arguments associated to each clause, based on
the alphabetical ordering of their names. They should be the preferred
way of matching operation arguments and entry block arguments to that
operation's first region.
Some updates are made to the printing/parsing of `omp.parallel` to
follow the expected order between `private` and `reduction` clauses, as
well as the MLIR to LLVM IR translation pass to access block arguments
using the new interface. Unit tests of operations impacted by additional
verification checks and sorting of entry block arguments.
This patch moves verification code for the `LoopWrapperInterface` to the
interface itself, checking it automatically for each operation that has
that interface.
This patch modifies the representation of `OpenMP_Clause` to allow
definitions to incorporate both required and optional arguments while
still allowing operations including them and overriding the
`assemblyFormat` to take advantage of automatically-populated format
strings.
The proposed approach is to split the `assemblyFormat` clause property
into `reqAssemblyFormat` and `optAssemblyFormat`, and remove the
`isRequired` template and associated `required` property. The
`OpenMP_Op` class, in turn, populates the new `clausesReqAssemblyFormat`
and `clausesOptAssemblyFormat` properties in addition to
`clausesAssemblyFormat`. These properties can be used by clause-based
OpenMP operation definitions to reconstruct parts of the
clause-inherited format string in a more flexible way when overriding
it.
Clause definitions are updated to follow this new approach and some
operation definitions overriding the `assemblyFormat` are simplified by
taking advantage of the improved flexibility, reducing code duplication.
The `verify-openmp-ops` tablegen pass is updated for the new
`OpenMP_Clause` representation.
Some MLIR and Flang unit tests had to be updated due to changes to the
default printing order of clauses on updated operations.
This patch updates the `omp.parallel` operation according to the results
of the discussion in [this
RFC](https://discourse.llvm.org/t/rfc-disambiguation-between-loop-and-block-associated-omp-parallelop/79972).
It is removed from the set of loop wrapper operations, changing the
expected MLIR representation for composite `distribute parallel do/for`
into the following:
```mlir
omp.parallel {
...
omp.distribute {
omp.wsloop {
omp.loop_nest ... { ... }
omp.terminator
}
omp.terminator
}
...
omp.terminator
}
```
MLIR verifiers for operations impacted by this representation change are
updated, as well as related tests. The `LoopWrapperInterface` is also
updated, since it's no longer representing an optional "role" of an
operation but a mandatory set of restrictions instead.
This patch updates MLIR tests for `omp.parallel` + `omp.wsloop`
reductions to move the reduction clause into `omp.wsloop` rather than
the parent `omp.parallel`, as mandated by the spec for these cases and
also to match what Flang is already producing for `parallel do
reduction(...)` combined constructs.
From the OpenMP Spec version 5.2, section 17.2:
> The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
> - The `parallel` construct, when combined with the `sections`,
worksharing-loop, `loop`, or `taskloop` construct; [...]
This region is intended to separate alloca operations from reduction
variable initialization. This makes it easier to hoist allocas to the
entry block before control flow and complex code for initialization.
The verifier checks that there is at most one block in the alloc region.
This is not sufficient to avoid control flow in general MLIR, but by the
time we are converting to LLVMIR structured control flow should already
have been lowered to the cf dialect.
1/3
Part 2: https://github.com/llvm/llvm-project/pull/102524
Part 3: https://github.com/llvm/llvm-project/pull/102525
This patch sets the omp.composite unit attr for composite wrapper ops
and also add appropriate checks to the verifiers of supported ops for
the presence/absence of the attribute.
This is patch 2/2 in a series of patches. Patch 1 - #102340.
This patch sorts the clause lists for the following OpenMP operations:
- omp.taskloop
- omp.taskgroup
- omp.target_data
- omp.target_enter_data
- omp.target_exit_data
- omp.target_update
- omp.target
This change results in the reordering of operation arguments, so
impacted unit tests are updated accordingly.
This patch sorts the clause lists for the following OpenMP operations:
- omp.parallel
- omp.teams
- omp.sections
- omp.wsloop
- omp.distribute
- omp.task
This change results in the reordering of operation arguments, so
impacted unit tests are updated accordingly.
This patch adds the missing `OpenMP_Clause` definitions to all existing
`OpenMP_Op`s and updates their operand structure based builders to
initialize the new arguments.
The result of this change is that operation operand structures are now
based in the same list of clauses as their tablegen counterparts. This
means that all of the information needed is now in place to
automatically generate OpenMP operand structures from tablegen
defitions.
Since this change doesn't involve the introduction of actual support for
these clauses, new arguments are not initialized from values stored in
the corresponding operand structure fields but rather set to empty or
null. Those should be updated when support for these clauses on the
corresponding operation is added.
This patch introduces a new OpenMP clause definition not defined by the spec.
Its main purpose is to define the `loop_inclusive` (previously "inclusive",
renamed according to the parent of this PR in the stack) argument of
`omp.loop_nest` in such a way that a followup implementation of a tablegen
backend to automatically generate clause and operation operand structures
directly from `OpenMP_Op` and `OpenMP_Clause` definitions can properly generate
the `LoopNestOperands` structure.
`collapse` clause arguments are also moved into this new definition, as they
represent information on the loop nests being collapsed rather than the
`collapse` clause itself.
Currently, there are some inconsistencies to how clause arguments are
named in the OpenMP dialect. Additionally, the clause operand structures
associated to them also diverge in certain cases. The purpose of this
patch is to normalize argument names across all `OpenMP_Clause` tablegen
definitions and clause operand structures.
This has the benefit of providing more consistent representations for
clauses in the dialect, but the main short-term advantage is that it
enables the development of an OpenMP-specific tablegen backend to
automatically generate the clause operand structures without breaking
dependent code.
The main re-naming decisions made in this patch are the following:
- Variadic arguments (i.e. multiple values) have the "_vars" suffix.
This and other similar suffixes are removed from array attribute
arguments.
- Individual required or optional value arguments do not have any suffix
added to them (e.g. "val", "var", "expr", ...), except for `if` which
would otherwise result in an invalid C++ variable name.
- The associated clause's name is prepended to argument names that don't
already contain it as part of its name. This avoids future collisions
between arguments named the same way on different clauses and adding
both clauses to the same operation.
- Privatization and reduction related arguments that contain lists of
symbols pointing to privatizer/reducer operations use the "_syms"
suffix. This removes the inconsistencies between the names for
"copyprivate_funcs", "[in]reductions", "privatizers", etc.
- General improvements to names, replacement of camel case for snake
case everywhere, etc.
- Renaming of operation-associated operand structures to use the
"Operands" suffix in place of "ClauseOps", to better differentiate
between clause operand structures and operation operand structures.
- Fields on clause operand structures are sorted according to the
tablegen definition of the same clause.
The assembly format for a few arguments is updated to better reflect the
clause they are associated with:
- `chunk_size` -> `dist_schedule_chunk_size`
- `grain_size` -> `grainsize`
- `simd` -> `par_level_simd`
This patch replaces the `SingleBlockImplicitTerminator<"TerminatorOp">`
trait of loop wrapper operations for the `SingleBlock` trait. This
enables a more robust implementation of the
`LoopWrapperInterface::isWrapper()` method, since it does no longer have
to deal with the potentially missing (implicit) terminator.
The `LoopWrapperInterface::isWrapper()` method is also extended to not
identify as wrappers those operations which have a loop wrapper
operation inside that is not taking a wrapper role. This is important
for cases where `omp.parallel` is nested, which can but is not required
to work as a loop wrapper.
Tests are updated to integrate these representation and validation
changes.
Adding MLIR Op support for omp masked. Omp masked is introduced in 5.2
standard and allows a region to be executed by threads
specified by a programmer. This is achieved with the help of filter
clause which helps to specify thread id expected to execute the region.
This patch updates `OpenMP_Op` definitions to be based on the new set of
`OpenMP_Clause` definitions, and to take advantage of clause-based
automatically-generated argument lists, descriptions, assembly format
and class declarations.
There are also changes introduced to the clause operands structures to
match the current set of tablegen clause definitions. These two are very
closely linked and should be kept in sync. It would probably be a good
idea to try generating clause operands structures from the tablegen
`OpenMP_Clause` definitions in the future.
As a result of this change, arguments for some operations have been
reordered. This patch also addresses this by updating affected operation
build calls and unit tests. Some other updates to tests related to the
order of arguments in the resulting assembly format and others due to
certain previous inconsistencies in the printing/parsing of clauses are
addressed.
The printer and parser functions for the `map` clause are updated, so
that they are able to handle `map` clauses linked to entry block
arguments as well as those which aren't.
This PR causes a build failure in the flang subproject. This is addressed
by the next PR in the stack.
Now all operations with a reduction clause have an array of bools
controlling whether each reduction variable should be passed by
reference or value.
This was already supported for Wsloop and Parallel. The new operations
modified here currently have no flang lowering or translation to LLVMIR
and so further changes are not needed.
It isn't possible to check the verifier in
mlir/test/Dialect/OpenMP/invalid.mlir because there is no way of parsing
an operation to have an incorrect number of byref attributes. The
verifier exists to pick up buggy operation builders or in-place
operation modification.
Fixes#88935
Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,
```
reduction(+:scalar,scalar2) reduction(+:array)
```
The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.
This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
Previously this was only populated in the create method later. This
resolves some of invalid builder paths. This may also be sufficient that
type inference functions no longer have to consider whether property
conversion has happened (but haven't verified that yet).
This also makes Attributes corresponding to Properties as optional
inside the set from attributes method. Today that is in effect what
happens with Property value initialization and folks use it to define
custom C++ types whose default initialization is what they want. This is
the behavior users get if they use properties directly. Propagating
Attributes without allowing partial setting would require iterating over
the dictionary attribute considering the properties of the op type that
will be created. This could also have been an additional method
generated or optional behavior on the set method. But doing it
consistently seems better. In terms of whats lost, it doesn't seem like
anything compared to the pure Property path where Property is default
value initialized and then partially overwritten (this doesn't seem to
buy anything else verification wise).
Default valued Properties (as specified ODS side rather than C++ side)
triggered error as the containing class was not yet complete but
referenced nested class, so that we couldn't have default initializer
for them in the parent class. Added an additional forwarding builder to
avoid needing to update call sites. This could be split out to separate
change.
Inlined templated function in unit test that was only used once. Moved
initialization earlier where seen.
This PR adds two new fields to omp.map_info, one BoolAttr and one I64ArrayAttr.
The BoolAttr is named partial_map, and is a flag that indicates if the record type captured by
the map_info operation is a partial map, or if it is mapped in its entirety, this currently helps
the later lowering determine the type of map entries that need to be generated.
The I64ArrayAttr named members_index is intended to track the placement of each member
map_info operations (and by extension mapped member variable) placement in the parent
record type. This may need to be extended to an N-D array for nested member mapping.
Pull Request: https://github.com/llvm/llvm-project/pull/82851
Extends `omp.private` with a new region: `dealloc` where deallocation
logic for Fortran deallocatables will be outlined (this will happen in
later PRs).
This patch updates verifiers for `omp.ordered`, `omp.ordered.region`,
`omp.cancel` and `omp.cancellation_point`, which check for a parent
`omp.wsloop`.
After transitioning to a loop wrapper-based approach, the expected
direct parent will become `omp.loop_nest` instead, so verifiers need to
take this into account.
This PR on its own will not pass premerge tests. All patches in the
stack are needed before it can be compiled and passes tests.
This patch updates the definition of `omp.wsloop` to enforce the
restrictions of a loop wrapper operation.
Related tests are updated but this PR on its own will not pass premerge
tests. All patches in the stack are needed before it can be compiled and
passes tests.
This patch extends verification of the `omp.parallel` operation to check
it is correctly defined when taking a loop wrapper role.
In OpenMP, a PARALLEL construct can be either a (potenially combined)
block construct or a loop construct, when appearing as part of a
composite construct. This is currently the case for the DISTRIBUTE
PARALLEL DO/FOR and DISTRIBUTE PARALLEL DO/FOR SIMD exclusively.
When used to represent the PARALLEL leaf of a composite construct, it
must follow the rules of a wrapper loop operation in MLIR, and this is
what this patch ensures. No additional restrictions are introduced for
PARALLEL block constructs.
This patch updates the definition of `omp.simdloop` to enforce the
restrictions of a wrapper operation. It has been renamed to `omp.simd`,
to better reflect the naming used in the spec. All uses of "simdloop" in
function names have been updated accordingly.
Some changes to Flang lowering and OpenMP to LLVM IR translation are
introduced to prevent the introduction of compilation/test failures. The
eventual long term solution might be different.
This patch defines a common interface to be shared by all OpenMP loop
wrapper operations. The main restrictions these operations must meet in
order to be considered a wrapper are:
- They contain a single region.
- Their region contains a single block.
- Their block only contains another loop wrapper or `omp.loop_nest` and
a terminator.
The new interface is attached to the `omp.parallel`, `omp.wsloop`,
`omp.simdloop`, `omp.distribute` and `omp.taskloop` operations. It is
not currently enforced that these operations meet the wrapper
restrictions, which would break existing OpenMP loop-generating code.
Rather, this will be introduced progressively in subsequent patches.
This patch introduces an operation intended to hold loop information
associated to the `omp.distribute`, `omp.simdloop`, `omp.taskloop` and
`omp.wsloop` operations. This is a stopgap solution to unblock work on
transitioning these operations to becoming wrappers, as discussed in
[this
RFC](https://discourse.llvm.org/t/rfc-representing-combined-composite-constructs-in-the-openmp-dialect/76986).
Long-term, this operation will likely be replaced by
`omp.canonical_loop`, which is being designed to address missing support
for loop transformations, etc.
Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
The **is_device_ptr** clause indicates that its list items are device
pointers.
The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address
Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc
Fixed build error caused by Squash merge which needs rebase
Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
`The **is_device_ptr** clause indicates that its list items are device
pointers.`
`The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.`
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
`Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address`
`Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc`
Currently, by-ref reductions will allocate the per-thread reduction
variable in the initialization region. Adding a cleanup region allows
that allocation to be undone. This will allow flang to support reduction
of arrays stored on the heap.
This conflation of allocation and initialization in the initialization
should be fixed in the future to better match the OpenMP standard, but
that is beyond the scope of this patch.
Extends the `omp.parallel` op by adding a `private` clause to model
[first]private variables. This uses the `omp.private` op to map
privatized variables to their corresponding privatizers.
Example `omp.private` op with `private` variable:
```
omp.parallel private(@x.privatizer %arg0 -> %arg1 : !llvm.ptr) {
^bb0(%arg1: !llvm.ptr):
// ... use %arg1 ...
omp.terminator
}
```
Whether the variable is private or firstprivate is determined by the
attributes of the corresponding `omp.private` op.
This adds a new custom CopyPrivateVarList to the single operation.
Each list item is formed by a reference to the variable to be
updated, its type and the function to be used to perform the copy.
It will be translated to LLVM IR using OpenMP builder, that will
use the information in the copyprivate list to call
__kmpc_copyprivate.
This is patch 2 of 4, to add support for COPYPRIVATE in Flang.
Original PR: https://github.com/llvm/llvm-project/pull/73128
This patch reworks the way that wsloop reduction operations function to
better match the expected semantics from the OpenMP specification,
following the rework of parallel reductions.
The new semantics create a private reduction variable as a block
argument which should be used normally for all operations on that
variable in the region; this private variable is then combined with the
others into the shared variable. This way no special omp.reduction
operations are needed inside the region. These block arguments follow
the loop control block arguments.
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
This patch adds support for the depend clause in a number of OpenMP
directives/constructs related to offloading. Specifically, it adds the
handling of the depend clause when it is used with the following
constructs
- target
- target enter data
- target update data
- target exit data
This PR adds a new op to the OpenMP dialect: `PrivateClauseOp`. This op
will be later used to model `[first]private` clauses for differnt OpenMP
directives.
This is part of productizing the "delayed privatization" PoC which can
be found in #79862.
This patch reworks the way that parallel reduction operations function
to better match the expected semantics from the OpenMP specification.
Previously specific omp.reduction operations were used inside the
region, meaning that the reduction only applied when the correct
operation was used, whereas the specification states that any change to
the variable inside the region should be taken into account for the
reduction.
The new semantics create a private reduction variable as a block
argument which should be used normally for all operations on that
variable in the region; this private variable is then combined with the
others into the shared variable. This way no special omp.reduction
operations are needed inside the region.
This patch only makes the change for the `parallel` operation, the
change for the `wsloop` operation will be in a separate patch.
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>