This patch,
- Added a translation support for aligned clause in SIMD directive by passing the alignment details to "llvm.assume" intrinsic.
- Updated the insertion point for llvm.assume intrinsic call in "OMPIRBuilder.cpp".
- Added a check in aligned clause MLIR lowering, to ensure that the alignment value must be a power of 2.
Implementatoin details:
Both Mutexinoutset and Inoutset is recognized as flag=0x4
and 0x8 respectively, the flags is set to `kmp_depend_info` and
passed as argument to `__kmpc_omp_task_with_deps` runtime call
Again, this simplifies the semantic checks and lowering quite a bit.
Update the check for positive alignment to use a more informative
message, and to highlight the modifier itsef, not the whole clause.
Remove the checks for the allocator expression itself being positive:
there is nothing in the spec that says that it should be positive.
Remove the "simple" modifier from the AllocateT template, since both
simple and complex modifiers are the same thing, only differing in
syntax.
When compiling WORKSHARE construct in different compilation units, a
linker error happened, when two equal WORKSHARE constructs with a copy
operation have been compiled:
```
/usr/bin/ld: module2.o: in function `_workshare_copy_f64':
FIRModule:(.text+0x0): multiple definition of `_workshare_copy_f64'; module1.o:FIRModule:(.text+0x0): first defined here
```
Reason is that the generate copy function has the wrong linkage:
```
0000000000000000 T _workshare_copy_f64
```
while it should be
```
0000000000000000 t _workshare_copy_f64
```
This removes the specialized parsers and helper classes for these
clauses, namely ConcatSeparated, MapModifiers, and MotionModifiers. Map
and the motion clauses are now handled in the same way as all other
clauses with modifiers, with one exception: the commas separating their
modifiers are optional. This syntax is deprecated in OpenMP 5.2.
Implement version checks for modifiers: for a given modifier on a given
clause, check if that modifier is allowed on this clause in the
specified OpenMP version. This replaced several individual checks.
Add a testcase for handling map modifiers in a different order, and for
diagnosing an ultimate modifier out of position.
It also modifies the error message to specify it is the dependence-type
that is not supported.
Resolves the crash in
https://github.com/llvm/llvm-project/issues/115647. A fix can come in
later as part of future OpenMP version support.
Extends MLIR lowering support for the `loop` directive by adding
lowering support for the `bind` clause.
Parent PR: https://github.com/llvm/llvm-project/pull/114199, only the
latest commit is relevant to this PR.
This PR is one of 3 in a PR stack, this is the primary change set which
seeks to extend the current derived type explicit member mapping support
to handle descriptor member mapping at arbitrary levels of nesting. The
PR stack seems to do this reasonably (from testing so far) but as you
can create quite complex mappings with derived types (in particular when
adding allocatable derived types or arrays of allocatable derived types)
I imagine there will be hiccups, which I am more than happy to address.
There will also be further extensions to this work to handle the
implicit auto-magical mapping of descriptor members in derived types and
a few other changes planned for the future (with some ideas on
optimizing things).
The changes in this PR primarily occur in the OpenMP lowering and the
OMPMapInfoFinalization pass.
In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of these
types or do a "deep copy" for lack of a better term, where we map both
the base address and the descriptor as without the copying of both of
these we lack the information in the case of the descriptor to access
the member or attach the pointers data to the pointer and in the latter
case we require the base address to map the chunk of data. Currently we
do not segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future
as it adds a fair bit of a performance penalty to having nestings of
allocatable derived types as an example. The process of mapping all
intermediate descriptor members in a members path only occurs if a
member has an allocatable or object parent in its symbol path or the
member itself is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also generate
the appropriate address for the allocatable member within the derived
type to use as a the varPtr field of the map (for intermediate
allocatable maps and final allocatable mappings). In this case it's
necessary as we can't utilise the usual Fortran::lower functionality
such as gatherDataOperandAddrAndBounds without causing issues later in
the lowering due to extra allocas being spawned which seem to affect the
pointer attachment (at least this is my current assumption, it results
in memory access errors on the device due to incorrect map information
generation). This is similar to why we do not use the MLIR value
generated for this and utilise the original symbol provided when mapping
descriptor types external to derived types. Hopefully this can be
rectified in the future so this function can be simplified and more
closely aligned to the other type mappings. We also make use of
fir::CoordinateOp as opposed to the HLFIR version as the HLFIR version
doesn't support the appropriate lowering to FIR necessary at the moment,
we also cannot use a single CoordinateOp (similarly to a single GEP) as
when we index through a descriptor operation (BoxType) we encounter
issues later in the lowering, however in either case we need access to
intermediate descriptors so individual CoordinateOp's aid this
(although, being able to compress them into a smaller amount of
CoordinateOp's may simplify the IR and perhaps result in a better end
product, something to consider for the future).
The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box types
(or multiple nestings of box types) within derived types, or box type
derived types. This requires expanding each BoxType mapping from one
into two maps and then modifying all of the existing member indices of
the overarching parent mapping to account for the addition of these new
members alongside adjusting the existing member indices to support the
addition of these new maps which extend the original member indices (as
a base address of a box type is currently considered a member of the box
type at a position of 0 as when lowered to LLVM-IR it's a pointer
contained at this position in the descriptor type, however, this means
extending mapped children of this expanded descriptor type to
additionally incorporate the new member index in the correct location in
its own index list). I believe there is a reasonable amount of comments
that should aid in understanding this better, alongside the test
alterations for the pass.
A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr containing the member
indices shareable across the lowering and OMPMapInfoFinalization, this
required moving some functions to the Lower/Support/Utils.h header, and
transforming the lowering structure containing the member index data
into something more similar to the version used in
OMPMapInfoFinalization. There we also some other attempts at tidying
things up in relation to the member index data generation in the
lowering, some of which required creating a logical operator for the
OpenMP ID class so it can be utilised as a map key (it simply utilises
the symbol address for the moment as ordering isn't particularly
important).
Otherwise I have added a set of new tests encompassing some of the
mappings currently supported by this PR (unfortunately as you can have
arbitrary nestings of all shapes and types it's not very feasible to
cover them all).
Extract the SINK/SOURCE parse tree elements into a separate class
`OmpDoacross`, share them between DEPEND and DOACROSS clauses. Most of
the changes in Semantics are to accommodate the new contents of
OmpDependClause, and a mere introduction of OmpDoacrossClause.
There are no semantic checks specifically for DOACROSS.
Parse PRESENT modifier as well while we're at it (no MAPPER though). Add
semantic checks for these clauses in the TARGET UPDATE construct, TODO
messages in lowering.
Parse the locator list in OmpDependClause as an OmpObjectList (instead
of a list of Designators). When a common block appears in the locator
list, show an informative message.
Implement resolving symbols in DependSinkVec in a dedicated visitor
instead of having a visitor for OmpDependClause.
Resolve unresolved names common blocks in OmpObjectList.
Minor changes to the code organization:
- rename OmpDependenceType to OmpTaskDependenceType (to follow 5.2
terminology),
- rename Depend::WithLocators to Depend::DepType,
- add comments with more detailed spec references to parse-tree.h.
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Define `OmpIteratorSpecifier` and `OmpIteratorModifier` parser classes,
and add parsing for them. Those are reusable between any clauses that
use iterator modifiers.
Add support for iterator modifiers to the MAP clause up to lowering,
where a TODO message is emitted.
This commit adds parsing of type modifiers for the MAP clause: CLOSE,
OMPX_HOLD, and PRESENT. The support for ALWAYS has already existed.
The new modifiers are not yet handled in lowering: when present, a TODO
message is emitted and compilation stops.
The main purpose of this patch is to centralize the logic for creating
MLIR operation entry blocks and for binding them to the corresponding
symbols. This minimizes the chances of mixing arguments up for
operations having multiple entry block argument-generating clauses and
prevents divergence while binding arguments.
Some changes implemented to this end are:
- Split into two functions the creation of the entry block, and the
binding of its arguments and the corresponding Fortran symbol. This
enabled a significant simplification of the lowering of composite
constructs, where it's no longer necessary to manually ensure the lists
of arguments and symbols refer to the same variables in the same order
and also match the expected order by the `BlockArgOpenMPOpInterface`.
- Removed redundant and error-prone passing of types and locations from
`ClauseProcessor` methods. Instead, these are obtained from the values
in the appropriate clause operands structure. This also simplifies
argument lists of several lowering functions.
- Access block arguments of already created MLIR operations through the
`BlockArgOpenMPOpInterface` instead of directly indexing the argument
list of the operation, which is not scalable as more entry block
argument-generating clauses are added to an operation.
- Simplified the implementation of `genParallelOp` to no longer need to
define different callbacks depending on whether delayed privatization is
enabled.
This patch removes the template parameter of the
`ClauseProcessor::processMotionClauses()` method and instead processes
both `TO` and `FROM` as part of a single call. This also enables moving
the implementation out of the header and makes it simpler for a
follow-up patch to potentially refactor `processMap()`,
`processMotionClauses()`, `processUseDeviceAddr()` and
`processUseDevicePtr()`, and minimize code duplication among these.
Currently, Flang throws a "**not yet implemented: Unhandled clause
NONTEMPORAL in SIMD construct**" error when encountering nontemporal
clause. This patch adds support for this clause in SIMD construct.
This patch adds the "gen-openmp-clause-ops" `mlir-tblgen` generator to
produce the structure definitions previously in OpenMPClauseOperands.h
automatically from the information contained in OpenMPOps.td and
OpenMPClauses.td.
The original header is maintained to enable the definition of similar
structures that are not directly related to any single `OpenMP_Clause`
or `OpenMP_Op` tablegen definition.
This patch updates the use_device_ptr and use_device_addr clauses to use
the mapInfoOps for lowering. This allows all the types that are handle
by the map clauses such as derived types to also be supported by the
use_device_clauses.
This is patch 1/2 in a series of patches.
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
This PR aims to unify the map argument generation behavior across both
the implicit capture (captured in a target region) and the explicit
capture (process map), currently the varPtr field of the MapInfo for the
same variable will be different depending on how it's captured. This PR
tries to align that across the generations of MapInfoOp in the OpenMP
lowering.
Currently, I have opted to utilise the rawInput (input memref to a HLFIR
DeclareInfoOp) as opposed to the addr field which includes more
information. The side affect of this is that we have to deal with
BoxTypes less often, which will result in simpler maps in these cases.
The negative side affect of this is that we don't have access to the
bounds information through the resulting value, however, I believe the
bounds information we require in our case is still appropriately stored
in the map bounds, and this seems to be the case from testing so far.
The other fix is for cases where we end up with a BoxType argument into
a function (certain assumed shape and sizes cases do this) that has no
fir.ref wrapping it. As we need the Box to be a reference type to
actually utilise the operation to access the base address stored inside
and create the correct mappings we currently generate an intermediate
allocation in these cases, and then store into it, and utilise this as
the map argument, as opposed to the original.
However, as we were not sharing the same intermediate allocation across
all of the maps for a variable, this resulted in errors in certain cases
when detatching/attatching the data e.g. via enter and exit. This PR
adjusts this for cases
Currently we only maintain tracking of all intermediate allocations for
the current function scope, as opposed to module. Primarily as the only
case I am aware of that this is required is in cases where we pass
certain types of arguments to functions (so I opted to minimize the
overhead of the pass for now). It could likely be extended to module
scope if required if we find other cases where it's applicable and
causing issues.
This patch removes the `ClauseProcessor::processDefault` method due to
it having been implemented in
`DataSharingProcessor::collectDefaultSymbols` instead.
Also, some `genXyzClauses` functions are updated to avoid triggering
TODO errors for clauses not supported by the corresponding construct and
to keep alphabetical sorting on the order in which clauses are
processed.
This patch introduces a new OpenMP clause definition not defined by the spec.
Its main purpose is to define the `loop_inclusive` (previously "inclusive",
renamed according to the parent of this PR in the stack) argument of
`omp.loop_nest` in such a way that a followup implementation of a tablegen
backend to automatically generate clause and operation operand structures
directly from `OpenMP_Op` and `OpenMP_Clause` definitions can properly generate
the `LoopNestOperands` structure.
`collapse` clause arguments are also moved into this new definition, as they
represent information on the loop nests being collapsed rather than the
`collapse` clause itself.
Currently, there are some inconsistencies to how clause arguments are
named in the OpenMP dialect. Additionally, the clause operand structures
associated to them also diverge in certain cases. The purpose of this
patch is to normalize argument names across all `OpenMP_Clause` tablegen
definitions and clause operand structures.
This has the benefit of providing more consistent representations for
clauses in the dialect, but the main short-term advantage is that it
enables the development of an OpenMP-specific tablegen backend to
automatically generate the clause operand structures without breaking
dependent code.
The main re-naming decisions made in this patch are the following:
- Variadic arguments (i.e. multiple values) have the "_vars" suffix.
This and other similar suffixes are removed from array attribute
arguments.
- Individual required or optional value arguments do not have any suffix
added to them (e.g. "val", "var", "expr", ...), except for `if` which
would otherwise result in an invalid C++ variable name.
- The associated clause's name is prepended to argument names that don't
already contain it as part of its name. This avoids future collisions
between arguments named the same way on different clauses and adding
both clauses to the same operation.
- Privatization and reduction related arguments that contain lists of
symbols pointing to privatizer/reducer operations use the "_syms"
suffix. This removes the inconsistencies between the names for
"copyprivate_funcs", "[in]reductions", "privatizers", etc.
- General improvements to names, replacement of camel case for snake
case everywhere, etc.
- Renaming of operation-associated operand structures to use the
"Operands" suffix in place of "ClauseOps", to better differentiate
between clause operand structures and operation operand structures.
- Fields on clause operand structures are sorted according to the
tablegen definition of the same clause.
The assembly format for a few arguments is updated to better reflect the
clause they are associated with:
- `chunk_size` -> `dist_schedule_chunk_size`
- `grain_size` -> `grainsize`
- `simd` -> `par_level_simd`
PR adds changes to the flang frontend to create the `MaskedOp` when
`masked` directive is used in the input program. Omp masked is
introduced in 5.2 standard and allows a parallel region to be executed
by threads specified by a programmer. This is achieved with the help of
filter clause which helps to specify thread id expected to execute the
region.
Other related PRs:
- [Fortran Parsing and Semantic
Support](https://github.com/llvm/llvm-project/pull/91432) - Merged
- [MLIR Support](https://github.com/llvm/llvm-project/pull/96022/files)
- Merged
- [Lowering Support](https://github.com/llvm/llvm-project/pull/98401) -
Under Review
The tricky bit here is that we need to generate the reduction symbol
mapping inside each of the nested SECTION constructs. This is a bit
similar to omp.canonical_loop inside of omp.wsloop, except the SECTION
constructs come from the PFT.
To make this work I moved the lowering of the SECTION constructs inside
of the lowering SECTIONS (where reduction information is still
available). This subverts the normal control flow for OpenMP lowering a
bit.
One alternative option I investigated would be to generate the SECTION
CONSTRUCTS as normal as though there were no reduction, and then to fix
them up after control returns back to genSectionsOp. The problem here is
that the code generated for the section body has the wrong symbol
mapping for the reduction variable, so all of the nested code has to be
patched up. In my prototype version this was even more hacky than what
the solution I settled upon.
This patch applies fixes after the updates to OpenMP clause operands, as
well as updating some tests that were impacted by changes to the
ordering or assembly format of some clauses in MLIR.
Now all operations with a reduction clause have an array of bools
controlling whether each reduction variable should be passed by
reference or value.
This was already supported for Wsloop and Parallel. The new operations
modified here currently have no flang lowering or translation to LLVMIR
and so further changes are not needed.
It isn't possible to check the verifier in
mlir/test/Dialect/OpenMP/invalid.mlir because there is no way of parsing
an operation to have an incorrect number of byref attributes. The
verifier exists to pick up buggy operation builders or in-place
operation modification.
The lowering of copyprivate clauses with allocatable or pointer
variables was incorrect. This happened because the values passed to
copyVar() are always wrapped in SymbolBox::Intrinsic, which
resulted in allocatable/pointer variables being handled as regular
ones.
This is fixed by providing to copyVar() the attributes of the
variables being copied, to make it possible to detect and handle
allocatable/pointer variables correctly.
Fixes#95801
This patch adds support for lowering the OpenMP DISTRIBUTE directive
from PFT to MLIR. It only supports standalone DISTRIBUTE, support for
composite constructs will come in follow-up PRs.
The object identity requires more than just `Symbol`. Don't use `id()`
to get the Symbol associated with the object, becase the return value
will need to change. Instead use `sym()` which is added for that reason.
Named location attribute added to `tgt_offload_entry` shall be used by
runtime calls like `ompx_dump_mapping_tables` to print the information
of variables that are mapped to the device. `ompx_dump_mapping_tables`
was printing the wrong location information and this change fixes it.
A sample execution of example before the change:
```
omptarget device 0 info: OpenMP Host-Device pointer mappings after block at libomptarget:0:0:
omptarget device 0 info: Host Ptr Target Ptr Size (B) DynRefCount HoldRefCount Declaration
omptarget device 0 info: 0x0000000000206df0 0x00007f02cdc00000 20000000 1 0 <program-file-loc> at unknown:18:35
```
The change replaces unknown to the mapped symbol and location to the
declaration location.
Fixes#88935
Toggling reduction by-ref broke when multiple reduction clauses were
used. Decisions made for the by-ref status for later clauses could then
invalidate decisions for earlier clauses. For example,
```
reduction(+:scalar,scalar2) reduction(+:array)
```
The first clause would choose by value reduction and generate by-value
reduction regions, but then after this the second clause would force
by-ref to support the array argument. But by the time the second clause
is processed, the first clause has already had the wrong kind of
reduction regions generated.
This is solved by toggling whether a variable should be reduced by
reference per variable. In the above example, this allows only `array`
to be reduced by ref.
This patch is one in a series of four patches that seeks to refactor
slightly and extend the current record type map support that was
put in place for Fortran's descriptor types to handle explicit
member mapping for record types at a single level of depth.
For example, the below case where two members of a Fortran
derived type are mapped explicitly:
''''
type :: scalar_and_array
real(4) :: real
integer(4) :: array(10)
integer(4) :: int
end type scalar_and_array
type(scalar_and_array) :: scalar_arr
!$omp target map(tofrom: scalar_arr%int, scalar_arr%real)
''''
Current cases of derived type mapping left for future work are:
> explicit member mapping of nested members (e.g. two layers of
record types where we explicitly map a member from the internal
record type)
> Fortran's automagical mapping of all elements and nested elements
of a derived type
> explicit member mapping of a derived type and then constituient members
(redundant in Fortran due to former case but still legal as far as I am aware)
> explicit member mapping of a record type (may be handled reasonably, just
not fully tested in this iteration)
> explicit member mapping for Fortran allocatable types (a variation of nested
record types)
This patch seeks to support this by extending the Flang-new OpenMP lowering to
support generation of this newly required information, creating the neccessary
parent <-to-> member map_info links, calculating the member indices and
setting if it's a partial map.
The OMPDescriptorMapInfoGen pass has also been generalized into a map
finalization phase, now named OMPMapInfoFinalization. This pass was extended
to support the insertion of member maps into the BlockArg and MapOperands of
relevant map carrying operations. Similar to the method in which descriptor types
are expanded and constituient members inserted.
Pull Request: https://github.com/llvm/llvm-project/pull/82853
The lowering produces fir.dummy_scope operation if the current
function has dummy arguments. Each hlfir.declare generated
for a dummy argument is then using the result of fir.dummy_scope
as its dummy_scope operand. This is only done for HLFIR.
I was not able to find a reliable way to identify dummy symbols
in `genDeclareSymbol`, so I added a set of registered dummy symbols
that is alive during the variables instantiation for the current
function. The set is initialized during the mapping of the dummy
argument symbols to their MLIR values. It is reset right after
all variables are instantiated - this is done to avoid generating
hlfir.declare operations with dummy_scope for the clones of
the dummy symbols (e.g. this happens with OpenMP privatization).
If this can be done in a cleaner way, please advise.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc75.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
This patch performs several cleanups with the main purpose of
normalizing the code patterns used to trigger codegen for MLIR OpenMP
operations and making the processing of clauses and constructs
independent. The following changes are made:
- Clean up unused `directive` argument to
`ClauseProcessor::processMap()`.
- Move general helper functions in OpenMP.cpp to the appropriate section
of the file.
- Create `gen<OpName>Clauses()` functions containing the clause
processing code specific for the associated OpenMP construct.
- Update `gen<OpName>Op()` functions to call the corresponding
`gen<OpName>Clauses()` function.
- Sort calls to `ClauseProcessor::process<ClauseName>()` alphabetically,
to avoid inadvertently relying on some arbitrary order. Update some
tests that broke due to the order change.
- Normalize `genOMP()` functions so they all delegate the generation of
MLIR to `gen<OpName>Op()` functions following the same pattern.
- Only process `nowait` clause on `TARGET` constructs if not compiling
for the target device.
A later patch can move the calls to `gen<OpName>Clauses()` out of
`gen<OpName>Op()` functions and passing completed clause structures
instead, in preparation to supporting composite constructs. That will
make it possible to reuse clause processing for a given leaf construct
when appearing alone or in a combined or composite construct, while
controlling where the associated code is produced.
This patch updates Flang lowering to use the new set of OpenMP clause
operand structures and their groupings into directive-specific sets of
clause operands.
It simplifies the passing of information from the clause processor and
the creation of operations.
The `DataSharingProcessor` is slightly modified to not hold delayed
privatization state. Instead, optional arguments are added to
`processStep1` which are only passed when delayed privatization is used.
This enables using the clause operand structure for `private` and
removes the need for the ad-hoc `DelayedPrivatizationInfo` structure.
The processing of the `schedule` clause is updated to process the
`chunk` modifier rather than requiring two separate calls to the
`ClauseProcessor`.
Lowering of a block-associated `ordered` construct is updated to emit a
TODO error if the `simd` clause is specified, since it is not currently
supported by the `ClauseProcessor` or later compilation stages.
Removed processing of `schedule` from `omp.simdloop`, as it doesn't
apply to `simd` constructs.