This patch adds PFT to MLIR lowering of teams reductions. Since there is
still no MLIR to LLVM IR translation implemented, compilation of
programs including these constructs will still trigger
not-yet-implemented errors.
Attempt to address the following example from causing an assert or ICE:
```
subroutine test(a)
implicit none
integer :: i
real(kind=real64), dimension(:) :: a
real(kind=real64), dimension(size(a, 1)) :: b
!$omp target map(tofrom: b)
do i = 1, 10
b(i) = i
end do
!$omp end target
end subroutine
```
Where we utilise a Fortran intrinsic (size) to calculate the size of
allocatable arrays and then map it to device.
Allow utility constructs (error and nothing) to appear in the
specification part as well as the execution part. The exception is
"ERROR AT(EXECUTION)" which should only be in the execution part.
In case of ambiguity (the boundary between the specification and the
execution part), utility constructs will be parsed as belonging to the
specification part. In such cases move them to the execution part in the
OpenMP canonicalization code.
Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.
This re-applies #117867 with a small fix that hopefully prevents build
bot failures. The fix is avoiding `dyn_cast` for the result of
`getOperation()`. Instead we can assign the result to `mlir::ModuleOp`
directly since the type of the operation is known statically (`OpT` in
`OperationPass`).
This is a starting PR to implicitly map allocatable record fields.
This PR contains the following changes:
1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that
these utils work on the `mlir::Value` level rather than the
`semantics::Symbol` level. This takes one step towards to enabling
MLIR passes to more easily do some lowering themselves (e.g. creating
`omp.map.bounds` ops for implicitely caputured data like this PR
does).
2. Adds support for implicitely capturing and mapping allocatable fields
in record types.
There is quite some distant to still cover to have full support for
this. I added a number of todos to guide further development.
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
Convert `DataSharingProcessor::symTable` from pointer to reference.
This avoids accidental null pointer dereferences and makes it
possible to use `symTable` when delayed privatization is disabled.
This adds a minimalistic implementation of parsing and semantics for the
ATOMIC COMPARE feature from OpenMP 5.1.
There is no lowering, just a TODO for that part. Some of the Semantics
is also just a comment explaining that more is needed.
Introduces a new conversion pass that rewrites `omp.loop` ops to their
semantically equivalent op nests bases on the surrounding/binding
context of the `loop` op. Not all forms of `omp.loop` are supported yet.
See `isLoopConversionSupported` for more info on which forms are
supported.
Add FIR generation and LLVMIR translation support for mergeable clause
on task construct. If mergeable clause is present on a task, the
relevant flag in `ompt_task_flag_t` is set and passed to
`__kmpc_omp_task_alloc`.
Add ALL variable category, implement semantic checks to verify the
validity of the clause, improve error messages, add testcases.
The variable category modifier is optional since 5.0, make sure we allow
it to be missing. If it is missing, assume "all" in clause conversion.
Extends MLIR lowering support for the `loop` directive by adding
lowering support for the `bind` clause.
Parent PR: https://github.com/llvm/llvm-project/pull/114199, only the
latest commit is relevant to this PR.
Adds initial support for lowering the `loop` directive to MLIR.
The PR includes basic suport and testing for the following clauses:
* `collapse`
* `order`
* `private`
* `reduction`
Parent PR: #113911, only the latest commit is relevant to this PR.
This PR is one of 3 in a PR stack, this is the primary change set which
seeks to extend the current derived type explicit member mapping support
to handle descriptor member mapping at arbitrary levels of nesting. The
PR stack seems to do this reasonably (from testing so far) but as you
can create quite complex mappings with derived types (in particular when
adding allocatable derived types or arrays of allocatable derived types)
I imagine there will be hiccups, which I am more than happy to address.
There will also be further extensions to this work to handle the
implicit auto-magical mapping of descriptor members in derived types and
a few other changes planned for the future (with some ideas on
optimizing things).
The changes in this PR primarily occur in the OpenMP lowering and the
OMPMapInfoFinalization pass.
In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of these
types or do a "deep copy" for lack of a better term, where we map both
the base address and the descriptor as without the copying of both of
these we lack the information in the case of the descriptor to access
the member or attach the pointers data to the pointer and in the latter
case we require the base address to map the chunk of data. Currently we
do not segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future
as it adds a fair bit of a performance penalty to having nestings of
allocatable derived types as an example. The process of mapping all
intermediate descriptor members in a members path only occurs if a
member has an allocatable or object parent in its symbol path or the
member itself is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also generate
the appropriate address for the allocatable member within the derived
type to use as a the varPtr field of the map (for intermediate
allocatable maps and final allocatable mappings). In this case it's
necessary as we can't utilise the usual Fortran::lower functionality
such as gatherDataOperandAddrAndBounds without causing issues later in
the lowering due to extra allocas being spawned which seem to affect the
pointer attachment (at least this is my current assumption, it results
in memory access errors on the device due to incorrect map information
generation). This is similar to why we do not use the MLIR value
generated for this and utilise the original symbol provided when mapping
descriptor types external to derived types. Hopefully this can be
rectified in the future so this function can be simplified and more
closely aligned to the other type mappings. We also make use of
fir::CoordinateOp as opposed to the HLFIR version as the HLFIR version
doesn't support the appropriate lowering to FIR necessary at the moment,
we also cannot use a single CoordinateOp (similarly to a single GEP) as
when we index through a descriptor operation (BoxType) we encounter
issues later in the lowering, however in either case we need access to
intermediate descriptors so individual CoordinateOp's aid this
(although, being able to compress them into a smaller amount of
CoordinateOp's may simplify the IR and perhaps result in a better end
product, something to consider for the future).
The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box types
(or multiple nestings of box types) within derived types, or box type
derived types. This requires expanding each BoxType mapping from one
into two maps and then modifying all of the existing member indices of
the overarching parent mapping to account for the addition of these new
members alongside adjusting the existing member indices to support the
addition of these new maps which extend the original member indices (as
a base address of a box type is currently considered a member of the box
type at a position of 0 as when lowered to LLVM-IR it's a pointer
contained at this position in the descriptor type, however, this means
extending mapped children of this expanded descriptor type to
additionally incorporate the new member index in the correct location in
its own index list). I believe there is a reasonable amount of comments
that should aid in understanding this better, alongside the test
alterations for the pass.
A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr containing the member
indices shareable across the lowering and OMPMapInfoFinalization, this
required moving some functions to the Lower/Support/Utils.h header, and
transforming the lowering structure containing the member index data
into something more similar to the version used in
OMPMapInfoFinalization. There we also some other attempts at tidying
things up in relation to the member index data generation in the
lowering, some of which required creating a logical operator for the
OpenMP ID class so it can be utilised as a map key (it simply utilises
the symbol address for the moment as ordering isn't particularly
important).
Otherwise I have added a set of new tests encompassing some of the
mappings currently supported by this PR (unfortunately as you can have
arbitrary nestings of all shapes and types it's not very feasible to
cover them all).
…mposites (#112686)"
Lowering of reductions in composite operations can now be re-enabled,
since previous commits in this PR stack fix the MLIR representation
produced and it no longer triggers a compiler crash during translation
to LLVM IR.
This reverts commit c44860c8d2.
When composite constructs are lowered, clauses for each leaf construct
are lowered before creating the set of loop wrapper operations, using
these outside values to populate their operand lists. Then, when the
loop nest associated to that composite construct is lowered, the binding
of Fortran symbols to the entry block arguments defined by these loop
wrappers is performed, resulting in the creation of `hlfir.declare`
operations in the entry block of the `omp.loop_nest`.
This approach prevents `hlfir.declare` operations related to the binding
and other operations resulting from the evaluation of the clauses from
being inserted between loop wrapper operations, which would be an
illegal MLIR representation. However, this introduces the problem of
entry block arguments defined by a wrapper that then should be used by
one of its nested wrappers, because the corresponding Fortran symbol
would still be mapped to an outside value at the time of gathering the
list of operands for the nested wrapper.
This patch adds operand re-mapping logic to update wrappers without
changing when clauses are evaluated or where the `hlfir.declare`
creation is performed.
This patch adds methods to `EntryBlockArgs` to access the full list of
entry block argument-related symbols and variables, in their standard
order. This helps centralizing this logic in as few places as possible
to avoid future inconsistencies.
Implement parsing of the AFFINITY clause on TASK construct, conversion
from the parser class to omp::Clause.
Lowering to HLFIR is unsupported, a TODO message is displayed.
Currently, the `omp.simd` operation is ignored during MLIR to LLVM IR
translation when it takes part in a composite construct. One consequence
of this limitation is that any entry block arguments defined by that
operation will trigger a compiler crash if they are used anywhere, as
they are not bound to an LLVM IR value.
A previous PR introducing support for the `reduction` clause resulted in
the creation and use of entry block arguments attached to the `omp.simd`
operation, causing compiler crashes on 'do simd reduction(...)'
constructs.
This patch disables Flang lowering of simd reductions in 'do simd'
constructs to avoid triggering these errors while translation to LLVM IR
is still incomplete.
This patch enables lowering to MLIR of the reduction clause of `simd`
constructs. Lowering from MLIR to LLVM IR remains unimplemented, so at
that stage it will result in errors being emitted rather than silently
ignoring it as it is currently done.
On composite `do simd` constructs, this lowering error will remain
untriggered, as the `omp.simd` operation in that case is currently
ignored. The MLIR representation, however, will now contain `reduction`
information.
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
The bulk of this change are new tests to check that we get a "Not yet
implemneted: *some stuff here*" message when using some not yet
supported OpenMP functionality.
For some of these cases, this also means adding additional clauses to a
filter list in OpenMP.cpp - this changes nothing [to the best of my
understanding] other than allowing the clause to get to the point where
it can be rejected in a TODO with a more clear message. One of the TOOD
filters were missing Mergeable clause, so this was also added and the
existing test updated for the new more specific error message.
There is no functional change intended here.
This patch reverts previous changes to create entry block arguments for
reduction variables attached to `simd` constructs.
This can't currently be done because reduction variables stored in the
corresponding clause structure are not added to the `omp.simd` operation
when created, as this is not supported yet. Adding block arguments for
non-existent reduction variables results in some tests from the Fujitsu
compiler testsuite breaking:
https://linaro.atlassian.net/browse/LLVM-1389.
The main purpose of this patch is to centralize the logic for creating
MLIR operation entry blocks and for binding them to the corresponding
symbols. This minimizes the chances of mixing arguments up for
operations having multiple entry block argument-generating clauses and
prevents divergence while binding arguments.
Some changes implemented to this end are:
- Split into two functions the creation of the entry block, and the
binding of its arguments and the corresponding Fortran symbol. This
enabled a significant simplification of the lowering of composite
constructs, where it's no longer necessary to manually ensure the lists
of arguments and symbols refer to the same variables in the same order
and also match the expected order by the `BlockArgOpenMPOpInterface`.
- Removed redundant and error-prone passing of types and locations from
`ClauseProcessor` methods. Instead, these are obtained from the values
in the appropriate clause operands structure. This also simplifies
argument lists of several lowering functions.
- Access block arguments of already created MLIR operations through the
`BlockArgOpenMPOpInterface` instead of directly indexing the argument
list of the operation, which is not scalable as more entry block
argument-generating clauses are added to an operation.
- Simplified the implementation of `genParallelOp` to no longer need to
define different callbacks depending on whether delayed privatization is
enabled.
Fixes a bug in handling unstructured control-flow in compound loop
constructs. The fix makes sure that unstructured CF does not get lowered
until we reach the last item of the compound construct. This way, we
avoid moving block of unstructured loops in-between the middle items of
the construct and messing (i.e. adding operations) to these block while
doing so.
This patch introduces a new MLIR interface for the OpenMP dialect aimed
at providing a uniform way of verifying and handling entry block
arguments defined by OpenMP clauses.
The approach consists in defining a set of overrideable methods that
return the number of block arguments the operation holds regarding each
of the clauses that may define them. These by default return 0, but they
are overriden by the corresponding clause through the
`extraClassDeclaration` mechanism.
Another set of interface methods to get the actual lists of block
arguments is defined, which is implemented based on the previously
described methods. These implicitly define a standardized ordering
between the list of block arguments associated to each clause, based on
the alphabetical ordering of their names. They should be the preferred
way of matching operation arguments and entry block arguments to that
operation's first region.
Some updates are made to the printing/parsing of `omp.parallel` to
follow the expected order between `private` and `reduction` clauses, as
well as the MLIR to LLVM IR translation pass to access block arguments
using the new interface. Unit tests of operations impacted by additional
verification checks and sorting of entry block arguments.
Starts delayed privatizaiton support for standalone `distribute`
directives. Other flavours of `distribute` are still TODO as well as
MLIR to LLVM IR lowering.
This patch removes the template parameter of the
`ClauseProcessor::processMotionClauses()` method and instead processes
both `TO` and `FROM` as part of a single call. This also enables moving
the implementation out of the header and makes it simpler for a
follow-up patch to potentially refactor `processMap()`,
`processMotionClauses()`, `processUseDeviceAddr()` and
`processUseDevicePtr()`, and minimize code duplication among these.
Currently, Flang throws a "**not yet implemented: Unhandled clause
NONTEMPORAL in SIMD construct**" error when encountering nontemporal
clause. This patch adds support for this clause in SIMD construct.
This patch creates a simple RAII wrapper class for `SymMap` to make it
easier to use and prevent a missing matching `popScope()` for a
`pushScope()` call on simple use cases.
Some push-pop pairs are replaced with instances of the new class by this
patch.
This patch updates the use_device_ptr and use_device_addr clauses to use
the mapInfoOps for lowering. This allows all the types that are handle
by the map clauses such as derived types to also be supported by the
use_device_clauses.
This is patch 1/2 in a series of patches.
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com