When composite constructs are lowered, clauses for each leaf construct
are lowered before creating the set of loop wrapper operations, using
these outside values to populate their operand lists. Then, when the
loop nest associated to that composite construct is lowered, the binding
of Fortran symbols to the entry block arguments defined by these loop
wrappers is performed, resulting in the creation of `hlfir.declare`
operations in the entry block of the `omp.loop_nest`.
This approach prevents `hlfir.declare` operations related to the binding
and other operations resulting from the evaluation of the clauses from
being inserted between loop wrapper operations, which would be an
illegal MLIR representation. However, this introduces the problem of
entry block arguments defined by a wrapper that then should be used by
one of its nested wrappers, because the corresponding Fortran symbol
would still be mapped to an outside value at the time of gathering the
list of operands for the nested wrapper.
This patch adds operand re-mapping logic to update wrappers without
changing when clauses are evaluated or where the `hlfir.declare`
creation is performed.
This patch adds methods to `EntryBlockArgs` to access the full list of
entry block argument-related symbols and variables, in their standard
order. This helps centralizing this logic in as few places as possible
to avoid future inconsistencies.
Implement parsing of the AFFINITY clause on TASK construct, conversion
from the parser class to omp::Clause.
Lowering to HLFIR is unsupported, a TODO message is displayed.
Currently, the `omp.simd` operation is ignored during MLIR to LLVM IR
translation when it takes part in a composite construct. One consequence
of this limitation is that any entry block arguments defined by that
operation will trigger a compiler crash if they are used anywhere, as
they are not bound to an LLVM IR value.
A previous PR introducing support for the `reduction` clause resulted in
the creation and use of entry block arguments attached to the `omp.simd`
operation, causing compiler crashes on 'do simd reduction(...)'
constructs.
This patch disables Flang lowering of simd reductions in 'do simd'
constructs to avoid triggering these errors while translation to LLVM IR
is still incomplete.
This patch enables lowering to MLIR of the reduction clause of `simd`
constructs. Lowering from MLIR to LLVM IR remains unimplemented, so at
that stage it will result in errors being emitted rather than silently
ignoring it as it is currently done.
On composite `do simd` constructs, this lowering error will remain
untriggered, as the `omp.simd` operation in that case is currently
ignored. The MLIR representation, however, will now contain `reduction`
information.
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
The bulk of this change are new tests to check that we get a "Not yet
implemneted: *some stuff here*" message when using some not yet
supported OpenMP functionality.
For some of these cases, this also means adding additional clauses to a
filter list in OpenMP.cpp - this changes nothing [to the best of my
understanding] other than allowing the clause to get to the point where
it can be rejected in a TODO with a more clear message. One of the TOOD
filters were missing Mergeable clause, so this was also added and the
existing test updated for the new more specific error message.
There is no functional change intended here.
This patch reverts previous changes to create entry block arguments for
reduction variables attached to `simd` constructs.
This can't currently be done because reduction variables stored in the
corresponding clause structure are not added to the `omp.simd` operation
when created, as this is not supported yet. Adding block arguments for
non-existent reduction variables results in some tests from the Fujitsu
compiler testsuite breaking:
https://linaro.atlassian.net/browse/LLVM-1389.
The main purpose of this patch is to centralize the logic for creating
MLIR operation entry blocks and for binding them to the corresponding
symbols. This minimizes the chances of mixing arguments up for
operations having multiple entry block argument-generating clauses and
prevents divergence while binding arguments.
Some changes implemented to this end are:
- Split into two functions the creation of the entry block, and the
binding of its arguments and the corresponding Fortran symbol. This
enabled a significant simplification of the lowering of composite
constructs, where it's no longer necessary to manually ensure the lists
of arguments and symbols refer to the same variables in the same order
and also match the expected order by the `BlockArgOpenMPOpInterface`.
- Removed redundant and error-prone passing of types and locations from
`ClauseProcessor` methods. Instead, these are obtained from the values
in the appropriate clause operands structure. This also simplifies
argument lists of several lowering functions.
- Access block arguments of already created MLIR operations through the
`BlockArgOpenMPOpInterface` instead of directly indexing the argument
list of the operation, which is not scalable as more entry block
argument-generating clauses are added to an operation.
- Simplified the implementation of `genParallelOp` to no longer need to
define different callbacks depending on whether delayed privatization is
enabled.
Fixes a bug in handling unstructured control-flow in compound loop
constructs. The fix makes sure that unstructured CF does not get lowered
until we reach the last item of the compound construct. This way, we
avoid moving block of unstructured loops in-between the middle items of
the construct and messing (i.e. adding operations) to these block while
doing so.
This patch introduces a new MLIR interface for the OpenMP dialect aimed
at providing a uniform way of verifying and handling entry block
arguments defined by OpenMP clauses.
The approach consists in defining a set of overrideable methods that
return the number of block arguments the operation holds regarding each
of the clauses that may define them. These by default return 0, but they
are overriden by the corresponding clause through the
`extraClassDeclaration` mechanism.
Another set of interface methods to get the actual lists of block
arguments is defined, which is implemented based on the previously
described methods. These implicitly define a standardized ordering
between the list of block arguments associated to each clause, based on
the alphabetical ordering of their names. They should be the preferred
way of matching operation arguments and entry block arguments to that
operation's first region.
Some updates are made to the printing/parsing of `omp.parallel` to
follow the expected order between `private` and `reduction` clauses, as
well as the MLIR to LLVM IR translation pass to access block arguments
using the new interface. Unit tests of operations impacted by additional
verification checks and sorting of entry block arguments.
Starts delayed privatizaiton support for standalone `distribute`
directives. Other flavours of `distribute` are still TODO as well as
MLIR to LLVM IR lowering.
This patch removes the template parameter of the
`ClauseProcessor::processMotionClauses()` method and instead processes
both `TO` and `FROM` as part of a single call. This also enables moving
the implementation out of the header and makes it simpler for a
follow-up patch to potentially refactor `processMap()`,
`processMotionClauses()`, `processUseDeviceAddr()` and
`processUseDevicePtr()`, and minimize code duplication among these.
Currently, Flang throws a "**not yet implemented: Unhandled clause
NONTEMPORAL in SIMD construct**" error when encountering nontemporal
clause. This patch adds support for this clause in SIMD construct.
This patch creates a simple RAII wrapper class for `SymMap` to make it
easier to use and prevent a missing matching `popScope()` for a
`pushScope()` call on simple use cases.
Some push-pop pairs are replaced with instances of the new class by this
patch.
This patch updates the use_device_ptr and use_device_addr clauses to use
the mapInfoOps for lowering. This allows all the types that are handle
by the map clauses such as derived types to also be supported by the
use_device_clauses.
This is patch 1/2 in a series of patches.
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
This patch moves the creation of `DataSharingProcessor` instances for
loop constructs out of `genOMPDispatch()` and into their corresponding
codegen functions. This is a necessary first step to enable a proper
handling of privatization on composite constructs.
Some tests are updated due to a change of order between clause
processing and privatization.
There are some spots where all symbols to privatize collected by a
`DataSharingProcessor` instance are expected to have corresponding entry
block arguments associated regardless of whether delayed privatization
was enabled.
This can result in compiler crashes if a `DataSharingProcessor` instance
created with `useDelayedPrivatization=false` is queried in this way. The
solution proposed by this patch is to provide another public method to
query specifically delayed privatization symbols, which will either be
empty or point to the complete set of symbols to privatize accordingly.
This PR aims to unify the map argument generation behavior across both
the implicit capture (captured in a target region) and the explicit
capture (process map), currently the varPtr field of the MapInfo for the
same variable will be different depending on how it's captured. This PR
tries to align that across the generations of MapInfoOp in the OpenMP
lowering.
Currently, I have opted to utilise the rawInput (input memref to a HLFIR
DeclareInfoOp) as opposed to the addr field which includes more
information. The side affect of this is that we have to deal with
BoxTypes less often, which will result in simpler maps in these cases.
The negative side affect of this is that we don't have access to the
bounds information through the resulting value, however, I believe the
bounds information we require in our case is still appropriately stored
in the map bounds, and this seems to be the case from testing so far.
The other fix is for cases where we end up with a BoxType argument into
a function (certain assumed shape and sizes cases do this) that has no
fir.ref wrapping it. As we need the Box to be a reference type to
actually utilise the operation to access the base address stored inside
and create the correct mappings we currently generate an intermediate
allocation in these cases, and then store into it, and utilise this as
the map argument, as opposed to the original.
However, as we were not sharing the same intermediate allocation across
all of the maps for a variable, this resulted in errors in certain cases
when detatching/attatching the data e.g. via enter and exit. This PR
adjusts this for cases
Currently we only maintain tracking of all intermediate allocations for
the current function scope, as opposed to module. Primarily as the only
case I am aware of that this is required is in cases where we pass
certain types of arguments to functions (so I opted to minimize the
overhead of the pass for now). It could likely be extended to module
scope if required if we find other cases where it's applicable and
causing issues.
After decomposition of OpenMP compound constructs and assignment of
applicable clauses to each leaf construct, composite constructs are then
combined again into a single element in the construct queue. This helped
later lowering stages easily identify composite constructs.
However, as a result of the re-composition stage, the same list of
clauses is used to produce all MLIR operations corresponding to each
leaf of the original composite construct. This undoes existing logic
introducing implicit clauses and deciding to which leaf construct(s)
each clause applies.
This patch removes construct re-composition logic and updates Flang
lowering to be able to identify composite constructs from a list of leaf
constructs. As a result, the right set of clauses is produced for each
operation representing a leaf of a composite construct.
PR stack:
- #102612
- #102613
This patch adds an assert to `genLoopNestClauses` to ensure the number
of symbols and corresponding loop wrapper entry block arguments have the
same size. This is checked by some of the callers, but it makes more
sense moving it into the function itself and avoid having to replicate
it.
This patch removes the `ClauseProcessor::processDefault` method due to
it having been implemented in
`DataSharingProcessor::collectDefaultSymbols` instead.
Also, some `genXyzClauses` functions are updated to avoid triggering
TODO errors for clauses not supported by the corresponding construct and
to keep alphabetical sorting on the order in which clauses are
processed.
This patch sets the omp.composite unit attr for composite wrapper ops
and also add appropriate checks to the verifiers of supported ops for
the presence/absence of the attribute.
This is patch 2/2 in a series of patches. Patch 1 - #102340.
This patch replaces `ConstructQueue::iterator` arguments with
`ConstructQueue::const_iterator` where it's used as a pointer to an
element inside of a `const ConstructQueue &` passed along with it.
Since these functions don't intend to modify the list or any elements in
it, keeping constness consistent between both makes it simpler to work
with.
Flips the delayed privatization switch to be on by default. After the
recent fixes related to delayed privatization, the gfortran test suite
runs successfully with delayed privatization turned on by defuault for
`omp parallel`.
Currently, there are some inconsistencies to how clause arguments are
named in the OpenMP dialect. Additionally, the clause operand structures
associated to them also diverge in certain cases. The purpose of this
patch is to normalize argument names across all `OpenMP_Clause` tablegen
definitions and clause operand structures.
This has the benefit of providing more consistent representations for
clauses in the dialect, but the main short-term advantage is that it
enables the development of an OpenMP-specific tablegen backend to
automatically generate the clause operand structures without breaking
dependent code.
The main re-naming decisions made in this patch are the following:
- Variadic arguments (i.e. multiple values) have the "_vars" suffix.
This and other similar suffixes are removed from array attribute
arguments.
- Individual required or optional value arguments do not have any suffix
added to them (e.g. "val", "var", "expr", ...), except for `if` which
would otherwise result in an invalid C++ variable name.
- The associated clause's name is prepended to argument names that don't
already contain it as part of its name. This avoids future collisions
between arguments named the same way on different clauses and adding
both clauses to the same operation.
- Privatization and reduction related arguments that contain lists of
symbols pointing to privatizer/reducer operations use the "_syms"
suffix. This removes the inconsistencies between the names for
"copyprivate_funcs", "[in]reductions", "privatizers", etc.
- General improvements to names, replacement of camel case for snake
case everywhere, etc.
- Renaming of operation-associated operand structures to use the
"Operands" suffix in place of "ClauseOps", to better differentiate
between clause operand structures and operation operand structures.
- Fields on clause operand structures are sorted according to the
tablegen definition of the same clause.
The assembly format for a few arguments is updated to better reflect the
clause they are associated with:
- `chunk_size` -> `dist_schedule_chunk_size`
- `grain_size` -> `grainsize`
- `simd` -> `par_level_simd`
This patch enables the lastprivate clause to be used in the presence of
the collapse clause.
Note: the way we currently implement lastprivate means that this adds a
large number of compare instructions to the end of every iteration of
the loop. This is a clearly non-optimal thing to do, but lastprivate in
general will need re-implementing to prevent this. This is planned as
part of the delayed privatization work. This current implementation is
just a stop-gap measure as generating sub-optimal but working code is
better than crashing out.
PR adds changes to the flang frontend to create the `MaskedOp` when
`masked` directive is used in the input program. Omp masked is
introduced in 5.2 standard and allows a parallel region to be executed
by threads specified by a programmer. This is achieved with the help of
filter clause which helps to specify thread id expected to execute the
region.
Other related PRs:
- [Fortran Parsing and Semantic
Support](https://github.com/llvm/llvm-project/pull/91432) - Merged
- [MLIR Support](https://github.com/llvm/llvm-project/pull/96022/files)
- Merged
- [Lowering Support](https://github.com/llvm/llvm-project/pull/98401) -
Under Review
The tricky bit here is that we need to generate the reduction symbol
mapping inside each of the nested SECTION constructs. This is a bit
similar to omp.canonical_loop inside of omp.wsloop, except the SECTION
constructs come from the PFT.
To make this work I moved the lowering of the SECTION constructs inside
of the lowering SECTIONS (where reduction information is still
available). This subverts the normal control flow for OpenMP lowering a
bit.
One alternative option I investigated would be to generate the SECTION
CONSTRUCTS as normal as though there were no reduction, and then to fix
them up after control returns back to genSectionsOp. The problem here is
that the code generated for the section body has the wrong symbol
mapping for the reduction variable, so all of the nested code has to be
patched up. In my prototype version this was even more hacky than what
the solution I settled upon.
This patch adds support for lowering 'DISTRIBUTE SIMD' constructs to
MLIR. Translation of `omp.distribute` operations to LLVM IR is still not
supported, so its composition with `omp.simd` isn't either.
This patch adds support for lowering 'DO SIMD' constructs to MLIR. SIMD
information is now stored in an `omp.simd` loop wrapper, which is
currently ignored by the OpenMP dialect to LLVM IR translation stage.
The end result is that runtime behavior of compiled 'DO SIMD' constructs
does not change after this patch, so 'DO SIMD' still runs like 'DO'
(i.e. SIMD width = 1). However, all of the required information is now
present in the resulting MLIR representation.
To avoid confusion, the previous wsloop-simd.f90 lit test is renamed to
wsloop-schedule.f90 and a new wsloop-simd.f90 test is created to check
the addition of SIMD clauses to the `omp.simd` operation produced when a
'DO SIMD' construct is lowered to MLIR.
This patch splits the lowering for `omp.loop_nest` into its own function
and updates lowering for all supported loop wrappers to stop creating
this operation themselves.
Lowering functions for loop constructs are split into "wrapper" and
"standalone" variants, where the "wrapper" version only creates the
specific operation with nothing inside of it and the "standalone"
version calls the former and also handles clause processing and creates
the nested `omp.loop_nest`.
"Wrapper" lowering functions can be used by "composite" lowering
functions in follow-up patches, minimizing code duplication.
Tests broken as a result of reordering between the processing of the
loop wrapper's and the nested `omp.loop_nest`'s clauses are also
updated.
This patch moves the logic associated with the creation of a
`DataSharingProcessor` instance for loop-associated OpenMP leaf
constructs to the `genOMPDispatch` function, avoiding code duplication
for standalone and composite loop constructs.
This also prevents privatization-related allocations to be later made
inside of loop wrappers when support for composite constructs is
implemented.
This patch removes the `outerCombined`, `reductionSymbols` and
`reductionTypes` attributes from the `OpWithBodyGenInfo` structure and
their uses, as they never impact the lowering process or its output.
The `outerCombined` variable is always set to `false`, so in practice it
doesn't represent what its name indicates. Furthermore, initializing it
correctly can result in privatization not being performed in cases where
it should (at least tests doing this together with composite construct
support pointed me in that direction). It seems to be tied to the early
privatization approach, where a redundant alloca could possibly be
avoided in certain cases. With the transition to delayed privatization,
it seems like it won't serve that purpose anymore, since the decision of
what and where privatization-related allocations are inserted will be
postponed to the MLIR to LLVM IR translation stage. Since this feature
is already currently not being used, its potential benefit appears to be
minor and it won't make sense to do once the delayed privatization
approach is rolled out, I propose removing it.
The `reductionSymbols` and `reductionTypes` variables are set in certain
cases but never used. Unless there's a plan where these will be needed,
in which case it would be a better alternative to document it, I believe
we should also remove them.
This patch removes the `outerCombined` argument from `genTargetOp()` and
the `processReduction` argument from `genTargetClauses()`, as they
aren't used.
This PR is related to the following issue:
https://github.com/llvm/llvm-project/issues/63362
It tries to solve the crash (which is now slightly different, since the
issue has been languishing for a while sorry about that I missed the
original issue ping).
The crash occurs due to trying to access the symbol of an
undefined/unnamed main when trying to find a declare target symbol that
has not been specified (but can be assumed based on it's residence in a
function or interface).
The solution in this PR will check if we're trying to retrieve a main
symbol, and then if that is the case, we make sure it exists (due to
being named) before we attempt to retrieve it, this avoids the crash.
However, that's only part of the issue in the above example, the other
is the significant amount of nested directives, I think we are still a
little while away from handling this, I have added a reduced variation
of the test in the issue as a replicator which contains a lesser number
of nesting directives. To push the issue along further, it will likely
be a case of working through a number of variations of nested directives
in conjunction with target + parallel.
However, this PR pushes the issue above to the point where the issue
encountered is identical to the following:
https://github.com/llvm/llvm-project/issues/67231
This patch applies fixes after the updates to OpenMP clause operands, as
well as updating some tests that were impacted by changes to the
ordering or assembly format of some clauses in MLIR.
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.
The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.
For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.
In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).
In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.
Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.