Add FIR generation and LLVMIR translation support for mergeable clause
on task construct. If mergeable clause is present on a task, the
relevant flag in `ompt_task_flag_t` is set and passed to
`__kmpc_omp_task_alloc`.
This is one of 3 PRs in a PR stack that aims to add support for explicit
mapping of allocatable members in derived types.
The primary changes in this PR are the OpenMPToLLVMIRTranslation.cpp
changes, which are small and seek to alter the current member mapping to
add an additional map insertion for pointers. Effectively, if the member
is a pointer (currently indicated by having a varPtrPtr field) we add an
additional map for the pointer and then alter the subsequent mapping of
the member (the data) to utilise the member rather than the parents base
pointer. This appears to be necessary in certain cases when mapping
pointer data within record types to avoid segfaulting on device (due to
incorrect data mapping). In general this record type mapping may be
simplifiable in the future.
There are also additions of tests which should help to showcase the
affect of the changes above.
This patch primarily updates the MapInfoFinalization pass to utilise the
BlockArgument interface. It also shuffles newly added arguments the
MapInfoFinalization passes to the end of the BlockArg/Relevant MapInfo
lists, instead of one prior to the owning descriptor type.
During this it was noted that the use_device_ptr/addr handling of target
data was a little bit too order dependent so I've attempted to make it
less so, as we cannot depend on argument ordering to be the same as
Fortran for any future frontends.
Upstreaming some code cleanups ahead of supporting delayed task
execution.
- Make allocatePrivateVars not need to be a template (it will need to
operate separately on firstprivate and private variables for delayed
task execution so it can't index into lists of all variables in the
operation).
- Use llvm::SmallVectorImpl for function arguments
- collectPrivatizationDecls already reserves size for privateDecls so we
don't need to do that in callers
- Use llvm::zip_equal instead of C-style array indexing
This uses essentially an identical implementation to that used for
ParallelOp. The private variable allocation and deallocation use shared
functions to avoid code duplication. FIRSTPRIVATE variable copying uses
duplicated code for now because I anticipate the implementation
diverging in the near future once I store data for firstprivate
variables in the task description structure.
After enabling delayed privatisation for TASK in flang, one more test in
the fujitsu test suite passes (I haven't looked into why).
This patch improves not-yet-implemented error diagnostics to more
closely follow the format used by Flang lowering for the same kind of
errors. This helps keep some level of uniformity from a user
perspective.
When composite constructs are lowered, clauses for each leaf construct
are lowered before creating the set of loop wrapper operations, using
these outside values to populate their operand lists. Then, when the
loop nest associated to that composite construct is lowered, the binding
of Fortran symbols to the entry block arguments defined by these loop
wrappers is performed, resulting in the creation of `hlfir.declare`
operations in the entry block of the `omp.loop_nest`.
This approach prevents `hlfir.declare` operations related to the binding
and other operations resulting from the evaluation of the clauses from
being inserted between loop wrapper operations, which would be an
illegal MLIR representation. However, this introduces the problem of
entry block arguments defined by a wrapper that then should be used by
one of its nested wrappers, because the corresponding Fortran symbol
would still be mapped to an outside value at the time of gathering the
list of operands for the nested wrapper.
This patch adds operand re-mapping logic to update wrappers without
changing when clauses are evaluated or where the `hlfir.declare`
creation is performed.
This patch improves error reporting in the MLIR to LLVM IR translation
pass for the 'omp' dialect by emitting descriptive errors when
encountering clauses not yet supported by that pass.
Additionally, not-yet-implemented errors previously missing for some
clauses are added, to avoid silently ignoring them.
Error messages related to inlining of `omp.private` and
`omp.declare_reduction` regions have been updated to use the same
format.
This patch unifies the handling of errors passed through the
OpenMPIRBuilder and removes some redundant error messages through the
introduction of a custom `ErrorInfo` subclass.
Additionally, the current list of operations and clauses unsupported by
the MLIR to LLVM IR translation pass is added to a new Lit test to check
they are being reported to the user.
This patch updates the translation of `omp.wsloop` with a nested
`omp.simd` to prevent uses of block arguments defined by the latter from
triggering null pointer dereferences.
This happens because the inner `omp.simd` operation representing
composite `do simd` constructs is currently skipped and not translated,
but this results in block arguments defined by it not being mapped to an
LLVM value. The proposed solution is to map these block arguments to the
LLVM value associated to the corresponding operand, which is defined
above.
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.
This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
The existing conversion inlined private alloc regions and firstprivate
copy regions in mlir, then undoing the modification of the mlir module
before completing the conversion. To make this work, LLVM IR had to be
generated using the wrong mapping for privatised values and then later
fixed inside of OpenMPIRBuilder.
This approach violated an assumption in OpenMPIRBuilder that private
variables would be values not constants. Flang sometimes generates code
where private variables are promoted to globals, the address of which is
treated as a constant in LLVM IR. This caused the incorrect values for
the private variable from being replaced by OpenMPIRBuilder: ultimately
resulting in programs producing incorrect results.
This patch rewrites delayed privatisation for omp.parallel to work more
similarly to reductions: translating directly into LLVMIR with correct
mappings for private variables.
RFC:
https://discourse.llvm.org/t/rfc-openmp-fix-issue-in-mlir-to-llvmir-translation-for-delayed-privatisation/81225
Tested against the gfortran testsuite and our internal test suite.
Linaro's post-commit bots will check against the fujitsu test suite.
I decided to add the new tests as flang integration tests rather than in
mlir/test/Target/LLVMIR:
- The regression test is for an issue filed against flang. i wanted to
keep the reproducer similar to the code in the ticket.
- I found the "worst case" CFG test difficult to reason about in
abstract it helped me to think about what was going on in terms of a
Fortran program.
Fixes#106297
This patch enables lowering to MLIR of the reduction clause of `simd`
constructs. Lowering from MLIR to LLVM IR remains unimplemented, so at
that stage it will result in errors being emitted rather than silently
ignoring it as it is currently done.
On composite `do simd` constructs, this lowering error will remain
untriggered, as the `omp.simd` operation in that case is currently
ignored. The MLIR representation, however, will now contain `reduction`
information.
Adds MLIR to LLVM lowering support for `target ... nowait`. This
leverages the already existings code-gen patterns for `task` by treating
`target ... nowait` as `task ... if(1)` and `target` (without `nowait`)
as `task ... if(0)`; similar to what clang does.
This patch adds extra class declarations to the `omp.declare_reduction`
and `omp.private` operations to access the entry block arguments defined
by their regions. Some existing accesses to these arguments are updated
to use the new named methods to improve code readability.
This patch adds functionality to emit relevant libcalls in case
atomicrmw instruction can not be emitted (for instance, in case of
complex types). The IRBuilder is modified to directly emit __atomic_load
and __atomic_compare_exchange libcalls. The added functions follow a
similar codegen path as Clang, so that LLVM Flang generates almost
similar IR as Clang.
Fixes https://github.com/llvm/llvm-project/issues/83760 and
https://github.com/llvm/llvm-project/issues/75138
Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
This patch updates the `omp.target_data` operation to use the same
formatting as `map` clauses on `omp.target` for `use_device_addr` and
`use_device_ptr`. This is done so the mapping that is being enforced
between op arguments and associated entry block arguments is explicit.
The way it is achieved is by marking these clauses as entry block
argument-defining and adjusting printer/parsers accordingly.
As a result of this change, block arguments for `use_device_addr` come
before those for `use_device_ptr`, which is the opposite of the previous
undocumented situation. Some unit tests are updated based on this
change, in addition to those updated because of the format change.
This patch introduces a new MLIR interface for the OpenMP dialect aimed
at providing a uniform way of verifying and handling entry block
arguments defined by OpenMP clauses.
The approach consists in defining a set of overrideable methods that
return the number of block arguments the operation holds regarding each
of the clauses that may define them. These by default return 0, but they
are overriden by the corresponding clause through the
`extraClassDeclaration` mechanism.
Another set of interface methods to get the actual lists of block
arguments is defined, which is implemented based on the previously
described methods. These implicitly define a standardized ordering
between the list of block arguments associated to each clause, based on
the alphabetical ordering of their names. They should be the preferred
way of matching operation arguments and entry block arguments to that
operation's first region.
Some updates are made to the printing/parsing of `omp.parallel` to
follow the expected order between `private` and `reduction` clauses, as
well as the MLIR to LLVM IR translation pass to access block arguments
using the new interface. Unit tests of operations impacted by additional
verification checks and sorting of entry block arguments.
This patch adds support to translate the `private` clause on
`omp.target` ops from MLIR to LLVMIR. This first cut only handles
non-allocatables. Also, this is for delayed privatization.
This patch updates the use_device_ptr and use_device_addr clauses to use
the mapInfoOps for lowering. This allows all the types that are handle
by the map clauses such as derived types to also be supported by the
use_device_clauses.
This is patch 2/2 in a series of patches.
The intention of this change is to ensure that allocas end up in the
entry block not spread out amongst complex reduction variable
initialization code.
The tests we have are quite minimized for readability and
maintainability, making the benefits less obvious. The use case for this
is when there are multiple reduction variables each will multiple blocks
inside of the init region for that reduction.
2/3
Part 1: https://github.com/llvm/llvm-project/pull/102522
Part 3: https://github.com/llvm/llvm-project/pull/102525
Fixes#102935
Updates matching logic for finding the LLVM value that corresponds to an
MLIR value. We need that matching to find the delayed privatizer for an
LLVM value being privatized.
The issue occures when there is an "indirect" correspondence between
MLIR and LLVM values: in some cases the values we are trying to match
stem from a pair of load/store ops that point to the same memref. This
PR adds such matching logic.
This patch modifies MLIR to LLVM IR lowering of the OpenMP dialect to take into
consideration the contents of the `omp.target_triples` module attribute while
generating code for `omp.target` operations.
It adds the `OpenMPIRBuilderConfig::TargetTriples` field and initializes it
using the `amendOperation` flow of the `OpenMPToLLVMIRTranslation` pass. Some
changes are introduced into the `OpenMPIRBuilder` to allow passing the
information about whether a target region is intended to be offloaded from
outside.
The result of this change is that offloading calls are only generated when the
`--offload-arch` or `-fopenmp-targets` options are given to the compiler.
Otherwise, only the host fallback code is generated. This fixes linker errors
currently triggered by `flang-new` if a source file containing a `target`
construct is compiled without any of the aforementioned options.
Several unit tests impacted by these changes, which are intended to check host
code generated for `omp.target` operations, are updated to contain the new
attribute. Without it, no calls to `__tgt_target_kernel` and associated control
flow operations are generated.
Fixes#100209.
This patch adds the missing `OpenMP_Clause` definitions to all existing
`OpenMP_Op`s and updates their operand structure based builders to
initialize the new arguments.
The result of this change is that operation operand structures are now
based in the same list of clauses as their tablegen counterparts. This
means that all of the information needed is now in place to
automatically generate OpenMP operand structures from tablegen
defitions.
Since this change doesn't involve the introduction of actual support for
these clauses, new arguments are not initialized from values stored in
the corresponding operand structure fields but rather set to empty or
null. Those should be updated when support for these clauses on the
corresponding operation is added.
This patch introduces a new OpenMP clause definition not defined by the spec.
Its main purpose is to define the `loop_inclusive` (previously "inclusive",
renamed according to the parent of this PR in the stack) argument of
`omp.loop_nest` in such a way that a followup implementation of a tablegen
backend to automatically generate clause and operation operand structures
directly from `OpenMP_Op` and `OpenMP_Clause` definitions can properly generate
the `LoopNestOperands` structure.
`collapse` clause arguments are also moved into this new definition, as they
represent information on the loop nests being collapsed rather than the
`collapse` clause itself.
Currently, there are some inconsistencies to how clause arguments are
named in the OpenMP dialect. Additionally, the clause operand structures
associated to them also diverge in certain cases. The purpose of this
patch is to normalize argument names across all `OpenMP_Clause` tablegen
definitions and clause operand structures.
This has the benefit of providing more consistent representations for
clauses in the dialect, but the main short-term advantage is that it
enables the development of an OpenMP-specific tablegen backend to
automatically generate the clause operand structures without breaking
dependent code.
The main re-naming decisions made in this patch are the following:
- Variadic arguments (i.e. multiple values) have the "_vars" suffix.
This and other similar suffixes are removed from array attribute
arguments.
- Individual required or optional value arguments do not have any suffix
added to them (e.g. "val", "var", "expr", ...), except for `if` which
would otherwise result in an invalid C++ variable name.
- The associated clause's name is prepended to argument names that don't
already contain it as part of its name. This avoids future collisions
between arguments named the same way on different clauses and adding
both clauses to the same operation.
- Privatization and reduction related arguments that contain lists of
symbols pointing to privatizer/reducer operations use the "_syms"
suffix. This removes the inconsistencies between the names for
"copyprivate_funcs", "[in]reductions", "privatizers", etc.
- General improvements to names, replacement of camel case for snake
case everywhere, etc.
- Renaming of operation-associated operand structures to use the
"Operands" suffix in place of "ClauseOps", to better differentiate
between clause operand structures and operation operand structures.
- Fields on clause operand structures are sorted according to the
tablegen definition of the same clause.
The assembly format for a few arguments is updated to better reflect the
clause they are associated with:
- `chunk_size` -> `dist_schedule_chunk_size`
- `grain_size` -> `grainsize`
- `simd` -> `par_level_simd`
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.
The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function
This shares code with WsloopOp (the changes to Wsloop should be NFC).
OpenMPIRBuilder basically implements SECTIONS as a wsloop over a case
statement with each SECTION as a case for a particular loopiv value.
Unfortunately it proved very difficult to share code between these and
ParallelOp. ParallelOp does quite a few things differently (doing more
work inside of the bodygen callback and laying out blocks differently).
Aligning reduction implementations for wsloop and parallel will probably
involve functional changes to both, so I won't attempt that in this
commit.
This patch adds support for lowering 'DO SIMD' constructs to MLIR. SIMD
information is now stored in an `omp.simd` loop wrapper, which is
currently ignored by the OpenMP dialect to LLVM IR translation stage.
The end result is that runtime behavior of compiled 'DO SIMD' constructs
does not change after this patch, so 'DO SIMD' still runs like 'DO'
(i.e. SIMD width = 1). However, all of the required information is now
present in the resulting MLIR representation.
To avoid confusion, the previous wsloop-simd.f90 lit test is renamed to
wsloop-schedule.f90 and a new wsloop-simd.f90 test is created to check
the addition of SIMD clauses to the `omp.simd` operation produced when a
'DO SIMD' construct is lowered to MLIR.
This patch updates `OpenMP_Op` definitions to be based on the new set of
`OpenMP_Clause` definitions, and to take advantage of clause-based
automatically-generated argument lists, descriptions, assembly format
and class declarations.
There are also changes introduced to the clause operands structures to
match the current set of tablegen clause definitions. These two are very
closely linked and should be kept in sync. It would probably be a good
idea to try generating clause operands structures from the tablegen
`OpenMP_Clause` definitions in the future.
As a result of this change, arguments for some operations have been
reordered. This patch also addresses this by updating affected operation
build calls and unit tests. Some other updates to tests related to the
order of arguments in the resulting assembly format and others due to
certain previous inconsistencies in the printing/parsing of clauses are
addressed.
The printer and parser functions for the `map` clause are updated, so
that they are able to handle `map` clauses linked to entry block
arguments as well as those which aren't.
This PR causes a build failure in the flang subproject. This is addressed
by the next PR in the stack.
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.
Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.
The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.
For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.
In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).
In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.
Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
Fix for Fujitsu test suite test: 0275_0032.f90. The MLIR to LLVM
translation logic assumed that reduction arguments to an `omp.parallel`
op are always the last set of arguments to the op. However, this is a
wrong assumption since private args come afterward.
Fixes a crash uncovered by test 0007_0019.f90 in the Fujitsu test suite.
Previously, in the `PrivCB`, we cloned the `omp.private` op without
inserting it in the parent module of the original op. This causes issues
whenever there is an op that needs to lookup the parent module (e.g. for
symbol lookup). This PR fixes the issue by cloning in the parent module
instead of creating an orphaned op.
It was incorrect to set the insertion point to the init block after
inlining the initialization region because the code generated in the
init block depends upon the value yielded from the init region. When
there were multiple reduction initialization regions each with multiple
blocks, this could lead to the initilization region being inlined after
the init block which depends upon it.
Moving the insertion point to before inlining the initialization block
turned up further issues around the handling of the terminator for the
initialization block, which are also fixed here.
This fixes a bug in #92430 (but the affected code couldn't compile
before #92430 anyway).
Some of the OpenMP code can change the instruction pointed at by the
insertion point. This leads to an assert in the compiler about
BB->getParent() and IP->getParent() not matching.
The fix is to rebuild the insertionpoint from the block, rather than use
builder.restoreIP.
Also, move some of the alloca generation, rather than skipping back and
forth between insert points (and ensure all the allocas are done before
their users are created).
A simple test, mainly to ensure the minimal reproducer doesn't fail to
compile in the future is also added.