Extend the `amendOperation` mechanism for translating dialect attributes
attached to operations from another dialect when translating MLIR to
LLVM IR. Previously, this mechanism would have no knowledge of the LLVM
IR instructions created for the given operation, making it impossible
for it to perform local modifications such as attaching operation-level
metadata. Collect instructions inserted by the LLVM IR builder and pass
them to `amendOperation`.
This patch removes the val field from the `MapInfoOp`.
Previously when lowering `TargetOp`, the bounds information for the
`BoxValues` were also being mapped. Instead these ops are now cloned
inside the target region to prevent mapping of non reference typed
values.
This block of code was here to create pseudo handling of implicit
captures in target regions to prevent gfortran test regressions and
allow certain pieces of code to function, however, with the introduction
of the IFA patch which adds proper handling of implicits by adding them
to the map operands list alongside explicit mappings at the initial
Fortran -> MLIR generation phase this should no longer be required and
may cause some adverse affects at worse in the future.
Currently, when deleting the device functions in the second stage of filtering during MLIR to LLVM translation we can end up with invalid calls to these functions. This is because of the removal of the EarlyOutliningPass which would have otherwise gotten rid of any such calls.
This patch aims to alter the function filtering pass in the following way:
- Any host function is completely removed.
- Call to the host function are also removed and their uses replaced with Undef values.
- Any host function with target region code is marked to be removed during the the second stage.
- Calls to such functions are still removed and their uses replaced with Undef values.
Co-authored-by: Sergio Afonso <sergio.afonsofumero@amd.com>
This patch removes the OMPEarlyOutliningPass as it is no longer required. The implicit map operand capture has now been moved to the PFT lowering stage.
Depends on #67318.
This patch adds the MLIR translation changes required for add the IsolatedFromAbove and OutlineableOpenMPOpInterface traits to omp.target. It links the newly added block arguments to their corresponding llvm values.
Depends on #67164.
This was a regression introduced by myself in:
6a62707c04
where I too hastily removed the basic handling of implicit captures
we have currently. This will be superseded by all implicit captures
being added to target operations map_info entries in a soon landing
series of patches, however, that is currently not the case so we must
continue to do some basic handling of these captures for the time
being. This patch re-adds that behaviour to avoid regressions.
Unfortunately this means some test changes as well as
getUsedValuesDefinedAbove grabs constants used outside
of the target region which aren't handled particularly
well currently.
This patch seeks to add initial lowering of OpenMP array sections within
target region map clauses from MLIR to LLVM IR.
This patch seeks to support fixed sized contiguous (don't think OpenMP
supports anything other than contiguous sections from my reading but i
could be wrong) arrays initially, before looking toward assumed size and
shaped arrays. The patch also currently does not include stride, it's
left for future work.
Although, assumed size works in some fashion (dummy arguments) with some
minor alterations to the OMPEarlyOutliner, so it is possible changes
made in the IsolatedFromAbove series may allow this to work with no
further required patches.
It utilises the generated omp.bounds to calculate the size of the mapped
OpenMP array (both for sectioned and un-sectioned arrays) as well as the
offset to be passed to the kernel argument structure.
Alongside these changes some refactoring of how map data is handled is
attempted, using a new MapData structure to keep track of information
utilised in the lowering of mapped values.
The initial addition of a more complex createDeviceArgumentAccessor that
utilises capture kinds similarly to (and loosely based on) Clang to
generate different kernel argument accesses is also added.
A similar function for altering how the kernel argument is passed to the
kernel argument structure on the host is also utilised
(createAlteredByCaptureMap), which allows modification of the
pointer/basePointer based on their capture (and bounds information).
It's of note ByRef, is the default for explicit mappings and ByCopy will
be the default for implicit captures, so the former is currently tested
in this patch and the latter is not for the moment.
Remove usage of getElementType in OpenMPTranslation to pave way for
switching to opaque pointers in MLIR and Flang. The approach chosen
stores the elementType in a new field in MapInfo called varType. A
similar approach was chosen for AtomicReadOp in
81767f52f4
This patch makes changes to the early outlining pass to avoid compiler
crashes due to not handling `hlfir.declare` operations correctly. That
pass is intended to eventually be removed (#67319), but in the meantime
this fixes some issues arising in different parts of the OpenMP
offloading compilation process.
The main changes included in this patch are the following:
- Added support for mapped values defined by an `hlfir.declare`
operation. These operations are now kept in outlined target functions,
so that both of their outputs (base and original base) are available to
the corresponding `omp.target`'s map arguments and region.
- Added a fix by @agozillon to prevent unused map clauses from producing
a compiler crash. All these unused mapped variables are added to the
outlined function's inputs.
- Added a fix to the OpenMP translation to MLIR to support integer
arguments to these outlined functions. This enables successfully
compiling and running the tests in
opemp/libomptarget/test/offloading/fortran using HLFIR.
Co-authored-by: agozillon <Andrew.Gozillon@amd.com>
This commit fixes a bug in the Mem2Reg operation erasure order.
Replacing the use-def based topological order with a dominance-based
weak order ensures that no operation is removed before all its uses have
been replaced. The order relation uses the topological order of blocks
and block internal ordering to determine a deterministic operation
order.
Additionally, the reliance on the `DenseMap` key order was eliminated by
switching to a `MapVector`, that gives a deterministic iteration order.
Example:
```
%ptr = alloca ...
...
%val0 = %load %ptr ... // LOAD0
store %val0 %ptr ...
%val1 = load %ptr ... // LOAD1
````
When promoting the slot backing %ptr, it can happen that the LOAD0 was
cleaned before LOAD1. This results in all uses of LOAD0 being replaced
by its reaching definition, before LOAD1's result is replaced by LOAD0's
result. The subsequent erasure of LOAD0 can thus not succeed, as it has
remaining usages.
This patch moves the existing copyInput function
into a lambda argument that can be defined
by a caller to the function.
This allows more flexibility in how the function
is defined, allowing Clang and MLIR to utilise
their own respective functions and types inside
of the lamba without affecting the OMPIRBuilder
itself.
The idea is to eventually replace/build on
the existing copyInput function that's used
and moved into OpenMPToLLVMIRTranslation.cpp
to a slightly more complex implementation
that uses MLIRs map information (primarily
ByRef and ByCapture information at the
moment).
The patch also moves kernel load stores to the top
of the kernel, prior to the first openmp runtime
invocation. Just makes the IR a little closer to Clang.
At the moment, for device a reference pointer is generated in place of
the original declare target global value, this reference pointer is the
pointer that actually receives the data. In Clang the original global
value isn't generated for device, just the reference pointer.
Unfortunately for Flang/MLIR this is currently not the case, as the
declare target attribute is processed after the creation of the global
so we end up with a dead global on device effectively after rewriting
its uses to the new device reference pointer.
It appears I was a little overzealous with the deletion of the declare
target globals for device. The current method breaks in-cases where the
same declare target global is used across two target regions (added a
runtime reproduced in the patch). As it'll effectively delete it before
the second target gets a chance to be written to LLVM IR and have it's
uses rewritten .
I'd like to remove this deletion as the dead global isn't breaking any
code and will likely be removed in later dead code elimination passes,
perhaps a little too heavy handed with the original approach.
This patch fixes two issues introduced by the D149368 patch, one is
a memory leak from using the removeFromParent rather
than eraseFromParent (the erase also had to be moved to not create
use after deletes).
And the other is a possible iterator invalidation bug, better to be safe
than sorry.
This patch fixes:
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:1525:3:
error: default label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:1541:3:
error: default label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
This patch adds initial lowering for DeclareTargetAttr on
GlobalOp's utilising registerTargetGlobalVariable
and getAddrOfDeclareTargetVar from the
OMPIRBuilder.
It also adds initial processing of declare target map
operands, populating the combinedInfo that the
OMPIRBuilder requires to generate kernels and
it's kernel argument structure.
The combination of these additions allows simple mapping
of declare target globals to Target regions, as such a simple
runtime test showcasing this and testing it has been added.
The patch currently does not factor in filtering
based on device_type clauses (e.g. no emission of
globals for device if host specified), this will come in
a future iteration. And for the moment it's only been
tested with 1-D arrays and basic fortran data types,
more complex types (such as user defined derived
types from Fortran, allocatables or Fortran pointers)
may need further work.
reviewers: kiranchandramohan, skatrak
Differential Revision: https://reviews.llvm.org/D149368
This patch adjusts the lower to LLVM-IR inside of
OpenMPToLLVMIRTranslation to faciliate the changes made
to Target related operations to add the new Map related
operations. It also includes adjustments to tests to support
these changes, primarily modifying the MLIR as opposed to
the LLVM-IR, the LLVM-IR should be identical after this patch.
Depends on D158735
Reviewers: kiranchandramohan, TIFitis, razvanlupusoru
Differential Revision: https://reviews.llvm.org/D158737
Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.
Depends on D147217, D147218 and D158278.
Differential Revision: https://reviews.llvm.org/D147219
The OpenACC standard specifies an `atomic` construct in section 2.12 (of
3.3 spec), used to ensure that a specific location is accessed or
updated atomically. Four different clauses are allowed: `read`, `write`,
`update`, or `capture`. If no clause appears, it is as if `update` is
used.
The OpenMP specification defines the same clauses for `omp atomic`. The
types of expression and the clauses in the OpenACC spec match the OpenMP
spec exactly. The main difference is that the OpenMP specification is a
superset - it includes clauses for `hint` and `memory order`. It also
allows conditional expression statements. But otherwise, the expression
definition matches.
Thus, for OpenACC, we refactor and reuse the OpenMP implementation as
follows:
* The atomic operations are duplicated in OpenACC dialect. This is
preferable so that each language's semantics are precisely represented
even if specs have divergence.
* However, since semantics overlap, a common interface between the
atomic operations is being added. The semantics for the interfaces are
not generic enough to be used outside of OpenACC and OpenMP, and thus
new folders were added to hold common pieces of the two dialects.
* The atomic interfaces define common accessors (such as getting `x` or
`v`) which match the OpenMP and OpenACC specs. It also adds common
verifiers intended to be called by each dialect's operation verifier.
* The OpenMP write operation was updated to use `x` and `expr` to be
consistent with its other operations (that use naming based on spec).
The frontend lowering necessary to generate the dialect can also be
reused. This will be done in a follow up change.
This patch fixes a compiler crash that would happen during translation to LLVM
IR if the optional `map` argument of the `omp.target` operation was not
present. A unit test is added to ensure this has been fixed.
Differential Revision: https://reviews.llvm.org/D158722
This patch extends the existing WsLoop reduction IR generation to parallel blocks.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D155157
This patch adds code emission in emitTargetCall to call the OpenMP runtime to
launch an kernel, and to call the fallback host implementation if the launch
fails.
Reviewed By: TIFitis, kiranchandramohan, jdoerfert
Differential Revision: https://reviews.llvm.org/D155633
This patch improves the implementation of a recent function filtering
workaround to address problems uncovered by D154247.
In particular, the problem was related to the removal of functions called from
within target regions. Since target regions have to remain until LLVM IR is
generated, removing these functions from MLIR results in undefined references
any time there are calls to them in a target region. This patch modifies the
MLIR function filtering pass to make these functions "external" rather than
removing them. This way, the processing and lowering of MLIR functions that
will eventually be discarded is still prevented, but no calls to undefined
functions remain either.
Additionally, the approach of just filtering host-only functions during device
compilation, and not filtering device-only functions during host compilation,
is maintained. This is because code generation for device-only functions is
required for host fallback to work.
Depends on D156988
Differential Revision: https://reviews.llvm.org/D155827
Added MLIR support for translating use_device_ptr and use_device_addr clauses for LLVMIR lowering.
- use_device_ptr: The mapped variables marked with use_device_ptr are accessed through a copy of the base pointer mappers. The mapper is copied onto a new temporary pointer variable.
- use_device_addr: The mapped variables marked with use_device_addr are accessed directly through the base pointer mappers.
- If mapping information is not provided explicitly then default map_type of alloc/release is assumed and the map_size is set to 0.
Depends on D152554
Reviewed By: kiranchandramohan, raghavendhra
Differential Revision: https://reviews.llvm.org/D146648
This patch permits map operands to be not specified for the target
data operation. Also emit an error if none of the map, use_device_addr,
or use_device_ptr operands are specified.
Reviewed By: TIFitis
Differential Revision: https://reviews.llvm.org/D156170
This patch adds support for selecting which functions are lowered to LLVM IR
from MLIR depending on declare target information and whether host or device
code is being generated.
The approach proposed by this patch is to perform the filtering in two stages:
- An MLIR transformation pass, which is added to the Flang translation flow
after the `OMPEarlyOutliningPass`. The functions that are kept are those
that match the OpenMP processor (host or device) the compiler invocation
is targeting, according to the presence of the `-fopenmp-is-target-device`
compiler option and declare target information. All functions contaning an
`omp.target` are also kept, regardless of the declare target information of
the function, due to the need for keeping target regions visible for both
host and device compilation.
- A filtering step during translation to LLVM IR, which is peformed for those
functions that were kept because of the presence of a target region inside.
If the targeted OpenMP processor does not match the declare target
information of the function, then it is removed from the LLVM IR after its
contents have been processed and translated. Since they should only contain
an omp.target operation which, in turn, should have been outlined into
another LLVM IR function, the wrapper can be deleted at that point.
Depends on D150328 and D150329.
Differential Revision: https://reviews.llvm.org/D147641
This patch implements an early outlining transform of omp.target operations in
flang. The pass is needed because optimizations may cross target op region
boundaries, but with the outlining the resulting functions only contain a
single omp.target op plus a func.return, so there should not be any opportunity
to optimize across region boundaries.
The patch also adds an interface to be able to store and retrieve the parent
function name of the original target operation. This is needed to be able to
create correct kernel function names when lowering to LLVM-IR.
Reviewed By: kiranchandramohan, domada
Differential Revision: https://reviews.llvm.org/D154879
Key changes:
- Refactor the createTargetData function to make use of the emitOffloadingArrays and emitOffloadingArraysArgument functions to generate code.
- Added a new emitIfClause helper function to allow handling if clauses in a similar fashion to Clang.
- Updated the MLIR side of code to account for changes to createTargetData.
Depends on D149872
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D146557
MLIR version attribute should be lowered to LLVM IR module metadata.
The lowering is done by OpenMPIRBuilder.
Differential Revision: https://reviews.llvm.org/D150574
Reviewed By: kiranchandramohan
This patch adds support in the `OpenMPIRBuilder` for generating working
device code for OpenMP target regions. It generates and handles the
result of a call to `__kmpc_target_init()` at the beginning of the
function resulting from outlining each target region, and it also
generates the matching `__kmpc_target_deinit()` call before returning.
It relies on the implementation of target region outlining for host
codegen to handle the production of the new function and the lowering of
its body based on the contents of the associated target region.
Depends on D147172
Differential Revision: https://reviews.llvm.org/D147940
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123