This pass will mark functions called from TargetOp's
and declare target functions as implicitly declare
target by adding the MLIR declare target attribute
directly to the function.
This pass executes after the initial lowering of Fortran's PFT
to MLIR (FIR/OMP+Arith etc.) and is one of a series of passes
that aim to clean up the MLIR for offloading (seperate passes
in different patches, one for early outlining, another for declare
target function filtering).
Reviewers: jsjodin, skatrak, kiaranchandramohan
Differential Revision: https://reviews.llvm.org/D154247
This patch adds support for selecting which functions are lowered to LLVM IR
from MLIR depending on declare target information and whether host or device
code is being generated.
The approach proposed by this patch is to perform the filtering in two stages:
- An MLIR transformation pass, which is added to the Flang translation flow
after the `OMPEarlyOutliningPass`. The functions that are kept are those
that match the OpenMP processor (host or device) the compiler invocation
is targeting, according to the presence of the `-fopenmp-is-target-device`
compiler option and declare target information. All functions contaning an
`omp.target` are also kept, regardless of the declare target information of
the function, due to the need for keeping target regions visible for both
host and device compilation.
- A filtering step during translation to LLVM IR, which is peformed for those
functions that were kept because of the presence of a target region inside.
If the targeted OpenMP processor does not match the declare target
information of the function, then it is removed from the LLVM IR after its
contents have been processed and translated. Since they should only contain
an omp.target operation which, in turn, should have been outlined into
another LLVM IR function, the wrapper can be deleted at that point.
Depends on D150328 and D150329.
Differential Revision: https://reviews.llvm.org/D147641
There was a bug with the -funderscoring / -fno-underscoring options from (https://reviews.llvm.org/D140795) that prevented the driver option from controlling the underscoring behaviour and instead the behaviour could only be controlled by the pass option instead of the driver option. The driver test case did not catch the bug and also needed to be updated.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D155042
This patch implements an early outlining transform of omp.target operations in
flang. The pass is needed because optimizations may cross target op region
boundaries, but with the outlining the resulting functions only contain a
single omp.target op plus a func.return, so there should not be any opportunity
to optimize across region boundaries.
The patch also adds an interface to be able to store and retrieve the parent
function name of the original target operation. This is needed to be able to
create correct kernel function names when lowering to LLVM-IR.
Reviewed By: kiranchandramohan, domada
Differential Revision: https://reviews.llvm.org/D154879
Currently the local builder used in IntrinsicCall doesn't have the
fastmath flags passed to it. This results in the fastmath attribute
not being added to certain runtime calls. This patch simply forwards
the fastmath flags from the parent builder.
Differential Revision: https://reviews.llvm.org/D154611
Previously only a constant reference was stored in the FirOpBuilder.
However, a lot of code was merged using
FirOpBuilder builder{rewriter, getKindMapping(mod)};
This is incorrect because the KindMapping returned will go out of scope
as soon as FirOpBuilder's constructor had run. This led to an infinite
loop running some tests using HLFIR (because the stack space containing
the kind mapping was re-used and corrupted).
One solution would have just been to fix the incorrect call sites,
however, as a large number of these had already made it past review, I
decided to instead change FirOpBuilder to store its own copy of the
KindMapping. This is not costly because nearly every time we construct a
KindMapping is exclusively to construct a FirOpBuilder. To make this
common pattern simpler, I added a new constructor to FirOpBuilder which
calls getKindMapping().
Differential Revision: https://reviews.llvm.org/D151881
The old fir.allocmem operation returned a !fir.heap<.> type. The new
fir.alloca operation returns a !fir.ref<.> type. This patch inserts a
fir.convert so that the old type is preserved. This prevents verifier
failures when types returned from fir.if statements don't match the
expected type.
Differential Revision: https://reviews.llvm.org/D151921
Despite me being convinced that the use of divide didn't produce any
divide instructions, it does in fact add more instructions than using
a plain shift operation.
This patch simply changes the divide to a shift right, with an
assert to check that the "divisor" is a power of two.
Reviewed By: kiranchandramohan, tblah
Differential Revision: https://reviews.llvm.org/D151880
In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.
In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.
This change was suggested by @jeanPerier
The old differential revision for this patch was
https://reviews.llvm.org/D150853
Re-applying here fixing the issue which led to the patch being reverted. The
issue was from erasing uses of the allocation operation while still iterating
over those uses (leading to a use-after-free). I have added a regression
test which catches this bug for -fsanitize=address builds, but it is
hard to reliably cause a crash from the use-after-free in normal builds.
Differential Revision: https://reviews.llvm.org/D151728
This patch makes more than 2D arrays work, with a fix for the way that
loop index is calculated. Removing the restriction of number of
dimensions.
This also changes the way that the actual index is calculated, such that
the stride is used rather than the extent of the previous dimension. Some
tests failed without fixing this - this was likely a latent bug in the
2D version too, but found in a test using 3D arrays, so wouldn't
have been found with 2D only. This introduces a division on the index
calculation - however it should be a nice and constant value allowing
a shift to be used to actually divide - or otherwise removed by using
other methods to calculate the result. In analysing code generated with
optimisation at -O3, there are no divides produced.
Some minor refactoring to avoid repeatedly asking for the "rank" of the
array being worked on.
This improves some of the SPEC-2017 ROMS code, in the same way as the
limited 2D array improvements - less overhead spent calculating array
indices in the inner-most loop and better use of vector-instructions.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D151140
In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.
In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.
This change was suggested by @jeanPerier
Differential Revision: https://reviews.llvm.org/D150853
The information needed for translation is now encoded in the dialect
operations and does not require a dedicated pass to be extracted.
Remove the obsolete passes that were performing operand legalization.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D150248
Remove old clause operands from acc.parallel operation since
the new dataOperands is now in place.
private, firstprivate and reductions will receive some redesign but are
not part of the new dataOperands.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D150207
Since the new data operand operations have been added in D148389 and
adopted on acc.data in D149673, the old clause operands are no longer
needed.
The LegalizeDataOpForLLVMTranslation will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.
processOperands will also receive some updates once all the operands
will be coming from an acc data operand operation.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D150155
Since the new data operand operations have been added in D148389 and
adopted on acc.exit_data in D149601, the old clause operands are no longer
needed.
The LegalizeDataOpForLLVMTranslation will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.
processOperands will also receive some updates once all the operands
will be coming from an acc data operand operation.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D150145
Since the new data operand operations have been added in D148389 and
adopted on acc.enter_data in D148721, the old clause operands are no longer
needed.
The LegalizeDataOpForLLVMTranslation will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.
processOperands will also receive some updates once all the operands
will be coming from an acc data operand operation.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D150132
Since the new data operand operations have been added in D148389 and
adopted on acc.update in D149909, the old clause operands are no longer
needed. This is a first patch to start cleaning the OpenACC operations
with data clause operands.
The `LegalizeDataOpForLLVMTranslation` will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.
`processOperands` will also receive some updates once all the operands
will be coming from an acc data operand operation.
Reviewed By: razvanlupusoru, jeanPerier
Differential Revision: https://reviews.llvm.org/D150053
When the default case requires block arguments, they have to be passed
through the cf.br - this piece was missing.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D149484
The AbstractResult pass replaces allocation of function result on the callee
side per an extra argument so that the allocation of the result can be
done on the caller stack.
It looks for the result allocation from the fir.return op, so it needs
to handle (in a transparent way) a fir.declare in the chain between the
allocation and the fir.return.
Reviewed By: vzakhari, clementval
Differential Revision: https://reviews.llvm.org/D149057
Introduce conditional code to identify stride of "one element", and simplify the array accesses for that case.
This allows better loop performance in various benchmarks.
Reviewed By: tblah, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D141306
Similar to D148039 but for the FIR to LLVM IR
conversion pass.
The inner part of the acc.loop has been removed since the rest of the
pipeline is not ready and would raise an error here. This was passing
until now because the acc.loop was discarded completely.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D148057
After the extraction of the TypeConverter, move the header files
to the include dir so the shared library build is fine.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D147979
Previously the mask would be loaded as the appropriate integer type and cast to I1 to pass to
fir.if, however this truncates the integer and so would cast 6 to 0. By loading values as logicals
and casting to I1 this problem is avoided.
Reviewed By: Leporacanthicus
Differential Revision: https://reviews.llvm.org/D144974
Previously COUNT would cast the mask input to logical<4> before passing it
to the runtime function, this has been changed to allow different types of logical.
Reviewed By: tblah
Differential Revision: https://reviews.llvm.org/D144867
This patch adds minloc to the simplify intrinsics pass, supporting calls with KIND or MASK arguments while calls which have BACK, DIM or have a CHARACTER input array are rejected. This patch is targeting exchange2, and in benchmarks provides a ~11% improvement in performance.
Also included are some minor style changes / cleanup in simplifyIntrinsics.cpp.
Reviewed By: vzakhari
Differential Revision: https://reviews.llvm.org/D144103
Some functions (e.g. the main function) end with a call to the STOP
statement instead of a func.return. This is lowered as a call to the
stop runtime function followed by a fir.unreachable. fir.unreachable is
a terminator and so this can cause functions to have no func.return.
The stack arrays pass looks to see which heap allocations have always
been freed by the time a function returns. Without any returns, the pass
does not detect any freed allocations. This patch changes this behaviour
so that fir.unreachable is checked as well as func.return.
This allows 15 heap allocations for array temporaries in spec2017
exchange2's main function to be moved to the stack.
Differential Revision: https://reviews.llvm.org/D143918
When rank > 1, the inital value would be lost on inner loops, leading to the wrong
value to be returned, e.g. This would return T. This patch fixes this to use the correct
inital value for all cases.
```
Integer :: m(0,10)
Any(m .eq 0)
```
Reviewed By: vdonaldson
Differential Revision: https://reviews.llvm.org/D143899
This patch provides a simplified version of the Any intrinsic as well as the All intrinsic
that can be used for inlining or simpiler use cases. These changes are targeting exchange2, and
provide a ~9% performance increase.
Reviewed By: Leporacanthicus, vzakhari
Differential Revision: https://reviews.llvm.org/D142977
This pass implements the `-fstack-arrays` flag. See the RFC in
`flang/docs/fstack-arrays.md` for more information.
Differential revision: https://reviews.llvm.org/D140415
This fixes an issue where the symbols for operations that were not directly
handled by the rewriting in ExternalNameConversion.cpp were not updated
accurately when a FuncOp symbol was modified. Resulting in a name
mismatch between the FuncOp and the operation holding a symbol to
the FuncOp.
This fix works by updating all of the symbols relating to a FuncOp in a
module, this did not show up as an issue previously as fir::CallOps were
getting specific handling and only fir::CallOps were being tested. So
as the more larger case is now being handled the specific handling for
fir::CallOps has been removed (but is still handled by the fix).
Reviewers:
clementval
Differential Revision: https://reviews.llvm.org/D142918