Commit Graph

79 Commits

Author SHA1 Message Date
mlevesquedion
ddc9892999 Add operands to worklist when only used by deleted op (#86990)
I believe the existing check to determine if an operand should be added
is incorrect: `operand.use_empty() || operand.hasOneUse()`. This is
because these checks do not take into account the fact that the op is
being deleted. It hasn't been deleted yet, so `operand.use_empty()`
cannot be true, and `operand.hasOneUse()` may be true if the op being
deleted is the only user of the operand and it only uses it once, but it
will fail if the operand is used more than once (e.g. something like
`add %0, %0`).

Instead, check if the op being deleted is the only _user_ of the
operand. If so, add the operand to the worklist.

Fixes #86765
2024-03-29 21:38:41 +01:00
Sergio Afonso
d84252e064 [MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393)
This patch proposes the renaming of certain OpenMP dialect operations with the
goal of improving readability and following a uniform naming convention for
MLIR operations and associated classes. In particular, the following operations
are renamed:

- `omp.map_info` -> `omp.map.info`
- `omp.target_update_data` -> `omp.target_update`
- `omp.ordered_region` -> `omp.ordered.region`
- `omp.cancellationpoint` -> `omp.cancellation_point`
- `omp.bounds` -> `omp.map.bounds`
- `omp.reduction.declare` -> `omp.declare_reduction`

Also, the following MLIR operation classes have been renamed:

- `omp::TaskLoopOp` -> `omp::TaskloopOp`
- `omp::TaskGroupOp` -> `omp::TaskgroupOp`
- `omp::DataBoundsOp` -> `omp::MapBoundsOp`
- `omp::DataOp` -> `omp::TargetDataOp`
- `omp::EnterDataOp` -> `omp::TargetEnterDataOp`
- `omp::ExitDataOp` -> `omp::TargetExitDataOp`
- `omp::UpdateDataOp` -> `omp::TargetUpdateOp`
- `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp`
- `omp::WsLoopOp` -> `omp::WsloopOp`
2024-03-20 11:19:38 +00:00
Tom Eccles
1f1e0948f2 [flang] run CFG conversion on omp reduction declare ops (#84953)
Most FIR passes only look for FIR operations inside of functions (either
because they run only on func.func or they run on the module but iterate
over functions internally). But there can also be FIR operations inside
of fir.global, some OpenMP and OpenACC container operations.

This has worked so far for fir.global and OpenMP reductions because they
only contained very simple FIR code which doesn't need most passes to be
lowered into LLVM IR. I am not sure how OpenACC works.

In the long run, I hope to see a more systematic approach to making sure
that every pass runs on all of these container operations. I will write
an RFC for this soon.

In the meantime, this pass duplicates the CFG conversion pass to also
run on omp reduction operations. This is similar to how the
AbstractResult pass is already duplicated for fir.global operations.

OpenMP array reductions 2/6
Previous PR: https://github.com/llvm/llvm-project/pull/84952
Next PR: https://github.com/llvm/llvm-project/pull/84954

---------

Co-authored-by: Mats Petersson <mats.petersson@arm.com>
2024-03-20 09:47:49 +00:00
David Green
2a95fe481d [Flang] Allow Intrinsic simpification with min/maxloc dim and scalar result (#81619)
This makes an adjustment to the existing fir minloc/maxloc generation
code to handle functions with a dim=1 that produce a scalar result. This
should allow us to get the same benefits as the existing generated
minmax reductions.

This is a recommit of #76194 with an extra alteration to the end of
genRuntimeMinMaxlocBody to make sure we convert the output array to the
correct type (a `box<heap<i32>>`, not `box<heap<array<1xi32>>>`) to
prevent writing the wrong type of box into it. This still allocates the
data as a `array<1xi32>`, converting it into a i32 assuming that is
safe. An alternative would be to allocate the data as a i32 and change
more of the accesses to it throughout genRuntimeMinMaxlocBody.
2024-03-02 14:39:59 +00:00
David Green
7242896233 [Flang] Attempt to fix Nan handling in Minloc/Maxloc intrinsic simplification (#82313)
In certain case "extreme" values like Nan, Inf and 0xffffffff could lead
to generating different code via the inline-generated intrinsics vs the
versions in the runtimes (and other compilers like gfortran). There are
some examples I was using for testing in
https://godbolt.org/z/x4EfqEss5.

This changes the generation for the intrinsics to be more like the
runtimes, using a condition that is similar to:
  isFirst || (prev != prev && elem == elem) || elem < prev
The middle part is only used for floating point operations, and checks
if the values are Nan. This should then hopefully make the logic closer
to - return the first element with the lowest value, with Nans ignored
unless there are only Nans. The initial limit value for floats are also
changed from the largest float to Inf, to make sure it is handled
correctly.

The integer reductions are also changed to use a similar scheme to make
sure they work with masked values. This means that the preamble after
the loop can be removed.
2024-02-21 09:31:29 +00:00
agozillon
95fe47ca7e [Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices (#71766)
This patch seeks to add an initial lowering for pointers and allocatable variables 
captured by implicit and explicit map in Flang OpenMP for Target operations that 
take map clauses e.g. Target, Target Update. Target Exit/Enter etc.

Currently this is done by treating the type that lowers to a descriptor 
(allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's
effectively what descriptor types lower to in LLVM-IR and what they're represented as
in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a structure
containing scalar and array elements that represent various aspects of the underlying
data being mapped (lower bound, upper bound, extent being the main ones of interest
in most cases) and a pointer to the allocated data. In this current iteration of the mapping
we map the structure in it's entirety and then attach the underlying data pointer and map
the data to the device, this allows most of the required data to be resident on the device
for use. Currently we do not support the addendum (another block of pointer data), but
it shouldn't be too difficult to extend this to support it.

The MapInfoOp generation for descriptor types is primarily handled in an optimization
pass, where it expands BoxType (descriptor types) map captures into two maps, one for
the structure (scalar elements) and the other for the pointer data (base address) and
links them in a Parent <-> Child relationship. The later lowering processes will then treat
them as a conjoined structure with a pointer member map.
2024-02-05 18:45:07 +01:00
Slava Zakharin
3e47e75feb [flang] Use DataLayout for computing type size in LoopVersioning. (#79778)
The existing type size computation in LoopVersioning does not work
for REAL*10, because the compute element size is 10 bytes,
which violates the power-of-two assertion.
We'd better use the DataLayout for computing the storage size
of each element of an array of the given type.
2024-01-29 09:14:47 -08:00
David Green
223d3dabc8 [Flang] Minloc elemental intrinsic lowering (#74828)
Currently the lowering of a minloc intrinsic with a mask will look something
like:
  %e = hlfir.elemental %shape ({
    ...
  })
  %m = hlfir.minloc %array mask %e
  hlfir.assign %m to %result
  hlfir.destroy %m
The elemental will be expanded into a temporary+loop, the minloc into a
FortranAMinloc call (which hopefully gets simplified to a specialized call that
can be inlined at the call site), and the assign might get expanded to a
FortranAAssign. It would be better to generate the entire construct as single
loop if we can - one that performs the minloc calculation with the mask
elemental computed inline.

This patch attempt to do that, adding a hlfir version of the expansion code
from SimplifyIntrinsics that turns an minloc+elemental into a single combined
loop nest. It attempts to reuse the methods in genMinlocReductionLoop for
constructing the loop with a modified loop body. The declaration for the
function is currently in Optimizer/Support/Utils.h, but there might be a better
place for it.

It is added as part of the OptimizedBufferizationPass, like the
similar count/any/all that have been added recently.
2024-01-25 12:17:12 +00:00
David Green
49212d1601 [Flang] Fix for replacing loop uses in LoopVersioning pass (#77899)
The added test case has a loop that is versioned, which has a use of the
loop in an if block after the loop. The current code replaces all uses
of the loop with the new version If, but only if the parent blocks
match. As far as I can see it should be safe to replace all the uses,
then construct the result for the If with op.op.
2024-01-20 22:16:05 +00:00
Christian Ulmann
fa5255eee2 [MLIR][LLVM] Enable export of DISubprograms on function declarations (#78026)
This commit changes the MLIR to LLVMIR export to also attach subprogram
debug attachements to function declarations.
This commit additonally fixes the two passes that produce subprograms to
not attach the "Definition" flag to function declarations. This
otherwise results in invalid LLVM IR.
2024-01-15 07:34:13 +01:00
Christian Ulmann
bae1fdea71 [MLIR][LLVM] Add distinct identifier to the DISubprogram attribute (#77093)
This commit adds an optional distinct attribute parameter to the
DISubprogramAttr. This enables modeling of distinct subprograms, as
required for LLVM IR. This change is required to avoid accidential
uniquing of subprograms on functions that would lead to invalid LLVM IR
post export.
2024-01-08 08:25:30 +01:00
Christian Ulmann
b3037ae1fc [MLIR][LLVM] Add distinct identifier to DICompileUnit attribute (#77070)
This commit adds a distinct attribute parameter to the DICompileUnit to
enable the modeling of distinctness. LLVM requires DICompileUnits to be
distinct and there are cases where one gets two equivalent compilation
units but LLVM still requires differentiates them. We observed such
cases for combinations of LTO and inline functions.

This patch also changes the DIScopeForLLVMFuncOp pass to a module pass,
to ensure that only one distinct DICompileUnit is created, instead of
one for each function.
2024-01-08 07:42:33 +01:00
Pete Steinfeld
4f59a38821 Revert #76194 (#76987)
[Flang] Revert "Allow Intrinsic simpification with min/maxloc dim
and…scalar result (#76194)"

This reverts commit 9b7cf5bfb0.

See merge request #76194.

This change was causing several failures in our internal tests. I'm
reverting now and will work on creating a test that David Green can use
to reproduce the problem.
2024-01-04 10:19:50 -08:00
David Green
9b7cf5bfb0 [Flang] Allow Intrinsic simpification with min/maxloc dim and scalar result (#76194)
This makes an adjustment to the existing fir minloc/maxloc generation
code to handle functions with a dim=1 that produce a scalar result. This
should allow us to get the same benefits as the existing generated
minmax reductions.

This is a recommit of #75820 with the typename added to the generated
function.
2024-01-02 11:09:18 +00:00
Pete Steinfeld
0cf3af0c51 Revert "[Flang] Allow Intrinsic simpification with min/maxloc dim and… (#76184)
… scalar result. (#75820)"

This reverts commit 701f647905.

The commit breaks some uses of the 'maxloc' intrinsic.

See PR #75820
2023-12-21 13:14:05 -08:00
David Green
701f647905 [Flang] Allow Intrinsic simpification with min/maxloc dim and scalar result. (#75820)
This makes an adjustment to the existing fir minloc/maxloc generation
code to handle functions with a dim=1 that produce a scalar result. This
should allow us to get the same benefits as the existing generated
minmax reductions.
2023-12-20 12:12:12 +00:00
David Green
9bb47f7f8b [Flang] Add Maxloc to fir simplify intrinsics pass (#75463)
This takes the code from D144103 and extends it to maxloc, to allow the
simplifyMinMaxlocReduction method to work with both min and max
intrinsics by switching condition and limit/initial value.
2023-12-18 07:59:51 +00:00
Tom Eccles
fcd06d774d [mlir][flang] add fast math attribute to fcmp (#74315)
`llvm.fcmp` does support fast math attributes therefore so should
`arith.cmpf`.

The heavy churn in flang tests are because flang sets
`fastmath<contract>` by default on all operations that support the fast
math interface. Downstream users of MLIR should not be so effected.

This was requested in https://github.com/llvm/llvm-project/issues/74263
2023-12-06 10:19:48 +00:00
Akash Banerjee
8701b178e0 [MLIR][OpenMP] Changes to function-filtering pass (#71850)
Currently, when deleting the device functions in the second stage of filtering during MLIR to LLVM translation we can end up with invalid calls to these functions. This is because of the removal of the EarlyOutliningPass which would have otherwise gotten rid of any such calls.

This patch aims to alter the function filtering pass in the following way:
	- Any host function is completely removed.
	- Call to the host function are also removed and their uses replaced with Undef values.
	- Any host function with target region code is marked to be removed during the the second stage.
	- Calls to such functions are still removed and their uses replaced with Undef values.

Co-authored-by: Sergio Afonso <sergio.afonsofumero@amd.com>
2023-11-14 12:43:31 +00:00
jeanPerier
f35f863a88 [flang][NFC] Use hlfir=false and flang-deprecated-no-hlfir in legacy tests (#71957)
Patch 2/3 of the transition step 1 described in

https://discourse.llvm.org/t/rfc-enabling-the-hlfir-lowering-by-default/72778/7.

All the modified tests are still here since coverage for the direct
lowering to FIR was still needed while it was default. Some already have
an HLFIR version, some have not and will need to be ported in step 2
described in the RFC.

Note that another 147 lit tests use -emit-fir/-emit-llvm outputs but do
not need a flag since the HLFIR/no HLFIR output is the same for what is
being tested.
2023-11-13 09:14:05 +01:00
Fabian Mora
fd389f46de [flang] Change uniqueCGIdent separator from . to X (#71338)
Change the separator in the `uniqueCGIdent` method to `X`. This change
is required to enable OpenMP offloading for the NVPTX target, as dots
are not valid identifiers in PTX and `uniqueCGIdent` is used to mangle
some literals. Follow up patches will change the remainder of `.`
appearances in names to `X` and add support for the NVPTX target.
2023-11-08 15:04:00 -05:00
Tom Eccles
6242c8ca18 [flang] add TBAA tags to global and direct variables
These turn out to be useful for spec2017/fotonik3d and safe so long as
they are not used along side TBAA tags for local allocations. LLVM may
be able to figure out local allocations by itself anyway.

PR #68727
2023-10-25 10:47:51 +00:00
Mats Petersson
8dcee5800c [flang]Check for dominance in loop versioning (#68797)
This avoids trying to version loops that can't be versioned, and thus
avoids hitting an assert.

Co-authored with Slava Zakharin (who provided the test-code).
2023-10-12 13:07:16 +01:00
Tom Eccles
df5c27869c [flang][FIR] add FIR TBAA pass
See RFC at
https://discourse.llvm.org/t/rfc-propagate-fir-alias-analysis-information-using-tbaa/73755

This pass adds TBAA tags to all accesses to non-pointer/target dummy
arguments. These TBAA tags tell LLVM that these accesses cannot alias:
allowing better dead code elimination, hoisting out of loops, and
vectorization.

Each function has its own TBAA tree so that accesses between funtions
MayAlias after inlining.

I also included code for adding tags for local allocations and for
global variables. Enabling all three kinds of tag is known to produce a
miscompile and so these are disabled by default. But it isn't much
code and I thought it could be interesting to play with these later if
one is looking at a benchmark which looks like it would benefit from
more alias information. I'm open to removing this code too.

TBAA tags are also added separately by TBAABuilder during CodeGen.
TBAABuilder has to run during CodeGen because it adds tags to box
accesses, many of which are implicit in FIR. This pass cannot (easily)
run in CodeGen because fir::AliasAnalysis has difficulty tracing values
between blocks, and by the time CodeGen runs, structured control flow
has already been lowered.

Coming in follow up patches
  - Change CodeGen/TBAABuilder to use TBAAForest to add tags within the
    same per-function trees as are used here (delayed to a later patch
    to make it easier to revert)
  - Command line argument processing to actually enable the pass
2023-10-11 14:29:47 +00:00
Slava Zakharin
7beb65ae2d [flang] Fixed LoopVersioning for array slices. (#65703)
The first test case added in the LIT test demonstrates the problem.
Even though we did not consider the inner loop as a candidate for
the transformation due to the array_coor with a slice, we decided to
version the outer loop for the same function argument.
During the cloning of the outer loop we dropped the slicing completely
producing invalid code.

I restructured the code so that we record all arg uses that cannot be
transformed (regardless of the reason), and then fixup the usage
information across the loop nests. I also noticed that we may generate
redundant contiguity checks for the inner loops, so I fixed it
since it was easy with the new way of keeping the usage data.
2023-09-08 09:01:10 -07:00
Tom Eccles
ad9af7de90 [flang][LoopVersioning] support fir.array_coor
This is the last piece required for the loop versioning patch to work on
code lowered via HLFIR. With this patch, HLFIR performance on spec2017
roms is now similar to the FIR lowering.

Adding support for fir.array_coor means that many more loops will be
versioned, even in the FIR lowering. So far as I have seen, these do not
seem to have an impact on performance for the benchmarks I tried, but I
expect it would speed up some programs, if the loop being versioned
happened to be the hot code.

The main difference between fir.array_coor and fir.coordinate_of is
that fir.coordinate_of uses zero-based indices, whereas fir.array_coor
uses the indices as specified in the Fortran program (starting from 1 by
default, but also supporting non default lower bounds). I opted to
transform fir.array_coor operations into fir.coordinate_of operations
because this allows both to share the same offset calculation logic.

The tricky bit of this patch is getting the correct lower bounds for the
array operand to subtract from the fir.array_coor indices to get a
zero-based indices. So far as I can tell, the FIR lowering will always
provide lower bounds (shift) information in the shape operand to the
fir.array_coor when non-default lower bounds are used. If none is given,
I originally tried falling back to reading lower bounds from the box,
but this led to misscompilation in SPEC2017 cam4. Therefore the pass
instead assumes that if it can't already find an SSA value for the shift
information, the default lower bound (1) should be used.

A suspect the incorrect lower bounds in the box for the FIR lowering was
already a known issue (see https://reviews.llvm.org/D158119).

Differential Revision: https://reviews.llvm.org/D158597
2023-09-04 10:40:40 +00:00
Slava Zakharin
cccf4d6e4a [flang] Skip OPTIONAL arguments in LoopVersioning.
This patch fixes multiple tests failing with segfault due to accessing
absent argument box before the loop versioning check.
The absent arguments might be treated as contiguous for the purpose
of loop versioning, but this is not done in this patch.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D158800
2023-08-25 08:33:49 -07:00
Tom Eccles
8d24b7322e [flang][LoopVersioning] support reboxed operands
Since https://reviews.llvm.org/D158119, many boxes lowered via HLFIR are
reboxed with better lower bounds information after they are declared.

For the loop versioning pass to support FIR lowered via HLFIR, it needs
to dereference fir.rebox operations to figure out that the variable was
a function argument.

I decided to modify the existing dereferencing of fir.declare so that
the declared/reboxed value is used in the versioned loop instead of the
function argument. This makes it easier for the improved lower bounds
information to be accessed. In doing this, I changed ArgInfo to store
ArgInfo::arg by value instead of by pointer because mlir::Value has
value-type semantics.

Differential Revision: https://reviews.llvm.org/D158408
2023-08-23 09:53:05 +00:00
Slava Zakharin
89b98c13e0 [flang] Fixed simplification for FP maxval.
On x86, a simplified F128 maxval ends up calling fmaxl that does not
work properly for F128 arguments. It is probably an LLVM issue, but
we also should not use arith.maxf if NaN or -0.0 operands are possible.
The change is to use cmpf and select. Unfortunately, these arith ops
do not support FastMathFlags currently, so I will have to fix this
sooner or later (depending on how this affects performance).

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D158200
2023-08-21 19:33:56 -07:00
Tom Eccles
05011024fd [flang][LoopVersioning] support fir.declare
When FIR comes from HLFIR, there will be a fir.declare operation between
the source and the usage of each source variable (and some temporary
allocations). This pass needs to be able to follow these so that it can
still transform loops when HLFIR is used, otherwise it mistakenly
assumes these values are not function arguments.

More work is needed after this patch to fully support HLFIR, because the
generated code tends to use fir.array_coor instead of fir.coordinate_of.

Differential Revision: https://reviews.llvm.org/D157964
2023-08-18 09:51:22 +00:00
Sergio Afonso
f20b67a81c [Flang][MLIR][OpenMP] Improve device-only function filtering
This patch improves the implementation of a recent function filtering
workaround to address problems uncovered by D154247.

In particular, the problem was related to the removal of functions called from
within target regions. Since target regions have to remain until LLVM IR is
generated, removing these functions from MLIR results in undefined references
any time there are calls to them in a target region. This patch modifies the
MLIR function filtering pass to make these functions "external" rather than
removing them. This way, the processing and lowering of MLIR functions that
will eventually be discarded is still prevented, but no calls to undefined
functions remain either.

Additionally, the approach of just filtering host-only functions during device
compilation, and not filtering device-only functions during host compilation,
is maintained. This is because code generation for device-only functions is
required for host fallback to work.

Depends on D156988

Differential Revision: https://reviews.llvm.org/D155827
2023-08-10 11:29:45 +01:00
Matt Arsenault
be7b385f36 flang: Update stacksave/stackrestore intrinsic uses
Suboptimal fix after 25bc999d1f. Ideally
this would go through a builder and use the proper alloca type instead
of using hardcoded mangled names.
2023-08-09 19:35:06 -04:00
Sergio Afonso
debdfc0ae2 [Flang][OpenMP][MLIR] Filter emitted code depending on declare target and device
This patch adds support for selecting which functions are lowered to LLVM IR
from MLIR depending on declare target information and whether host or device
code is being generated.

The approach proposed by this patch is to perform the filtering in two stages:
  - An MLIR transformation pass, which is added to the Flang translation flow
    after the `OMPEarlyOutliningPass`. The functions that are kept are those
    that match the OpenMP processor (host or device) the compiler invocation
    is targeting, according to the presence of the `-fopenmp-is-target-device`
    compiler option and declare target information. All functions contaning an
    `omp.target` are also kept, regardless of the declare target information of
    the function, due to the need for keeping target regions visible for both
    host and device compilation.
  - A filtering step during translation to LLVM IR, which is peformed for those
    functions that were kept because of the presence of a target region inside.
    If the targeted OpenMP processor does not match the declare target
    information of the function, then it is removed from the LLVM IR after its
    contents have been processed and translated. Since they should only contain
    an omp.target operation which, in turn, should have been outlined into
    another LLVM IR function, the wrapper can be deleted at that point.

Depends on D150328 and D150329.

Differential Revision: https://reviews.llvm.org/D147641
2023-07-17 09:07:54 +01:00
Kiran Chandramohan
10e3ed9919 [Flang][Debug] NFC: Correct the REQUIRES line to use system-linux
Reviewed By: kkwli0

Differential Revision: https://reviews.llvm.org/D153126
2023-06-21 12:11:03 +01:00
Tom Eccles
775de6754a [flang] convert stack arrays allocation to match old type
The old fir.allocmem operation returned a !fir.heap<.> type. The new
fir.alloca operation returns a !fir.ref<.> type. This patch inserts a
fir.convert so that the old type is preserved. This prevents verifier
failures when types returned from fir.if statements don't match the
expected type.

Differential Revision: https://reviews.llvm.org/D151921
2023-06-05 09:57:57 +00:00
Mats Petersson
b812932b35 [FLANG] Change loop versioning to use shift instead of divide
Despite me being convinced that the use of divide didn't produce any
divide instructions, it does in fact add more instructions than using
a plain shift operation.

This patch simply changes the divide to a shift right, with an
assert to check that the "divisor" is a power of two.

Reviewed By: kiranchandramohan, tblah

Differential Revision: https://reviews.llvm.org/D151880
2023-06-01 19:29:57 +01:00
Tom Eccles
408f4196ba [flang] use greedy mlir driver for stack arrays pass
In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.

In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.

This change was suggested by @jeanPerier

The old differential revision for this patch was
https://reviews.llvm.org/D150853

Re-applying here fixing the issue which led to the patch being reverted. The
issue was from erasing uses of the allocation operation while still iterating
over those uses (leading to a use-after-free). I have added a regression
test which catches this bug for -fsanitize=address builds, but it is
hard to reliably cause a crash from the use-after-free in normal builds.

Differential Revision: https://reviews.llvm.org/D151728
2023-05-31 14:06:57 +00:00
Mats Petersson
b75f9ce3fe [FLANG] Support all arrays for LoopVersioning
This patch makes more than 2D arrays work, with a fix for the way that
loop index is calculated. Removing the restriction of number of
dimensions.

This also changes the way that the actual index is calculated, such that
the stride is used rather than the extent of the previous dimension. Some
tests failed without fixing this - this was likely a latent bug in the
2D version too, but found in a test using 3D arrays, so wouldn't
have been found with 2D only. This introduces a division on the index
calculation - however it should be a nice and constant value allowing
a shift to be used to actually divide - or otherwise removed by using
other methods to calculate the result. In analysing code generated with
optimisation at -O3, there are no divides produced.

Some minor refactoring to avoid repeatedly asking for the "rank" of the
array being worked on.

This improves some of the SPEC-2017 ROMS code, in the same way as the
limited 2D array improvements - less overhead spent calculating array
indices in the inner-most loop and better use of vector-instructions.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D151140
2023-05-30 18:54:40 +01:00
Valentin Clement
5e983942d5 [mlir][openacc] Cleanup acc.parallel from old data clause operands
Remove old clause operands from acc.parallel operation since
the new dataOperands is now in place.
private, firstprivate and reductions will receive some redesign but are
not part of the new dataOperands.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D150207
2023-05-09 14:57:50 -07:00
Valentin Clement
46e1b095c9 [mlir][openacc] Cleanup acc.data from old data clause operands
Since the new data operand operations have been added in D148389 and
adopted on acc.data in D149673, the old clause operands are no longer
needed.

The LegalizeDataOpForLLVMTranslation will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.

processOperands will also receive some updates once all the operands
will be coming from an acc data operand operation.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D150155
2023-05-09 13:21:37 -07:00
Valentin Clement
15a480c05e [mlir][openacc] Cleanup acc.exit_data from old data clause operands
Since the new data operand operations have been added in D148389 and
adopted on acc.exit_data in D149601, the old clause operands are no longer
needed.

The LegalizeDataOpForLLVMTranslation will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.

processOperands will also receive some updates once all the operands
will be coming from an acc data operand operation.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D150145
2023-05-09 11:36:48 -07:00
Valentin Clement
9dec07f44a [mlir][openacc] Cleanup acc.enter_data from old data clause operands
Since the new data operand operations have been added in D148389 and
adopted on acc.enter_data in D148721, the old clause operands are no longer
needed.

The LegalizeDataOpForLLVMTranslation will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.

processOperands will also receive some updates once all the operands
will be coming from an acc data operand operation.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D150132
2023-05-09 09:01:30 -07:00
Valentin Clement
689afa88ae [mlir][openacc] Cleanup acc.update from old data clause operands
Since the new data operand operations have been added in D148389 and
adopted on acc.update in D149909, the old clause operands are no longer
needed. This is a first patch to start cleaning the OpenACC operations
with data clause operands.

The `LegalizeDataOpForLLVMTranslation` will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.

`processOperands` will also receive some updates once all the operands
will be coming from an acc data operand operation.

Reviewed By: razvanlupusoru, jeanPerier

Differential Revision: https://reviews.llvm.org/D150053
2023-05-08 10:03:28 -07:00
Mats Petersson
8cbb945165 [flang]Add test for 2D loop versioning test
Another test based on review comments added late in the review.

This one confirms that the multiplication and addition of the outer
index to the inner index and thus form the 2D index.

Reviewed By: tblah

Differential Revision: https://reviews.llvm.org/D149265
2023-05-04 11:18:29 +01:00
Mats Petersson
0446c4996f [flang]Add more tests for loop versioning
These two tests were created from little snippets added late
in the review of the loop versioning work. The code was fixed
to cope with the situation and correctly compile these samples.

This adds tests to avoid regressions in this area.

Reviewed By: tblah

Differential Revision: https://reviews.llvm.org/D148649
2023-04-26 12:10:05 +01:00
Mats Petersson
a716ace13d Add loop-versioning pass to improve unit-stride
Introduce conditional code to identify stride of "one element", and simplify the array accesses for that case.

This allows better loop performance in various benchmarks.

Reviewed By: tblah, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D141306
2023-04-18 09:53:07 +01:00
Valentin Clement
3d683cb9f1 [mlir][openacc][NFC] Use assembly format for acc.parallel
Remove the custoom parser and printer for the acc.parallel
operation and use the assembly format directly.

Reviewed By: PeteSteinfeld, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D148183
2023-04-13 10:15:08 -07:00
Valentin Clement
49a813b4a6 [flang][openacc] Keep region when applying data operand conversion
Similar to D148039 but for the FIR to LLVM IR
conversion pass.

The inner part of the acc.loop has been removed since the rest of the
pipeline is not ready and would raise an error here. This was passing
until now because the acc.loop was discarded completely.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D148057
2023-04-12 08:20:34 -07:00
Valentin Clement
cd9cdc6837 [flang][openacc] Add missing piece to translate to LLVM IR dialect
Add missing pieces to translate handle OpenACC dialect in the translation.

Depends on D147825

Reviewed By: PeteSteinfeld, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D147828
2023-04-10 14:30:25 -07:00
Valentin Clement
42598ec745 [flang][openacc] Add data operands conversion from FIR
This patch revive an old PR attempt [1] to perform the
data operands conversion needed for translation to LLVMIR.

This is currently not supporting box/class type since they will
normally not reach this pass when the proposed change in this RFC [2]
are implemented.

[1] https://github.com/flang-compiler/f18-llvm-project/pull/915
[2] https://discourse.llvm.org/t/rfc-openacc-dialect-data-operation-improvements/69825/2

Depends on D147824

Reviewed By: PeteSteinfeld, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D147825
2023-04-10 13:34:57 -07:00