Commit Graph

200 Commits

Author SHA1 Message Date
Pete Steinfeld
0cf3af0c51 Revert "[Flang] Allow Intrinsic simpification with min/maxloc dim and… (#76184)
… scalar result. (#75820)"

This reverts commit 701f647905.

The commit breaks some uses of the 'maxloc' intrinsic.

See PR #75820
2023-12-21 13:14:05 -08:00
Kazu Hirata
c50de57feb [flang] Fix a warning
This patch fixes:

  flang/lib/Optimizer/Transforms/StackArrays.cpp:452:7: error:
  ignoring return value of function declared with 'nodiscard'
  attribute [-Werror,-Wunused-result]
2023-12-21 10:30:36 -08:00
David Green
701f647905 [Flang] Allow Intrinsic simpification with min/maxloc dim and scalar result. (#75820)
This makes an adjustment to the existing fir minloc/maxloc generation
code to handle functions with a dim=1 that produce a scalar result. This
should allow us to get the same benefits as the existing generated
minmax reductions.
2023-12-20 12:12:12 +00:00
David Green
9bb47f7f8b [Flang] Add Maxloc to fir simplify intrinsics pass (#75463)
This takes the code from D144103 and extends it to maxloc, to allow the
simplifyMinMaxlocReduction method to work with both min and max
intrinsics by switching condition and limit/initial value.
2023-12-18 07:59:51 +00:00
Kazu Hirata
11efccea8f [flang] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 23:48:53 -08:00
Tom Eccles
ba3d0241e2 [flang] Record the original name of a function during ExternalNameCoversion (#74065)
We pass TBAA alias information with separate TBAA trees per function (to
prevent incorrect alias information after inlining). These TBAA trees
are identified by a unique string per function. Naturally, we use the
mangled name of the function.

TBAA tags are added in two places: during a dedicated pass relatively
early (structured control flow makes fir::AliasAnalysis more accurate),
then again during CodeGen (when implied box loads and stores become
visible). In between these two passes, the ExternalNameConversion pass
changes the name of some functions.

These functions with changed names previously ended up with separate
TBAA trees from the TBAA tags pass and from CodeGen - leading LLVM to
think that all data accesses alias with all descriptor accesses.

This patch solves this by storing the original name of a function in an
attribute during the ExternalNameConversion pass, and using the name
from that attribute when creating TBAA trees during CodeGen.
2023-12-03 20:37:10 +00:00
Valentin Clement
208a4510d4 [flang][NFC] Fix typo 2023-11-17 10:54:45 -08:00
Akash Banerjee
8701b178e0 [MLIR][OpenMP] Changes to function-filtering pass (#71850)
Currently, when deleting the device functions in the second stage of filtering during MLIR to LLVM translation we can end up with invalid calls to these functions. This is because of the removal of the EarlyOutliningPass which would have otherwise gotten rid of any such calls.

This patch aims to alter the function filtering pass in the following way:
	- Any host function is completely removed.
	- Call to the host function are also removed and their uses replaced with Undef values.
	- Any host function with target region code is marked to be removed during the the second stage.
	- Calls to such functions are still removed and their uses replaced with Undef values.

Co-authored-by: Sergio Afonso <sergio.afonsofumero@amd.com>
2023-11-14 12:43:31 +00:00
Akash Banerjee
63752399f8 [OpenMP][MLIR]OMPEarlyOutliningPass removal
This patch removes the OMPEarlyOutliningPass as it is no longer required. The implicit map operand capture has now been moved to the PFT lowering stage.

Depends on #67318.
2023-11-06 13:24:02 +00:00
Tom Eccles
e215324185 [flang][StackArrays] skip analysis of very large functions (#71047)
The stack arrays pass uses data flow analysis to determine whether heap
allocations are freed on all paths out of the function.

`interp_domain_em_part2` in spec2017 wrf generates over 120k operations,
including almost 5k fir.if operations and over 200 fir.do_loop
operations, all in the same function. The MLIR data flow analysis
framework cannot provide reasonable performance for such cases because
there is a combinatorial explosion in the number of control flow paths
through the function, all of which must be checked to determine if the
heap allocations will be freed.

This patch skips the stack arrays pass for ridiculously large functions
(defined as having more than 1000 fir.allocmem operations). This
threshold is configurable at runtime with a command line argument.

With this patch, compiling this file is more than 80% faster.
2023-11-03 10:29:33 +00:00
Tom Eccles
6242c8ca18 [flang] add TBAA tags to global and direct variables
These turn out to be useful for spec2017/fotonik3d and safe so long as
they are not used along side TBAA tags for local allocations. LLVM may
be able to figure out local allocations by itself anyway.

PR #68727
2023-10-25 10:47:51 +00:00
Sergio Afonso
4b15c0ed0a [Flang][HLFIR][OpenMP] Fix offloading tests broken by HLFIR (#69457)
This patch makes changes to the early outlining pass to avoid compiler
crashes due to not handling `hlfir.declare` operations correctly. That
pass is intended to eventually be removed (#67319), but in the meantime
this fixes some issues arising in different parts of the OpenMP
offloading compilation process.

The main changes included in this patch are the following:
- Added support for mapped values defined by an `hlfir.declare`
operation. These operations are now kept in outlined target functions,
so that both of their outputs (base and original base) are available to
the corresponding `omp.target`'s map arguments and region.
- Added a fix by @agozillon to prevent unused map clauses from producing
a compiler crash. All these unused mapped variables are added to the
outlined function's inputs.
- Added a fix to the OpenMP translation to MLIR to support integer
arguments to these outlined functions. This enables successfully
compiling and running the tests in
opemp/libomptarget/test/offloading/fortran using HLFIR.

Co-authored-by: agozillon <Andrew.Gozillon@amd.com>
2023-10-23 17:40:55 +02:00
Mats Petersson
8dcee5800c [flang]Check for dominance in loop versioning (#68797)
This avoids trying to version loops that can't be versioned, and thus
avoids hitting an assert.

Co-authored with Slava Zakharin (who provided the test-code).
2023-10-12 13:07:16 +01:00
Tom Eccles
c0f453c023 [flang] add missing dependency FIRTransforms -> FIRAnalysis 2023-10-11 15:36:47 +00:00
Tom Eccles
df5c27869c [flang][FIR] add FIR TBAA pass
See RFC at
https://discourse.llvm.org/t/rfc-propagate-fir-alias-analysis-information-using-tbaa/73755

This pass adds TBAA tags to all accesses to non-pointer/target dummy
arguments. These TBAA tags tell LLVM that these accesses cannot alias:
allowing better dead code elimination, hoisting out of loops, and
vectorization.

Each function has its own TBAA tree so that accesses between funtions
MayAlias after inlining.

I also included code for adding tags for local allocations and for
global variables. Enabling all three kinds of tag is known to produce a
miscompile and so these are disabled by default. But it isn't much
code and I thought it could be interesting to play with these later if
one is looking at a benchmark which looks like it would benefit from
more alias information. I'm open to removing this code too.

TBAA tags are also added separately by TBAABuilder during CodeGen.
TBAABuilder has to run during CodeGen because it adds tags to box
accesses, many of which are implicit in FIR. This pass cannot (easily)
run in CodeGen because fir::AliasAnalysis has difficulty tracing values
between blocks, and by the time CodeGen runs, structured control flow
has already been lowered.

Coming in follow up patches
  - Change CodeGen/TBAABuilder to use TBAAForest to add tags within the
    same per-function trees as are used here (delayed to a later patch
    to make it easier to revert)
  - Command line argument processing to actually enable the pass
2023-10-11 14:29:47 +00:00
jeanPerier
4ccd57ddb1 [flang][nfc] replace fir.dispatch_table with more generic fir.type_info (#68309)
The goal is to progressively propagate all the derived type info that is
currently in the runtime type info globals into a FIR operation that can
be easily queried and used by FIR/HLFIR passes.

When this will be complete, the last step will be to stop generating the
runtime info global in lowering, but to do that later in or just before
codegen to keep the FIR files readable (on the added type-info.f90
tests, the lowered runtime info globals takes a whooping 2.6 millions
characters on 1600 lines of the FIR textual output. The fir.type_info that
contains all the info required to generate those globals for such
"trivial" types takes 1721 characters on 9 lines).

So far this patch simply starts by replacing the fir.dispatch_table
operation by the fir.type_info operation and to add the noinit/
nofinal/nodestroy flags to it. These flags will soon be used in HLFIR to
better rewrite hlfir.assign with derived types.
2023-10-06 09:29:57 +02:00
Mats Petersson
6180964a01 [flang]Pass to add vscale range attribute (#68103)
Add vscale range attirbute for the Scalable Vector Extension (SVE) if
provided on the command-line (options in a previous commit)

If no command-line option is provided, if the target-feature of SVE is
specified and the architecture is AArch64, it defualts to 128-2048. in
other words a vscale-min of 1, vscale-max of 16.

A pass is used to add the atribute to all functions. The vectorizer will
use this attribute to generate the SVE instruction to match the range
specified. The attribute is harmless if there is no vectorizable
operations in the function.
2023-10-05 11:06:00 +01:00
Andrew Gozillon
171d8c4028 [Flang][OpenMP][MLIR] Fix memory leak caused by D149368 causing sanitizer error and fix iterator invalidation error
This patch fixes two issues introduced by the D149368 patch, one is
a memory leak from using the removeFromParent rather
than eraseFromParent (the erase also had to be moved to not create
use after deletes).

And the other is a possible iterator invalidation bug, better to be safe
than sorry.
2023-09-20 22:28:11 -05:00
Andrew Gozillon
76916669b9 [MLIR][OpenMP] Initial Lowering of Declare Target for Data
This patch adds initial lowering for DeclareTargetAttr on
GlobalOp's utilising registerTargetGlobalVariable
and getAddrOfDeclareTargetVar from the
OMPIRBuilder.

It also adds initial processing of declare target map
operands, populating the combinedInfo that the
OMPIRBuilder requires to generate kernels and
it's kernel argument structure.

The combination of these additions allows simple mapping
of declare target globals to Target regions, as such a simple
runtime test showcasing this and testing it has been added.

The patch currently does not factor in filtering
based on device_type clauses (e.g. no emission of
globals for device if host specified), this will come in
a future iteration. And for the moment it's only been
tested with 1-D arrays and basic fortran data types,
more complex types (such as user defined derived
types from Fortran, allocatables or Fortran pointers)
may need further work.

reviewers: kiranchandramohan, skatrak

Differential Revision: https://reviews.llvm.org/D149368
2023-09-20 13:31:15 -05:00
jeanPerier
1062c140f8 [flang] Prevent IR name clashes between BIND(C) and external procedures (#66777)
Defining a procedure with a BIND(C, NAME="...") where the binding label
matches the assembly name of a non BIND(C) external procedure in the
same file causes a failure when generating the LLVM IR because of the
assembly symbol name clash.

Prevent this crash with a clearer semantic error.
2023-09-20 10:00:28 +02:00
Andrew Gozillon
eaa0d281b6 [Flang][MLIR][OpenMP] Update OMPEarlyOutlining to support Bounds, MapEntry and declare target globals
This patch is a required change for the device side IR to
maintain apporpiate links for declare target variables to
their global variables for later lowering.

It is also a requirement to clone over map bounds and
entry operations to maintain the correct information for
later lowering of the IR.

It simply tries to clone over the relevant information
maintaining the appropriate links they would have
maintained prior to the pass, rather than redirecting
them to new function arguments which causes a
loss of information in the case of Declare Target
and map information.

Depends on D158734

reviewers: TIFitis, razvanlupusoru

Differential Revision: https://reviews.llvm.org/D158735
2023-09-19 08:26:46 -05:00
Slava Zakharin
7beb65ae2d [flang] Fixed LoopVersioning for array slices. (#65703)
The first test case added in the LIT test demonstrates the problem.
Even though we did not consider the inner loop as a candidate for
the transformation due to the array_coor with a slice, we decided to
version the outer loop for the same function argument.
During the cloning of the outer loop we dropped the slicing completely
producing invalid code.

I restructured the code so that we record all arg uses that cannot be
transformed (regardless of the reason), and then fixup the usage
information across the loop nests. I also noticed that we may generate
redundant contiguity checks for the inner loops, so I fixed it
since it was easy with the new way of keeping the usage data.
2023-09-08 09:01:10 -07:00
jeanPerier
6ffea74f7c [flang] Use BIND name, if any, when consolidating common blocks (#65613)
This patch changes how common blocks are aggregated and named in
lowering in order to:

* fix one obvious issue where BIND(C) and non BIND(C) with the same
Fortran name were "merged"

* go further and deal with a derivative where the BIND(C) C name matches
the assembly name of a Fortran common block. This is a bit unspecified
IMHO, but gfortran, ifort, and nvfortran "merge" the common block
without complaints as a linker would have done. This required getting
rid of all the common block mangling early in FIR (\_QC) instead of
leaving that to the phase that emits LLVM from FIR because BIND(C)
common blocks did not have mangled names. Care has to be taken to deal
with the underscoring option of flang-new.

See added flang/test/Lower/HLFIR/common-block-bindc-conflicts.f90 for an
illustration.
2023-09-08 10:43:55 +02:00
Tom Eccles
ad9af7de90 [flang][LoopVersioning] support fir.array_coor
This is the last piece required for the loop versioning patch to work on
code lowered via HLFIR. With this patch, HLFIR performance on spec2017
roms is now similar to the FIR lowering.

Adding support for fir.array_coor means that many more loops will be
versioned, even in the FIR lowering. So far as I have seen, these do not
seem to have an impact on performance for the benchmarks I tried, but I
expect it would speed up some programs, if the loop being versioned
happened to be the hot code.

The main difference between fir.array_coor and fir.coordinate_of is
that fir.coordinate_of uses zero-based indices, whereas fir.array_coor
uses the indices as specified in the Fortran program (starting from 1 by
default, but also supporting non default lower bounds). I opted to
transform fir.array_coor operations into fir.coordinate_of operations
because this allows both to share the same offset calculation logic.

The tricky bit of this patch is getting the correct lower bounds for the
array operand to subtract from the fir.array_coor indices to get a
zero-based indices. So far as I can tell, the FIR lowering will always
provide lower bounds (shift) information in the shape operand to the
fir.array_coor when non-default lower bounds are used. If none is given,
I originally tried falling back to reading lower bounds from the box,
but this led to misscompilation in SPEC2017 cam4. Therefore the pass
instead assumes that if it can't already find an SSA value for the shift
information, the default lower bound (1) should be used.

A suspect the incorrect lower bounds in the box for the FIR lowering was
already a known issue (see https://reviews.llvm.org/D158119).

Differential Revision: https://reviews.llvm.org/D158597
2023-09-04 10:40:40 +00:00
Slava Zakharin
cccf4d6e4a [flang] Skip OPTIONAL arguments in LoopVersioning.
This patch fixes multiple tests failing with segfault due to accessing
absent argument box before the loop versioning check.
The absent arguments might be treated as contiguous for the purpose
of loop versioning, but this is not done in this patch.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D158800
2023-08-25 08:33:49 -07:00
Tom Eccles
8d24b7322e [flang][LoopVersioning] support reboxed operands
Since https://reviews.llvm.org/D158119, many boxes lowered via HLFIR are
reboxed with better lower bounds information after they are declared.

For the loop versioning pass to support FIR lowered via HLFIR, it needs
to dereference fir.rebox operations to figure out that the variable was
a function argument.

I decided to modify the existing dereferencing of fir.declare so that
the declared/reboxed value is used in the versioned loop instead of the
function argument. This makes it easier for the improved lower bounds
information to be accessed. In doing this, I changed ArgInfo to store
ArgInfo::arg by value instead of by pointer because mlir::Value has
value-type semantics.

Differential Revision: https://reviews.llvm.org/D158408
2023-08-23 09:53:05 +00:00
Slava Zakharin
668f261bfa [flang] Make ISO_Fortran_binding.h a standalone header again.
This implements the proposal from
https://discourse.llvm.org/t/adding-flang-specific-header-files-to-clang/72442/6
Since ISO_Fortran_binding.h is supposed to be included from users'
C/C++ codes, it would better have no dependencies on other header
files.

Reviewed By: PeteSteinfeld

Differential Revision: https://reviews.llvm.org/D158549
2023-08-22 18:56:27 -07:00
Slava Zakharin
89b98c13e0 [flang] Fixed simplification for FP maxval.
On x86, a simplified F128 maxval ends up calling fmaxl that does not
work properly for F128 arguments. It is probably an LLVM issue, but
we also should not use arith.maxf if NaN or -0.0 operands are possible.
The change is to use cmpf and select. Unfortunately, these arith ops
do not support FastMathFlags currently, so I will have to fix this
sooner or later (depending on how this affects performance).

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D158200
2023-08-21 19:33:56 -07:00
Mark Danial
bfe390cf9a [Flang] funderscoring intermittent failure fix
There is an intermittent failure in the tests for the funderscoring driver option reported in (https://lab.llvm.org/buildbot/#/builders/21/builds/78228) that is caused by an uninitialized member variable.

Reviewed By: kkwli0

Differential Revision: https://reviews.llvm.org/D158187
2023-08-21 14:42:33 -04:00
Tom Eccles
05011024fd [flang][LoopVersioning] support fir.declare
When FIR comes from HLFIR, there will be a fir.declare operation between
the source and the usage of each source variable (and some temporary
allocations). This pass needs to be able to follow these so that it can
still transform loops when HLFIR is used, otherwise it mistakenly
assumes these values are not function arguments.

More work is needed after this patch to fully support HLFIR, because the
generated code tends to use fir.array_coor instead of fir.coordinate_of.

Differential Revision: https://reviews.llvm.org/D157964
2023-08-18 09:51:22 +00:00
Sergio Afonso
f20b67a81c [Flang][MLIR][OpenMP] Improve device-only function filtering
This patch improves the implementation of a recent function filtering
workaround to address problems uncovered by D154247.

In particular, the problem was related to the removal of functions called from
within target regions. Since target regions have to remain until LLVM IR is
generated, removing these functions from MLIR results in undefined references
any time there are calls to them in a target region. This patch modifies the
MLIR function filtering pass to make these functions "external" rather than
removing them. This way, the processing and lowering of MLIR functions that
will eventually be discarded is still prevented, but no calls to undefined
functions remain either.

Additionally, the approach of just filtering host-only functions during device
compilation, and not filtering device-only functions during host compilation,
is maintained. This is because code generation for device-only functions is
required for host fallback to work.

Depends on D156988

Differential Revision: https://reviews.llvm.org/D155827
2023-08-10 11:29:45 +01:00
Valentin Clement
103907bc5f [flang] Add missing dependency on tablegen files
This issue was raised on https://github.com/llvm/llvm-project/issues/64268.

`flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp` includes
`flang/Optimizer/HLFIR/HLFIRDialect.h` and might fails if the HLFIR related
tablegen files have not been generated.

Reviewed By: vzakhari

Differential Revision: https://reviews.llvm.org/D156751
2023-08-01 09:48:07 -07:00
Alex Zinenko
b2b7efb96d [mlir] NFC: rename XDataFlowAnalysis to XForwardDataFlowAnalysis
This makes naming consisnt with XBackwardDataFlowAnalysis.

Reviewed By: Mogball, phisiart

Differential Revision: https://reviews.llvm.org/D155930
2023-07-27 11:11:40 +00:00
Andrew Gozillon
062fce6f4d [Flang][OpenMP][MLIR] An mlir transformation pass for marking FuncOp's implicitly called from TargetOp's and declare target marked FuncOp's as implicitly declare target
This pass will mark functions called from TargetOp's
and declare target functions as implicitly declare
target by adding the MLIR declare target attribute
directly to the function.

This pass executes after the initial lowering of Fortran's PFT
to MLIR (FIR/OMP+Arith etc.) and is one of a series of passes
that aim to clean up the MLIR for offloading (seperate passes
in different patches, one for early outlining, another for declare
target function filtering).

Reviewers: jsjodin, skatrak, kiaranchandramohan

Differential Revision: https://reviews.llvm.org/D154247
2023-07-17 08:32:26 -05:00
Sergio Afonso
debdfc0ae2 [Flang][OpenMP][MLIR] Filter emitted code depending on declare target and device
This patch adds support for selecting which functions are lowered to LLVM IR
from MLIR depending on declare target information and whether host or device
code is being generated.

The approach proposed by this patch is to perform the filtering in two stages:
  - An MLIR transformation pass, which is added to the Flang translation flow
    after the `OMPEarlyOutliningPass`. The functions that are kept are those
    that match the OpenMP processor (host or device) the compiler invocation
    is targeting, according to the presence of the `-fopenmp-is-target-device`
    compiler option and declare target information. All functions contaning an
    `omp.target` are also kept, regardless of the declare target information of
    the function, due to the need for keeping target regions visible for both
    host and device compilation.
  - A filtering step during translation to LLVM IR, which is peformed for those
    functions that were kept because of the presence of a target region inside.
    If the targeted OpenMP processor does not match the declare target
    information of the function, then it is removed from the LLVM IR after its
    contents have been processed and translated. Since they should only contain
    an omp.target operation which, in turn, should have been outlined into
    another LLVM IR function, the wrapper can be deleted at that point.

Depends on D150328 and D150329.

Differential Revision: https://reviews.llvm.org/D147641
2023-07-17 09:07:54 +01:00
Jan Sjodin
22a167779a [flang] Fix OMPEarlyOutlining erasing declare target functions
The early outlining pass was erasing target functions that need to be
kept. It should only erase functions that contain target ops.
2023-07-13 13:00:23 -04:00
Mark Danial
d85b94bf00 [Flang] -funderscoring bug fix
There was a bug with the -funderscoring / -fno-underscoring options from (https://reviews.llvm.org/D140795) that prevented the driver option from controlling the underscoring behaviour and instead the behaviour could only be controlled by the pass option instead of the driver option. The driver test case did not catch the bug and also needed to be updated.

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D155042
2023-07-13 11:30:35 -04:00
Jan Sjodin
45a9604417 [Flang][OpenMP][MLIR] Add early outlining pass for omp.target operations to flang
This patch implements an early outlining transform of omp.target operations in
flang. The pass is needed because optimizations may cross target op region
boundaries, but with the outlining the resulting functions only contain a
single omp.target op plus a func.return, so there should not be any opportunity
to optimize across region boundaries.

The patch also adds an interface to be able to store and retrieve the parent
function name of the original target operation. This is needed to be able to
create correct kernel function names when lowering to LLVM-IR.

Reviewed By: kiranchandramohan, domada

Differential Revision: https://reviews.llvm.org/D154879
2023-07-13 09:14:42 -04:00
David Truby
f52c64b115 [flang] Add fastmath flags to localBuilder in IntrinsicCall
Currently the local builder used in IntrinsicCall doesn't have the
fastmath flags passed to it. This results in the fastmath attribute
not being added to certain runtime calls. This patch simply forwards
the fastmath flags from the parent builder.

Differential Revision: https://reviews.llvm.org/D154611
2023-07-11 18:53:31 +01:00
Tom Eccles
76c3c5bca0 [flang] [stack-arrays] fix unused variable warning 2023-06-05 15:36:02 +00:00
Tom Eccles
53cc33b00b [flang] Store KindMapping by value in FirOpBuilder
Previously only a constant reference was stored in the FirOpBuilder.
However, a lot of code was merged using

FirOpBuilder builder{rewriter, getKindMapping(mod)};

This is incorrect because the KindMapping returned will go out of scope
as soon as FirOpBuilder's constructor had run. This led to an infinite
loop running some tests using HLFIR (because the stack space containing
the kind mapping was re-used and corrupted).

One solution would have just been to fix the incorrect call sites,
however, as a large number of these had already made it past review, I
decided to instead change FirOpBuilder to store its own copy of the
KindMapping. This is not costly because nearly every time we construct a
KindMapping is exclusively to construct a FirOpBuilder. To make this
common pattern simpler, I added a new constructor to FirOpBuilder which
calls getKindMapping().

Differential Revision: https://reviews.llvm.org/D151881
2023-06-05 09:57:57 +00:00
Tom Eccles
775de6754a [flang] convert stack arrays allocation to match old type
The old fir.allocmem operation returned a !fir.heap<.> type. The new
fir.alloca operation returns a !fir.ref<.> type. This patch inserts a
fir.convert so that the old type is preserved. This prevents verifier
failures when types returned from fir.if statements don't match the
expected type.

Differential Revision: https://reviews.llvm.org/D151921
2023-06-05 09:57:57 +00:00
Mats Petersson
b812932b35 [FLANG] Change loop versioning to use shift instead of divide
Despite me being convinced that the use of divide didn't produce any
divide instructions, it does in fact add more instructions than using
a plain shift operation.

This patch simply changes the divide to a shift right, with an
assert to check that the "divisor" is a power of two.

Reviewed By: kiranchandramohan, tblah

Differential Revision: https://reviews.llvm.org/D151880
2023-06-01 19:29:57 +01:00
Tom Eccles
408f4196ba [flang] use greedy mlir driver for stack arrays pass
In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.

In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.

This change was suggested by @jeanPerier

The old differential revision for this patch was
https://reviews.llvm.org/D150853

Re-applying here fixing the issue which led to the patch being reverted. The
issue was from erasing uses of the allocation operation while still iterating
over those uses (leading to a use-after-free). I have added a regression
test which catches this bug for -fsanitize=address builds, but it is
hard to reliably cause a crash from the use-after-free in normal builds.

Differential Revision: https://reviews.llvm.org/D151728
2023-05-31 14:06:57 +00:00
Mats Petersson
b75f9ce3fe [FLANG] Support all arrays for LoopVersioning
This patch makes more than 2D arrays work, with a fix for the way that
loop index is calculated. Removing the restriction of number of
dimensions.

This also changes the way that the actual index is calculated, such that
the stride is used rather than the extent of the previous dimension. Some
tests failed without fixing this - this was likely a latent bug in the
2D version too, but found in a test using 3D arrays, so wouldn't
have been found with 2D only. This introduces a division on the index
calculation - however it should be a nice and constant value allowing
a shift to be used to actually divide - or otherwise removed by using
other methods to calculate the result. In analysing code generated with
optimisation at -O3, there are no divides produced.

Some minor refactoring to avoid repeatedly asking for the "rank" of the
array being worked on.

This improves some of the SPEC-2017 ROMS code, in the same way as the
limited 2D array improvements - less overhead spent calculating array
indices in the inner-most loop and better use of vector-instructions.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D151140
2023-05-30 18:54:40 +01:00
Tom Eccles
2dfaec7781 Revert "[flang] use greedy mlir driver for stack arrays pass"
This reverts commit 74c2ec50f3.

This caused a regression building spec2017 with -Ofast.
2023-05-24 16:15:52 +00:00
Tom Eccles
74c2ec50f3 [flang] use greedy mlir driver for stack arrays pass
In upstream mlir, the dialect conversion infrastructure is used for
lowering from one dialect to another: the passes are of the form
XToYPass. Whereas, transformations within the same dialect tend to use
applyPatternsAndFoldGreedily.

In this case, the full complexity of applyPatternsAndFoldGreedily isn't
needed so we can get away with the simpler applyOpPatternsAndFold.

This change was suggested by @jeanPerier

Differential Revision: https://reviews.llvm.org/D150853
2023-05-23 14:51:42 +00:00
Valentin Clement
677f7cc55a [mlir][flang][openacc] Remove obsolete operand legalization passes
The information needed for translation is now encoded in the dialect
operations and does not require a dedicated pass to be extracted.
Remove the obsolete passes that were performing operand legalization.

Reviewed By: jeanPerier

Differential Revision: https://reviews.llvm.org/D150248
2023-05-11 10:33:00 -07:00
Valentin Clement
5e983942d5 [mlir][openacc] Cleanup acc.parallel from old data clause operands
Remove old clause operands from acc.parallel operation since
the new dataOperands is now in place.
private, firstprivate and reductions will receive some redesign but are
not part of the new dataOperands.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D150207
2023-05-09 14:57:50 -07:00
Valentin Clement
46e1b095c9 [mlir][openacc] Cleanup acc.data from old data clause operands
Since the new data operand operations have been added in D148389 and
adopted on acc.data in D149673, the old clause operands are no longer
needed.

The LegalizeDataOpForLLVMTranslation will become obsolete when all
operations will be cleaned. For the time being only the appropriate
part are being removed.

processOperands will also receive some updates once all the operands
will be coming from an acc data operand operation.

Reviewed By: razvanlupusoru

Differential Revision: https://reviews.llvm.org/D150155
2023-05-09 13:21:37 -07:00