Whenever lowering is checking if a function or global already exists in
the mlir::Module, it was doing module->lookup.
On big programs (~5000 globals and functions), this causes important
slowdowns because these lookups are linear. Use mlir::SymbolTable to
speed-up these lookups. The SymbolTable has to be created from the
ModuleOp and maintained in sync. It is therefore placed in the
converter, and FirOPBuilders can take a pointer to it to speed-up the
lookups.
This patch does not bring mlir::SymbolTable to FIR/HLFIR passes, but
some passes creating a lot of runtime calls could benefit from it too.
More analysis will be needed.
As an example of the speed-ups, this patch speeds-up compilation of
Whizard compare_amplitude_UFO.F90 from 5 mins to 2 mins on my machine
(there is still room for speed-ups).
This patch introduces a new operation to represent the CUDA Fortran
kernel loop directive. This operation is modeled as a LoopLikeOp
operation in a similar way to acc.loop.
The CUFKernelDoConstruct parse tree node is also placed correctly in the
PFTBuilder to be available in PFT evaluations.
Lowering from the flang parse-tree to MLIR is also done.
These hardcoded attribute name are a leftover from the upstreaming
period when there was no way to get the attribute name without an
instance of the operation. It is since possible to do without them and
they should be removed to avoid duplication.
This PR cleanup the fir.dt_entry and fir.dispatch ops of these hardcoded
attribute name and use their generated getters. Some other PRs will
follow to cleanup other operations.
These hardcoded attribute name are a leftover from the upstreaming
period when there was no way to get the attribute name without an
instance of the operation. It is since possible to do without them and
they should be removed to avoid duplication.
This PR cleanup the fir.global op of these hardcoded attribute name and
use their generated getters. Some other PRs will follow to cleanup other
operations.
The custom printer for `fir.global` was eluding all the attributes
present on the op when printing the attribute dictionary. So any
attribute that is not part of the pretty printing was therefore
discarded.
This patch fix the printer and also make use of the getters for the
attribute names when they are hardcoded.
hlfir.declare introduce some boxes that can be later optimized away. The
OpenACC lowering is currently setting some attributes on FIR operations
to track declare variables. When the boxes are optimized away these
attributes are lost. This patch propagate OpenACC attributes from
box_addr op to the defining op of the folding result.
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.
This is a breaking change with the following changes
1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
functionality is obtained by using `scf::tileUsingSCF` and setting
the loop type in `scf::SCFTilingOptions` passed into this method to
`scf::SCFTilingOptions::LoopType::ForallOp` (using the
`setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
`SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
`LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
method is replaced with `getRegionIterArgs()`
These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.
Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.
Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.
Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
This operation allows computing the address of descriptor fields. It is
needed to help attaching descriptors in OpenMP/OpenACC target region.
The pointers inside the descriptor structure must be mapped too, but the
fir.box is abstract, so these fields cannot be computed with
fir.coordinate_of.
To preserve the abstraction of the descriptor layout in FIR, introduce
an operation specifically to !fir.ref<fir.box<>> address fields based on
field names (base_addr or derived_type).
Expose a `MutableArrayRef<OpOperand>` instead of
`ValueRange`/`OperandRange`. This allows users of this interface to
change the yielded values and the init values. The names of the
interface methods are the same as the auto-generated op accessor names
(`get...()` returns `OperandRange`, `get...Mutable()` returns
`MutableOperandRange`).
Note: The interface methods return a `MutableArrayRef` instead of a
`MutableOperandRange` because a loop op may not implement
`getYieldedValuesMutable` etc. and there is no safe way to return an
"empty" range with a `MutableOperandRange`.
Add a new interface method that returns the yielded values.
Also add a verifier that checks the number of inits/iter_args/yielded
values. Most of the checked invariants (but not all of them) are already
covered by the `RegionBranchOpInterface`, but the `LoopLikeOpInterface`
now provides (additional) error messages that are easier to read.
This interface allows (HL)FIR passes to add TBAA information to fir.load
and fir.store. If present, these TBAA tags take precedence over those
added during CodeGen.
We can't reuse mlir::LLVMIR::AliasAnalysisOpInterface because that uses
the mlir::LLVMIR namespace so it tries to define methods for fir
operations in the wrong namespace. But I did re-use the tbaa tag type to
minimise boilerplate code.
The new builders are to preserve the old interface without the tbaa tag.
The goal is to progressively propagate all the derived type info that is
currently in the runtime type info globals into a FIR operation that can
be easily queried and used by FIR/HLFIR passes.
When this will be complete, the last step will be to stop generating the
runtime info global in lowering, but to do that later in or just before
codegen to keep the FIR files readable (on the added type-info.f90
tests, the lowered runtime info globals takes a whooping 2.6 millions
characters on 1600 lines of the FIR textual output. The fir.type_info that
contains all the info required to generate those globals for such
"trivial" types takes 1721 characters on 9 lines).
So far this patch simply starts by replacing the fir.dispatch_table
operation by the fir.type_info operation and to add the noinit/
nofinal/nodestroy flags to it. These flags will soon be used in HLFIR to
better rewrite hlfir.assign with derived types.
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.
`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.
Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
The `RegionBranchOpInterface` had a few fundamental issues caused by the API design of `getSuccessorRegions`.
It always required passing values for the `operands` parameter. This is problematic as the operands parameter actually changes meaning depending on which predecessor `index` is referring to. If coming from a region, you'd have to find a `RegionBranchTerminatorOpInterface` in that region, get its operand count, and then create a `SmallVector` of that size.
This is not only inconvenient, but also error-prone, which has lead to a bug in the implementation of a previously existing `getSuccessorRegions` overload.
Additionally, this made the method dual-use, trying to serve two different use-cases: 1) Trying to determine possible control flow edges between regions and 2) Trying to determine the region being branched to based on constant operands.
This patch fixes these issues by changing the interface methods and adding new ones:
* The `operands` argument of `getSuccessorRegions` has been removed. The method is now only responsible for returning possible control flow edges between regions.
* An optional `getEntrySuccessorRegions` method has been added. This is used to determine which regions are branched to from the parent op based on constant operands of the parent op. By default, it calls `getSuccessorRegions`. This is analogous to `getSuccessorForOperands` from `BranchOpInterface`.
* Add `getSuccessorRegions` to `RegionBranchTerminatorOpInterface`. This is used to get the possible successors of the terminator based on constant operands. By default, it calls the containing `RegionBranchOpInterface`s `getSuccessorRegions` method.
* `getSuccessorEntryOperands` was renamed to `getEntrySuccessorOperands` for consistency.
Differential Revision: https://reviews.llvm.org/D157506
This renaming started with the native ODS support for properties, this is completing it.
A mass automated textual rename seems safe for most codebases.
Drop also the ods prefix to keep the accessors the same as they were before
this change:
properties.odsOperandSegmentSizes
reverts back to:
properties.operandSegementSizes
The ODS prefix was creating divergence between all the places and make it harder to
be consistent.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D157173
This patch includes the a subset of MMA intrinsics that are included in
the mma intrinsic module:
mma_assemble_acc
mma_assemble_pair
mma_build_acc
mma_disassemble_acc
mma_disassemble_pair
Submit on behalf of Daniel Chen <cdchen@ca.ibm.com>
Differential Revision: https://reviews.llvm.org/D155725
/data/llvm-project/flang/lib/Optimizer/Dialect/FIROps.cpp:971:46: error: no member named 'getNumScalableDims' in 'mlir::VectorType'
if (mlir::dyn_cast<mlir::VectorType>(ty).getNumScalableDims() == 0)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
1 error generated.
Codegen only supports conversions between logicals and integers. The
verifier should reflect this.
Differential Revision: https://reviews.llvm.org/D152935
Reboxing of the actual argument according to the type of the dummy
argument has to be aware of the potential rank mismatch, when
IGNORE_TKR(R) is used. This change only adds support for the mismatching
rank when the dummy argument has unlimited polymorphic type.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D151016
This reverts commit aa6b47cdaf.
And adds a fix (adding missing libraries
to CMakeLists.txt for the OpenMPDialect)
that allows failing builds to succeed.
This attribute represents the OpenMP declare target directive, it marks a function
or global as declare target by being present but also contains information on
the device_type and capture clause (link or to). It being an attribute allows it to
mark existing constructs and be converted trivially on lowering from the OpenMP
dialect to MLIR using amendOperation.
An interface has been made for the declare target attribute, with several helper
methods for managing the attribute, this interface can be applied to MLIR
operations that are allowed to be marked as declare target (as an example, it
is by default applied to func.func, LLVMFunc, fir.GlobalOps and LLVMGlobalOps).
Reviewers: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D150328
Fir.GlobalOp's currently do not respect attributes that
are applied to them, this change will do two things:
- Allow lowering of arbitrary attributes applied to
Fir.GlobalOp's to LLVMGlobalOp's during CodeGen
- Allow printing and parsing of arbitrarily applied attributes
This allows applying other dialects attributes (or other
fir attributes) to fir.GlobalOps on the fly and have them
exist in the resulting LLVM dialect IR or FIR IR.
Reviewer: jeanPerier
Differential Revision: https://reviews.llvm.org/D148352
fir.if currently isn't treated as a 'proper' conditional, so passes are unable to determine which regions are executed at times.
This patch gives fir.if this interface, which shouldn't do too much on its own but should allow future changes to take advantage
for various purposes
Reviewed By: vzakhari
Differential Revision: https://reviews.llvm.org/D145165
There is a lot of Fortran code that takes advantage of F77 implicit
interface to pass arguments with a different type than those from
the subprogram definition (which is well defined if the storage
and passing convention are the same or compatible).
When the definition and calls are in different files, there is nothing
special to do: the actual arguments are already used to compute the
call interface.
The trouble for lowering comes when the definition is in the same
compilation unit (Semantics raises warning). Then, lowering will
be provided with the interface from the definition to prepare the
argument, and this leads to many ad-hoc handling (see
builder.convertWithSemantics) in the current lowering to cope
with the dummy/actual mismatches on a case by case basis. The
current lowering to FIR is not even complete for all mismatch cases that
can be found in the wild (see https://github.com/llvm/llvm-project/issues/60550),
it is crashing or hitting asserts for many of the added tests.
For HLFIR, instead of coping on a case by case basis, the call
interface will be recomputed according to the actual arguments when
calling an external procedure that can be called with an explicit
interface.
One extra case still has to be handled manually because it may happen
in calls with explicit interfaces: passing a character procedure
designator to a non character procedure dummy (and vice-versa) is widely
accepted even with explicit interfaces (and flang semantic accepts it).
Yet, this "mismatch" cannot be dealt with a simple fir.convert because
character dummy procedure are passed with a different passing
convention: an extra argument is hoisted for the result length (in FIR,
there is no extra argument yet, but the MLIR func argument is a
tuple<fir.boxproc, len>).
Differential Revision: https://reviews.llvm.org/D143636
In lowering to HLFIR, deal with user calls involving a mix of:
- dummy with VALUE
- Polymorphism
- contiguous dummy
- assumed shape dummy
- OPTIONAL arguments
- NULL() passed to OPTIONAL arguments.
- elemental calls
Does not deal with assumed ranked dummy arguments.
This patch unifies the preparation of all arguments that must be passed
in memory and are not passed as allocatable/pointers.
For optionals, the same argument preparation is done, except the utility
that generates the IR for the argument preparation is called inside a
fir.if.
The addressing of array arguments in elemental calls is delayed so that
it can also happen during this argument preparation, and be placed in
the fir.if when the array may be absent.
Structure helpers are added to convey a prepared dummy argument and the
data that may be needed to do the clean-up after the call (temporary
storage deallocation or copy-out). And a utility is added to wrap
the preparation code inside a fir.if and convey these values through
the fir.if.
Certain aspects of this patch brings the HLFIR lowering support beyond
what the current lowering to FIR supports (e.g. handling of NULL(), handling
of optional in elemental calls, handling of copy-in/copy-out involving
polymorphic entities).
Differential Revision: https://reviews.llvm.org/D142695
Makes use of fir.type_desc in order to delay the type desc address
resolution. The lowering inserts fir.type_desc operation instead of fir.addr_of
operation pointing to the fir.global type descriptor. The fir.type_desc
operation is then lowered in code gen to an address of operation in the LLVM
dialect. At this stage, the type descriptor is generated in all cases.
Reviewed By: vdonaldson
Differential Revision: https://reviews.llvm.org/D142920
Assume no conflict between pointer arrays and arrays without the target
attribute, if the fact of an array not having the target attribute
can be reliably computed.
This change speeds up SPEC CPU2017/527.cam from 2.5k seconds to 880 seconds
on Icelake, and makes further performance investigation easier.
Differential Revision: https://reviews.llvm.org/D142273
Pointer association to unlimited polymorphic target is allowed for
unlimited polymorphic pointer and non-extensible derived-type.
This is checked by the semantic and this patch allows it in the
fir.rebox operation.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D142104
Adds support for:
- referencing a whole allocatable/pointer symbol
- passing allocatable/pointer in a call
This required update in HLFIRTools.cpp helpers so that the
raw address, extents, lower bounds, and type parameters of a
fir.box/fir.class can be extracted.
This is required because in hlfir lowering, dereferencing a
pointer/alloc is only doing the fir.load fir.box part, and the
helpers have to be able to reason about that fir.box without the
help of a "fir::FortranVariableOpInterface".
Missing:
- referencing part of allocatable/pointer (will need to update
Designator lowering to dereference the pointer/alloc). Same
for whole allocatable and pointer components.
- allocate/deallocate/pointer assignment statements.
- Whole allocatable assignment.
- Lower inquires.
Differential Revision: https://reviews.llvm.org/D142043
Until now, only the address of the type descriptor was hold in
a PolymorphicValue. In some cases, the element size and the
type code are also needed when creating new polymorphic
descriptors from an element of a polymorphic entity.
This patch updates PolymorphicValue to carry the source
descriptor from which the element is extracted. The source
descriptor is then used when emboxing the element to a new
polymorphic descriptor.
This simplify the code done in D141274 and will be used
when creating polymorphic temporary as well.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D141609
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
In SELECT TYPE, within the block following TYPE IS, the associating entity is not polymorphic.
It has the type named in the type guard and other properties taken from the
selector. Within the block following a CLASS IS type guard statement, the
associating entity is polymorphic and has the declared type named in the type
guard statement.
This patch makes sure the associating entity matches the selector if it is
an array, a pointer or an allocatable.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D140017
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
In D139019 the assumption was made that the rhs was also the MutableBox
but this is not a constraint. Use genExprBox instead. Also the allowed
conversion in D139019 was not correct. Remoed it since it is not needed anymore.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D139081
Pointer association with an unlimited polymorphic pointer on the lhs
requires more than just updating the base_addr. Delegate the association to
the runtime function `PointerAssociation`.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D139019
This patch relaxes the verifier for the fir.rebox operation
to allow reboxing to unlimited polymoprhic box.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D138695
Couple of operation are expecting BoxType but can totally handle
ClassType as well. This patch updates couple of locations to support
BaseBoxType instead of BoxType only.
Reviewed By: jeanPerier, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D138422
Convert fir.select_type operation to an if-then-else ladder.
The type guards are sorted before the conversion so it follows the
execution of SELECT TYPE construct as mentioned in 11.1.11.2 point 4
of the Fortran standard.
Depends on D138279
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D138280