C_FUNLOC was not handling procedure pointer argument correctly, the
issue lied in `hlfir::convertToBox` that did not handle procedure
pointers.
I modified the interface of `hlfir::convertToXXX` to take values on the
way because hlfir::Entity are fundamentally an mlir::Value with type
guarantees, so they should be dealt with by value as mlir::Value are
(they are very small).
This is a lot less complex than the data case where the shape has to be
accounted for, so the implementation is done inline.
One corner case will not be supported correctly for now: the case where
POINTER and TARGET points to the same internal procedure may return
false because lowering is creating fir.embox_proc each time the address
of an internal procedure is taken, so different thunk for the same
internal procedure/host link may be created and compare to false. This
will be fixed in a later patch that moves creating of internal procedure
fir.embox_proc in the host so that the addresses are the same when the
host link is the same. This change is required to properly support the
required lifetime of internal procedure addresses anyway (should be the
always be the lifetime of the host, even when the address is taken in an
internal procedure).
This patch fixes:
flang/lib/Optimizer/Transforms/StackArrays.cpp:452:7: error:
ignoring return value of function declared with 'nodiscard'
attribute [-Werror,-Wunused-result]
This makes an adjustment to the existing fir minloc/maxloc generation
code to handle functions with a dim=1 that produce a scalar result. This
should allow us to get the same benefits as the existing generated
minmax reductions.
This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.
Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.
Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.
Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
Lower procedure pointer components, except in the context of structure
constructor (left TODO).
Procedure pointer components lowering share most of the lowering logic
of procedure poionters with the following particularities:
- They are components, so an hlfir.designate must be generated to
retrieve the procedure pointer address from its derived type base.
- They may have a PASS argument. While there is no dispatching as with
type bound procedure, special care must be taken to retrieve the derived
type component base in this case since semantics placed it in the
argument list and not in the evaluate::ProcedureDesignator.
These components also bring a new level of recursive MLIR types since a
fir.type may now contain a component with an MLIR function type where
one of the argument is the fir.type itself. This required moving the
"derived type in construction" stackto the converter so that the object
and function type lowering utilities share the same state (currently the
function type utilty would end-up creating a new stack when lowering its
arguments, leading to infinite loops). The BoxedProcedurePass also
needed an update to deal with this recursive aspect.
Implement the C struct passing ABI on X86-64 for the trivial case where
the structs have one element. This is required to cover some cases of
BIND(C) derived type pass with the VALUE attribute.
This takes the code from D144103 and extends it to maxloc, to allow the
simplifyMinMaxlocReduction method to work with both min and max
intrinsics by switching condition and limit/initial value.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
The adds a hlfir minloc intrinsic, similar to the minval intrinsic
already added, to help in the lowering of minloc. The idea is to later
add maxloc too, and from there add a simplification for producing minloc
with inlined elemental and hopefully less temporaries.
In the context of C/Fortran interoperability (BIND(C)), it is possible
to give the VALUE attribute to a BIND(C) derived type dummy, which
according to Fortran 2018 18.3.6 - 2. (4) implies that it must be passed
like the equivalent C structure value. The way C structure value are
passed is ABI dependent.
LLVM does not implement the C struct ABI passing for LLVM aggregate type
arguments. It is up to the front-end, like clang is doing, to split the
struct into registers or pass the struct on the stack (llvm "byval") as
required by the target ABI.
So the logic for C struct passing sits in clang. Using it from flang
requires setting up a lot of clang context and to bridge FIR/MLIR
representation to clang AST representation for function signatures (in
both directions). It is a non trivial task.
See
https://stackoverflow.com/questions/39438033/passing-structs-by-value-in-llvm-ir/75002581#75002581.
Since BIND(C) struct are rather limited as opposed to generic C struct
(e.g. no bit fields). It is easier to provide a limited implementation
of it for the case that matter to Fortran.
This patch:
- Updates the generic target rewrite pass to keep track of both the new
argument type and attributes. The motivation for this is to be able to
tell if a previously marshalled argument is passed in memory (it is a C
pointer), or if it is being passed on the stack (has the byval llvm
attributes).
- Adds an entry point in the target specific codegen to marshal struct
arguments, and use it in the generic target rewrite pass.
- Implements limited support for the X86-64 case. So far, the support
allows telling if a struct must be passed in register or on the stack,
and to deal with the stack case. The register case is left TODO in this
patch.
The X86-64 ABI implemented is the System V ABI for AMD64 version 1.0
When assigning to a whole allocatable, lowering is dealing with the
implicit conversion to preserve the RHS lower bounds. In case of
character KIND mismatch, the code was setting the new RHS length to the
one from the LHS, which is wrong for two reasons:
- no padding/truncation was actually done in the conversion
- the RHS length should anyway not be touched since the one from the
allocatable LHS may change to become the one of the RHS.
Update the code to preserve the RHS type length when materializing the
implicit character KIND conversion.
`nsw` is a flag for LLVM arithmetic operations meaning "no signed wrap".
If this keyword is present, the result of the operation is a poison
value if overflow occurs. Adding this keyword permits LLVM to re-order
integer arithmetic more aggressively.
In
https://discourse.llvm.org/t/rfc-changes-to-fircg-xarray-coor-codegen-to-allow-better-hoisting/75257/16
@vzakhari observed that adding nsw is useful to enable hoisting of
address calculations after some loops (or is at least a step in that
direction).
Classic flang also adds nsw to address calculations.
Preliminary patch to change lowering/code generation to use
llvm::DataLayout information instead of generating "sizeof" GEP (see
https://github.com/llvm/llvm-project/issues/71507).
Fortran Semantic analysis needs to know about the target type size and
alignment to deal with common blocks, and intrinsics like
C_SIZEOF/TRANSFER. This information should be obtained from the
llvm::DataLayout so that it is consistent during the whole compilation
flow.
This change is changing flang-new and bbc drivers to:
1. Create the llvm::TargetMachine so that the data layout of the target
can be obtained before semantics.
2. Sharing bbc/flang-new set-up of the
SemanticConstext.targetCharateristics from the llvm::TargetMachine. For
now, the actual part that set-up the Fortran type size and alignment
from the llvm::DataLayout is left TODO so that this change is mostly an
NFC impacting the drivers.
3. Let the lowering bridge set-up the mlir::Module datalayout attributes
since it is doing it for the target attribute, and that allows the llvm
data layout information to be available during lowering.
For flang-new, the changes are code shuffling: the `llvm::TargetMachine`
instance is moved to `CompilerInvocation` class so that it can be used
to set-up the semantic contexts. `setMLIRDataLayout` is moved to
`flang/Optimizer/Support/DataLayout.h` (it will need to be used from
codegen pass for fir-opt target independent testing.)), and the code
setting-up semantics targetCharacteristics is moved to
`Tools/TargetSetup.h` so that it can be shared with bbc.
As a consequence, LLVM targets must be registered when running
semantics, and it is not possible to run semantics for a target that is
not registered with the -triple option (hence the power pc specific
modules can only be built if the PowerPC target is available.
`llvm.fcmp` does support fast math attributes therefore so should
`arith.cmpf`.
The heavy churn in flang tests are because flang sets
`fastmath<contract>` by default on all operations that support the fast
math interface. Downstream users of MLIR should not be so effected.
This was requested in https://github.com/llvm/llvm-project/issues/74263
COMPLEX(10) passing by value and returning follows C complex
passing/returning ABI.
Cover the COMPLEX(10) case (X87 / __Complex long double on X86-64).
Implements System V ABI for AMD64 version 1.0.
The LLVM signatures match the one generated by clang for the __Complex
long double case.
Note that a FIXME is added for the COMPLEX(8) case that is incorrect in
a corner case. This will be fixed when dealing with passing derived type
by value in BIND(C) context.
We pass TBAA alias information with separate TBAA trees per function (to
prevent incorrect alias information after inlining). These TBAA trees
are identified by a unique string per function. Naturally, we use the
mangled name of the function.
TBAA tags are added in two places: during a dedicated pass relatively
early (structured control flow makes fir::AliasAnalysis more accurate),
then again during CodeGen (when implied box loads and stores become
visible). In between these two passes, the ExternalNameConversion pass
changes the name of some functions.
These functions with changed names previously ended up with separate
TBAA trees from the TBAA tags pass and from CodeGen - leading LLVM to
think that all data accesses alias with all descriptor accesses.
This patch solves this by storing the original name of a function in an
attribute during the ExternalNameConversion pass, and using the name
from that attribute when creating TBAA trees during CodeGen.
…ortran_binding.h
... and into the ISO_Fortran_binding_wrapper.h header, through which the
compiler and runtime access its contents. This change ensures that user
code that #includes ISO_Fortran_binding.h within 'extern "C" {' doesn't
encounter mysterious namespace errors.
This update makes the user visible messages relating to features that
are not yet implemented be more consistent. I also cleaned up some of
the code.
For NYI messages that refer to intrinsics, I made sure the the message
begins with "not yet implemented: intrinsic:" to make them easier to
recognize.
I created some utility functions for NYI reporting that I put into
.../include/Optimizer/Support/Utils.h. These mainly convert MLIR types
to their Fortran equivalents.
I converted the NYI code to use the newly created utility functions.
This got "lost" in the HLFIR transformation. This patch applies the old
attribute to the AssociateOp that needs it, and forwards it to the
AllocaOp that is generated when lowering to FIR.
LOC is a GNU extension, so gfortran is the reference for it, and it
accepts absent OPTIONAL and returns zero. Support this use case in flang
too.
Update the LOC test to use HLFIR while touching it.
Fixes https://github.com/llvm/llvm-project/issues/72823.
This operation allows computing the address of descriptor fields. It is
needed to help attaching descriptors in OpenMP/OpenACC target region.
The pointers inside the descriptor structure must be mapped too, but the
fir.box is abstract, so these fields cannot be computed with
fir.coordinate_of.
To preserve the abstraction of the descriptor layout in FIR, introduce
an operation specifically to !fir.ref<fir.box<>> address fields based on
field names (base_addr or derived_type).
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
Note: this commit is actually resubmitting the original commit from
PR #70461 that was reverted. See PR #73221.
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
MLIR can't really be const-correct (it would need a `ConstValue` class
alongside the `Value` class really, like `ArrayRef` and
`MutableArrayRef`). This is however making is more consistent: method
that are directly modifying the Value shouldn't be marked const.
Currently, when deleting the device functions in the second stage of filtering during MLIR to LLVM translation we can end up with invalid calls to these functions. This is because of the removal of the EarlyOutliningPass which would have otherwise gotten rid of any such calls.
This patch aims to alter the function filtering pass in the following way:
- Any host function is completely removed.
- Call to the host function are also removed and their uses replaced with Undef values.
- Any host function with target region code is marked to be removed during the the second stage.
- Calls to such functions are still removed and their uses replaced with Undef values.
Co-authored-by: Sergio Afonso <sergio.afonsofumero@amd.com>
Change the separator in the `uniqueCGIdent` method to `X`. This change
is required to enable OpenMP offloading for the NVPTX target, as dots
are not valid identifiers in PTX and `uniqueCGIdent` is used to mangle
some literals. Follow up patches will change the remainder of `.`
appearances in names to `X` and add support for the NVPTX target.
This patch removes the OMPEarlyOutliningPass as it is no longer required. The implicit map operand capture has now been moved to the PFT lowering stage.
Depends on #67318.
The stack arrays pass uses data flow analysis to determine whether heap
allocations are freed on all paths out of the function.
`interp_domain_em_part2` in spec2017 wrf generates over 120k operations,
including almost 5k fir.if operations and over 200 fir.do_loop
operations, all in the same function. The MLIR data flow analysis
framework cannot provide reasonable performance for such cases because
there is a combinatorial explosion in the number of control flow paths
through the function, all of which must be checked to determine if the
heap allocations will be freed.
This patch skips the stack arrays pass for ridiculously large functions
(defined as having more than 1000 fir.allocmem operations). This
threshold is configurable at runtime with a command line argument.
With this patch, compiling this file is more than 80% faster.
Propagate fast math flags through complex number lowering (when lowering
fir.*c directly to llvm floating point operations).
The lowering path through the MLIR complex dialect is unchanged.
This leads to a small improvement in spec2017 fotonik3d_r.
The front-end is making implicit conversions explicit in assignment and
structure constructors.
While this generally helps and is needed by semantics to fold structure
constructors correctly, this is incorrect when the LHS or component is
an allocatable. The RHS may have non default lower bounds that should be
propagated to the LHS, and making the conversion explicit changes the
semantics. In the structure constructor, the situation is even worse
since Fortran 2018 7.5.10 point 7 allows the value to be a reference to
an unallocated allocatable, and adding an explicit conversion in
semantics will cause a segfault.
This patch removes the explicit convert in semantics when the
LHS/component is a whole allocatable, and update lowering to deal with
the conversion insertion, dealing with preserving the lower bounds and
the tricky structure constructor case.
These turn out to be useful for spec2017/fotonik3d and safe so long as
they are not used along side TBAA tags for local allocations. LLVM may
be able to figure out local allocations by itself anyway.
PR #68727
!llvm.ptr<T> typed pointers are depreciated in MLIR LLVM dialects. Flang
codegen still generated them and relied on mlir.llvm codegen to LLVM to
turn them into opaque pointers.
This patch update FIR codegen to directly emit and work with LLVM opaque
pointers.
Addresses https://github.com/llvm/llvm-project/issues/69303
- All places generating GEPs need to add an extra type argument with the
base type (the T that was previously in the llvm.ptr<T> of the base).
- llvm.alloca must also be provided the object type. In the process, I
doscovered that we were shamelessly copying all the attribute from
fir.alloca to the llvm.alloca, which makes no sense for the operand
segments. The updated code that cannot take an attribute dictionnary in
the llvm.alloca builder with opaque pointers only propagate the "pinned"
and "bindc_name" attributes to help debugging the generated IR.
- Updating all the places that rely on getting the llvm object type from
lowered llvm.ptr<T> arguments to get it from a type conversion of the
original fir types.
- Updating all the places that were generating llvm.ptr<T> types to
generate the opaque llvm.ptr type.
- Updating all the codegen tests checking generated MLIR llvm dialect.
Many tests are testing directly LLVM IR, and this change is a no-op for
those (which is expected).
Expose a `MutableArrayRef<OpOperand>` instead of
`ValueRange`/`OperandRange`. This allows users of this interface to
change the yielded values and the init values. The names of the
interface methods are the same as the auto-generated op accessor names
(`get...()` returns `OperandRange`, `get...Mutable()` returns
`MutableOperandRange`).
Note: The interface methods return a `MutableArrayRef` instead of a
`MutableOperandRange` because a loop op may not implement
`getYieldedValuesMutable` etc. and there is no safe way to return an
"empty" range with a `MutableOperandRange`.