This patch encapsulate array function call lowering into
hlfir.eval_in_mem and allows directly evaluating the call into the LHS
when possible.
The conditions are: LHS is contiguous, not accessed inside the function,
it is not a whole allocatable, and the function results needs not to be
finalized. All these conditions are tested in the previous hlfir.eval_in_mem
optimization (#118069) that is leveraging the extension of getModRef to
handle function calls(#117164).
This yields a 25% speed-up on polyhedron channel2 benchmark (from 1min
to 45s measured on an X86-64 Zen 2).
This alternative loop nest generation is used to generate an OpenMP loop nest instead of fir loops to facilitate parallelizing statements in an OpenMP `workshare` construct.
Currently, it is not possible to distinguish between BIND(C) from
non-BIND(C) type bound procedure call at the FIR level.
This will be a problem when dealing with derived type BIND(C) function
where the ABI differ between BIND(C)/non-BIND(C) but the FIR signature
looks like the same at the FIR level.
Fix this by adding the Fortran procedure attributes to fir.distpatch,
and propagating it until the related fir.call is generated in
fir.dispatch codegen.
Follow-up from a previous patch that turned bind_c into an enum for
procedure attribute.
This patch carries the elemental and pure Fortran attribute into FIR so
that the optimizer can leverage that info in the future (I think debug
info may also need to know these aspects since DWARF has DW_AT_elemental
and DW_AT_pure nodes).
SIMPLE from F2023 will be translated once it is handled in the
front-end.
NON_RECURSIVE is only meaningful on func.func since we are not
guaranteed to know that aspect on the caller side (it is not part of
Fortran characteristics). There is a DW_AT_recursive DWARF node. I will
do it while dealing with func.func attributes.
stack save/restore emitted for character elemental function result
allocation inside hlfir.elemental in lowering created memory bugs
because result memory is actually still used after the stack restore
when lowering the elemental into a loop where the result element is
copied into the array result storage.
Instead of adding special handling for stack save/restore in lowering,
just avoid emitting those since the stack reclaim pass is able to emit
them in the generated loop. Not having those stack save/restore will
also help optimizations that want to elide the temporary allocation for
the element result when that is possible.
The test code with ignore_tkr(tk) on character dummy passed by
fir.boxchar<> was crashing the compiler in [an
assert](2afe678f0a/flang/lib/Optimizer/Dialect/FIRType.cpp (L632))
in `changeElementType`.
It makes little sense to call changeElementType on a fir.boxchar since
this type is lossy (the shape is not part of it). Just skip it in the
code dealing with ignore(tk) when hitting this case
The new LLVM stack save/restore intrinsic operations are more convenient
than function calls because they do not add function declarations to the
module and therefore do not block the parallelisation of passes.
Furthermore they could be much more easily marked with memory effects
than function calls if that ever proved useful.
This builds on top of #107879.
Resolves#108016
First patch to fix a BIND(C) ABI issue
(https://github.com/llvm/llvm-project/issues/102113). I need to keep
track of BIND(C) in more locations (fir.dispatch and func.func
operations), and I need to fix a few passes that are dropping the
attribute on the floor. Since I expect more procedure attributes that
cannot be reflected in mlir::FunctionType will be needed for ABI,
optimizations, or debug info, this NFC patch adds a new enum attribute
to keep track of procedure attributes in the IR.
This patch is not updating lowering to lower more attributes, this will
be done in a separate patch to keep the test changes low here.
Adding the attribute on fir.dispatch and func.func will also be done in
separate patches.
When passing a polymorphic actual array argument to an non polymorphic
explicit or assumed shape argument, copy-in/copy-out may be required and
should be made according to the dummy dynamic type.
The code that was creating the descriptor to drive this copy-in/out was
not handling properly the case where the dummy and actual rank do not
match (possible according to sequence association rules), it tried to
make the copy-in/out according to the dummy argument shape (which we may
not even know if the dummy is assumed-size). Fix this by using the
actual shape when creating this new descriptor with the dummy argument
dynamic type.
Until genSecond, all intrinsic `genXXX` returning scalar intrinsic
(except NULL) were returning them as value.
The code calling genIntrinsicCall is using that assumption when
generation the asExprOp because hflir.expr<> of scalar are badly
supported in tools (I should likely just forbid them all together), the
type is meant for "non trivial" values: arrays, character, and derived
type. For instance, the added tests crashed with error: `'arith.subf' op
operand #0 must be floating-point-like, but got '!hlfir.expr<f32>'`
Load the result in genSecond and add an assert after genIntrinsicCall to
better enforce this.
Currently, all procedures from intrinsic modules that are not BIND(C)
are expected to be intercepted by the compiler in lowering and to have a
handler in IntrinsicCall.cpp.
As more "intrinsic" modules are being added (OpenMP, OpenACC, CUF, ...),
this requirement is preventing seamless implementation of intrinsic
modules in Fortran. Procedures from intrinsic modules are different from
generic intrinsics defined in section 16 of the standard. They are
declared in Fortran file seating in the intrinsic module directory and
inside the compiler they look like regular user call except for the
INTRINSIC attribute set on their module. So an easy implementation is
just to have the implementation done in Fortran and linked to the
runtime without any need for the compiler to necessarily understand and
handle these calls in special ways.
This patch splits the lookup and generation part of IntrinsicCall.cpp so
that it can be allowed to only intercept calls to procedure from
intrinsic module if they have a handler. Otherwise, the assumption is
that they should be implemented in Fortran.
Add explicit TODOs handler for the IEEE procedure that are known to not
yet been implemented and won't be implemented via Fortran code so that
this patch is an NFC for what is currently supported.
This patch also prevents doing two lookups in the intrinsic table (There
was one to get argument lowering rules, and another one to generate the
code).
Just remove the TODO and add a test.
There is nothing special to do to deal with assumed-rank copy-in/out
after the previous copy-in/out API change in
https://github.com/llvm/llvm-project/pull/95822.
The runtime API for copy-in copy-out currently only has an entry only
for the copy-out. This entry has a "skipInit" boolean that is never set
to false by lowering and it does not deal with the deallocation of the
temporary.
The generated code was a mix of inline code and runtime calls This is not a big deal,
but this is unneeded compiler and generated code complexity.
With assumed-rank, it is also more cumbersome to establish a
temporary descriptor.
Instead, this patch:
- Adds a CopyInAssignment API that deals with establishing the temporary
descriptor and does the copy.
- Removes unused arg to CopyOutAssign, and pushes
destruction/deallocation responsibility inside it.
Note that this runtime API are still not responsible for deciding the
need of copying-in and out. This is kept as a separate runtime call to
IsContiguous, which is easier to inline/replace by inline code with the
hope of removing the copy-in/out calls after user function inlining.
@vzakhari has already shown that always inlining all the copy part
increase Fortran compilation time due to loop optimization attempts for
loops that are known to have little optimization profitability (the
variable being copied from and to is not contiguous).
Deal with the cases where lower bounds, or attribute, or dynamic type
must be updated when passing an assumed-rank actual argument to an
assumed-rank dummy argument.
copy-in/copy-out and passing target assumed-rank to intent(in) pointers
will be handled in separate patch.
Prepare the argument and map them to their corresponding dummy symbol in
order to lower the specification expression of the function result.
Extract the preparation of arguments according to the interface to its
own function to be reused.
It seems there is no need to conditionally compute the length on the
input since all the information comes from the CharBoxValue or the
descriptor for cases where the number of element could be 0.
The BIND(C) attribute attached to a function can be lost when we do
indirect call. This information might be useful for codegen that have
specific ABI. This patch carry over the BIND(C) information to the
fir.call operation.
Handle lowering of non optional inquired argument in custom lowering.
Also fix an issue in the lowering of associated optional argument where
a box was emboxed again which led to weird result.
The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.
The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
The HLFIR pass lowering WHERE (hlfir.where op) was too aggressive in its
hoisting of scalar sub-expressions from LHS/RHS/MASKS outside of the
loops generated for the WHERE construct.
This violated F'2023 10.2.3.2 point 10 that stipulated that elemental
operations must be evaluated only for elements corresponding to true
values, because scalar operations are still elemental, and hoisting them
is invalid if they could have side effects (e.g, division by zero) and
if the MASK is always false (i.e., the loop body is never evaluated).
The difficulty is that 10.2.3.2 point 9 mandates that nonelemental
function must be evaluated before the loops. So it is not possible to
simply stop hoisting non hlfir.elemental operations.
Marking calls with an elemental/nonelemental attribute would not allow
the pass to be correct if inlining is run before and drops this
information, beside, extracting the argument tree that may have been
CSE-ed with the rest of the expression evaluation would be a bit
combursome.
Instead, lower nonelemental calls into a new hlfir.exactly_once
operation that will allow retaining the information that the operations
contained inside its region must be hoisted. This allows inlining to
operate before if desired in order to improve alias analysis.
The LowerHLFIROrderedAssignments pass is updated to only hoist the
operations contained inside hlfir.exactly_once bodies.
This PR is to fix issue #88889 .
Because the type of an actual argument of an array expression of
character has type of `hlfir::ExprType`, we need to transform it to
`fir::SequenceType` before calling `getBoxTypeWithNewShape`.
Calling `hlfir::ExprType::getShape` inside of `getBoxTypeWithNewShape`
will introduce a circular dependency on FIRDialect and HLFIRDialect
libraries.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc75.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
Fortran mandates "CHARACTER(1), VALUE" be passed as a C "char" in calls
to BIND(C) procedures (F'2023 18.3.7 (4)). Lowering passed them by
memory instead. Update call interface lowering code to pass them by
register. Fix related test and update it to use HLFIR.
The current lowering did not handle sequence associated argument passed
by descriptor. This case is special because sequence association implies
that the actual and dummy argument need to to agree in rank and shape.
Usually, arguments that can be sequence associated are passed by raw
address, and the shape mistmatch is transparent. But there are three
cases of explicit and assumed-size arrays passed by descriptors:
- polymorphic arguments
- BIND(C) assumed-length arguments (F'2023 18.3.7 (5)).
- length parametrized derived types (TBD)
The callee side is expecting a descriptor containing the dummy rank and
shape. This was not the case. This patch fix that by evaluating the
dummy shape on the caller side using the interface (that has to be
available when arguments are passed by descriptors).
Passing `TYPE(*)`actual to `TYPE(*)` dummy was left TODO. Implement it.
The difference with other actual arguments is that `TYPE(*)` are not
represented as Fortran::evaluate::Expr<T>, so inquiries on
evaluate::Expr<T> must be updated to use evaluate::ActualArgument or
also handle semantics::Symbol case (except in portion of the code where
`TYPE(*)` is impossible, where asserts are added).
This patch introduces a new `fir.cuda_kernel_launch` operation to
represents the call to CUDA kernels with the chervon notation. The
chevrons values in the parse tree can be scalar integer expr or dim3
derived type. The operation describes the grid/block values explicitly
as i32 values.
It lowers the parse-tree call to this new operation.
Flang crashes with the following case. The problem is we missed the case
when passing a reference to a function that returns a procedure pointer
as actual that corresponds to a procedure dummy. This PR is to fix that.
```
PROGRAM main
IMPLICIT NONE
INTERFACE
FUNCTION IntF(Arg)
integer :: Arg, IntF
END FUNCTION
END INTERFACE
INTERFACE
FUNCTION RetPtr(Arg)
IMPORT
PROCEDURE(IntF) :: Arg
PROCEDURE(IntF), POINTER :: RetPtr
END FUNCTION
END INTERFACE
CALL ModSub(RetPtr(IntF))
contains
SUBROUTINE ModSub(Fun1)
PROCEDURE(IntF) :: Fun1
END SUBROUTINE
END
```
Fortran requires finalizing function results when the result type have
final procedures.
Lowering was unconditionally "moving" function results into values
"hlfir.expr". This is not correct when the results are finalized because
it means the function result storage will be used after the hlfir.expr.
Only move function results that are not finalized.
Procedure pointer lowering used `prepareUserCallActualArgument` because
it was convenient, but this helper was not meant for POINTERs when
originally written and it did not handled passing NULL to an OPTIONAL
procedure pointer correctly.
The resulting argument should be a disassociated pointer, not an absent
pointer (Fortran 15.5.2.12 point 1.).
Move the logic for procedure pointer argument "cooking" in its own
helper to avoid triggering the logic that created an absent argument in
this case.
This PR adds lowering the reference to a function that returns a
procedure pointer. It also fixed intrinsic ASSOCIATED to take such
argument.
---------
Co-authored-by: jeanPerier <jperier@nvidia.com>
Start implementing assumed-rank support as described in
https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md
This commit holds the minimal support for lowering calls to procedure
with assumed-rank arguments where the procedure implementation is done
in C.
The case for passing assumed-size to assumed-rank is left TODO since it
will be done a change in assumed-size lowering that is better done in
another patch.
Care is taken to set the lower bounds to zero when passing non allocatable no pointer as descriptor
to a BIND(C) procedure as required per 18.5.3 point 3. This was not done before while the requirements also applies to non assumed-rank descriptors. This change required special attention with IGNORE_TKR(t) to avoid emitting invalid fir.rebox operations (the actual argument type must be used in this case as the output type).
Implementation of Fortran procedure with assumed-rank arguments is still
TODO.
This is a lot less complex than the data case where the shape has to be
accounted for, so the implementation is done inline.
One corner case will not be supported correctly for now: the case where
POINTER and TARGET points to the same internal procedure may return
false because lowering is creating fir.embox_proc each time the address
of an internal procedure is taken, so different thunk for the same
internal procedure/host link may be created and compare to false. This
will be fixed in a later patch that moves creating of internal procedure
fir.embox_proc in the host so that the addresses are the same when the
host link is the same. This change is required to properly support the
required lifetime of internal procedure addresses anyway (should be the
always be the lifetime of the host, even when the address is taken in an
internal procedure).
Lower procedure pointer components, except in the context of structure
constructor (left TODO).
Procedure pointer components lowering share most of the lowering logic
of procedure poionters with the following particularities:
- They are components, so an hlfir.designate must be generated to
retrieve the procedure pointer address from its derived type base.
- They may have a PASS argument. While there is no dispatching as with
type bound procedure, special care must be taken to retrieve the derived
type component base in this case since semantics placed it in the
argument list and not in the evaluate::ProcedureDesignator.
These components also bring a new level of recursive MLIR types since a
fir.type may now contain a component with an MLIR function type where
one of the argument is the fir.type itself. This required moving the
"derived type in construction" stackto the converter so that the object
and function type lowering utilities share the same state (currently the
function type utilty would end-up creating a new stack when lowering its
arguments, leading to infinite loops). The BoxedProcedurePass also
needed an update to deal with this recursive aspect.
VALUE derived type are passed by reference outside of BIND(C) interface.
The ABI is much simpler and it is possible for these arguments to have
the OPTIONAL attribute.
In the BIND(C) context, these arguments must follow the C ABI for
struct, which may lead the data to be passed in register. OPTIONAL is
also forbidden for those arguments, so it is safe to directly use the
fir.type<T> type for the func.func argument.
Codegen is in charge of later applying the C passing ABI according to
the target (https://github.com/llvm/llvm-project/pull/74829).
This patch is fixing two issue relative to the dynamic dispatch for
polymorphic entities.
1. Fix the `requireDispatchCall` function. It was checking for the first
symbol of the component but this is not the one to be checked. Instead
the last symbol of the base of the component object is the one to check
to know if it is polymorphic object with a dispatch call or not. This is
demonstrated in the new added test in `flang/test/Lower/dispatch.f90`
where the first symbol would point to `q` which is monomorphic and would
result in a simple `fir.call`
2. Fix the pass object in a no pass situation. In a no pass situation
the pass object is lowered anyway to be able to do the lookup in the
binding table. It was previously lowered wrongly an lead to unresolved
lookup. The base of the component is the passed object and should be
lowered. To achieve this, the `gen(DataRef)` entry point is exposed form
`ConvertExprToHLFIR` through a `convertDataRefToValue` function. The
same test added in `flang/test/Lower/dispatch.f90` is checking for the
correct passed object.
In addition couple of tests were updated to HLFIR since the lowering
used only works with it.
This got "lost" in the HLFIR transformation. This patch applies the old
attribute to the AssociateOp that needs it, and forwards it to the
AllocaOp that is generated when lowering to FIR.
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
Note: this commit is actually resubmitting the original commit from
PR #70461 that was reverted. See PR #73221.
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.
**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.
**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
When attempting to lower an intrinsic module procedure call, take into
account bind(c). Such procedures cannot be lowered as intrinsics since
their implementation is external to the module.
With this solution, the hardcoded "omp_lib" string can be removed when
checking if intrinsic module procedure since all procedure interfaces in
that module use bind(c).
In case of small interface mismatches between a function on the caller
and callee side, lowering insert converts. These are very often no-ops
at runtime (casting a descriptor to a descriptor), but they matter in
the strongly type IR.
The IR type of an object argument of a fir.dispatch must be the one of
the object, not the one of the callee side dummy, which may differ in
case of mismatches. Otherwise, the codgeneration of fir.dispatch cannot
succeed (it will not access the right binding tables).
I missed that vector subscripted arguments must still be passed by
address in an elemental call where the dummy argument does not have
the VALUE attribute.
Update PreparedActualArgument to hold an hlfir::Entity or an
hlfir::ElementalOp and to inline the elementalOp body in `getActual`.