The HLFIR pass lowering WHERE (hlfir.where op) was too aggressive in its
hoisting of scalar sub-expressions from LHS/RHS/MASKS outside of the
loops generated for the WHERE construct.
This violated F'2023 10.2.3.2 point 10 that stipulated that elemental
operations must be evaluated only for elements corresponding to true
values, because scalar operations are still elemental, and hoisting them
is invalid if they could have side effects (e.g, division by zero) and
if the MASK is always false (i.e., the loop body is never evaluated).
The difficulty is that 10.2.3.2 point 9 mandates that nonelemental
function must be evaluated before the loops. So it is not possible to
simply stop hoisting non hlfir.elemental operations.
Marking calls with an elemental/nonelemental attribute would not allow
the pass to be correct if inlining is run before and drops this
information, beside, extracting the argument tree that may have been
CSE-ed with the rest of the expression evaluation would be a bit
combursome.
Instead, lower nonelemental calls into a new hlfir.exactly_once
operation that will allow retaining the information that the operations
contained inside its region must be hoisted. This allows inlining to
operate before if desired in order to improve alias analysis.
The LowerHLFIROrderedAssignments pass is updated to only hoist the
operations contained inside hlfir.exactly_once bodies.
When generating block terminators in `genFIR(Evaluation)`, treat
`Directives` with nested evaluations the same way as `Constructs` to
determine the successor block.
This fixes https://github.com/llvm/llvm-project/issues/91526
The lowering produces fir.dummy_scope operation if the current
function has dummy arguments. Each hlfir.declare generated
for a dummy argument is then using the result of fir.dummy_scope
as its dummy_scope operand. This is only done for HLFIR.
I was not able to find a reliable way to identify dummy symbols
in `genDeclareSymbol`, so I added a set of registered dummy symbols
that is alive during the variables instantiation for the current
function. The set is initialized during the mapping of the dummy
argument symbols to their MLIR values. It is reset right after
all variables are instantiated - this is done to avoid generating
hlfir.declare operations with dummy_scope for the clones of
the dummy symbols (e.g. this happens with OpenMP privatization).
If this can be done in a cleaner way, please advise.
`loopEval` was declared inside the for loop to iterate over the nested
loops so the same loop control was redeclared for each level of the loop
nest. Make sure we are iterating over all the loops by putting
`loopEval` declaration ouside of the for loop.
Make source code locations for IF statements and IF construct component
statements more accurate. Make similar changes to ASSOCIATE, BLOCK, and
SELECT TYPE construct component statements.
Current implementation of default clause privatization incorrectly fails
to privatize in presence of non-OpenMP constructs (i.e. nested
constructs with regions whose symbols need to be privatized in the scope
of the parent OpenMP construct). This patch fixes the same by
considering non-OpenMP constructs separately by collecting symbols of a
nested region if it is a non-OpenMP construct with a region, and
privatizing it in the scope of the parent OpenMP construct.
Fixes https://github.com/llvm/llvm-project/issues/71914 and
https://github.com/llvm/llvm-project/issues/71915
A double pointer was being passed to the call to FortranStart rather than just a pointer to the EnvironmentDefaults.list. This now passes `null` directly when there's no EnvironmentDefaults.list and passes the list directly when there is, removing the original global variable which was a pointer to a pointer containing null or the EnvironmentDefaults.list global.
Fixes#90537
This patch changes the behaviour for flang to only create and link to a
`main` entry point when the Fortran code has a program statement in it.
This means that flang-new can be used to link even when the program is
a mixed C/Fortran code with `main` present in C and no entry point
present in Fortran.
This also removes the `-fno-fortran-main` flag as this no longer has any
functionality.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc75.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
When creating temporaries for implicit transfer, the newly create
hlfir.declare operation was missing some information like the shape and
the verifier was throwing an error. Fix it by making sure we have an
ExtendedValue when calling addSymbol to register the temp.
```
error: loc("cuda-data-transfer.cuf":67:22): 'hlfir.declare' op of array entity
with a raw address base must have a shape operand that is a shape or shapeshift
```
Thanks @jeanPerier for the advice!
FYI @ImanHosseini
When the characteristics of a procedure depend on a procedure that
hasn't yet been defined, the compiler currently emits an unconditional
error message. This includes the case of a procedure whose
characteristics depend, perhaps indirectly, on itself. However, in the
case where the characteristics of a procedure are needed to resolve a
generic, we should not emit an error for a hitherto undefined procedure
-- either the call will resolve to another specific procedure, in which
case the error is spurious, or it won't, and then an error will issue
anyway.
Fixes https://github.com/llvm/llvm-project/issues/88677.
llvm-project/flang/lib/Lower/Bridge.cpp:3775:14:
error: variable 'nbDeviceResidentObject' set but not used [-Werror,-Wunused-but-set-variable]
unsigned nbDeviceResidentObject = 0;
^
1 error generated.
Add more support for CUDA data transfer in assignment. This patch adds
device to device and device to host support. If device symbols are
present on the rhs, some implicit data transfer are initiated. A
temporary is created and the data are transferred to the host. The
expression is evaluated on the host and the assignment is done.
Whenever lowering is checking if a function or global already exists in
the mlir::Module, it was doing module->lookup.
On big programs (~5000 globals and functions), this causes important
slowdowns because these lookups are linear. Use mlir::SymbolTable to
speed-up these lookups. The SymbolTable has to be created from the
ModuleOp and maintained in sync. It is therefore placed in the
converter, and FirOPBuilders can take a pointer to it to speed-up the
lookups.
This patch does not bring mlir::SymbolTable to FIR/HLFIR passes, but
some passes creating a lot of runtime calls could benefit from it too.
More analysis will be needed.
As an example of the speed-ups, this patch speeds-up compilation of
Whizard compare_amplitude_UFO.F90 from 5 mins to 2 mins on my machine
(there is still room for speed-ups).
This PR fixes `not yet implemented: procedure pointer component in
structure constructor` as shown in the following test case.
```
MODULE M
TYPE :: DT
PROCEDURE(Fun), POINTER, NOPASS :: pp1
END TYPE
CONTAINS
INTEGER FUNCTION Fun(Arg)
INTEGER :: Arg
Fun = Arg
END FUNCTION
END MODULE
PROGRAM MAIN
USE M
IMPLICIT NONE
TYPE (DT) :: v2
PROCEDURE(FUN), POINTER :: pp2
v2 = DT(pp2)
v2 = DT(bar())
CONTAINS
FUNCTION BAR() RESULT(res)
PROCEDURE(FUN), POINTER :: res
END
END
```
In CUDA Fortran data transfer can be done via assignment statements
between host and device variables.
This patch introduces a `fir.cuda_data_transfer` operation that
materialized the data transfer between two memory references.
Simple transfer not involving descriptors from host to device are also
lowered in this patch. When the rhs is an expression that required an
evaluation, a temporary is created. The evaluation is done on the host
and then the transfer is initiated.
Implicit transfer when device symbol are present on the rhs is not part
of this patch. Transfer from device to host is not part of this patch.
Cray pointee symbols can be host associated from a module or host
procedure while the related cray pointer is not explicitly associated.
This caused the "not yet implemented: lowering symbol to HLFIR" to fire
when lowering a reference to the cray pointee and fetching the cray
pointer.
This patch:
- Ensures cray pointers are always instantiated when instantiating a
cray pointee.
- Fix internal procedure lowering to deal with cray pointee host
association like it does for pointers (the lowering strategy for cray
pointee is to create a pointer that is updated with the cray pointer
value before being fetched).
This should fix the bug reported in
https://github.com/llvm/llvm-project/issues/85420.
Parsing of the cuf kernel loop directive has been updated to handle
variants with the * syntax. This patch updates the lowering to make use
of them.
- If the grid or block syntax uses only stars then the operation
variadic operand remains empty.
- If there is values and stars, then stars are represented as a zero
constant value.
Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on #84033).
This patch seeks to create a process that happens on module finalization
for OpenMP, in which a list of operations that had declare target
directives applied to them and were not generated at the time of
processing the original declare target directive are re-checked to apply
the appropriate declare target semantics.
This works by maintaining a vector of declare target related data inside
of the FIR converter, in this case the symbol and the two relevant
unsigned integers representing the enumerators. This vector is added to
via a new function called from Bridge.cpp, insertDeferredDeclareTargets,
which happens prior to the processing of the directive (similarly to
getDeclareTargetFunctionDevice currently for requires), it effectively
checks if the Operation the declare target directive is applied to
currently exists, if it doesn't it appends to the vector. This is a
seperate function to the processing of the declare target via the
overloaded genOMP as we unfortunately do not have access to the list
without passing it through every call, as the AbstractConverter we pass
will not allow access to it (I've seen no other cases of casting it to a
FirConverter, so I opted to not do that).
The list is then processed at the end of the module in the
finalizeOpenMPLowering function in Bridge by calling a new function
markDelayedDeclareTargetFunctions which marks the latently generated
operations. In certain cases, some still will not be generated, e.g. if
an interface is defined, marked as declare target, but has no definition
or usage in the module then it will not be emitted to the module, so due
to these cases we must silently ignore when an operation has not been
found via it's symbol.
The main use-case for this (although, I imagine there is others) is for
processing interfaces that have been declared in a module with a declare
target directive but do not have their implementation defined in the
same module. For example, inside of a seperate C++ module that will be
linked in. In cases where the interface is called inside of a target
region it'll be marked as used on device appropriately (although,
realistically a user should explicitly mark it to match the
corresponding definition), however, in cases where it's used in a
non-clear manner through something like a function pointer passed to an
external call we require this explicit marking, which this patch adds
support for (currently will cause the compiler to crash).
This patch also adds documentation on the declare target process and
mechanisms within the compiler currently.
Internal procedures cannot be called directly from outside the host
procedure, so there is no point giving them external linkage. The only
reason flang did is because it is the default in MLIR.
Giving external linkage to them:
- prevents deleting them when not used/inlined by LLVM
- causes bugs with shared libraries (at least on linux x86-64) because
the call to the internal function could lead to a dynamic loader call
that would overwrite r10 register (the static chain pointer) due to
system calls and did not restore (it seems it does not expect r10 to be
used for PLT calls).
This patch gives internal linkage to internal procedures:
Note: the llvm.linkage attribute name cannot be obtained via a
getLinkageAttrName since it is not the same name as the one used in the
LLVM dialect. It is just a placeholder defined in
mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp until the func dialect
gets a real linkage model. So simply avoid hard coding it too many times
in lowering.
Adds basic support for emitting delayed privatizers from flang. So far,
only types of symbols are supported (i.e. scalars), support for more
complicated types will be added later. This also makes sure that
reduction and delayed privatization work properly together by merging
the
body-gen callbacks for both in case both clauses are present on the
parallel construct.
This patch introduces a new operation to represent the CUDA Fortran
kernel loop directive. This operation is modeled as a LoopLikeOp
operation in a similar way to acc.loop.
The CUFKernelDoConstruct parse tree node is also placed correctly in the
PFTBuilder to be available in PFT evaluations.
Lowering from the flang parse-tree to MLIR is also done.
Add initial handling of OpenMP copyprivate clause in Flang.
When lowering copyprivate, Flang generates the copy function
needed by each variable and builds the appropriate
omp.single's CopyPrivateVarList.
This is patch 3 of 4, to add support for COPYPRIVATE in Flang.
Original PR: https://github.com/llvm/llvm-project/pull/73128
When doing a pointer assignment with an RHS that is an array section,
the code fell in the legacy lowering code even with HLFIR enabled.
Escape this old code when HLFIR is on.
Should fix#80884.
… lower
The convention is to pass it after "symTable" if present, otherwise
after "converter":
- converter, symTable, semaCtx
- converter, semaCtx
This makes the interfaces more uniform---some of these functions were
already taking the semantics context, while others were not.
The context will be used in future patches.
The current code that handles NULLIFY statement for procedure pointer
returns after the 1st object.
This PR is to remove the `return` so it can nullify multiple procedure
pointer objects.
Runtime globals are compiler generated globals injected in user scopes.
They are never referred to directly in lowering code, we only need th
fur.global for them. Yet lowering was creating hlfir.declare for them in
module procedures. In modern fortran apps, this blows up the generated
IR for nothing (Types with dozens of components, type bound procedures
and parents can create in the order of 10 000 runtime info globals to
describe them, if there is a 100 module procedure, that is that is a few
million operations generated and processed in each pass for nothing).
This patch forwards the target CPU and features information from the
Flang frontend to MLIR func.func operation attributes, which are later
used to populate the target_cpu and target_features llvm.func
attributes.
This is achieved in two stages:
1. Introduce the `fir.target_cpu` and `fir.target_features` module
attributes with information from the target machine immediately after
the initial creation of the MLIR module in the lowering bridge.
2. Update the target rewrite flang pass to get this information from the
module and pass it along to all func.func MLIR operations, respectively
as attributes named `target_cpu` and `target_features`. These attributes
will be automatically picked up during Func to LLVM dialect lowering and
used to initialize the corresponding llvm.func named attributes.
The target rewrite and FIR to LLVM lowering passes are updated with the
ability to override these module attributes, and the `CodeGenSpecifics`
optimizer class is augmented to make this information available to
target-specific MLIR transformations.
This completes a full flow by which target CPU and features make it all
the way from compiler options to LLVM IR function attributes.
Compiler was rewriting SIZE(PACK(x, MASK)) to COUNT(MASK). It was
wrapping the COUNT call without a KIND argument (leading to INTEGER(4)
result in the characteristics) in an Expr<ExtentType> (implying
INTEGER(8) result), this lead to inconsistencies that later hit verifier
errors in lowering.
Set the KIND argument to the KIND of ExtentType to ensure the built
expression is consistent.
This requires giving access to some safe place where the "kind" name can
be saved and turned into a CharBlock (count has a DIM argument that
require using the KIND keyword here). For the FoldingContext that belong
to SemanticsContext, this is the same string set as the one used by
SemanticsContext for similar purposes.
acc.loop was redesigned in https://reviews.llvm.org/D159229. This patch
updates the lowering to match the new op.
DO CONCURRENT construct will be added in a follow up patch.
Note that the pre-commit ci will fail until D159229 is merged.
Depends on #67355
Introduce `genNestedEvaluations` that will lower all evaluations nested
in the given, accouting for a potential COLLAPSE directive.
Recursive lowering [2/5]
Add `notify-type` to `iso_fortran_env` module. Add `notify-wait-stmt` to
the parser and add checks for constraints on the statement, `C1177` and
`C1178`, from the Fortran 2023 standard. Add three semantics tests for
`notify-wait-stmt`.
Lower procedure pointer components, except in the context of structure
constructor (left TODO).
Procedure pointer components lowering share most of the lowering logic
of procedure poionters with the following particularities:
- They are components, so an hlfir.designate must be generated to
retrieve the procedure pointer address from its derived type base.
- They may have a PASS argument. While there is no dispatching as with
type bound procedure, special care must be taken to retrieve the derived
type component base in this case since semantics placed it in the
argument list and not in the evaluate::ProcedureDesignator.
These components also bring a new level of recursive MLIR types since a
fir.type may now contain a component with an MLIR function type where
one of the argument is the fir.type itself. This required moving the
"derived type in construction" stackto the converter so that the object
and function type lowering utilities share the same state (currently the
function type utilty would end-up creating a new stack when lowering its
arguments, leading to infinite loops). The BoxedProcedurePass also
needed an update to deal with this recursive aspect.
The function `instantiateVariable` in Bridge.cpp has the following code:
```
if (var.getSymbol().test(
Fortran::semantics::Symbol::Flag::OmpThreadprivate))
Fortran::lower::genThreadprivateOp(*this, var);
if (var.getSymbol().test(
Fortran::semantics::Symbol::Flag::OmpDeclareTarget))
Fortran::lower::genDeclareTargetIntGlobal(*this, var);
```
Implement `handleOpenMPSymbolProperties` in OpenMP.cpp, move the above
code there, and have `instantiateVariable` call this function instead.
This would further separate OpenMP-related details into OpenMP.cpp.
There was some discussion on discourse[1] about allowing call to FIR
generation functions from other part of lowering belonging to OpenMP.
This solution exposes a simple `genEval` member function on the
`AbstractConverter` so that IR generation for PFT Evaluation objects can
be called from lowering outside of the FirConverter but not exposing it.
[1] https://discourse.llvm.org/t/openmp-lowering-from-pft-to-fir/75263
Preliminary patch to change lowering/code generation to use
llvm::DataLayout information instead of generating "sizeof" GEP (see
https://github.com/llvm/llvm-project/issues/71507).
Fortran Semantic analysis needs to know about the target type size and
alignment to deal with common blocks, and intrinsics like
C_SIZEOF/TRANSFER. This information should be obtained from the
llvm::DataLayout so that it is consistent during the whole compilation
flow.
This change is changing flang-new and bbc drivers to:
1. Create the llvm::TargetMachine so that the data layout of the target
can be obtained before semantics.
2. Sharing bbc/flang-new set-up of the
SemanticConstext.targetCharateristics from the llvm::TargetMachine. For
now, the actual part that set-up the Fortran type size and alignment
from the llvm::DataLayout is left TODO so that this change is mostly an
NFC impacting the drivers.
3. Let the lowering bridge set-up the mlir::Module datalayout attributes
since it is doing it for the target attribute, and that allows the llvm
data layout information to be available during lowering.
For flang-new, the changes are code shuffling: the `llvm::TargetMachine`
instance is moved to `CompilerInvocation` class so that it can be used
to set-up the semantic contexts. `setMLIRDataLayout` is moved to
`flang/Optimizer/Support/DataLayout.h` (it will need to be used from
codegen pass for fir-opt target independent testing.)), and the code
setting-up semantics targetCharacteristics is moved to
`Tools/TargetSetup.h` so that it can be shared with bbc.
As a consequence, LLVM targets must be registered when running
semantics, and it is not possible to run semantics for a target that is
not registered with the -triple option (hence the power pc specific
modules can only be built if the PowerPC target is available.