The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.
The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
This PR is an implementation for changes proposed in
https://discourse.llvm.org/t/rfc-distinguish-between-data-and-non-data-in-fir-alias-analysis/78759
Test updates were made when the query was on the wrong reference. So, it
is my hope that this will clear ambiguity on the nature of the queries
from here on.
There are also some TODOs that were addressed.
It also partly implements what
https://github.com/llvm/llvm-project/pull/87723 is attempting to
accomplish. At least, on a point-to-point query between references, the
distinction is made. To apply it to TBAA, would be another PR.
Note that, the changes were minimal in the TBAA code to retain the
current results.
This is same as #90905 with an added fix. The issue was that we
generated variable info even when user asked for line-tables-only. This
caused llvm dwarf generation code to fail an assertion as it expected an
empty variable list.
Fixed by not generating debug info for variables when user wants only
line table. I also updated a test check for this case.
This patch add support of intrinsics GNU extension ETIME
https://github.com/llvm/llvm-project/issues/84205. Some usage info and
example has been added to `flang/docs/Intrinsics.md`. The patch contains
both the lowering and the runtime code and works on both Windows and
Linux.
| System | Implmentation |
|-----------|--------------------|
| Windows| GetProcessTimes |
| Linux |times |
We need the information in the `DeclareOp` to generate debug information
for variables. Currently, cg-rewrite removes the `DeclareOp`. As
`AddDebugInfo` runs after that, it cannot process the `DeclareOp`. My
initial plan was to make the `AddDebugInfo` pass run before the cg-rewrite
but that has few issues.
1. Initially I was thinking to use the memref op to carry the variable
attr. But as @tblah suggested in the #86939, it makes more sense to
carry that information on `DeclareOp`. It also makes it easy to handle
it in codegen and there is no special handling needed for arguments. For
this reason, we need to preserve the `DeclareOp` till the codegen.
2. Running earlier, we will miss the changes in passes that run between
cg-rewrite and codegen.
But not removing the DeclareOp in cg-rewrite has the issue that ShapeOp
remains and it causes errors during codegen. To solve this problem, I
convert DeclareOp to XDeclareOp in cg-rewrite instead of removing
it. This was mentioned as possible solution by @jeanPerier in
https://reviews.llvm.org/D136254
The conversion follows similar logic as used for other operators in that
file. The FortranAttr and CudaAttr are currently not converted but left
as TODO when the need arise.
Now `AddDebugInfo` pass can extracts information about local variables
from `XDeclareOp` and creates `DILocalVariableAttr`. These are attached
to `XDeclareOp` using `FusedLoc` approach. Codegen can use them to
create `DbgDeclareOp`. I have added tests that checks the debug
information in mlir from and also in llvm ir.
Currently we only handle very limited types. Rest are given a place
holder type. The previous placeholder type was basic type with
`DW_ATE_address` encoding. When variables are added, it started
causing assertions in the llvm debug info generation logic for some
types. It has been changed to an interger type to prevent these issues
until we handle those types properly.
Arguments to openmp regions should not be tagged as dummy arguments.
This is particularly unsafe because these openmp blocks will eventually
be inlined into the calling function, where they will trivially alias
with other values inside of the calling function.
This is probably a theoretical issue because the calls to openmp runtime
function calls would act as barriers, preventing optimizations that are
too aggressive. But a lot more thought would need to go into a bet like
that.
This came out of discussion on
https://github.com/llvm/llvm-project/pull/92036
The HLFIR pass lowering WHERE (hlfir.where op) was too aggressive in its
hoisting of scalar sub-expressions from LHS/RHS/MASKS outside of the
loops generated for the WHERE construct.
This violated F'2023 10.2.3.2 point 10 that stipulated that elemental
operations must be evaluated only for elements corresponding to true
values, because scalar operations are still elemental, and hoisting them
is invalid if they could have side effects (e.g, division by zero) and
if the MASK is always false (i.e., the loop body is never evaluated).
The difficulty is that 10.2.3.2 point 9 mandates that nonelemental
function must be evaluated before the loops. So it is not possible to
simply stop hoisting non hlfir.elemental operations.
Marking calls with an elemental/nonelemental attribute would not allow
the pass to be correct if inlining is run before and drops this
information, beside, extracting the argument tree that may have been
CSE-ed with the rest of the expression evaluation would be a bit
combursome.
Instead, lower nonelemental calls into a new hlfir.exactly_once
operation that will allow retaining the information that the operations
contained inside its region must be hoisted. This allows inlining to
operate before if desired in order to improve alias analysis.
The LowerHLFIROrderedAssignments pass is updated to only hoist the
operations contained inside hlfir.exactly_once bodies.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
276 under llvm-project/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
This patch is one in a series of four patches that seeks to refactor
slightly and extend the current record type map support that was
put in place for Fortran's descriptor types to handle explicit
member mapping for record types at a single level of depth.
For example, the below case where two members of a Fortran
derived type are mapped explicitly:
''''
type :: scalar_and_array
real(4) :: real
integer(4) :: array(10)
integer(4) :: int
end type scalar_and_array
type(scalar_and_array) :: scalar_arr
!$omp target map(tofrom: scalar_arr%int, scalar_arr%real)
''''
Current cases of derived type mapping left for future work are:
> explicit member mapping of nested members (e.g. two layers of
record types where we explicitly map a member from the internal
record type)
> Fortran's automagical mapping of all elements and nested elements
of a derived type
> explicit member mapping of a derived type and then constituient members
(redundant in Fortran due to former case but still legal as far as I am aware)
> explicit member mapping of a record type (may be handled reasonably, just
not fully tested in this iteration)
> explicit member mapping for Fortran allocatable types (a variation of nested
record types)
This patch seeks to support this by extending the Flang-new OpenMP lowering to
support generation of this newly required information, creating the neccessary
parent <-to-> member map_info links, calculating the member indices and
setting if it's a partial map.
The OMPDescriptorMapInfoGen pass has also been generalized into a map
finalization phase, now named OMPMapInfoFinalization. This pass was extended
to support the insertion of member maps into the BlockArg and MapOperands of
relevant map carrying operations. Similar to the method in which descriptor types
are expanded and constituient members inserted.
Pull Request: https://github.com/llvm/llvm-project/pull/82853
The code for preparing cmdstat was generating an i2 constant with value
0, casting it, and then storing it into i16 storage. Just generate i16
constant directly.
The lowering produces fir.dummy_scope operation if the current
function has dummy arguments. Each hlfir.declare generated
for a dummy argument is then using the result of fir.dummy_scope
as its dummy_scope operand. This is only done for HLFIR.
I was not able to find a reliable way to identify dummy symbols
in `genDeclareSymbol`, so I added a set of registered dummy symbols
that is alive during the variables instantiation for the current
function. The set is initialized during the mapping of the dummy
argument symbols to their MLIR values. It is reset right after
all variables are instantiated - this is done to avoid generating
hlfir.declare operations with dummy_scope for the clones of
the dummy symbols (e.g. this happens with OpenMP privatization).
If this can be done in a cleaner way, please advise.
Lower locals allocation of cuda device, managed and unified variables to
fir.cuda_alloc. Add fir.cuda_free in the function context finalization.
@vzakhari For some reason the PR #90526 has been closed when I merged PR
#90525. Just reopening one.
It was pointed out in post commit review of
https://github.com/llvm/llvm-project/pull/90597 that the pass should
never have been run in parallel over all functions (and now other top
level operations) in the first place. The mutex used in the pass was
ineffective at preventing races since each instance of the pass would
have a different mutex.
The new operation is just an abstract attribute that is attached to
[hl]fir.declare operations of dummy arguments of a subroutine.
Dummy arguments of the same subroutine refer to the same
fir.dummy_scope, so they can be recognized as such during FIR AliasAnalysis.
Note that the fir.dummy_scope must be specific to the runtime
instantiation of a subroutine, so any MLIR inlining/cloning should duplicate and
unique it vs using the same fir.dummy_scope for different runtime instantiations.
This is why I made it an operation rather than an attribute.
The new operation uses a write effect on DebuggingResource, same as
[hl]fir.declare, to avoid optimizing it away.
A double pointer was being passed to the call to FortranStart rather than just a pointer to the EnvironmentDefaults.list. This now passes `null` directly when there's no EnvironmentDefaults.list and passes the list directly when there is, removing the original global variable which was a pointer to a pointer containing null or the EnvironmentDefaults.list global.
Fixes#90537
We might use polymorphic ops in top-level operations other than
functions some time in the future. We need to ensure that these
operations can be lowered.
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from moving declaration and definition of the
constructor function into tablegen (as requested in code review when
altering another pass).
The original PR #90083 had to be reverted in PR #90444 as it caused one
of the gfortran tests to fail. The issue was using `isIntOrIndex` for
checking for integer type. It allowed index type which later caused
assertion when calling `getIntOrFloatBitWidth`. I have now replaced it
with `isInteger` which should fix this regression.
This patch changes the behaviour for flang to only create and link to a
`main` entry point when the Fortran code has a program statement in it.
This means that flang-new can be used to link even when the program is
a mixed C/Fortran code with `main` present in C and no entry point
present in Fortran.
This also removes the `-fno-fortran-main` flag as this no longer has any
functionality.
This PR improves the debug information for functions in the following
ways:
1. Get line number information from FuncOp and remove hard-coded line
numbers.
2. Use proper type for function signature. I have a added a type
converter. Currently, it is very limited but will be enhanced with time.
3. Use de-constructed function name.
…rivate`
Adds support for CFG conversion and conversion to LLVM IR for
`omp.private` ops. This bridges a gap between FIR and LLVM to provide
more support for lowering `omp.private` ops for things like
allocatables.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc75.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
This is another one that runs on functions but isn't appropriate to also
run on other top level operations. It needs to find all paths that
return from the function to free heap allocated memory. There isn't a
generic concept for general top level operations which is equivalent to
looking for function returns.
I removed the manual definition of the options structure because there
is already an identical definition in tablegen and the options are
documented in Passes.td.
Stack arrays needs to stay running only on func.func because it needs to
know which block terminators can end the function (rather than just
branch between unstructured control flow). A similar concept does not
exist at the more abstract level of "any top level mlir operation".
For example, it currently looks for func::ReturnOp and
fir::UnreachableOp as points when execution can end. If this were to be
run on omp.declare_reduction, it would also need to understand
omp.YieldOp (perhaps only when omp.declare_reduction is the parent).
There isn't a generic concept in MLIR for this.
The pass is currently defined as only considering function arguments as
candidates for the optimization. I would prefer to generalise the pass
for other top level operations only when there is a concrete use case
before making too many assumptions about the current set of top level
operations. Therefore I have not adapted this pass to run on all top
level operations.
The pass assumed that all fun.func symbol usages could be safely
replaced by undef, that is not true after #87796 that added a back link
from internal procedure back to the parent procedure. This caused the
internal procedures to be erased and then processed (segfault).
Also set visibility of such internal procedures so that MLIR do not remove them before
the target function is generated for the target region.
Semantics usually fold SHAPE into an array constructor, but sometimes it
cannot (like when the source is a function result that cannot be
duplicated in expression analysis). Add lowering handling for shape.
This PR adds following options to the AddDebugInfo pass.
1. IsOptimized flag.
2. Level of debug info to generate.
3. Name of the source file
This enables us to remove the hard coded values from the code. It also
allows us to test the pass with different options. The tests have been
modified to take advantage of that.
The calling convention flag and producer name have also been improved.
Both arrays and trivial scalars are supported. Both cases must use
by-ref reductions because both are boxed.
My understanding of the standards are that OpenMP says that this should
follow the rules of the intrinsic reduction operators in fortran, and
fortran says that unallocated allocatable variables can only be
referenced to allocate them or test if they are already allocated.
Therefore we do not need a null pointer check in the combiner region.
Fix parsing of cuda_kernel: it missed a mlir::succeeded check and it was
not setting up the `types` and causing mismatch between values and types
of the grid/block (CUFKernelValues). @clementval
---------
Co-authored-by: Iman Hosseini <imanh@nvidia.com>
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>