We need the information in the `DeclareOp` to generate debug information
for variables. Currently, cg-rewrite removes the `DeclareOp`. As
`AddDebugInfo` runs after that, it cannot process the `DeclareOp`. My
initial plan was to make the `AddDebugInfo` pass run before the cg-rewrite
but that has few issues.
1. Initially I was thinking to use the memref op to carry the variable
attr. But as @tblah suggested in the #86939, it makes more sense to
carry that information on `DeclareOp`. It also makes it easy to handle
it in codegen and there is no special handling needed for arguments. For
this reason, we need to preserve the `DeclareOp` till the codegen.
2. Running earlier, we will miss the changes in passes that run between
cg-rewrite and codegen.
But not removing the DeclareOp in cg-rewrite has the issue that ShapeOp
remains and it causes errors during codegen. To solve this problem, I
convert DeclareOp to XDeclareOp in cg-rewrite instead of removing
it. This was mentioned as possible solution by @jeanPerier in
https://reviews.llvm.org/D136254
The conversion follows similar logic as used for other operators in that
file. The FortranAttr and CudaAttr are currently not converted but left
as TODO when the need arise.
Now `AddDebugInfo` pass can extracts information about local variables
from `XDeclareOp` and creates `DILocalVariableAttr`. These are attached
to `XDeclareOp` using `FusedLoc` approach. Codegen can use them to
create `DbgDeclareOp`. I have added tests that checks the debug
information in mlir from and also in llvm ir.
Currently we only handle very limited types. Rest are given a place
holder type. The previous placeholder type was basic type with
`DW_ATE_address` encoding. When variables are added, it started
causing assertions in the llvm debug info generation logic for some
types. It has been changed to an interger type to prevent these issues
until we handle those types properly.
Arguments to openmp regions should not be tagged as dummy arguments.
This is particularly unsafe because these openmp blocks will eventually
be inlined into the calling function, where they will trivially alias
with other values inside of the calling function.
This is probably a theoretical issue because the calls to openmp runtime
function calls would act as barriers, preventing optimizations that are
too aggressive. But a lot more thought would need to go into a bet like
that.
This came out of discussion on
https://github.com/llvm/llvm-project/pull/92036
This patch is one in a series of four patches that seeks to refactor
slightly and extend the current record type map support that was
put in place for Fortran's descriptor types to handle explicit
member mapping for record types at a single level of depth.
For example, the below case where two members of a Fortran
derived type are mapped explicitly:
''''
type :: scalar_and_array
real(4) :: real
integer(4) :: array(10)
integer(4) :: int
end type scalar_and_array
type(scalar_and_array) :: scalar_arr
!$omp target map(tofrom: scalar_arr%int, scalar_arr%real)
''''
Current cases of derived type mapping left for future work are:
> explicit member mapping of nested members (e.g. two layers of
record types where we explicitly map a member from the internal
record type)
> Fortran's automagical mapping of all elements and nested elements
of a derived type
> explicit member mapping of a derived type and then constituient members
(redundant in Fortran due to former case but still legal as far as I am aware)
> explicit member mapping of a record type (may be handled reasonably, just
not fully tested in this iteration)
> explicit member mapping for Fortran allocatable types (a variation of nested
record types)
This patch seeks to support this by extending the Flang-new OpenMP lowering to
support generation of this newly required information, creating the neccessary
parent <-to-> member map_info links, calculating the member indices and
setting if it's a partial map.
The OMPDescriptorMapInfoGen pass has also been generalized into a map
finalization phase, now named OMPMapInfoFinalization. This pass was extended
to support the insertion of member maps into the BlockArg and MapOperands of
relevant map carrying operations. Similar to the method in which descriptor types
are expanded and constituient members inserted.
Pull Request: https://github.com/llvm/llvm-project/pull/82853
It was pointed out in post commit review of
https://github.com/llvm/llvm-project/pull/90597 that the pass should
never have been run in parallel over all functions (and now other top
level operations) in the first place. The mutex used in the pass was
ineffective at preventing races since each instance of the pass would
have a different mutex.
We might use polymorphic ops in top-level operations other than
functions some time in the future. We need to ensure that these
operations can be lowered.
See RFC:
https://discourse.llvm.org/t/rfc-add-an-interface-for-top-level-container-operations
Some of the changes are from moving declaration and definition of the
constructor function into tablegen (as requested in code review when
altering another pass).
The original PR #90083 had to be reverted in PR #90444 as it caused one
of the gfortran tests to fail. The issue was using `isIntOrIndex` for
checking for integer type. It allowed index type which later caused
assertion when calling `getIntOrFloatBitWidth`. I have now replaced it
with `isInteger` which should fix this regression.
This PR improves the debug information for functions in the following
ways:
1. Get line number information from FuncOp and remove hard-coded line
numbers.
2. Use proper type for function signature. I have a added a type
converter. Currently, it is very limited but will be enhanced with time.
3. Use de-constructed function name.
…ted. (#89998)" (#90250)
This partially reverts commit 7aedd7dc75.
This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
This is another one that runs on functions but isn't appropriate to also
run on other top level operations. It needs to find all paths that
return from the function to free heap allocated memory. There isn't a
generic concept for general top level operations which is equivalent to
looking for function returns.
I removed the manual definition of the options structure because there
is already an identical definition in tablegen and the options are
documented in Passes.td.
Stack arrays needs to stay running only on func.func because it needs to
know which block terminators can end the function (rather than just
branch between unstructured control flow). A similar concept does not
exist at the more abstract level of "any top level mlir operation".
For example, it currently looks for func::ReturnOp and
fir::UnreachableOp as points when execution can end. If this were to be
run on omp.declare_reduction, it would also need to understand
omp.YieldOp (perhaps only when omp.declare_reduction is the parent).
There isn't a generic concept in MLIR for this.
The pass is currently defined as only considering function arguments as
candidates for the optimization. I would prefer to generalise the pass
for other top level operations only when there is a concrete use case
before making too many assumptions about the current set of top level
operations. Therefore I have not adapted this pass to run on all top
level operations.
The pass assumed that all fun.func symbol usages could be safely
replaced by undef, that is not true after #87796 that added a back link
from internal procedure back to the parent procedure. This caused the
internal procedures to be erased and then processed (segfault).
Also set visibility of such internal procedures so that MLIR do not remove them before
the target function is generated for the target region.
This PR adds following options to the AddDebugInfo pass.
1. IsOptimized flag.
2. Level of debug info to generate.
3. Name of the source file
This enables us to remove the hard coded values from the code. It also
allows us to test the pass with different options. The tests have been
modified to take advantage of that.
The calling convention flag and producer name have also been improved.
Currently, it is not possible to find back which fun.func is the host
procedure of some internal procedure because the mangling of the
internal procedure does not contain info about the BIND(C) name of the
host.
This info may be useful to ensure dwarf DW_TAG_subprogram of internal
procedures are nested under DW_TAG_subprogram of host procedures for
instance.
Whenever lowering is checking if a function or global already exists in
the mlir::Module, it was doing module->lookup.
On big programs (~5000 globals and functions), this causes important
slowdowns because these lookups are linear. Use mlir::SymbolTable to
speed-up these lookups. The SymbolTable has to be created from the
ModuleOp and maintained in sync. It is therefore placed in the
converter, and FirOPBuilders can take a pointer to it to speed-up the
lookups.
This patch does not bring mlir::SymbolTable to FIR/HLFIR passes, but
some passes creating a lot of runtime calls could benefit from it too.
More analysis will be needed.
As an example of the speed-ups, this patch speeds-up compilation of
Whizard compare_amplitude_UFO.F90 from 5 mins to 2 mins on my machine
(there is still room for speed-ups).
The ExternalNameConversion pass can be surprisingly slow on big
programs. On an example with a 50kloc Fortran file with about 10000
calls to external procedures, the pass alone took 25s on my machine.
This patch reduces this to 0.16s.
The root cause is that using `replaceAllSymbolUses` on each modified
FuncOp is very expensive: it is walking all operations and attribute
every time.
An alternative would be to use mlir::SymbolUserMap to avoid walking the
module again and again, but this is still much more expensive than what
is needed because it is essentially caching all symbol uses of the
module, and there is no need to such caching here.
Instead:
- Do a shallow walk of the module (only top level operation) to detect
FuncOp/GlobalOp that needs to be updated. Update them and place the name
remapping in a DenseMap.
- If any remapping were done, do a single deep walk of the module
operation, and update any SymbolRefAttr that matches a name that was
remapped.
Most FIR passes only look for FIR operations inside of functions (either
because they run only on func.func or they run on the module but iterate
over functions internally). But there can also be FIR operations inside
of fir.global, some OpenMP and OpenACC container operations.
This has worked so far for fir.global and OpenMP reductions because they
only contained very simple FIR code which doesn't need most passes to be
lowered into LLVM IR. I am not sure how OpenACC works.
In the long run, I hope to see a more systematic approach to making sure
that every pass runs on all of these container operations. I will write
an RFC for this soon.
In the meantime, this pass duplicates the CFG conversion pass to also
run on omp reduction operations. This is similar to how the
AbstractResult pass is already duplicated for fir.global operations.
OpenMP array reductions 2/6
Previous PR: https://github.com/llvm/llvm-project/pull/84952
Next PR: https://github.com/llvm/llvm-project/pull/84954
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
When walking over functions (in pre-order), if the function being
visited needs to be erased, skip visiting its regions.
This was detected by address sanitizer.
This makes an adjustment to the existing fir minloc/maxloc generation
code to handle functions with a dim=1 that produce a scalar result. This
should allow us to get the same benefits as the existing generated
minmax reductions.
This is a recommit of #76194 with an extra alteration to the end of
genRuntimeMinMaxlocBody to make sure we convert the output array to the
correct type (a `box<heap<i32>>`, not `box<heap<array<1xi32>>>`) to
prevent writing the wrong type of box into it. This still allocates the
data as a `array<1xi32>`, converting it into a i32 assuming that is
safe. An alternative would be to allocate the data as a i32 and change
more of the accesses to it throughout genRuntimeMinMaxlocBody.
In certain case "extreme" values like Nan, Inf and 0xffffffff could lead
to generating different code via the inline-generated intrinsics vs the
versions in the runtimes (and other compilers like gfortran). There are
some examples I was using for testing in
https://godbolt.org/z/x4EfqEss5.
This changes the generation for the intrinsics to be more like the
runtimes, using a condition that is similar to:
isFirst || (prev != prev && elem == elem) || elem < prev
The middle part is only used for floating point operations, and checks
if the values are Nan. This should then hopefully make the logic closer
to - return the first element with the lowest value, with Nans ignored
unless there are only Nans. The initial limit value for floats are also
changed from the largest float to Inf, to make sure it is handled
correctly.
The integer reductions are also changed to use a similar scheme to make
sure they work with masked values. This means that the preamble after
the loop can be removed.
This is one option for attempting to move genMinMaxlocReductionLoop to a
better location. It moves it into Transforms and makes HLFIRTranforms
depend upon FIRTransforms.
It passes a build locally, both with and without -DBUILD_SHARED_LIBS,
and does OK on the windows CI.
These hardcoded attribute name are a leftover from the upstreaming
period when there was no way to get the attribute name without an
instance of the operation. It is since possible to do without them and
they should be removed to avoid duplication.
This PR cleanup the fir.global op of these hardcoded attribute name and
use their generated getters. Some other PRs will follow to cleanup other
operations.
The implemented logic matches the logic used for Clang in emitting these
attributes. Although it's hoped that function attributes won't be needed
in the future (vs using fast math flags in individual IR instructions),
there are codegen differences currently with/without these attributes,
as can be seen in issues like #79257 or by hacking Clang to avoid
producing these attributes and observing codegen changes.
This patch seeks to add an initial lowering for pointers and allocatable variables
captured by implicit and explicit map in Flang OpenMP for Target operations that
take map clauses e.g. Target, Target Update. Target Exit/Enter etc.
Currently this is done by treating the type that lowers to a descriptor
(allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's
effectively what descriptor types lower to in LLVM-IR and what they're represented as
in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a structure
containing scalar and array elements that represent various aspects of the underlying
data being mapped (lower bound, upper bound, extent being the main ones of interest
in most cases) and a pointer to the allocated data. In this current iteration of the mapping
we map the structure in it's entirety and then attach the underlying data pointer and map
the data to the device, this allows most of the required data to be resident on the device
for use. Currently we do not support the addendum (another block of pointer data), but
it shouldn't be too difficult to extend this to support it.
The MapInfoOp generation for descriptor types is primarily handled in an optimization
pass, where it expands BoxType (descriptor types) map captures into two maps, one for
the structure (scalar elements) and the other for the pointer data (base address) and
links them in a Parent <-> Child relationship. The later lowering processes will then treat
them as a conjoined structure with a pointer member map.
The existing type size computation in LoopVersioning does not work
for REAL*10, because the compute element size is 10 bytes,
which violates the power-of-two assertion.
We'd better use the DataLayout for computing the storage size
of each element of an array of the given type.
The shared library build doesn't like references of genMinMaxlocReductionLoop,
in Optimizer/Transforms, from HLFIR/Optimizer/Transforms. For the moment I've
moved the code to the header file where it can be shared, like other methods in
Utils.h
Currently the lowering of a minloc intrinsic with a mask will look something
like:
%e = hlfir.elemental %shape ({
...
})
%m = hlfir.minloc %array mask %e
hlfir.assign %m to %result
hlfir.destroy %m
The elemental will be expanded into a temporary+loop, the minloc into a
FortranAMinloc call (which hopefully gets simplified to a specialized call that
can be inlined at the call site), and the assign might get expanded to a
FortranAAssign. It would be better to generate the entire construct as single
loop if we can - one that performs the minloc calculation with the mask
elemental computed inline.
This patch attempt to do that, adding a hlfir version of the expansion code
from SimplifyIntrinsics that turns an minloc+elemental into a single combined
loop nest. It attempts to reuse the methods in genMinlocReductionLoop for
constructing the loop with a modified loop body. The declaration for the
function is currently in Optimizer/Support/Utils.h, but there might be a better
place for it.
It is added as part of the OptimizedBufferizationPass, like the
similar count/any/all that have been added recently.
Currently lowering sets the extents of assumed-size array to "undef"
which was OK as long as the value was not expected to be read.
But when interfacing with the runtime and when passing assumed-size to
assumed-rank, this last extent may be read and must be -1 as specified
in the BIND(C) case in 18.5.3 point 5.
Set this value to -1, and update all the lowering code that was looking
for an undef defining op to identify assumed-size: much safer to
propagate and use semantic info here, the previous check actually did
not work if the array was used in an internal procedure (defining op not
visible anymore).
@clementval and @agozillon, I left assumed-size extent to zero in the
acc/omp bounds op as it was, please double check that is what you want
(I can imagine -1 may create troubles here, and 0 makes some sense as it
would lead to no data transfer).
This also allows removing special cases in UBOUND/LBOUND lowering.
Also disable allocation of cray pointee. This was never intended and
would now lead to crashes with the -1 value for assumed-size cray
pointee.
The added test case has a loop that is versioned, which has a use of the
loop in an if block after the loop. The current code replaces all uses
of the loop with the new version If, but only if the parent blocks
match. As far as I can see it should be safe to replace all the uses,
then construct the result for the If with op.op.
After the removal of the OpenMP early outlining MLIR pass in #67319, the
`EarlyOutliningInterface` stopped doing any useful work. It used to be
necessary to tie the name of the function from which a target region was
outlined to that new function, so it would be used when translating to
LLVM IR in place of the outlined function's name.
This is not necessary anymore, so this patch removes all references to
this interface and uses of the `omp.outline_parent_name` discardable
attribute in tests.
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
This commit changes the MLIR to LLVMIR export to also attach subprogram
debug attachements to function declarations.
This commit additonally fixes the two passes that produce subprograms to
not attach the "Definition" flag to function declarations. This
otherwise results in invalid LLVM IR.
This commit adds an optional distinct attribute parameter to the
DISubprogramAttr. This enables modeling of distinct subprograms, as
required for LLVM IR. This change is required to avoid accidential
uniquing of subprograms on functions that would lead to invalid LLVM IR
post export.