getElementType() was missing from Sequence and Vector types. Did a
replace of the obvious places getEleTy() was used for these two types
and updated to use this name instead.
Co-authored-by: Scott Manley <scmanley@nvidia.com>
Fix#112593 by adding support in lowering to concatenation with an
absent optional _assumed length_ dummy argument because:
1. Most compilers seem to support it (most likely by accident).
2. This actually makes the compiler codegen simpler. Codegen was going
out of its way to poke the LLVM optimizer bear by producing an undef
argument for the length.
I insist on the fact that no compiler support this with _explicit
length_ optional arguments and the executable will segfault and I would
discourage users from using that "feature" because runtime checks for
bad optional dereference will kick when used (For instance, "nagfor
-C=present" will produce an executable that abort with an error message
. Flang does not have such runtime check option so far).
Hence, I am not updating the Extensions.md document because this is not
something I think we should advertise.
Derived type results of BIND(C) function should be returned according
the the C ABI for returning the related C struct type.
This currently did not happen since the abstract-result pass was forcing
the Fortran ABI for all derived type results.
use the bind_c attribute that was added on call/func/dispatch in FIR to
prevent such rewrite in the abstract result pass, and update the
target-rewrite pass to deal with the struct return ABI.
So far, the target specific part of the target-rewrite is only
implemented for X86-64 according to the "System V Application Binary
Interface AMD64 v1", the other targets will hit a TODO, just like for
BIND(C), VALUE derived type arguments.
This intends to deal with #102113.
This is a re-land of #111678 with an extra commit to keep rewriting `type(c_ptr)`
results to `!fir.ref<none>` in the abstract result pass regardless of the ABIs.
Currently, we allow only one DIGlobalVariableExpressionAttr per global.
It is especially evident in import where we pick the first from the list
and ignore the rest. In contrast, LLVM allows multiple
DIGlobalVariableExpression to be attached to the global. They are needed
for correct working of things like DICommonBlock. This PR removes this
restriction in mlir. Changes are mostly mechanical. One thing on which I
went a bit back and forth was the representation inside GlobalOp. I
would be happy to change if there are better ways to do this.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Derived type results of BIND(C) function should be returned according
the the C ABI for returning the related C struct type.
This currently did not happen since the abstract-result pass was forcing
the Fortran ABI for all derived type results.
use the bind_c attribute that was added on call/func/dispatch in FIR to
prevent such rewrite in the abstract result pass, and update the
target-rewrite pass to deal with the struct return ABI.
So far, the target specific part of the target-rewrite is only
implemented for X86-64 according to the "System V Application Binary
Interface AMD64 v1", the other targets will hit a TODO, just like for
BIND(C), VALUE derived type arguments.
This intends to deal with
https://github.com/llvm/llvm-project/issues/102113.
With some restrictions, BIND(C) derived types can be converted to
compatible BIND(C) derived types.
Semantics already support this, but ConvertOp was missing the
conversion of such types.
Fixes https://github.com/llvm/llvm-project/issues/107783
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.
Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.
Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
Currently, it is not possible to distinguish between BIND(C) from
non-BIND(C) type bound procedure call at the FIR level.
This will be a problem when dealing with derived type BIND(C) function
where the ABI differ between BIND(C)/non-BIND(C) but the FIR signature
looks like the same at the FIR level.
Fix this by adding the Fortran procedure attributes to fir.distpatch,
and propagating it until the related fir.call is generated in
fir.dispatch codegen.
This PR adds LLVM [operand
bundle](https://llvm.org/docs/LangRef.html#operand-bundles) support to
MLIR LLVM dialect. It affects these 3 operations related to making
function calls: `llvm.call`, `llvm.invoke`, and `llvm.call_intrinsic`.
This PR adds two new parameters to each of the 3 operations. The first
parameter is a variadic operand `op_bundle_operands` that contains the
SSA values for operand bundles. The second parameter is a property
`op_bundle_tags` which holds an array of strings that represent the tags
of each operand bundle.
The new LLVM stack save/restore intrinsic operations are more convenient
than function calls because they do not add function declarations to the
module and therefore do not block the parallelisation of passes.
Furthermore they could be much more easily marked with memory effects
than function calls if that ever proved useful.
This builds on top of #107879.
Resolves#108016
Mostly NFC, I was bothered by the declaration that were always made even
if unsued, and I think using LLVM Ops is nicer anyway with regards to
side effects here.
```
func.func private @llvm.stacksave.p0() -> !fir.ref<i8>
func.func private @llvm.stackrestore.p0(!fir.ref<i8>)
```
There are other places in lowering that are using the calls instead of
the LLVM intrinsics, but I will deal with them another time (the issue
there is mostly to get the proper address space for the llvm.ptr type).
We're providing this as a negative signed value, so set the flag.
Currently doesn't make a difference, but will assert in the future.
Split out of https://github.com/llvm/llvm-project/pull/80309.
While experimenting with some more recent C++ features, I ran into
trouble with warnings from GCC 12.3.0 and 14.2.0. These warnings looked
legitimate, so I've tweaked the code to avoid them.
This PR adds initial debug support for derived type. It handles
`RecordType` and generates appropriate `DICompositeTypeAttr`. The
`TypeInfoOp` is used to get information about the parent and location of
the derived type.
We use `getTypeSizeAndAlignment` to get the size and alignment of the
components of the derived types. This function needed a few changes to
be suitable to be used here:
1. The `getTypeSizeAndAlignment` errored out on unsupported type which
would not work with incremental way we are building debug support. A new
variant of this function has been that returns an std::optional. The original
function has been renamed to `getTypeSizeAndAlignmentOrCrash` as it
will call `TODO()` for unsupported types.
2. The Character type was returning size of just element and not the
whole string which has been fixed.
The testcase checks for offsets of the components which had to be
hardcoded in the test. So the testcase is currently enabled on x86_64.
With this PR in place, this is how the debugging of derived types look
like:
```
type :: t_date
integer :: year, month, day
end type
type :: t_address
integer :: house_number
end type
type, extends(t_address) :: t_person
character(len=20) name
end type
type, extends(t_person) :: t_employee
type(t_date) :: hired_date
real :: monthly_salary
end type
type(t_employee) :: employee
(gdb) p employee
$1 = ( t_person = ( t_address = ( house_number = 1 ), name = 'John', ' ' <repeats 16 times> ), hired_date = ( year = 2020, month = 1, day = 20 ), monthly_salary = 3.1400001 )
```
This change addresses more "issues" as the one resolved in #71338.
Some targets (e.g. NVPTX) do not accept global names containing
`.`. In particular, the global variables created to represent
the runtime information of derived types use `.` in their names.
A derived type's descriptor object may be used in the device code,
e.g. to initialize a descriptor of a variable of this type.
Thus, the runtime type info objects may need to be compiled
for the device.
Moreover, at least the derived types' descriptor objects
may need to be registered (think of `omp declare target`)
for the host-device association so that the addendum pointer
can be properly mapped to the device for descriptors using
a derived type's descriptor as their addendum pointer.
The registration implies knowing the name of the global variable
in the device image so that proper host code can be created.
So it is better to name the globals the same way for the host
and the device.
CompilerGeneratedNamesConversion pass renames all uniqued globals
such that the special symbols (currently `.`) are replaced
with `X`. The pass is supposed to be run for the host and the device.
An option is added to FIR-to-LLVM conversion pass to indicate
whether the new pass has been run before or not. This setting
affects how the codegen computes the names of the derived types'
descriptors for FIR derived types.
fir::NameUniquer now allows `X` to be part of a name, because
the name deconstruction may be applied to the mangled names
after CompilerGeneratedNamesConversion pass.
Updated version of #102686. The issue was that in some rebox case the
addendum presence flag should be updated and not always taken from the
"from" box. This is the case when reboxing a fir.class to a fir.box that
doesn't require an addendum for example.
Open a new review since there is a bit of additional code in the CodeGen
part.
The extra field in the descriptor carries multiple information and
cannot be deducted anymore when doing a reboxing. This patch updates the
codegen to retrieve the extra field value from the inboc and set it in
the new box.
Currently, `%17 = fir.box_elesize %16 :
(!fir.class<!fir.ptr<!fir.type<_QFTt{a:i32,b:i32}>>>) -> i32`
is translated to
```
%4 = getelementptr { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %1, i32 0, i32 1
%5 = load i32, ptr %4, align 4
```
The type of the element size is `i64`. The load essentially truncates
the value and yields incorrect result in the big endian environment. The
problem occurs in the `storage_size` intrinsic on a polymorphic
variable.
#100690 introduces allocator registry with the ability to store
allocator index in the descriptor. This patch adds an attribute to
fir.embox and fircg.ext_embox to be able to set the allocator index
while populating the descriptor fields.
This patch enhances the descriptor with the ability to have specialized
allocator. The allocators are registered in a dedicated registry and the
index of the desired allocator is stored in the descriptor. The default
allocator, std::malloc, is registered at index 0.
In order to have this allocator index in the descriptor, the f18Addendum
field is repurposed to be able to hold the presence flag for the
addendum (lsb) and the allocator index.
Since this is a change in the semantic and name of the 7th field of the
descriptor, the CFI_VERSION is bumped to the date of the initial change.
This patch only adds the ability to have this features as part of the
descriptor but does not add specific allocator yet. CUDA fortran will be
the first user of this feature to allocate descriptor data in the
different type of device memory base on the CUDA attribute.
---------
Co-authored-by: Slava Zakharin <szakharin@nvidia.com>
fircg operations have xxxOffset members to give the operand index of
operand xxx. This is a bit weird when looking at usage (e.g.
`arrayCoor.shiftOffset` reads like it is shifting some offset). Rename
them to getXxxOperandIndex.
cg-rewrite runs regionDCE to get rid of the unused fir.shape/shift/slice
before codegen since those operations have no codegen.
I came across an issue where unreachable code would cause the pass to
fail with `error: loc(...): null operand found`.
It turns out `mlir::RegionDCE` does not work properly in presence of
unreachable code because it delete operations in reachable code that are
unused in reachable code, but still used in unreachable code (like the
constant in the added test case). It seems `mlir::RegionDCE` is always
run after `mlir::eraseUnreachableBlock` outside of this pass.
A solution could be to run `mlir::eraseUnreachableBlock` here or to try
modifying `mlir::RegionDCE`. But the current behavior may be
intentional, and both of these calls are actually quite expensive. For
instance, RegionDCE will does liveness analysis, and removes unused
block arguments, which is way more than what is needed here. I am not
very found of having this rather heavy transformation inside this pass
(they should be run after or before if they matter in the overall
pipeline).
Do a naïve backward deletion of the trivially dead operations instead.
It is cheaper, and works with unreachable code.
This change is in preparation of #97903, which adds extra checks for
materializations: it is now enforced that they produce an SSA value of
the correct type, so the current workaround no longer works.
The original workaround avoided target materializations by directly
returning the to-be-converted SSA value from the materialization
callback. This can be avoided by initializing the lowering patterns that
insert the materializations without a type converter. For
`cg::XEmboxOp`, the existing workaround that skips
`unrealized_conversion_cast` ops is still in place.
Also remove the lowering pattern for `unrealized_conversion_cast`. This
pattern has no effect because `unrealized_conversion_cast` ops that are
inserted by the dialect conversion framework are never matched by the
pattern driver.
This PR adds -mtune as a valid flang flag and passes the information
through to LLVM IR as an attribute on all functions. No specific
architecture optimizations are added at this time.
This change fixes the issue
https://github.com/llvm/llvm-project/issues/95977 due to commit
c0cba51981 inserting allocas after the
terminator op in the insertion block in the case where the block had
only a single operation, its terminator, in it. With this change, the
hoisted constant-sized allocas are placed at the front of the insertion
block, rather than right after the first operation in it.
This PR generates dwarf to extract the information about the arrays from
descriptor. The DWARF needs the offset of the fields like `lower_bound`
and `extent`. The getComponentOffset has been added to calculate
them which pushes the issue of host and target data size into
getDescFieldTypeModel.
As we use data layout now, some tests needed to be adjusted to have a
dummy data layout to avoid failure.
With this change in place, GDB is able show the assumed shape arrays
correctly.
subroutine ff(n, m, arr)
integer n, m
integer :: arr(:, :)
print *, arr
do i = 1, n
do j = 1, m
arr(j, i) = (i * 5) + j + 10
end do
end do
print *, arr
end subroutine ff
Breakpoint 1, ff (n=4, m=3, arr=...) at test1.f90:13
13 print *, arr
(gdb) p arr
$1 = ((6, 7, 8, 9) (11, 12, 13, 14) (16, 17, 18, 19))
(gdb) ptype arr
type = integer (4,3)
(gdb) c
Continuing.
6 7 8 9 11 12 13 14 16 17 18 19
Tablegen can automatically generate the pass constructor. Tablegen will
create a constructor for all of the pass options (not only the subset in
the old constructor), but the pass options seem unused anyway.
This pass does not require any modification to support alternative
top-level ops. It walks all operations in the module. Functions have
special handling (adding attributes, converting signatures) but this
wouldn't make sense for top level operations in general.
The pass constructor can be generated automatically by tablegen.
This pass is module-level and runs on all instances of target operations
inside of it and so does not need any modification to support
alternative top-level operations.
The frontend computes the necessary alignment for COMMON blocks but this
information is never carried over to the code generation and can lead to
segfault for COMMON block that requires a non default alignment.
This patch add an optional attribute on fir.global and carries over the
information.
With MLIR inlining (e.g. `flang-new -mmlir -inline-all=true`)
the current TBAA tags attachment is suboptimal, because
we may lose information about the callee's dummy arguments
(by bypassing fir.declare in AliasAnalysis::getSource).
This is a conservative first step to improve the situation.
This patch makes AddAliasTagsPass to account for fir.dummy_scope
hierarchy after MLIR inlining and use it to place the TBAA tags
into TBAA trees corresponding to different function scopes.
The pass uses special mode of AliasAnalysis to find the instantiation
point of a Fortran variable (a [hl]fir.decalre) when searching
for the source of a memory reference. In this mode, AliasAnalysis
will always stop at fir.declare operations that have dummy_scope
operands - there should not be a reason to past throught it
for the purpose of TBAA tags attachment.
The pass constructor can be generated automatically by tablegen.
The pass is module-level and iterates over every operation within the
module so it should not need any changes to support alternative top
level operations.
First commit is reviewed in
https://github.com/llvm/llvm-project/pull/93682.
Lower RANK using fir.box_rank. This patches updates fir.box_rank to
accept box reference, this avoids the need of generating an assumed-rank
fir.load just for the sake of reading ALLOCATABLE/POINTER rank. The
fir.load would generate a "dynamic" memcpy that is hard to optimize
without further knowledge. A read effect is conditionally given to the
operation.
- Update LLVM type conversion of assumed-rank fir.box/class to generate
the type of the maximum ranked descriptor. That way, alloca for assumed
rank descriptor copies are always big enough. This is needed in the
fir.load case that generates a new storage for the value
- Add a "computeBoxSize" helper to compute the dynamic size of a
descriptor.
- Use that size to generate an llvm.memcpy intrinsic to copy the input
descriptor into the new storage.
Looking at https://reviews.llvm.org/D108221?id=404635, it seems valid to
add the TBAA node on the memcpy, which I did.
In a further patch, I think we should likely always use a memcpy since
LLVM seems to have a better time optimizing it than fir.load/fir.store
patterns.
fir.box_rank codegen was invalid, it was assuming the rank field in the
descriptor was an i32. This is not correct. Do not hard code the type,
use the named position to find the type, and convert as needed in the
patterns.
`SelectCaseOp::getCompareOperands` may return an empty range for the
"default" case. Do not dereference the range until it is expected to be
non-empty.
This was detected by address-sanitizer.
This PR add debug info for module variables. The module variables are
added as global variables but their scope is set to module instead of
compile unit. The scope of function declared inside a module is also set
accordingly.
After this patch, a module variable could be evaluated in the GDB as `p
helper::gli` where helper is name of the module and gli is the name of
the variable. A future patch will add the import module functionality
which will remove the need to prefix the name with helper::.
The line number where is module is declared is a best guess at the
moment as this information is not part of the GlobalOp.
This is same as #90905 with an added fix. The issue was that we
generated variable info even when user asked for line-tables-only. This
caused llvm dwarf generation code to fail an assertion as it expected an
empty variable list.
Fixed by not generating debug info for variables when user wants only
line table. I also updated a test check for this case.