Commit Graph

1549 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
466b58ba38 [flang] Avoid generating duplicate symbol in comdat (#114472)
In case where a fir.global might be duplicated in an inner module
(gpu.module), the conversion pattern will be applied on the module and
the gpu module version of the global and try to generate multiple comdat
with the same symbol name. This is what we have in the implementation of
CUDA Fortran.

Just check for the presence of the `ComdatSelectorOp` before creating a
new one.
2024-10-31 18:59:04 -07:00
Valentin Clement (バレンタイン クレメン)
067ce5ca18 [flang][cuda] Use getOrCreateGPUModule in CUFDeviceGlobal pass (#114468)
Make the pass functional if gpu module was not created yet.
2024-10-31 18:58:43 -07:00
Valentin Clement (バレンタイン クレメン)
e4e9fea71e [flang][cuda] Pass descriptor by reference for CUFMemsetDescriptor (#114338) 2024-10-31 09:02:59 -07:00
Renaud Kauffmann
423f35410a [flang][cuda] Adding support for registration of boxes (#114323)
Needed to take into account that `fir::getTypeSizeAndAlignmentOrCrash`
does not work with box types but requires the `fir::LLVMTypeConverter`
2024-10-31 08:39:08 -07:00
Renaud Kauffmann
bfe486fe76 Passing descriptors by reference to CUDA runtime calls (#114288)
Passing a descriptor as a `const Descriptor &` or a `const Descriptor *`
generates a FIR signature where the box is passed by value.
This is an issue, as it requires a load of the box to be passed. But
since, ultimately, all boxes are passed by reference a temporary is
generated in LLVM and the reference to the temporary is passed.

The boxes addresses are registered with the CUDA runtime but the
temporaries are not, thus preventing the runtime to properly map a host
side address to its device side counterpart.

To address this issue, this PR changes the signatures to the transfer
functions to pass a descriptor as a `Descriptor *`, which will in turn
generate a FIR signature with that takes a box reference as an argument.
2024-10-30 13:24:47 -07:00
Asher Mancinelli
0c9a02355a [flang][fir] always use memcpy for fir.box (#113949)
@jeanPerier explained the importance of converting box loads and stores
into `memcpy`s instead of aggregate loads and stores, and I'll do my
best to explain it here.

* [(godbolt link) Example comparing opt transformations on memcpys vs
aggregate load/stores](https://godbolt.org/z/be7xM83cG)
* LLVM can more effectively reason about memcpys compared to aggregate
load/stores.
* This came up when others were discussing array descriptors for
assumed-rank arrays passed to `bind(c)` subroutines, with the
implication that the array descriptors are known to have lower bounds of
1 and that they are not pointer/allocatable types.
* [(godbolt link) Clang also uses memcpys so we should probably follow
them, assuming the clang developers are generatign what they know Opt
will handle more effectively.](https://godbolt.org/z/YT4x7387W)
* This currently may not help much without the `nocapture` attribute
being propagated to function calls, but [it looks like someone may do
this soon (discourse
link)](https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401/23)
or I can do this in a follow-up patch.

Note on test `flang/test/Fir/embox-char.fir`: it looks like the original
test was auto-generated. I wasn't too sure which parts were especially
important to test, so I regenerated the test. If we want the updated
version to look more like the old version, I'll make those changes.
2024-10-30 09:50:27 -07:00
vdonaldson
8d406d882d [flang] IEEE_REAL (#113948)
IEEE_REAL converts an integer or real argument to a real of a given
kind.
2024-10-30 09:56:42 -04:00
Abid Qadeer
652988b658 [flang][debug] Support TupleType. (#113917)
Handling is similar to RecordType with following differences:

1. No check for cyclic references
2. No extra processing for lower bounds of array members.
3. No line information as TupleType is a lowering artefact and does not
really represent an entity in the code.
2024-10-30 09:52:56 +00:00
Valentin Clement (バレンタイン クレメン)
0d94c7b5ce [flang][cuda][NFC] Make pattern names homogenous (#114156)
Dialect name is uppercase. Make all the patterns prefix homogenous.
2024-10-29 20:39:17 -07:00
Valentin Clement (バレンタイン クレメン)
0fa2fb3ed0 [flang][cuda] Add conversion pattern for cuf.kernel_launch op (#114129) 2024-10-29 17:00:41 -07:00
Renaud Kauffmann
b9978f8c77 [flang][cuda] Adding variable registration in constructor (#113976)
1) Adding variable registration in constructor
2) Applying feedback from PR
https://github.com/llvm/llvm-project/pull/112989
2024-10-29 11:48:48 -07:00
Valentin Clement (バレンタイン クレメン)
b05fec97d5 [flang][cuda] Convert gpu.launch_func to CUFLaunchClusterKernel when cluster dims are present (#113959)
Kernel launch in CUF are converted to `gpu.launch_func`. When the kernel
has `cluster_dims` specified these get carried over to the
`gpu.launch_func` operation. This patch updates the special conversion
of `gpu.launch_func` when cluster dims are present to the newly added
entry point.
2024-10-29 10:02:08 -07:00
Abid Qadeer
8239ea3918 [flang][debug] Support IndexType. (#113921) 2024-10-29 12:22:43 +00:00
Renaud Kauffmann
0eb5c9d2ef [flang][cuda] Copying device globals in the gpu module (#113955) 2024-10-28 15:34:27 -07:00
Yusuke MINATO
bd6ab32e6e Revert "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#113901)
Reverts llvm/llvm-project#110063 due to the performance regression on
503.bwaves_r in SPEC2017.
2024-10-28 14:19:20 +00:00
jeanPerier
64d7e45c40 Revert "[flang][debug] Support mlir::NoneType." (#113769)
Reverts llvm/llvm-project#113550

It turns out this causes compiler crashes with assumed-type arrays and -g.
See https://github.com/llvm/llvm-project/pull/113769 for a reproducer.
2024-10-26 21:38:54 +02:00
Renaud Kauffmann
3acf856b50 Adding CUFCommon.{h,cpp} for CUF utilities (#113740) 2024-10-25 16:08:45 -07:00
Abid Qadeer
85af1926f7 [flang][debug] Support mlir::NoneType. (#113550) 2024-10-25 11:43:25 +01:00
Yusuke MINATO
96bb375f5c [flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv (#110063)
nsw is now added to do-variable increment when -fno-wrapv is enabled as
GFortran seems to do.
That means the option introduced by #91579 isn't necessary any more.

Note that the feature of -flang-experimental-integer-overflow is enabled
by default.
2024-10-25 15:20:23 +09:00
Abid Qadeer
37832d5de2 [flang][debug] Support fir.vector type. (#112951)
This PR converts the `fir.vector<>` to
`DICompositeTypeAttr(DW_TAG_array_type)` with `vector` flag set.
2024-10-24 13:37:32 +01:00
Abid Qadeer
47c1abf4af [flang][debug] Fix array lower bounds in derived type members. (#113183)
The lower bound information for the array members of a derived type
can't be obtained from the `DeclareOp`. It has to be extracted from the
`TypeInfoOp`. That was left as FIXME in the code. This PR adds the
missing functionality to fix the issue.

I tried the following approaches before settling on the current one that
is to generate `DITypeAttr` for array members right where the components
are being processed.

1. Generate a temp XDeclareOp with the shift information obtained from
the `TypeInfoOp`. This caused a few issues mostly related to
`unrealized_conversion_cast`.

2. Change the shift operands in the `declOp` that was passed in the
function before calling `convertType`. The code can be seen in the
abcf031a8e5a02f0081e7f293858302e7bf47bec. It essentially looked like the
following. It works correctly but I was not sure if temporarily changing
the `declOp` is the safe thing to do.

```
mlir::OperandRange originalShift = declOp.getShift();
mlir::MutableOperandRange mutableOpRange = declOp.getShiftMutable();
mutableOpRange.assign(shiftOpers);
elemTy = convertType(fieldTy, fileAttr, scope, declOp);
mutableOpRange.assign(originalShift);
```

Fixes #113178.
2024-10-24 13:22:28 +01:00
Abid Qadeer
c07abf7272 [flang][debug] Support fir::ReferenceType. (#113480) 2024-10-24 11:38:17 +01:00
Valentin Clement (バレンタイン クレメン)
4e40b71c51 [flang][cuda] Add specialized gpu.launch_func conversion (#113493) 2024-10-23 15:28:51 -07:00
Valentin Clement (バレンタイン クレメン)
60105ac6ba [flang][cuda] Fix kernel registration (#113372)
The registration needs the fct pointer and the name. This patch updates
the entry point with an extra arg and the translation as well.
2024-10-23 11:25:58 -07:00
jeanPerier
a59f712434 [flang][hlfir] do not consider local temps as conflicting in assignment (#113330)
Last patch required to avoid creating a temporary for the LHS when
dealing with `x([a,b]) = y`.

The code dealing with "ordered assignments" (where, forall, user and
vector subscripted assignments) is saving the evaluated RHS/LHS and
masks if they have write effects because this write effects should not
be evaluated when they affect entities that may be written to in other
contexts after the evaluation and before the re-evaluation.

But when dealing with write to storage allocated in the region for the
expression being evluated, there is no problem to re-evaluate the write:
it has no effect outside of the expression evaluation that owns the
allocation.

In the case of `x([a,b]) = y`, the temporary is created for the vector
subscript. Raising the HLFIR abstraction for simple array constructors
may be a good idea, but local temps are created in other contexts, so
this fix is more generic.
2024-10-23 12:34:13 +02:00
jeanPerier
d89c1dbaf5 [flang][hlfir] refine hlfir.assign side effects (#113319)
hlfir.assign currently has the `MemoryEffects<[MemWrite]` which makes it
look like it can write to anything. This is good for some cases where
the assign effect cannot be precisely described through the MLIR side
effect API (e.g., when the LHS is a descriptor and it is not possible to
get an OpOperand describing the data address, or when derived type are
involved and finalization could be called, or user defined assignment
for some components). For the most common case of hlfir.assign on
intrinsic types without whole allocatable LHS, this is pessimistic.

This patch implements a finer description of the side effects when
possible, and also adds the proper read/allocate/free effects when
relevant.

The ultimate goal is to suppress the generation of temporary for the LHS
address when dealing with an assignment to a vector subscripted LHS
where the vector subscript is an array constructor that does not refer
to the LHS (as in `x([a,b]) = y`).

Two more patches will follow to enable this.
2024-10-23 12:33:14 +02:00
Renaud Kauffmann
f1e59dcb45 Renaming Cuf passes to CUF (#113351)
For consistency with other dialects and other CUF passes and files, this
patch renames passes CufOpConversion to CUFOpConversion,
CufImplicitDeviceGlobal to CUFDeviceGlobal.
It also renames the file.
2024-10-22 12:50:31 -07:00
Abid Qadeer
95b4128c6a [flang][debug] Don't generate debug for compiler-generated variables (#112423)
Flang generates many globals to handle derived types. There was a check
in debug info to filter them based on the information that their names
start with a period. This changed since PR#104859 where 'X' is being
used instead of '.'.

This PR fixes this issue by also adding 'X' in that list. As user
variables gets lower cased by the NameUniquer, there is no risk that
those will be filtered out. I added a test for that to be sure.
2024-10-21 11:27:34 +01:00
Pranav Bhandarkar
11dad2fa51 [flang][OpenMP] - Add MapInfoOp instances for target private variables when needed (#109862)
This PR adds an OpenMP dialect related pass for FIR/HLFIR which creates
`MapInfoOp` instances for certain privatized symbols. For example, if an
allocatable variable is used in a private clause attached to a
`omp.target` op, then the allocatable variable's descriptor will be
needed on the device (e.g. GPU). This descriptor needs to be separately
mapped onto the device. This pass creates the necessary `omp.map.info`
ops for this.
2024-10-20 01:01:39 -05:00
Valentin Clement (バレンタイン クレメン)
d37bc32a65 [flang][cuda] Translate cuf.register_kernel and cuf.register_module (#112972)
Add LLVM IR Translation for `cuf.register_module` and
`cuf.register_kernel`. These are lowered to function call to the CUF
runtime entries.
2024-10-18 21:31:47 -07:00
Valentin Clement (バレンタイン クレメン)
5406834cda [flang][cuda] Add cuf.register_module operation (#112971)
Add a new operation to register the fatbin and pass it to
`cuf.register_kernel`
2024-10-18 21:30:38 -07:00
Renaud Kauffmann
864902e9b4 [flang][cuda] Call CUFGetDeviceAddress to get global device address from host address (#112989) 2024-10-18 17:35:38 -07:00
Jay Foad
922992a22f Fix typo "instrinsic" (#112899) 2024-10-18 15:58:33 +01:00
Scott Manley
e6a4346b5a [flang] add getElementType() to fir::SquenceType and fir::VectorType (#112770)
getElementType() was missing from Sequence and Vector types. Did a
replace of the obvious places getEleTy() was used for these two types
and updated to use this name instead.

Co-authored-by: Scott Manley <scmanley@nvidia.com>
2024-10-18 09:29:25 +02:00
Valentin Clement (バレンタイン クレメン)
834d001e10 [flang][cuda] Relax the verifier for cuf.register_kernel op (#112585)
Relax the verifier since the `gpu.func` might be converted to
`llvm.func` before `cuf.register_kernel` is converted.
2024-10-17 08:30:13 -07:00
jeanPerier
2f0b4f43fc [flang][extension] support concatenation with absent optional (#112678)
Fix #112593 by adding support in lowering to concatenation with an
absent optional _assumed length_ dummy argument because:
1. Most compilers seem to support it (most likely by accident).
2. This actually makes the compiler codegen simpler. Codegen was going
out of its way to poke the LLVM optimizer bear by producing an undef
argument for the length.

I insist on the fact that no compiler support this with _explicit
length_ optional arguments and the executable will segfault and I would
discourage users from using that "feature" because runtime checks for
bad optional dereference will kick when used (For instance, "nagfor
-C=present" will produce an executable that abort with an error message
. Flang does not have such runtime check option so far).

Hence, I am not updating the Extensions.md document because this is not
something I think we should advertise.
2024-10-17 13:25:09 +02:00
Valentin Clement (バレンタイン クレメン)
85880140be [flang][cuda] Add kernel registration in CUF constructor (#112416)
Update the CUF constructor with the cuf.register_kernel operations.
2024-10-15 14:18:37 -07:00
Valentin Clement (バレンタイン クレメン)
7e72e5ba86 Reland '[flang][cuda] Add cuf.register_kernel operation' (#112389)
The operation will be used in the CUF constructor to register the kernel
functions. This allow to delay this until codegen when the gpu.binary
will be available.

Reland of #112268 with correct shared library build support.
2024-10-15 11:12:03 -07:00
Joel E. Denny
732353303e [flang] AliasAnalysis: Fix pointer component logic (#94242)
This PR applies the changes discussed in [[RFC] Rationale for Flang
AliasAnalysis pointer component
logic](https://discourse.llvm.org/t/rfc-rationale-for-flang-aliasanalysis-pointer-component-logic/79252).

In summary, this PR replaces the existing pointer component logic in
Flang's AliasAnalysis implementation. That logic focuses on aliasing
between pointers and non-pointer, non-target composites that have
pointer components. However, it is more conservative than necessary, and
some existing tests expect its current results when less conservative
results seem reasonable.

This PR splits the logic into two cases:

1. Source values are the same: Return MayAlias when one value is the
address of a composite, and the other value is statically the address of
a pointer component of that composite.
2. Source values are different: Return MayAlias when one value is the
address of a composite (actual argument), and the other value is the
address of a pointer (dummy arg) that might dynamically be a component
of that composite.

In both cases, the actual implementation is still more conservative than
described above, but it can be improved further later. Details appear in
the comments.

Additionally, this PR revises the logic that reports MayAlias for a
pointer/target vs. another pointer/target. It constrains the existing
logic to handle only isData cases, and it adds less conservative
handling of !isData cases elsewhere. First, it extends case 2 listed
above to cover the case where the actual argument is the address of a
pointer rather than a composite. Second, it adds a third case: where
target attributes enable aliasing with a dummy argument.
2024-10-15 10:07:17 -04:00
Valentin Clement (バレンタイン クレメン)
2a68f82989 Revert "[flang][cuda] Add cuf.register_kernel operation" (#112306)
Reverts llvm/llvm-project#112268
2024-10-14 21:06:59 -07:00
Valentin Clement (バレンタイン クレメン)
cbe76a2ac3 [flang][cuda] Add cuf.register_kernel operation (#112268)
The operation will be used in the CUF constructor to register the kernel
functions. This allow to delay this until codegen when the gpu.binary
will be available.
2024-10-14 20:57:21 -07:00
Tarun Prabhu
839344f025 [clang][flang][mlir] Reapply "Support -frecord-command-line option (#102975)"
The underlying issue was caused by a file included in two different
places which resulted in duplicate definition errors when linking
individual shared libraries. This was fixed in c3201ddaea
[#109874].
2024-10-14 08:44:24 -06:00
jeanPerier
367c3c968e [flang] correctly deal with bind(c) derived type result ABI (#111969)
Derived type results of BIND(C) function should be returned according
the the C ABI for returning the related C struct type.

This currently did not happen since the abstract-result pass was forcing
the Fortran ABI for all derived type results.
use the bind_c attribute that was added on call/func/dispatch in FIR to
prevent such rewrite in the abstract result pass, and update the
target-rewrite pass to deal with the struct return ABI.

So far, the target specific part of the target-rewrite is only
implemented for X86-64 according to the "System V Application Binary
Interface AMD64 v1", the other targets will hit a TODO, just like for
BIND(C), VALUE derived type arguments.

This intends to deal with #102113.

This is a re-land of #111678 with an extra commit to keep rewriting `type(c_ptr)`
results to `!fir.ref<none>` in the abstract result pass regardless of the ABIs.
2024-10-14 09:35:29 +02:00
Dominik Adamski
102f76b2d7 [Flang][AliasAnalysis] Alias analysis for tmp arrays (#111972)
This patch extends the alias analysis for temporary arrays in Flang.
With this extension, Flang can now determine that the temporary array
[a, b, c] does not alias with arrayD in Fortran code:
```
  integer :: a, b, c
  integer :: arrayD(3)
  arrayD = [ a, b, c ]
```
2024-10-14 09:25:14 +02:00
Abid Qadeer
cd12ffb622 [mlir][debug] Allow multiple DIGlobalVariableExpression on globals. (#111981)
Currently, we allow only one DIGlobalVariableExpressionAttr per global.
It is especially evident in import where we pick the first from the list
and ignore the rest. In contrast, LLVM allows multiple
DIGlobalVariableExpression to be attached to the global. They are needed
for correct working of things like DICommonBlock. This PR removes this
restriction in mlir. Changes are mostly mechanical. One thing on which I
went a bit back and forth was the representation inside GlobalOp. I
would be happy to change if there are better ways to do this.

---------

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
2024-10-13 23:36:00 +01:00
donald chen
4b3f251bad [mlir] [dataflow] unify semantics of program point (#110344)
The concept of a 'program point' in the original data flow framework is
ambiguous. It can refer to either an operation or a block itself. This
representation has different interpretations in forward and backward
data-flow analysis. In forward data-flow analysis, the program point of
an operation represents the state after the operation, while in backward
data flow analysis, it represents the state before the operation. When
using forward or backward data-flow analysis, it is crucial to carefully
handle this distinction to ensure correctness.

This patch refactors the definition of program point, unifying the
interpretation of program points in both forward and backward data-flow
analysis.

How to integrate this patch?

For dense forward data-flow analysis and other analysis (except dense
backward data-flow analysis), the program point corresponding to the
original operation can be obtained by `getProgramPointAfter(op)`, and
the program point corresponding to the original block can be obtained by
`getProgramPointBefore(block)`.

For dense backward data-flow analysis, the program point corresponding
to the original operation can be obtained by
`getProgramPointBefore(op)`, and the program point corresponding to the
original block can be obtained by `getProgramPointAfter(block)`.

NOTE: If you need to get the lattice of other data-flow analyses in
dense backward data-flow analysis, you should still use the dense
forward data-flow approach. For example, to get the Executable state of
a block in dense backward data-flow analysis and add the dependency of
the current operation, you should write:

``getOrCreateFor<Executable>(getProgramPointBefore(op),
getProgramPointBefore(block))``

In case above, we use getProgramPointBefore(op) because the analysis we
rely on is dense backward data-flow, and we use
getProgramPointBefore(block) because the lattice we query is the result
of a non-dense backward data flow computation.

related dsscussion:
https://discourse.llvm.org/t/rfc-unify-the-semantics-of-program-points/80671/8
corresponding PSA:
https://discourse.llvm.org/t/psa-program-point-semantics-change/81479
2024-10-11 21:59:05 +08:00
Dominik Adamski
73ad416ebf [OpenMP][Flang] Enable alias analysis inside omp target region (#111670)
At present, alias analysis does not work for operations inside OMP
target regions because the FIR declare operations within OMP target do
not offer sufficient information for alias analysis. Consequently, it is
necessary to examine the FIR code outside the OMP target region.
2024-10-11 11:53:28 +02:00
jeanPerier
4ddc756bcc Revert "[flang] correctly deal with bind(c) derived type result ABI" (#111858)
Reverts llvm/llvm-project#111678

Causes ARM failure in test suite. TYPE(C_PTR) result should not regress
even if struct ABI no implemented for the target.
https://lab.llvm.org/buildbot/#/builders/143/builds/2731
I need to revisit this.
2024-10-10 17:25:57 +02:00
jeanPerier
480e7f0667 [flang] correctly deal with bind(c) derived type result ABI (#111678)
Derived type results of BIND(C) function should be returned according
the the C ABI for returning the related C struct type.

This currently did not happen since the abstract-result pass was forcing
the Fortran ABI for all derived type results.
use the bind_c attribute that was added on call/func/dispatch in FIR to
prevent such rewrite in the abstract result pass, and update the
target-rewrite pass to deal with the struct return ABI.

So far, the target specific part of the target-rewrite is only
implemented for X86-64 according to the "System V Application Binary
Interface AMD64 v1", the other targets will hit a TODO, just like for
BIND(C), VALUE derived type arguments.

This intends to deal with
https://github.com/llvm/llvm-project/issues/102113.
2024-10-10 15:37:19 +02:00
Tarun Prabhu
4605ba0437 [flang] Link libflangPasses against correct libraries
libflangPasses.so was not linked against the correct libraries which
caused a build failure with -DBUILD_SHARED_LIBS=On. Fixes #110425
2024-10-09 13:09:17 -06:00