Commit Graph

368 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
60105ac6ba [flang][cuda] Fix kernel registration (#113372)
The registration needs the fct pointer and the name. This patch updates
the entry point with an extra arg and the translation as well.
2024-10-23 11:25:58 -07:00
Valentin Clement (バレンタイン クレメン)
d37bc32a65 [flang][cuda] Translate cuf.register_kernel and cuf.register_module (#112972)
Add LLVM IR Translation for `cuf.register_module` and
`cuf.register_kernel`. These are lowered to function call to the CUF
runtime entries.
2024-10-18 21:31:47 -07:00
Scott Manley
e6a4346b5a [flang] add getElementType() to fir::SquenceType and fir::VectorType (#112770)
getElementType() was missing from Sequence and Vector types. Did a
replace of the obvious places getEleTy() was used for these two types
and updated to use this name instead.

Co-authored-by: Scott Manley <scmanley@nvidia.com>
2024-10-18 09:29:25 +02:00
Valentin Clement (バレンタイン クレメン)
834d001e10 [flang][cuda] Relax the verifier for cuf.register_kernel op (#112585)
Relax the verifier since the `gpu.func` might be converted to
`llvm.func` before `cuf.register_kernel` is converted.
2024-10-17 08:30:13 -07:00
Valentin Clement (バレンタイン クレメン)
7e72e5ba86 Reland '[flang][cuda] Add cuf.register_kernel operation' (#112389)
The operation will be used in the CUF constructor to register the kernel
functions. This allow to delay this until codegen when the gpu.binary
will be available.

Reland of #112268 with correct shared library build support.
2024-10-15 11:12:03 -07:00
Valentin Clement (バレンタイン クレメン)
2a68f82989 Revert "[flang][cuda] Add cuf.register_kernel operation" (#112306)
Reverts llvm/llvm-project#112268
2024-10-14 21:06:59 -07:00
Valentin Clement (バレンタイン クレメン)
cbe76a2ac3 [flang][cuda] Add cuf.register_kernel operation (#112268)
The operation will be used in the CUF constructor to register the kernel
functions. This allow to delay this until codegen when the gpu.binary
will be available.
2024-10-14 20:57:21 -07:00
Tarun Prabhu
839344f025 [clang][flang][mlir] Reapply "Support -frecord-command-line option (#102975)"
The underlying issue was caused by a file included in two different
places which resulted in duplicate definition errors when linking
individual shared libraries. This was fixed in c3201ddaea
[#109874].
2024-10-14 08:44:24 -06:00
Leandro Lupori
390943f25b [flang] Implement conversion of compatible derived types (#111165)
With some restrictions, BIND(C) derived types can be converted to
compatible BIND(C) derived types.
Semantics already support this, but ConvertOp was missing the
conversion of such types.

Fixes https://github.com/llvm/llvm-project/issues/107783
2024-10-09 10:37:46 -03:00
jeanPerier
1753de2d95 [flang][FIR] remove fir.complex type and its fir.real element type (#111025)
Final patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292

Since fir.real was only still used as fir.complex element type, this
patch removes it at the same time.
2024-10-04 09:57:03 +02:00
jeanPerier
c4204c0b29 [flang] replace fir.complex usages with mlir complex (#110850)
Core patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292.
After that, the last step is to remove fir.complex from FIR types.
2024-10-03 17:10:57 +02:00
jeanPerier
c2601f1769 [flang][NFC] remove unused fir.constc operation (#110821)
As part of [RFC to replace fir.complex usages by mlir.complex
type](https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292).

fir.constc is unused so instead of porting it, just remove it.
Complex constants are currently created with inserts in lowering
already. When using mlir complex, we may just want to start using
[complex.constant](4f6ad17adc/mlir/include/mlir/Dialect/Complex/IR/ComplexOps.td (L131C5-L131C16)).
2024-10-02 16:16:57 +02:00
David Spickett
737c414e1d Revert "[clang][flang][mlir] Support -frecord-command-line option (#102975)"
This reverts commit b3533a156d.

It caused test failures in shared library builds:
https://lab.llvm.org/buildbot/#/builders/80/builds/3854
2024-09-20 11:30:50 +00:00
Tarun Prabhu
b3533a156d [clang][flang][mlir] Support -frecord-command-line option (#102975)
Add support for the -frecord-command-line option that will produce the
llvm.commandline metadata which will eventually be saved in the object
file. This behavior is also supported in clang. Some refactoring of the
code in flang to handle these command line options was carried out. The
corresponding -grecord-command-line option which saves the command line
in the debug information has not yet been enabled for flang.
2024-09-19 18:28:50 -06:00
Youngsuk Kim
84d7f294c4 [flang] Tidy uses of raw_string_ostream (NFC)
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly

( 65b13610a5 for further reference )

Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
2024-09-18 13:26:29 -05:00
Valentin Clement (バレンタイン クレメン)
bc54e5636f [flang][cuda] Add new entry points function for data transfer (#108244)
Add new entry points for more complex data transfer involving
descriptors. These functions will be called when converting
`cuf.data_transfer` operations.
2024-09-16 09:45:44 -07:00
jeanPerier
b65fc7e91a [flang][fir] allow fir.convert from and to !llvm.ptr type (#106590)
Allow some interaction between LLVM and FIR dialect by allowing
conversion between FIR memory types and llvm.ptr type.
This is meant to help experimentation where FIR and LLVM dialect
coexists, and is useful to deal with cases where LLVM type makes it
early into the MLIR produced by flang, like when inserting LLVM stack
intrinsic here:
0a00d32c5f/flang/lib/Optimizer/Transforms/StackReclaim.cpp (L57)
2024-08-30 08:20:17 +02:00
Valentin Clement (バレンタイン クレメン)
900cd62758 [flang][cuda] Simplify data transfer when possible (#106120)
When possible, avoid using descriptors and use the reference and the
shape for data_transfer.
2024-08-27 10:03:15 -07:00
Abid Qadeer
d07dc73bcf [flang][debug] Support derived types. (#99476)
This PR adds initial debug support for derived type. It handles
`RecordType` and generates appropriate `DICompositeTypeAttr`. The
`TypeInfoOp` is used to get information about the parent and location of
the derived type.

We use `getTypeSizeAndAlignment` to get the size and alignment of the
components of the derived types. This function needed a few changes to
be suitable to be used here:

1. The `getTypeSizeAndAlignment` errored out on unsupported type which
would not work with incremental way we are building debug support. A new
variant of this function has been that returns an std::optional. The original
function has been renamed to `getTypeSizeAndAlignmentOrCrash` as it
will call `TODO()` for unsupported types.

2. The Character type was returning size of just element and not the
whole string which has been fixed.

The testcase checks for offsets of the components which had to be
hardcoded in the test. So the testcase is currently enabled on x86_64.

With this PR in place, this is how the debugging of derived types look
like:

```
type :: t_date
    integer :: year, month, day
  end type

  type :: t_address
    integer :: house_number
  end type
  type, extends(t_address) :: t_person
    character(len=20) name
  end type
  type, extends(t_person)  :: t_employee
    type(t_date) :: hired_date
    real :: monthly_salary
  end type
  type(t_employee) :: employee

(gdb) p employee
$1 = ( t_person = ( t_address = ( house_number = 1 ), name = 'John', ' ' <repeats 16 times> ), hired_date = ( year = 2020, month = 1, day = 20 ), monthly_salary = 3.1400001 )
```
2024-08-27 10:30:49 +01:00
Valentin Clement (バレンタイン クレメン)
7af61d5cf4 [flang][cuda] Add shape to cuf.data_transfer operation (#104631)
When doing data transfer with dynamic sized array, we are currently
generating a data transfer between two descriptors. If the shape values
can be provided, we can keep the data transfer between two references.
This patch adds the shape operands to the operation.

This will be exploited in lowering in a follow up patch.
2024-08-26 09:50:17 -07:00
jeanPerier
2051a7bcd3 [flang][NFC] turn fir.call is_bind_c into enum for procedure flags (#105691)
First patch to fix a BIND(C) ABI issue
(https://github.com/llvm/llvm-project/issues/102113). I need to keep
track of BIND(C) in more locations (fir.dispatch and func.func
operations), and I need to fix a few passes that are dropping the
attribute on the floor. Since I expect more procedure attributes that
cannot be reflected in mlir::FunctionType will be needed for ABI,
optimizations, or debug info, this NFC patch adds a new enum attribute
to keep track of procedure attributes in the IR.

This patch is not updating lowering to lower more attributes, this will
be done in a separate patch to keep the test changes low here.

Adding the attribute on fir.dispatch and func.func will also be done in
separate patches.
2024-08-23 14:32:43 +02:00
Tarun Prabhu
90aac06c7f [flang][mlir] Add llvm.ident metadata when compiling with flang
This brings the behavior of flang in line with clang which also adds
this metadata unconditionally.

Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
2024-08-12 11:56:19 -06:00
Valentin Clement (バレンタイン クレメン)
0ee0eeb4bb [flang] Enhance location information (#95862)
Add inclusion location information by using FusedLocation with
attribute.

More context here:
https://discourse.llvm.org/t/rfc-enhancing-location-information/79650
2024-07-23 09:49:17 -07:00
Valentin Clement (バレンタイン クレメン)
33cb29cc3e [flang][cuda] Use cuf.alloc/cuf.free for local descriptor (#98518)
Local descriptor for cuda allocatable need to be handled on host and
device. One solution is to duplicate the descriptor (one on the host and
one on the device) and keep them in sync or have the descriptor in
managed/unified memory so we don't to take care of any sync.
The second solution is probably the one we will implement. In order to
have more flexibility on how descriptor representing cuda allocatable
are allocated, this patch updates the lowering to use the cuf operations
alloc and free to managed them.
2024-07-17 13:52:36 -07:00
jeanPerier
31087c5e4c [flang] handle alloca outside of entry blocks in MemoryAllocation (#98457)
This patch generalizes the MemoryAllocation pass (alloca -> heap) to
handle fir.alloca regardless of their postion in the IR. Currently, it
only dealt with fir.alloca in function entry blocks. The logic is placed
in a utility that can be used to replace alloca in an operation on
demand to whatever kind of allocation the utility user wants via
callbacks (allocmem, or custom runtime calls to instrument the code...).

To do so, a concept of ownership, that was already implied a bit and
used in passes like stack-reclaim, is formalized. Any operation with the
LoopLikeInterface, AutomaticAllocationScope, or IsolatedFromAbove owns
the alloca directly nested inside its regions, and they must not be used
after the operation.

The pass then looks for the exit points of region with such interface,
and use that to insert deallocation. If dominance is not proved, the
pass fallbacks to storing the new address into a C pointer variable
created in the entry of the owning region which allows inserting
deallocation as needed, included near the alloca itself to avoid leaks
when the alloca is executed multiple times due to block CFGs loops.

This should fix https://github.com/llvm/llvm-project/issues/88344.

In a next step, I will try to refactor lowering a bit to introduce
lifetime operation for alloca so that the deallocation points can be
inserted as soon as possible.
2024-07-17 09:15:47 +02:00
Alexis Perry-Holby
f1d3fe7aae Add basic -mtune support (#98517)
Initial implementation for the -mtune flag in Flang.

This PR is a clean version of PR #96688, which is a re-land of PR #95043
2024-07-16 16:48:24 +01:00
jeanPerier
66d5ca2a3d Reland "[flang] add extra component information in fir.type_info" (#97404)
Reland #96746 with the proper Support/CMakelist.txt change.

fir.type does not contain all Fortran level information about
components. For instance, component lower bounds and default initial
value are lost. For correctness purpose, this does not matter because
this information is "applied" in lowering (e.g., when addressing the
components, the lower bounds are reflected in the hlfir.designate).

However, this "loss" of information will prevent the generation of
correct debug info for the type (needs to know about lower bounds). The
initial value could help building some optimization pass to get rid of
initialization runtime calls.

This patch adds lower bound and initial value information into
fir.type_info via a new fir.dt_component operation. This operation is
generated only for component that needs it, which helps keeping the IR
small for "boring" types.

In general, adding Fortran level info in fir.type_info will allow
delaying the generation of "type descriptors" gobals that are very
verbose in FIR and make it hard to work with FIR dumps from applications
with many derived types.
2024-07-02 15:19:49 +02:00
Ramkumar Ramachandra
db791b278a mlir/LogicalResult: move into llvm (#97309)
This patch is part of a project to move the Presburger library into
LLVM.
2024-07-02 10:42:33 +01:00
jeanPerier
6a66b8224d Revert "[flang] add extra component information in fir.type_info" (#96937)
Reverts llvm/llvm-project#96746
Breaking shared library buillds:
https://lab.llvm.org/buildbot/#/builders/89/builds/931
2024-06-27 19:22:48 +02:00
jeanPerier
1448ed2000 [flang] add extra component information in fir.type_info (#96746)
fir.type does not contain all Fortran level information about
components. For instance, component lower bounds and default initial
value are lost. For correctness purpose, this does not matter because
this information is "applied" in lowering (e.g., when addressing the
components, the lower bounds are reflected in the hlfir.designate).

However, this "loss" of information will prevent the generation of
correct debug info for the type (needs to know about lower bounds). The
initial value could help building some optimization pass to get rid of
initialization runtime calls.

This patch adds lower bound and initial value information into
fir.type_info via a new fir.dt_component operation. This operation is
generated only for component that needs it, which helps keeping the IR
small for "boring" types.

In general, adding Fortran level info in fir.type_info will allow
delaying the generation of "type descriptors" gobals that are very
verbose in FIR and make it hard to work with FIR dumps from applications
with many derived types.
2024-06-27 18:59:03 +02:00
Tarun Prabhu
8dd9494056 Revert "[flang] Add basic -mtune support" (#96678)
Reverts llvm/llvm-project#95043
2024-06-25 13:25:39 -06:00
Alexis Perry-Holby
a790279bf2 [flang] Add basic -mtune support (#95043)
This PR adds -mtune as a valid flang flag and passes the information
through to LLVM IR as an attribute on all functions. No specific
architecture optimizations are added at this time.
2024-06-25 18:39:35 +01:00
donald chen
2c1ae801e1 [mlir][side effect] refactor(*): Include more precise side effects (#94213)
This patch adds more precise side effects to the current ops with memory
effects, allowing us to determine which OpOperand/OpResult/BlockArgument
the
operation reads or writes, rather than just recording the reading and
writing
of values. This allows for convenient use of precise side effects to
achieve
analysis and optimization.

Related discussions:
https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
2024-06-19 22:10:34 +08:00
jeanPerier
a786919256 [flang] allow assumed-rank box in fir.store (#95980)
Codegen is done with a memcpy using the rank from the "value" descriptor
like for the fir.load case.
Rational described in
https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md.
2024-06-19 10:12:19 +02:00
jeanPerier
bacbf26b4c [flang] allow assumed-rank box in fir.alloca (#95947)
The alloca can be maximized with the maximum number or ranks, which is
reasonable (15 currently as per the standard). Introducing a rank based
dynamic allocation would complexify alloca hoisting and stack size
analysis (this can be revisited if the standard changes to allow more
ranks).

No change is needed since this is already reflected in how the fir.box
type is translated to LLVM.
2024-06-19 09:56:36 +02:00
Valentin Clement (バレンタイン クレメン)
5e20785edc [flang][cuda] Relax cuf.data_transfer verifier (#95974)
Allow data transfer between array reference and array described by a
descriptor.
2024-06-18 13:09:37 -07:00
jeanPerier
9f44d5d9d0 [flang] Simplify copy-in copy-out runtime API (#95822)
The runtime API for copy-in copy-out currently only has an entry only
for the copy-out. This entry has a "skipInit" boolean that is never set
to false by lowering and it does not deal with the deallocation of the
temporary.

The generated code was a mix of inline code and runtime calls This is not a big deal,
but this is unneeded compiler and generated code complexity.
With assumed-rank, it is also more cumbersome to establish a
temporary descriptor.

Instead, this patch:
- Adds a CopyInAssignment API that deals with establishing the temporary
descriptor and does the copy.
- Removes unused arg to CopyOutAssign, and pushes
destruction/deallocation responsibility inside it.

Note that this runtime API are still not responsible for deciding the
need of copying-in and out. This is kept as a separate runtime call to
IsContiguous, which is easier to inline/replace by inline code with the
hope of removing the copy-in/out calls after user function inlining.
@vzakhari has already shown that always inlining all the copy part
increase Fortran compilation time due to loop optimization attempts for
loops that are known to have little optimization profitability (the
variable being copied from and to is not contiguous).
2024-06-18 12:04:04 +02:00
Iman Hosseini
7665d3d90d [flang] Add reductions for CUF Kernels: Lowering (#95184)
* Add reductionOperands and reductionAttrs to cuf's KernelOp. 
* Parsing is already working and the tree has the info: here I make the
Bridge emit the updated KernelOp with reduction information added.
* Check |reductionAttrs| = |reductionOperands| in verifier
* Add a test
@clementval @vzakhari

---------

Co-authored-by: Iman Hosseini <imanh@nvidia.com>
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
2024-06-12 19:18:41 +01:00
Valentin Clement (バレンタイン クレメン)
0babff9675 [flang] Lower REDUCE intrinsic with no DIM argument and rank 1 (#94652)
This patch lowers the `REDUCE` intrinsic call to the runtime equivalent
for scalar results. Call with array result will follow.
2024-06-10 14:12:57 -07:00
khaki3
88cdd99055 [flang] Add reduction semantics to fir.do_loop (#93934)
Derived from #92480. This PR introduces reduction semantics into loops
for DO CONCURRENT REDUCE. The `fir.do_loop` operation now invisibly has
the `operandSegmentsizes` attribute and takes variable-length reduction
operands with their operations given as `fir.reduce_attr`. For the sake
of compatibility, `fir.do_loop`'s builder has additional arguments at
the end. The `iter_args` operand should be placed in front of the
declaration of result types, so the new operand for reduction variables
(`reduce`) is put in the middle of arguments.
2024-06-06 11:16:40 -07:00
Slava Zakharin
e42864ecfb [flang] Fixed buildbots: removed std::move preventing copy elision. 2024-06-04 15:31:39 -07:00
Slava Zakharin
ae4f300133 [flang] Canonicalize fir.array_coor by pulling in embox/rebox. (#92858)
In a simple case like this:
```
program test
  integer :: u(120, 2)
  u(1:120,1:2) = u(1:120,1:2) + 2
end program
```
Flang is creating a copy loop with fir.array_coor using
a result of fir.embox inserted before the loop. This results in split
address computations before and inside the loop, which can be seen
as many more arithmetic operations than required after converting
FIR to LLVM dialect. Even though LLVM SROA/mem2reg are able
to optimize the temporary descriptor, and then LICM is able to hoist
the invariant computations, we seem to get better mix of LLVM dialect
operations after FIR-to-LLVM codegen. This may also slightly reduce
the compilation time taken by LLVM to optimize the generate LLVM IR.
This may also slightly reduce the time spent by FIR AliasAnalysis
to reach the memory reference source.
2024-06-04 15:21:19 -07:00
Slava Zakharin
6cd86d0fae [flang] Use fir.declare/fir.dummy_scope for TBAA tags attachments. (#92472)
With MLIR inlining (e.g. `flang-new -mmlir -inline-all=true`)
the current TBAA tags attachment is suboptimal, because
we may lose information about the callee's dummy arguments
(by bypassing fir.declare in AliasAnalysis::getSource).
This is a conservative first step to improve the situation.
This patch makes AddAliasTagsPass to account for fir.dummy_scope
hierarchy after MLIR inlining and use it to place the TBAA tags
into TBAA trees corresponding to different function scopes.
The pass uses special mode of AliasAnalysis to find the instantiation
point of a Fortran variable (a [hl]fir.decalre) when searching
for the source of a memory reference. In this mode, AliasAnalysis
will always stop at fir.declare operations that have dummy_scope
operands - there should not be a reason to past throught it
for the purpose of TBAA tags attachment.
2024-06-04 08:33:40 -07:00
Kareem Ergawy
5bfc444524 [flang] Emit argNo debug info only for func block args (#93921)
Fixes a bug uncovered by
[pr43337.f90](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr43337.f90)
in the test suite.

In particular, this emits `argNo` debug info only if the parent op of a
block is a `func.func` op. This avoids DI conflicts when a function
contains a nested OpenMP region that itself has block arguments with DI
attached to them; for example, `omp.parallel` with delayed privatization
enabled.
2024-06-03 11:33:00 +02:00
jeanPerier
fd8b2d2046 [flang] lower RANK intrinsic (#93694)
First commit is reviewed in
https://github.com/llvm/llvm-project/pull/93682.

Lower RANK using fir.box_rank. This patches updates fir.box_rank to
accept box reference, this avoids the need of generating an assumed-rank
fir.load just for the sake of reading ALLOCATABLE/POINTER rank. The
fir.load would generate a "dynamic" memcpy that is hard to optimize
without further knowledge. A read effect is conditionally given to the
operation.
2024-05-30 11:02:09 +02:00
jeanPerier
f1d13bbd66 [flang] add FIR to FIR pass to lower assumed-rank operations (#93344)
Add pass to lower assumed-rank operations. The current patch adds
codegen for fir.rebox_assumed_rank. It will be the pass lowering
fir.select_rank.
    
fir.rebox_assumed_rank is lowered to a call to CopyAndUpdateDescriptor
runtime API.
    
Note that the lowering ends-up allocating two new descriptors at the
LLVM level (one alloca created by the pass for the CopyAndUpdateDescriptor
result descriptor argument, the second one is created by the fir.load
of the result descriptor in codegen).
LLVM is currently unable to properly optimize and merge those allocas.
The "nocapture" attribute added to CopyAndUpdateDescriptor arguments
gives part of the information to LLVM, but the fir.load codegen of
descriptors must be updated to use llvm.memcpy instead of
llvm.load+store to allow LLVM to optimize it. This will be done in later patch.
2024-05-27 11:45:39 +02:00
jeanPerier
b0b3596404 [flang] add fir.rebox_assumed_rank operation (#93334)
As described in https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md,
add an operation to make copies of assumed-rank descriptors where lower
bounds, attributes, or dynamic type may have been changed.
2024-05-27 10:53:31 +02:00
Valentin Clement (バレンタイン クレメン)
0bc710f7c1 [flang][cuda] Accept constant as src for cuf.data_tranfer (#92951)
Assignment of a constant (host) to a device variable is a special case
that can be further lowered to `cudaMemset` or similar functions. This
patch update the lowering to avoid the creation of a temporary when we
assign a constant to a device variable.
2024-05-21 12:42:30 -07:00
Valentin Clement
7847b1ca00 [flang][cuda][NFC] Silence warning triggered in buildbot 2024-05-21 11:35:51 -07:00
Valentin Clement (バレンタイン クレメン)
1fc3ce1cdb [flang][cuda] Enable data transfer for descriptors (#92804)
Remove the TODO when data transfer is done with descriptor variables.
2024-05-21 11:23:55 -07:00