Commit Graph

9165 Commits

Author SHA1 Message Date
Valentin Clement
308c00749d [flang][cuda][NFC] Fix format 2024-11-01 12:42:06 -07:00
Valentin Clement (バレンタイン クレメン)
32473864cb [flang][cuda] Data transfer with descriptor (#114598)
Reopen PR #114302 as it was automatically closed. 

Review in #114302
2024-11-01 12:35:48 -07:00
Valentin Clement (バレンタイン クレメン)
7792dbe29a Reland '[flang][runtime] Allow different memmov function in assign' (#114587)
Reland #114301
2024-11-01 11:26:39 -07:00
Valentin Clement (バレンタイン クレメン)
c5a254cdd7 Revert "[flang][runtime][NFC] Allow different memmove function in assign" (#114581)
Reverts llvm/llvm-project#114301
2024-11-01 10:40:10 -07:00
Valentin Clement (バレンタイン クレメン)
b278fe3297 [flang][runtime][NFC] Allow different memmove function in assign (#114301)
- Add a parameter to the `Assign` function to be able to use a different
`memmove` function. This is preparatory work to be able to use the
`Assign` function between host and device data.
- Expose the `Assign` function so it can be used from different files. 

- The new `memmoveFct` is not used in `BlankPadCharacterAssignment` yet
since it is not clear if there is a need. It will be updated in case it
is needed.
2024-11-01 10:34:03 -07:00
Valentin Clement (バレンタイン クレメン)
466b58ba38 [flang] Avoid generating duplicate symbol in comdat (#114472)
In case where a fir.global might be duplicated in an inner module
(gpu.module), the conversion pattern will be applied on the module and
the gpu module version of the global and try to generate multiple comdat
with the same symbol name. This is what we have in the implementation of
CUDA Fortran.

Just check for the presence of the `ComdatSelectorOp` before creating a
new one.
2024-10-31 18:59:04 -07:00
Valentin Clement (バレンタイン クレメン)
067ce5ca18 [flang][cuda] Use getOrCreateGPUModule in CUFDeviceGlobal pass (#114468)
Make the pass functional if gpu module was not created yet.
2024-10-31 18:58:43 -07:00
Rolf Morel
5c1752e368 [MLIR][DLTI] Pretty parsing and printing for DLTI attrs (#113365)
Unifies parsing and printing for DLTI attributes. Introduces a format of
`#dlti.attr<key1 = val1, ..., keyN = valN>` syntax for all queryable
DLTI attributes similar to that of the DictionaryAttr, while retaining
support for specifying key-value pairs with `#dlti.dl_entry` (whether to
retain this is TBD).

As the new format does away with most of the boilerplate, it is much easier
to parse for humans. This makes an especially big difference for nested
attributes.

Updates the DLTI-using tests and includes fixes for misc error checking/
error messages.
2024-10-31 19:18:24 +00:00
Sergio Afonso
6c28530ed0 [Flang][OpenMP] Properly bind arguments of composite operations (#113682)
When composite constructs are lowered, clauses for each leaf construct
are lowered before creating the set of loop wrapper operations, using
these outside values to populate their operand lists. Then, when the
loop nest associated to that composite construct is lowered, the binding
of Fortran symbols to the entry block arguments defined by these loop
wrappers is performed, resulting in the creation of `hlfir.declare`
operations in the entry block of the `omp.loop_nest`.

This approach prevents `hlfir.declare` operations related to the binding
and other operations resulting from the evaluation of the clauses from
being inserted between loop wrapper operations, which would be an
illegal MLIR representation. However, this introduces the problem of
entry block arguments defined by a wrapper that then should be used by
one of its nested wrappers, because the corresponding Fortran symbol
would still be mapped to an outside value at the time of gathering the
list of operands for the nested wrapper.

This patch adds operand re-mapping logic to update wrappers without
changing when clauses are evaluated or where the `hlfir.declare`
creation is performed.
2024-10-31 16:39:53 +00:00
Valentin Clement (バレンタイン クレメン)
e4e9fea71e [flang][cuda] Pass descriptor by reference for CUFMemsetDescriptor (#114338) 2024-10-31 09:02:59 -07:00
Renaud Kauffmann
423f35410a [flang][cuda] Adding support for registration of boxes (#114323)
Needed to take into account that `fir::getTypeSizeAndAlignmentOrCrash`
does not work with box types but requires the `fir::LLVMTypeConverter`
2024-10-31 08:39:08 -07:00
Kareem Ergawy
0698482506 [flang][MLIR] Hoist do concurrent nest bounds/steps outside the nest (#114020)
If you have the following multi-range `do concurrent` loop:

```fortran
  do concurrent(i=1:n, j=1:bar(n*m, n/m))
    a(i) = n
  end do
```

Currently, flang generates the following IR:

```mlir
    fir.do_loop %arg1 = %42 to %44 step %c1 unordered {
      ...
      %53:3 = hlfir.associate %49 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
      %54:3 = hlfir.associate %52 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
      %55 = fir.call @_QFPbar(%53#1, %54#1) fastmath<contract> : (!fir.ref<i32>, !fir.ref<i32>) -> i32
      hlfir.end_associate %53#1, %53#2 : !fir.ref<i32>, i1
      hlfir.end_associate %54#1, %54#2 : !fir.ref<i32>, i1
      %56 = fir.convert %55 : (i32) -> index
      ...
      fir.do_loop %arg2 = %46 to %56 step %c1_4 unordered {
        ...
      }
    }
```

However, if `bar` is impure, then we have a direct violation of the
standard:

```
C1143 A reference to an impure procedure shall not appear within a DO CONCURRENT construct.
```

Moreover, the standard describes the execution of `do concurrent`
construct in multiple stages:

```
11.1.7.4 Execution of a DO construct
...
11.1.7.4.2 DO CONCURRENT loop control
The concurrent-limit and concurrent-step expressions in the concurrent-control-list are evaluated. ...

11.1.7.4.3 The execution cycle
...
The block of a DO CONCURRENT construct is executed for every active combination of the index-name values.
Each execution of the block is an iteration. The executions may occur in any order.
```

From the above 2 points, it seems to me that execution is divided in
multiple consecutive stages: 11.1.7.4.2 is the stage where we evaluate
all control expressions including the step and then 11.1.7.4.3 is the
stage to execute the block of the concurrent loop itself using the
combination of possible iteration values.
2024-10-31 09:19:18 +01:00
Renaud Kauffmann
bfe486fe76 Passing descriptors by reference to CUDA runtime calls (#114288)
Passing a descriptor as a `const Descriptor &` or a `const Descriptor *`
generates a FIR signature where the box is passed by value.
This is an issue, as it requires a load of the box to be passed. But
since, ultimately, all boxes are passed by reference a temporary is
generated in LLVM and the reference to the temporary is passed.

The boxes addresses are registered with the CUDA runtime but the
temporaries are not, thus preventing the runtime to properly map a host
side address to its device side counterpart.

To address this issue, this PR changes the signatures to the transfer
functions to pass a descriptor as a `Descriptor *`, which will in turn
generate a FIR signature with that takes a box reference as an argument.
2024-10-30 13:24:47 -07:00
Asher Mancinelli
0c9a02355a [flang][fir] always use memcpy for fir.box (#113949)
@jeanPerier explained the importance of converting box loads and stores
into `memcpy`s instead of aggregate loads and stores, and I'll do my
best to explain it here.

* [(godbolt link) Example comparing opt transformations on memcpys vs
aggregate load/stores](https://godbolt.org/z/be7xM83cG)
* LLVM can more effectively reason about memcpys compared to aggregate
load/stores.
* This came up when others were discussing array descriptors for
assumed-rank arrays passed to `bind(c)` subroutines, with the
implication that the array descriptors are known to have lower bounds of
1 and that they are not pointer/allocatable types.
* [(godbolt link) Clang also uses memcpys so we should probably follow
them, assuming the clang developers are generatign what they know Opt
will handle more effectively.](https://godbolt.org/z/YT4x7387W)
* This currently may not help much without the `nocapture` attribute
being propagated to function calls, but [it looks like someone may do
this soon (discourse
link)](https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401/23)
or I can do this in a follow-up patch.

Note on test `flang/test/Fir/embox-char.fir`: it looks like the original
test was auto-generated. I wasn't too sure which parts were especially
important to test, so I regenerated the test. If we want the updated
version to look more like the old version, I'll make those changes.
2024-10-30 09:50:27 -07:00
David Truby
dda20ea73d [flang] Add fir-lsp-server (#114059)
This patch adds a fir-lsp-server tool for editor support for editing fir
files, using the existing MLIR lsp server support.

See https://mlir.llvm.org/docs/Tools/MLIRLSP/ for more information.
2024-10-30 15:05:18 +00:00
vdonaldson
8d406d882d [flang] IEEE_REAL (#113948)
IEEE_REAL converts an integer or real argument to a real of a given
kind.
2024-10-30 09:56:42 -04:00
Krzysztof Parzyszek
c478aab684 [flang][OpenMP] Parser support for DEPOBJ plus DEPEND, DESTROY, UPDATE (#114074)
Parse the DEPOBJ construct and the associated clauses, perform basic
semantic checks.
2024-10-30 08:36:08 -05:00
Sergio Afonso
55e4e3ff65 [Flang][OpenMP] Access full list of entry block syms and vars (NFC) (#113681)
This patch adds methods to `EntryBlockArgs` to access the full list of
entry block argument-related symbols and variables, in their standard
order. This helps centralizing this logic in as few places as possible
to avoid future inconsistencies.
2024-10-30 12:07:47 +00:00
Kiran Chandramohan
092a819e94 [Flang][OpenMP] Add frontend support for directives involving master (#113893)
Issue deprecation warning for these directives.
Lowering currently supports parallel master, for all other combined or
composite directives involving master, issue TODO errors.

Note: The first commit changes the formatting and generalizes the
deprecation message emission for reuse in the second commit. I can pull
it out into a separate commit if required.
2024-10-30 10:58:26 +00:00
Abid Qadeer
652988b658 [flang][debug] Support TupleType. (#113917)
Handling is similar to RecordType with following differences:

1. No check for cyclic references
2. No extra processing for lower bounds of array members.
3. No line information as TupleType is a lowering artefact and does not
really represent an entity in the code.
2024-10-30 09:52:56 +00:00
Valentin Clement (バレンタイン クレメン)
0d94c7b5ce [flang][cuda][NFC] Make pattern names homogenous (#114156)
Dialect name is uppercase. Make all the patterns prefix homogenous.
2024-10-29 20:39:17 -07:00
Valentin Clement (バレンタイン クレメン)
0fa2fb3ed0 [flang][cuda] Add conversion pattern for cuf.kernel_launch op (#114129) 2024-10-29 17:00:41 -07:00
Renaud Kauffmann
b9978f8c77 [flang][cuda] Adding variable registration in constructor (#113976)
1) Adding variable registration in constructor
2) Applying feedback from PR
https://github.com/llvm/llvm-project/pull/112989
2024-10-29 11:48:48 -07:00
Kelvin Li
8e14c6c172 [flang] Support -mabi=vec-extabi and -mabi=vec-default on AIX (#113215)
This option is to enable the AIX extended and default vector ABIs.
2024-10-29 14:20:11 -04:00
Valentin Clement (バレンタイン クレメン)
b05fec97d5 [flang][cuda] Convert gpu.launch_func to CUFLaunchClusterKernel when cluster dims are present (#113959)
Kernel launch in CUF are converted to `gpu.launch_func`. When the kernel
has `cluster_dims` specified these get carried over to the
`gpu.launch_func` operation. This patch updates the special conversion
of `gpu.launch_func` when cluster dims are present to the newly added
entry point.
2024-10-29 10:02:08 -07:00
Valentin Clement (バレンタイン クレメン)
0b700f2333 [flang][cuda] Add entry point to launch global function with cluster_dims (#113958) 2024-10-29 10:01:49 -07:00
Krzysztof Parzyszek
d48c849ea9 [flang][OpenMP] Parsing support for iterator in DEPEND clause (#113622)
Warn about use of iterators OpenMP versions that didn't have them
(support added in 5.0). Emit a TODO error in lowering.
2024-10-29 08:00:44 -05:00
Abid Qadeer
8239ea3918 [flang][debug] Support IndexType. (#113921) 2024-10-29 12:22:43 +00:00
Krzysztof Parzyszek
46944d1f95 [flang][OpenMP] Extract OMP version hint into helper functions, NFC (#113621) 2024-10-29 06:43:40 -05:00
Renaud Kauffmann
0eb5c9d2ef [flang][cuda] Copying device globals in the gpu module (#113955) 2024-10-28 15:34:27 -07:00
Krzysztof Parzyszek
09a4bcf1a5 [flang][OpenMP] Update handling of DEPEND clause (#113620)
Parse the locator list in OmpDependClause as an OmpObjectList (instead
of a list of Designators). When a common block appears in the locator
list, show an informative message.
Implement resolving symbols in DependSinkVec in a dedicated visitor
instead of having a visitor for OmpDependClause.
Resolve unresolved names common blocks in OmpObjectList.

Minor changes to the code organization:
- rename OmpDependenceType to OmpTaskDependenceType (to follow 5.2
terminology),
- rename Depend::WithLocators to Depend::DepType,
- add comments with more detailed spec references to parse-tree.h.

---------

Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
2024-10-28 16:06:22 -05:00
Renaud Kauffmann
70d61f6de7 [flang][cuda] Adding runtime call to CUFRegisterVariable (#113952) 2024-10-28 13:34:37 -07:00
Yusuke MINATO
bd6ab32e6e Revert "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#113901)
Reverts llvm/llvm-project#110063 due to the performance regression on
503.bwaves_r in SPEC2017.
2024-10-28 14:19:20 +00:00
Kiran Chandramohan
5621929f7f [Flang][OpenMP] Add parser support for grainsize and num_tasks clause (#113136)
These clauses are applicable only for the taskloop directive. Since the
directive has a TODO error, skipping the addition of TODOs for these
clauses.
2024-10-27 20:16:24 +00:00
Kiran Chandramohan
eef3766ae5 Assumed-size arrays are shared and cannot be privatized (#112963)
Do not error out if default(none) is specified and the region has an
assumed-size array.

Fixes #110442
2024-10-27 18:58:47 +00:00
jeanPerier
64d7e45c40 Revert "[flang][debug] Support mlir::NoneType." (#113769)
Reverts llvm/llvm-project#113550

It turns out this causes compiler crashes with assumed-type arrays and -g.
See https://github.com/llvm/llvm-project/pull/113769 for a reproducer.
2024-10-26 21:38:54 +02:00
Renaud Kauffmann
3acf856b50 Adding CUFCommon.{h,cpp} for CUF utilities (#113740) 2024-10-25 16:08:45 -07:00
Kiran Chandramohan
843c2fbe7f Add parser+semantics support for scope construct (#113700)
Test parsing, semantics and a couple of basic semantic checks for
block/worksharing constructs.
Add TODO message in lowering.
2024-10-25 18:57:01 +01:00
Abid Qadeer
85af1926f7 [flang][debug] Support mlir::NoneType. (#113550) 2024-10-25 11:43:25 +01:00
Yusuke MINATO
96bb375f5c [flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv (#110063)
nsw is now added to do-variable increment when -fno-wrapv is enabled as
GFortran seems to do.
That means the option introduced by #91579 isn't necessary any more.

Note that the feature of -flang-experimental-integer-overflow is enabled
by default.
2024-10-25 15:20:23 +09:00
Krzysztof Parzyszek
e2e7d565bf [flang][OpenMP] Make Symbol::OmpFlagToClauseName static (#113586)
It doesn't need the Symbol object for anything.
2024-10-24 12:10:18 -05:00
Krzysztof Parzyszek
5d37415a58 Unsupport flang/test/Driver/embed.f90 on Windows
The test fails due to Windows' line-endings, and it's blocking
pre-checkin tests.
2024-10-24 11:45:27 -05:00
Abid Qadeer
37832d5de2 [flang][debug] Support fir.vector type. (#112951)
This PR converts the `fir.vector<>` to
`DICompositeTypeAttr(DW_TAG_array_type)` with `vector` flag set.
2024-10-24 13:37:32 +01:00
Abid Qadeer
47c1abf4af [flang][debug] Fix array lower bounds in derived type members. (#113183)
The lower bound information for the array members of a derived type
can't be obtained from the `DeclareOp`. It has to be extracted from the
`TypeInfoOp`. That was left as FIXME in the code. This PR adds the
missing functionality to fix the issue.

I tried the following approaches before settling on the current one that
is to generate `DITypeAttr` for array members right where the components
are being processed.

1. Generate a temp XDeclareOp with the shift information obtained from
the `TypeInfoOp`. This caused a few issues mostly related to
`unrealized_conversion_cast`.

2. Change the shift operands in the `declOp` that was passed in the
function before calling `convertType`. The code can be seen in the
abcf031a8e5a02f0081e7f293858302e7bf47bec. It essentially looked like the
following. It works correctly but I was not sure if temporarily changing
the `declOp` is the safe thing to do.

```
mlir::OperandRange originalShift = declOp.getShift();
mlir::MutableOperandRange mutableOpRange = declOp.getShiftMutable();
mutableOpRange.assign(shiftOpers);
elemTy = convertType(fieldTy, fileAttr, scope, declOp);
mutableOpRange.assign(originalShift);
```

Fixes #113178.
2024-10-24 13:22:28 +01:00
Krzysztof Parzyszek
ea3534b385 [flang][OpenMP] Parse AFFINITY clause, lowering not supported yet (#113485)
Implement parsing of the AFFINITY clause on TASK construct, conversion
from the parser class to omp::Clause.
Lowering to HLFIR is unsupported, a TODO message is displayed.
2024-10-24 05:54:35 -05:00
Abid Qadeer
c07abf7272 [flang][debug] Support fir::ReferenceType. (#113480) 2024-10-24 11:38:17 +01:00
Valentin Clement (バレンタイン クレメン)
4e40b71c51 [flang][cuda] Add specialized gpu.launch_func conversion (#113493) 2024-10-23 15:28:51 -07:00
Valentin Clement (バレンタイン クレメン)
e2766b2bce [flang][cuda] Add entry point to launch cuda fortran kernel (#113490) 2024-10-23 13:44:02 -07:00
Krzysztof Parzyszek
c99f3950f4 [flang][OpenMP] Order clause AST nodes alphabetically, NFC (#113469)
This makes it easier to navigate the parse-tree.h file.
2024-10-23 13:33:36 -05:00
Valentin Clement (バレンタイン クレメン)
60105ac6ba [flang][cuda] Fix kernel registration (#113372)
The registration needs the fct pointer and the name. This patch updates
the entry point with an extra arg and the translation as well.
2024-10-23 11:25:58 -07:00