Commit Graph

4920 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
466b58ba38 [flang] Avoid generating duplicate symbol in comdat (#114472)
In case where a fir.global might be duplicated in an inner module
(gpu.module), the conversion pattern will be applied on the module and
the gpu module version of the global and try to generate multiple comdat
with the same symbol name. This is what we have in the implementation of
CUDA Fortran.

Just check for the presence of the `ComdatSelectorOp` before creating a
new one.
2024-10-31 18:59:04 -07:00
Valentin Clement (バレンタイン クレメン)
067ce5ca18 [flang][cuda] Use getOrCreateGPUModule in CUFDeviceGlobal pass (#114468)
Make the pass functional if gpu module was not created yet.
2024-10-31 18:58:43 -07:00
Rolf Morel
5c1752e368 [MLIR][DLTI] Pretty parsing and printing for DLTI attrs (#113365)
Unifies parsing and printing for DLTI attributes. Introduces a format of
`#dlti.attr<key1 = val1, ..., keyN = valN>` syntax for all queryable
DLTI attributes similar to that of the DictionaryAttr, while retaining
support for specifying key-value pairs with `#dlti.dl_entry` (whether to
retain this is TBD).

As the new format does away with most of the boilerplate, it is much easier
to parse for humans. This makes an especially big difference for nested
attributes.

Updates the DLTI-using tests and includes fixes for misc error checking/
error messages.
2024-10-31 19:18:24 +00:00
Valentin Clement (バレンタイン クレメン)
e4e9fea71e [flang][cuda] Pass descriptor by reference for CUFMemsetDescriptor (#114338) 2024-10-31 09:02:59 -07:00
Renaud Kauffmann
423f35410a [flang][cuda] Adding support for registration of boxes (#114323)
Needed to take into account that `fir::getTypeSizeAndAlignmentOrCrash`
does not work with box types but requires the `fir::LLVMTypeConverter`
2024-10-31 08:39:08 -07:00
Kareem Ergawy
0698482506 [flang][MLIR] Hoist do concurrent nest bounds/steps outside the nest (#114020)
If you have the following multi-range `do concurrent` loop:

```fortran
  do concurrent(i=1:n, j=1:bar(n*m, n/m))
    a(i) = n
  end do
```

Currently, flang generates the following IR:

```mlir
    fir.do_loop %arg1 = %42 to %44 step %c1 unordered {
      ...
      %53:3 = hlfir.associate %49 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
      %54:3 = hlfir.associate %52 {adapt.valuebyref} : (i32) -> (!fir.ref<i32>, !fir.ref<i32>, i1)
      %55 = fir.call @_QFPbar(%53#1, %54#1) fastmath<contract> : (!fir.ref<i32>, !fir.ref<i32>) -> i32
      hlfir.end_associate %53#1, %53#2 : !fir.ref<i32>, i1
      hlfir.end_associate %54#1, %54#2 : !fir.ref<i32>, i1
      %56 = fir.convert %55 : (i32) -> index
      ...
      fir.do_loop %arg2 = %46 to %56 step %c1_4 unordered {
        ...
      }
    }
```

However, if `bar` is impure, then we have a direct violation of the
standard:

```
C1143 A reference to an impure procedure shall not appear within a DO CONCURRENT construct.
```

Moreover, the standard describes the execution of `do concurrent`
construct in multiple stages:

```
11.1.7.4 Execution of a DO construct
...
11.1.7.4.2 DO CONCURRENT loop control
The concurrent-limit and concurrent-step expressions in the concurrent-control-list are evaluated. ...

11.1.7.4.3 The execution cycle
...
The block of a DO CONCURRENT construct is executed for every active combination of the index-name values.
Each execution of the block is an iteration. The executions may occur in any order.
```

From the above 2 points, it seems to me that execution is divided in
multiple consecutive stages: 11.1.7.4.2 is the stage where we evaluate
all control expressions including the step and then 11.1.7.4.3 is the
stage to execute the block of the concurrent loop itself using the
combination of possible iteration values.
2024-10-31 09:19:18 +01:00
Renaud Kauffmann
bfe486fe76 Passing descriptors by reference to CUDA runtime calls (#114288)
Passing a descriptor as a `const Descriptor &` or a `const Descriptor *`
generates a FIR signature where the box is passed by value.
This is an issue, as it requires a load of the box to be passed. But
since, ultimately, all boxes are passed by reference a temporary is
generated in LLVM and the reference to the temporary is passed.

The boxes addresses are registered with the CUDA runtime but the
temporaries are not, thus preventing the runtime to properly map a host
side address to its device side counterpart.

To address this issue, this PR changes the signatures to the transfer
functions to pass a descriptor as a `Descriptor *`, which will in turn
generate a FIR signature with that takes a box reference as an argument.
2024-10-30 13:24:47 -07:00
Asher Mancinelli
0c9a02355a [flang][fir] always use memcpy for fir.box (#113949)
@jeanPerier explained the importance of converting box loads and stores
into `memcpy`s instead of aggregate loads and stores, and I'll do my
best to explain it here.

* [(godbolt link) Example comparing opt transformations on memcpys vs
aggregate load/stores](https://godbolt.org/z/be7xM83cG)
* LLVM can more effectively reason about memcpys compared to aggregate
load/stores.
* This came up when others were discussing array descriptors for
assumed-rank arrays passed to `bind(c)` subroutines, with the
implication that the array descriptors are known to have lower bounds of
1 and that they are not pointer/allocatable types.
* [(godbolt link) Clang also uses memcpys so we should probably follow
them, assuming the clang developers are generatign what they know Opt
will handle more effectively.](https://godbolt.org/z/YT4x7387W)
* This currently may not help much without the `nocapture` attribute
being propagated to function calls, but [it looks like someone may do
this soon (discourse
link)](https://discourse.llvm.org/t/applying-the-nocapture-attribute-to-reference-passed-arguments-in-fortran-subroutines/81401/23)
or I can do this in a follow-up patch.

Note on test `flang/test/Fir/embox-char.fir`: it looks like the original
test was auto-generated. I wasn't too sure which parts were especially
important to test, so I regenerated the test. If we want the updated
version to look more like the old version, I'll make those changes.
2024-10-30 09:50:27 -07:00
vdonaldson
8d406d882d [flang] IEEE_REAL (#113948)
IEEE_REAL converts an integer or real argument to a real of a given
kind.
2024-10-30 09:56:42 -04:00
Krzysztof Parzyszek
c478aab684 [flang][OpenMP] Parser support for DEPOBJ plus DEPEND, DESTROY, UPDATE (#114074)
Parse the DEPOBJ construct and the associated clauses, perform basic
semantic checks.
2024-10-30 08:36:08 -05:00
Kiran Chandramohan
092a819e94 [Flang][OpenMP] Add frontend support for directives involving master (#113893)
Issue deprecation warning for these directives.
Lowering currently supports parallel master, for all other combined or
composite directives involving master, issue TODO errors.

Note: The first commit changes the formatting and generalizes the
deprecation message emission for reuse in the second commit. I can pull
it out into a separate commit if required.
2024-10-30 10:58:26 +00:00
Abid Qadeer
652988b658 [flang][debug] Support TupleType. (#113917)
Handling is similar to RecordType with following differences:

1. No check for cyclic references
2. No extra processing for lower bounds of array members.
3. No line information as TupleType is a lowering artefact and does not
really represent an entity in the code.
2024-10-30 09:52:56 +00:00
Valentin Clement (バレンタイン クレメン)
0fa2fb3ed0 [flang][cuda] Add conversion pattern for cuf.kernel_launch op (#114129) 2024-10-29 17:00:41 -07:00
Renaud Kauffmann
b9978f8c77 [flang][cuda] Adding variable registration in constructor (#113976)
1) Adding variable registration in constructor
2) Applying feedback from PR
https://github.com/llvm/llvm-project/pull/112989
2024-10-29 11:48:48 -07:00
Kelvin Li
8e14c6c172 [flang] Support -mabi=vec-extabi and -mabi=vec-default on AIX (#113215)
This option is to enable the AIX extended and default vector ABIs.
2024-10-29 14:20:11 -04:00
Valentin Clement (バレンタイン クレメン)
b05fec97d5 [flang][cuda] Convert gpu.launch_func to CUFLaunchClusterKernel when cluster dims are present (#113959)
Kernel launch in CUF are converted to `gpu.launch_func`. When the kernel
has `cluster_dims` specified these get carried over to the
`gpu.launch_func` operation. This patch updates the special conversion
of `gpu.launch_func` when cluster dims are present to the newly added
entry point.
2024-10-29 10:02:08 -07:00
Krzysztof Parzyszek
d48c849ea9 [flang][OpenMP] Parsing support for iterator in DEPEND clause (#113622)
Warn about use of iterators OpenMP versions that didn't have them
(support added in 5.0). Emit a TODO error in lowering.
2024-10-29 08:00:44 -05:00
Abid Qadeer
8239ea3918 [flang][debug] Support IndexType. (#113921) 2024-10-29 12:22:43 +00:00
Renaud Kauffmann
0eb5c9d2ef [flang][cuda] Copying device globals in the gpu module (#113955) 2024-10-28 15:34:27 -07:00
Krzysztof Parzyszek
09a4bcf1a5 [flang][OpenMP] Update handling of DEPEND clause (#113620)
Parse the locator list in OmpDependClause as an OmpObjectList (instead
of a list of Designators). When a common block appears in the locator
list, show an informative message.
Implement resolving symbols in DependSinkVec in a dedicated visitor
instead of having a visitor for OmpDependClause.
Resolve unresolved names common blocks in OmpObjectList.

Minor changes to the code organization:
- rename OmpDependenceType to OmpTaskDependenceType (to follow 5.2
terminology),
- rename Depend::WithLocators to Depend::DepType,
- add comments with more detailed spec references to parse-tree.h.

---------

Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
2024-10-28 16:06:22 -05:00
Yusuke MINATO
bd6ab32e6e Revert "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#113901)
Reverts llvm/llvm-project#110063 due to the performance regression on
503.bwaves_r in SPEC2017.
2024-10-28 14:19:20 +00:00
Kiran Chandramohan
5621929f7f [Flang][OpenMP] Add parser support for grainsize and num_tasks clause (#113136)
These clauses are applicable only for the taskloop directive. Since the
directive has a TODO error, skipping the addition of TODOs for these
clauses.
2024-10-27 20:16:24 +00:00
Kiran Chandramohan
eef3766ae5 Assumed-size arrays are shared and cannot be privatized (#112963)
Do not error out if default(none) is specified and the region has an
assumed-size array.

Fixes #110442
2024-10-27 18:58:47 +00:00
jeanPerier
64d7e45c40 Revert "[flang][debug] Support mlir::NoneType." (#113769)
Reverts llvm/llvm-project#113550

It turns out this causes compiler crashes with assumed-type arrays and -g.
See https://github.com/llvm/llvm-project/pull/113769 for a reproducer.
2024-10-26 21:38:54 +02:00
Kiran Chandramohan
843c2fbe7f Add parser+semantics support for scope construct (#113700)
Test parsing, semantics and a couple of basic semantic checks for
block/worksharing constructs.
Add TODO message in lowering.
2024-10-25 18:57:01 +01:00
Abid Qadeer
85af1926f7 [flang][debug] Support mlir::NoneType. (#113550) 2024-10-25 11:43:25 +01:00
Yusuke MINATO
96bb375f5c [flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv (#110063)
nsw is now added to do-variable increment when -fno-wrapv is enabled as
GFortran seems to do.
That means the option introduced by #91579 isn't necessary any more.

Note that the feature of -flang-experimental-integer-overflow is enabled
by default.
2024-10-25 15:20:23 +09:00
Krzysztof Parzyszek
5d37415a58 Unsupport flang/test/Driver/embed.f90 on Windows
The test fails due to Windows' line-endings, and it's blocking
pre-checkin tests.
2024-10-24 11:45:27 -05:00
Abid Qadeer
37832d5de2 [flang][debug] Support fir.vector type. (#112951)
This PR converts the `fir.vector<>` to
`DICompositeTypeAttr(DW_TAG_array_type)` with `vector` flag set.
2024-10-24 13:37:32 +01:00
Abid Qadeer
47c1abf4af [flang][debug] Fix array lower bounds in derived type members. (#113183)
The lower bound information for the array members of a derived type
can't be obtained from the `DeclareOp`. It has to be extracted from the
`TypeInfoOp`. That was left as FIXME in the code. This PR adds the
missing functionality to fix the issue.

I tried the following approaches before settling on the current one that
is to generate `DITypeAttr` for array members right where the components
are being processed.

1. Generate a temp XDeclareOp with the shift information obtained from
the `TypeInfoOp`. This caused a few issues mostly related to
`unrealized_conversion_cast`.

2. Change the shift operands in the `declOp` that was passed in the
function before calling `convertType`. The code can be seen in the
abcf031a8e5a02f0081e7f293858302e7bf47bec. It essentially looked like the
following. It works correctly but I was not sure if temporarily changing
the `declOp` is the safe thing to do.

```
mlir::OperandRange originalShift = declOp.getShift();
mlir::MutableOperandRange mutableOpRange = declOp.getShiftMutable();
mutableOpRange.assign(shiftOpers);
elemTy = convertType(fieldTy, fileAttr, scope, declOp);
mutableOpRange.assign(originalShift);
```

Fixes #113178.
2024-10-24 13:22:28 +01:00
Krzysztof Parzyszek
ea3534b385 [flang][OpenMP] Parse AFFINITY clause, lowering not supported yet (#113485)
Implement parsing of the AFFINITY clause on TASK construct, conversion
from the parser class to omp::Clause.
Lowering to HLFIR is unsupported, a TODO message is displayed.
2024-10-24 05:54:35 -05:00
Abid Qadeer
c07abf7272 [flang][debug] Support fir::ReferenceType. (#113480) 2024-10-24 11:38:17 +01:00
Valentin Clement (バレンタイン クレメン)
4e40b71c51 [flang][cuda] Add specialized gpu.launch_func conversion (#113493) 2024-10-23 15:28:51 -07:00
Krzysztof Parzyszek
973fa983af [flang][OpenMP] Parse iterators, add to MAP clause, TODO for lowering (#113167)
Define `OmpIteratorSpecifier` and `OmpIteratorModifier` parser classes,
and add parsing for them. Those are reusable between any clauses that
use iterator modifiers.

Add support for iterator modifiers to the MAP clause up to lowering,
where a TODO message is emitted.
2024-10-23 08:31:53 -05:00
jeanPerier
a59f712434 [flang][hlfir] do not consider local temps as conflicting in assignment (#113330)
Last patch required to avoid creating a temporary for the LHS when
dealing with `x([a,b]) = y`.

The code dealing with "ordered assignments" (where, forall, user and
vector subscripted assignments) is saving the evaluated RHS/LHS and
masks if they have write effects because this write effects should not
be evaluated when they affect entities that may be written to in other
contexts after the evaluation and before the re-evaluation.

But when dealing with write to storage allocated in the region for the
expression being evluated, there is no problem to re-evaluate the write:
it has no effect outside of the expression evaluation that owns the
allocation.

In the case of `x([a,b]) = y`, the temporary is created for the vector
subscript. Raising the HLFIR abstraction for simple array constructors
may be a good idea, but local temps are created in other contexts, so
this fix is more generic.
2024-10-23 12:34:13 +02:00
jeanPerier
d89c1dbaf5 [flang][hlfir] refine hlfir.assign side effects (#113319)
hlfir.assign currently has the `MemoryEffects<[MemWrite]` which makes it
look like it can write to anything. This is good for some cases where
the assign effect cannot be precisely described through the MLIR side
effect API (e.g., when the LHS is a descriptor and it is not possible to
get an OpOperand describing the data address, or when derived type are
involved and finalization could be called, or user defined assignment
for some components). For the most common case of hlfir.assign on
intrinsic types without whole allocatable LHS, this is pessimistic.

This patch implements a finer description of the side effects when
possible, and also adds the proper read/allocate/free effects when
relevant.

The ultimate goal is to suppress the generation of temporary for the LHS
address when dealing with an assignment to a vector subscripted LHS
where the vector subscript is an array constructor that does not refer
to the LHS (as in `x([a,b]) = y`).

Two more patches will follow to enable this.
2024-10-23 12:33:14 +02:00
Nimish Mishra
b39760c4ce [flang][NFC] Fix failing atomic tests
Fix ordering of checks in atomic02.f90.
2024-10-23 08:57:31 +05:30
NimishMishra
1cbc015551 [flang][OpenMP] Error out when CHARACTER type is used in atomic constructs (#113045)
According to OpenMPv5.2 1.2.6, "For Fortran, a scalar variable with
intrinsic type, as defined by the base language, excluding character
type.". Likewise, section 4.3.1.3 states that atomic operations are on
"scalar variables of intrinsic type". This PR hence introduces a check
to error out when CHARACTER type is used in atomic operations.

Fixes https://github.com/llvm/llvm-project/issues/112918
2024-10-22 19:29:21 -07:00
Krzysztof Parzyszek
a8d506b320 [flang][OpenMP] Rename enum OmpxHold to Ompx_Hold in parser (#113366)
The convention is to use enum names that match the source spelling (up
to upper/lower case), including names with underscores.

Remove the special case from unparser, update tests.
2024-10-22 16:24:17 -05:00
Renaud Kauffmann
f1e59dcb45 Renaming Cuf passes to CUF (#113351)
For consistency with other dialects and other CUF passes and files, this
patch renames passes CufOpConversion to CUFOpConversion,
CufImplicitDeviceGlobal to CUFDeviceGlobal.
It also renames the file.
2024-10-22 12:50:31 -07:00
Razvan Lupusoru
ac9ee61857 [acc] Improve LegalizeDataValues pass to handle data constructs (#112990)
Renames LegalizeData to LegalizeDataValues since this pass fixes up SSA
values. LegalizeData suggested that it fixed data mapping.

This change also adds support to fix up ssa values for data clause
operations. Effectively, compute regions within a data region use the
ssa values from data operations also. The ssa values within data regions
but not within compute regions are not updated.

This change is to support the requirement in the OpenACC spec which
notes that a visible data clause is not just one on the current compute
construct but on the lexically containing data construct or visible
declare directive.
2024-10-21 09:49:58 -07:00
Abid Qadeer
95b4128c6a [flang][debug] Don't generate debug for compiler-generated variables (#112423)
Flang generates many globals to handle derived types. There was a check
in debug info to filter them based on the information that their names
start with a period. This changed since PR#104859 where 'X' is being
used instead of '.'.

This PR fixes this issue by also adding 'X' in that list. As user
variables gets lower cased by the NameUniquer, there is no risk that
those will be filtered out. I added a test for that to be sure.
2024-10-21 11:27:34 +01:00
Thirumalai Shaktivel
9b49392d6e [Flang] Handle the source (scopes) for some OpenMP constructs (#109097)
Fixes: https://github.com/llvm/llvm-project/issues/82943
Fixes: https://github.com/llvm/llvm-project/issues/82942
Fixes: https://github.com/llvm/llvm-project/issues/85593
2024-10-21 13:07:48 +05:30
Pranav Bhandarkar
11dad2fa51 [flang][OpenMP] - Add MapInfoOp instances for target private variables when needed (#109862)
This PR adds an OpenMP dialect related pass for FIR/HLFIR which creates
`MapInfoOp` instances for certain privatized symbols. For example, if an
allocatable variable is used in a private clause attached to a
`omp.target` op, then the allocatable variable's descriptor will be
needed on the device (e.g. GPU). This descriptor needs to be separately
mapped onto the device. This pass creates the necessary `omp.map.info`
ops for this.
2024-10-20 01:01:39 -05:00
Valentin Clement (バレンタイン クレメン)
5406834cda [flang][cuda] Add cuf.register_module operation (#112971)
Add a new operation to register the fatbin and pass it to
`cuf.register_kernel`
2024-10-18 21:30:38 -07:00
Renaud Kauffmann
864902e9b4 [flang][cuda] Call CUFGetDeviceAddress to get global device address from host address (#112989) 2024-10-18 17:35:38 -07:00
Luke Drummond
b55c52c047 Revert "Renormalize line endings whitespace only after dccebddb3b80"
This reverts commit 9d98acb196.
2024-10-18 21:16:50 +01:00
Tom Eccles
6ce4b6dd07 [flang][OpenMP][test] re-add complex atomic capture regression test (#112736)
This was reverted in https://github.com/llvm/llvm-project/pull/110969
due to a failure on aarch64.

Weirdly aarch64 (but apparently not x86?) has a spurious phi
instruction. flang -fc1 -emit-llvm will run midle-end optimization
passes. Presumably one of those is behaving differently on different
targets. I have adapted the test to work correctly on aarch64.

The difference is in the RUN lines and the atomic exit block.
2024-10-18 11:00:55 +01:00
Yusuke MINATO
9698e57548 [flang][Driver] Add support for -f[no-]wrapv and -f[no]-strict-overflow in the frontend (#110061)
This patch introduces the options for integer overflow flags into Flang.
The behavior is similar to that of Clang.
2024-10-18 16:30:23 +09:00
Thirumalai Shaktivel
e6321d94de [Flang][Semantics] Add a semantic check for simd construct (#109089)
Add missing semantic check for the SAFELEN clause in the SIMD Order
construct
2024-10-18 10:13:49 +05:30