Commit Graph

2359 Commits

Author SHA1 Message Date
Kazu Hirata
4435b7d8d3 [flang] Migrate away from PointerUnion::{is,get} (NFC) (#122585)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>
2025-01-11 02:06:47 -08:00
Krzysztof Parzyszek
70e96dc3fb [flang][OpenMP] Parsing context selectors for METADIRECTIVE (#121815)
This is just adding parsers for context selectors. There are no tests
because there is no way to execute these parsers yet.
2025-01-10 11:05:23 -06:00
Peter Klausler
3a8a52f4a5 [flang] Make IsCoarray() more accurate (#121415)
A designator without cosubscripts can have subscripts, component
references, substrings, &c. and still have corank. The current
IsCoarray() predicate only seems to work for whole variable/component
references. This was breaking some cases of THIS_IMAGE().
2025-01-08 13:16:56 -08:00
Peter Klausler
eb77f442b3 [flang] Accept L0 (#121998)
Accept a zero field width for formatted logical output (L0),
interpreting it as if it had been L1.
2025-01-08 13:15:51 -08:00
Peter Klausler
9496391901 [flang] Fold LCOBOUND & UCOBOUND (#121411)
Implement constant folding for LCOBOUND and UCOBOUND intrinsic
functions. Moves some error detection code from intrinsics.cpp to
fold-integer.cpp so that erroneous calls get properly flagged and
converted into known errors.
2025-01-08 13:13:30 -08:00
Valentin Clement (バレンタイン クレメン)
878a57468b [flang][cuda] Add c_devloc as intrinsic and inline it during lowering (#120648)
Add `c_devloc` as intrinsic and inline it during lowering. `c_devloc` is
used in CUDA Fortran to get the address of device variables.

For the moment, we borrow almost all semantic checks from `c_loc` except
for the pointer or target restriction. The specifications of `c_devloc`
are are pretty vague and we will relax/enforce the restrictions based on
library and apps usage comparing them to the reference compiler.
2025-01-08 11:23:05 -08:00
Slava Zakharin
2e637dbbb8 [flang] Canonicalize redundant pointer converts. (#121864)
This patch adds a canonicalization pattern for optimizing redundant
"pointer" fir.converts. Such converts prevent the StackArrays pass
to recognize fir.freemem for the corresponding fir.allocmem, e.g.:
```
    %69 = fir.allocmem !fir.array<2xi32>
    %71:2 = hlfir.declare %69(%70) {uniq_name = ".tmp.arrayctor"} :
        (!fir.heap<!fir.array<2xi32>>, !fir.shape<1>) ->
        (!fir.heap<!fir.array<2xi32>>, !fir.heap<!fir.array<2xi32>>)
    %95 = fir.convert %71#1 :
        (!fir.heap<!fir.array<2xi32>>) -> !fir.ref<!fir.array<2xi32>>
    %100 = fir.convert %95 :
        (!fir.ref<!fir.array<2xi32>>) -> !fir.heap<!fir.array<2xi32>>
    fir.freemem %100 : !fir.heap<!fir.array<2xi32>>
```
I found this in `tonto`, but the change does not affect performance at all.
Anyway, it looks like a reasonable thing to do, and it makes easier
to compare the performance profiles with other compilers'.
2025-01-07 08:35:43 -08:00
Mats Petersson
4df366cd80 [FLANG][OpenMP]Add support for ALIGN clause on OMP ALLOCATE (#120791)
This is trivially additional support for the existing ALLOCATE
directive, which allows an ALIGN clause.

The ALLOCATE directive is currently not implemented, so this is just
addding the necessary parser parts to allow the compiler to not say
"Huh? I don't get this" [or "Expected OpenMP construct"] when it
encounters the ALIGN clause.

Some parser testing is updated and a new todo test, just in case the
feature of align clause is not supported by the initial support for
ALLOCATE.
2025-01-06 11:02:31 +00:00
Valentin Clement (バレンタイン クレメン)
9165848c82 [flang][cuda] Sync global descriptor when nullifying pointer (#121595) 2025-01-03 14:37:14 -08:00
Slava Zakharin
3c700d131a [flang] Extract hlfir.assign inlining from opt-bufferization. (#121544)
Optimized bufferization can transform hlfir.assign into a loop
nest doing element per element assignment, but it avoids
doing so for RHS that is hlfir.expr. This is done to let
ElementalAssignBufferization pattern to try to do a better job.

This patch moves the hlfir.assign inlining after opt-bufferization,
and enables it for hlfir.expr RHS.

The hlfir.expr RHS cases are present in tonto, and this patch
results in some nice improvements. Note that those cases
are handled by other compilers also using array temporaries,
so this patch seems to just get rid of the Assign runtime
overhead/inefficiency.
2025-01-03 08:33:14 -08:00
Krzysztof Parzyszek
adeff9f63a [flang][OpenMP] Allow utility constructs in specification part (#121509)
Allow utility constructs (error and nothing) to appear in the
specification part as well as the execution part. The exception is
"ERROR AT(EXECUTION)" which should only be in the execution part.
In case of ambiguity (the boundary between the specification and the
execution part), utility constructs will be parsed as belonging to the
specification part. In such cases move them to the execution part in the
OpenMP canonicalization code.
2025-01-03 09:21:36 -06:00
Krzysztof Parzyszek
df859f90aa [flang][OpenMP] Frontend support for NOTHING directive (#120606)
Create OpenMPUtilityConstruct and put the two utility directives in it
(error and nothing). Rename OpenMPErrorConstruct to OmpErrorDirective.
2025-01-03 08:36:34 -06:00
Valentin Clement (バレンタイン クレメン)
6dcd2b035d [flang][cuda] Convert cuf.sync_descriptor to runtime call (#121524)
Convert the op to a new entry point in the runtime
`CUFSyncGlobalDescriptor`
2025-01-02 17:02:59 -08:00
Valentin Clement (バレンタイン クレメン)
4b17a8b10e [flang][cuda] Add operation to sync global descriptor (#121520)
Introduce cuf.sync_descriptor to be used to sync device global
descriptor after pointer association.

Also move CUFCommon so it can be used in FIRBuilder lib as well.
2025-01-02 17:02:45 -08:00
Matthias Springer
c870632ef6 [flang] Fix some memory leaks (#121050)
This commit fixes some but not all memory leaks in Flang. There are
still 91 tests that fail with ASAN.

- Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does
not free allocations of nested blocks.
- Pass `ModuleOp` as value instead of reference.
- Add few missing deallocations in test cases and other places.
2024-12-25 09:42:03 +01:00
Ivan Aksamentov
2d3d62d77e [flang] fix: split ifndef for CHECK and CHECK_MSG (#114707)
Resolves https://github.com/llvm/llvm-project/issues/114703

I think it's the best practice that each macro has it's own `ifndef`
check and this way the build issue is resolved for me.

I also find the names of these macro a bit too generic - an easy recipe
for conflicts. In my case, the error was likely caused by something else
defining `CHECK` but not `CHECK_MSG`, so likely these `CHECK` and
`CHECK_MSG` weren't actually working at all because the result of
`ifndef` is always false.

As a definitive fix, perhaps it makes sense to rename them to something
more specific, e.g. `FLANG_CHECK` and `FLANG_CHECK_MSG`.
2024-12-25 07:47:30 +00:00
Valentin Clement (バレンタイン クレメン)
4cb2a519db Revert "Reland '[flang] Allow to pass an async id to allocate the descriptor (#118713)' and #118733" (#121029)
This still cause issue for device runtime build.
2024-12-23 21:27:34 -08:00
Valentin Clement (バレンタイン クレメン)
5b74fb75d9 Reland '[flang] Allow to pass an async id to allocate the descriptor (#118713)' and #118733 (#120997)
Device runtime build have been fixed. Attempt to re-land these patches
that have been approved before.

https://github.com/llvm/llvm-project/pull/118713
https://github.com/llvm/llvm-project/pull/118733
2024-12-23 12:13:56 -08:00
vdonaldson
c28a7c1efd [flang] Modifications to ieee_support_halting (#120747)
The F23 standard requires that a call to intrinsic module procedure
ieee_support_halting be foldable to a constant at compile time in some
contexts. See for example F23 Clause 10.1.11 [Specification expression]
list item (13), Clause 1.1.12 [Constant expression] list item (11), and
references to specification and constant expressions elsewhere, such as
constraints C1012, C853, and C704.

Some Arm processors allow a user to control processor behavior when an
arithmetic exception is signaled, and some Arm processors do not have
this capability. An Arm executable will run on either type of processor,
so it is effectively unknown at compile time whether or not this support
will be available at runtime. This in conflict with the standard
requirement.

This patch addresses this conflict by implementing ieee_support_halting
calls on Arm processors to check if this capability is present at
runtime. A call to ieee_support_halting in a constant context, such as
in the specification part of a program unit, will generate a compile
time "cannot be computed as a constant value" error. The expectation is
that such calls are unlikely to appear in production code.

Code generation for other processors will continue to generate a compile
time constant result for ieee_support_halting calls.
2024-12-23 09:30:45 -05:00
Valentin Clement (バレンタイン クレメン)
415cfaf339 [flang][cuda][NFC] Fix type in CUFFreeDescriptor (#120799) 2024-12-20 14:43:12 -08:00
Valentin Clement (バレンタイン クレメン)
e650ac1654 [flang][cuda][NFC] Fix typo in CUFAllocDescriptor (#120797)
Missing `r` in the function name.
2024-12-20 13:57:47 -08:00
Leandro Lupori
1fcb6a9754 [flang][OpenMP] Initialize allocatable members of derived types (#120295)
Allocatable members of privatized derived types must be allocated,
with the same bounds as the original object, whenever that member
is also allocated in it, but Flang was not performing such
initialization.

The `Initialize` runtime function can't perform this task unless
its signature is changed to receive an additional parameter, the
original object, that is needed to find out which allocatable
members, with their bounds, must also be allocated in the clone.
As `Initialize` is used not only for privatization, sometimes this
other object won't even exist, so this new parameter would need
to be optional.
Because of this, it seemed better to add a new runtime function:
`InitializeClone`.
To avoid unnecessary calls, lowering inserts a call to it only for
privatized items that are derived types with allocatable members.

Fixes https://github.com/llvm/llvm-project/issues/114888
Fixes https://github.com/llvm/llvm-project/issues/114889
2024-12-19 17:26:50 -03:00
Renaud Kauffmann
cb0effc0e6 [flang][cuda] Using nvvm intrinsics for the syncthread and threadfence families of calls (#120020) 2024-12-18 11:44:30 -08:00
Peter Klausler
fc97d2e68b [flang] Add UNSIGNED (#113504)
Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).

This is nearly identical to the UNSIGNED feature that has been available
in Sun Fortran for years, and now implemented in GNU Fortran for
gfortran 15, and proposed for ISO standardization in J3/24-116.txt.

See the new documentation for details; but in short, this is C's
unsigned type, with guaranteed modular arithmetic for +, -, and *, and
the related transformational intrinsic functions SUM & al.
2024-12-18 07:02:37 -08:00
Kareem Ergawy
e532241b02 Re-apply (#117867): [flang][OpenMP] Implicitly map allocatable record fields (#120374)
This re-applies #117867 with a small fix that hopefully prevents build
bot failures. The fix is avoiding `dyn_cast` for the result of
`getOperation()`. Instead we can assign the result to `mlir::ModuleOp`
directly since the type of the operation is known statically (`OpT` in
`OperationPass`).
2024-12-18 09:19:45 +01:00
Kareem Ergawy
dc936f3c19 Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867)" (#120360) 2024-12-18 06:52:24 +01:00
Kareem Ergawy
db09014a07 [flang][OpenMP] Implicitly map allocatable record fields (#117867)
This is a starting PR to implicitly map allocatable record fields.

This PR contains the following changes:
1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that
   these utils work on the `mlir::Value` level rather than the
   `semantics::Symbol` level. This takes one step towards to enabling
   MLIR passes to more easily do some lowering themselves (e.g. creating
   `omp.map.bounds` ops for implicitely caputured data like this PR
   does).
2. Adds support for implicitely capturing and mapping allocatable fields
   in record types.

There is quite some distant to still cover to have full support for
this. I added a number of todos to guide further development.

Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>

Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
2024-12-18 05:37:58 +01:00
Peter Klausler
a957cedea9 [flang] Handle substring in data statement constant (#120130)
The case of a constant substring wasn't handled in the parser for data
statement constants.

Fixes https://github.com/llvm/llvm-project/issues/119005.
2024-12-17 12:10:50 -08:00
Slava Zakharin
9d33874936 [flang] Support -f[no-]realloc-lhs. (#120165)
-frealloc-lhs is the default.
If -fno-realloc-lhs is specified, then an allocatable on the left
side of an intrinsic assignment is not implicitly (re)allocated
to conform with the right hand side. Fortran runtime will issue
an error if there is a mismatch in shape/type/allocation-status.
2024-12-17 09:06:05 -08:00
Slava Zakharin
a00946fc94 [flang] Simplify hlfir.sum total reductions. (#119482)
I am trying to switch to keeping the reduction value in a temporary
scalar location so that I can use hlfir::genLoopNest easily.
This also allows using omp.loop_nest with worksharing for OpenMP.
2024-12-13 13:08:28 -08:00
Mats Petersson
75e6d0eb4d [flang][OpenMP]Add support for OpenMP ERROR directive (#119582)
Lowering leads to a TODO, with a test to confirm.

Also testing unparse.

---------

Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
2024-12-13 14:05:48 +00:00
Slava Zakharin
139e69b7bc [flang] Simple folding for hlfir.shape_of. (#119649)
This folding makes sure there are no hlfir.shape_of users
of hlfir.elemental - this may enable more InlineElementals matches,
because it is looking for exactly two uses of an hlfir.elemental.
2024-12-12 10:38:34 -08:00
Krzysztof Parzyszek
03cbe42627 [flang][OpenMP] Rework LINEAR clause (#119278)
The OmpLinearClause class was a variant of two classes, one for when the
linear modifier was present, and one for when it was absent. These two
classes did not follow the conventions for parse tree nodes, (i.e.
tuple/wrapper/union formats), which necessitated specialization of the
parse tree visitor.

The new form of OmpLinearClause is the standard tuple with a list of
modifiers and an object list. The specialization of parse tree visitor
for it has been removed.
Parsing and unparsing of the new form bears additional complexity due to
syntactical differences between OpenMP 5.2 and prior versions: in OpenMP
5.2 the argument list is post-modified, while in the prior versions, the
step modifier was a post-modifier while the linear modifier had an
unusual syntax of `modifier(list)`.

With this change the LINEAR clause is no different from any other
clauses in terms of its structure and use of modifiers. Modifier
validation and all other checks work the same as with other clauses.
2024-12-12 12:19:35 -06:00
Krzysztof Parzyszek
58f9c4fc00 [flang][OpenMP] Semantic checks for IN_REDUCTION and TASK_REDUCTION (#118841)
Update parsing of these two clauses and add semantic checks for them.
Simplify some code in IsReductionAllowedForType and
CheckReductionOperator.
2024-12-12 12:19:12 -06:00
Valentin Clement (バレンタイン クレメン)
151901c762 [flang][rt][device] Use enum-set.h as Fortran.h (#119611) 2024-12-11 15:38:38 -08:00
Mats Petersson
00e1cc4c9d [flang][OpenMP]Add support for fail clause (#118683)
Support the atomic compare option of a fail(memory-order) clauses.

Additional tests introduced to check that parsing and semantics checks
for the new clause is handled.

Lowering for atomic compare is still unsupported and wil end in a TOOD
(aka "Not yet implemented"). A test for this case with the fail clause
is also present.
2024-12-11 16:29:02 +00:00
执着
e8baa792e7 Backtrace support for flang (#118179)
Fixed build failures in old PRs due to missing files
2024-12-10 10:31:48 +00:00
Yusuke MINATO
a88677edc0 Reland "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#118933)
This relands #110063.
The performance issue on 503.bwaves_r is found not to be related to the
patch, and is resolved by fbd89bcc when LTO is enabled.
2024-12-10 16:26:53 +09:00
Slava Zakharin
1ca392764a [flang] Added definition of hlfir.cshift operation. (#118732)
CSHIFT intrinsic will be lowered to this operation, which
then can be optimized as inline sequence or lowered into
a runtime call.
2024-12-09 07:55:22 -08:00
Valentin Clement (バレンタイン クレメン)
16c2a1016e Revert "[flang] Allow to pass an async id to allocate the descriptor (#118713)" (#119109)
This reverts commit 7d1c661381.

This commit breaks some device runtime builds. Need time to investigate.
2024-12-07 19:55:12 -08:00
Michael Kruse
c91ba04328 [Flang][NFC] Split runtime headers in preparation for cross-compilation. (#112188)
Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.

* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.

* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.

* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.

* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.

* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
 
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
2024-12-06 15:29:00 +01:00
Renaud Kauffmann
27e458c8cb [flang][cuda] Distinguish constant fir.global from globals with a #cuf.cuda<constant> attribute (#118912)
1. In `CufOpConversion` `isDeviceGlobal` was renamed
`isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal`
excludes constant `fir.global` operation from registration. This is to
avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not
have any symbols in the runtime. This was done for
`_FortranACUFRegisterVariable` in #118582, but also needs to be done
here after #118591
2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute
to the constant global. As discussed in #118582 a module variable with
the #cuf.cuda<constant> attribute is not a compile time constant. Yet,
the compile time constant also needs to be copied into the GPU module.
The candidates for copy to the GPU modules are
- the globals needing regsitrations regardless of their uses in device
code (they can be referred to in host code as well)
       - the compile time constant when used in device code 

3. The registration of "constant" module device variables (
#cuf.cuda<constant>) can be restored in `CufAddConstructor`
2024-12-05 18:36:48 -08:00
Valentin Clement (バレンタイン クレメン)
83ccaad473 [flang][cuda] Use async id for device stream allocation (#118733)
When stream is specified use cudaMallocAsync with the specified stream
2024-12-05 08:57:10 -08:00
jeanPerier
ff78cd5f3d [flang] fix private pointers and default initialized variables (#118494)
Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect
for pointers and derived type with default initialization.

For pointers, the descriptor was not established with the rank/type
code/element size, leading to undefined behavior if any inquiry was made
to it prior to a pointer assignment (and if/when using the runtime for
pointer assignments, the descriptor must have been established).

For derived type with default initialization, the copies were not
default initialized.
2024-12-05 14:09:48 +01:00
Michael Kruse
0cda970ecc [Flang][NFC] Split common headers to reduce dependencies. (#110244)
Fortran.h and target.h are defining symbols where some are used by both, the Fortran runtime (Flang-RT) and Fortran compiler (Flang), and others are used by Flang only. With the upcoming refactoring of the Fortran runtime into its own subproject (#110217), move the declarations that are used by both into new headers to minimize the amount of code that will need to be shared by Flang-RT and Flang.

Details:

 * `Fortran.h`: Flang-RT  only uses some enum definitions out of this file, but not `AsFortran` which is defined in `Fortran.cpp`. Moving the enums into `Fortran-consts.h` allows keeping `Fortran.cpp` within Flang.

 * `target.h`: Contains some floating-point definitions that is used by the non-GTest unittests in `fp-testing.h`. Flang-RT also uses some non-GTest as well. Moving those definitions avoids the dependence on the entire FortranEvaluate library.
2024-12-05 11:29:32 +01:00
Valentin Clement (バレンタイン クレメン)
7d1c661381 [flang] Allow to pass an async id to allocate the descriptor (#118713)
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.

This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.

A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
2024-12-04 18:24:40 -08:00
Valentin Clement (バレンタイン クレメン)
7efd6139f2 [flang][cuda] Get device address in fir.declare (#118591)
Add pattern that update fir.declare memref when it comes from a device
global and is not a descriptor. In that case, we recover the device
address that needs to be used in ops like `fir.array_coor` and so on.
2024-12-04 13:36:58 -08:00
vdonaldson
6003be7ef1 [flang] IEEE_GET_UNDERFLOW_MODE, IEEE_SET_UNDERFLOW_MODE (#118551)
Implement IEEE_GET_UNDERFLOW_MODE and IEEE_SET_UNDERFLOW_MODE. Update
IEEE_SUPPORT_UNDERFLOW_CONTROL to enable support for indvidual REAL
kinds.
2024-12-04 16:21:11 -05:00
Valentin Clement (バレンタイン クレメン)
5522d2462e [flang][cuda] Allow AbstractResult to run in gpu.module (#118529)
in CUDA Fortran, device function are converted to `gpu.func` inside the
`gpu.module` operation. Update the AbstractResult pass to be able to run
on `func.func` and `gpu.func` operations inside the `gpu.module`.
2024-12-03 14:04:49 -08:00
jeanPerier
cd7e65398f [flang] optimize array function calls using hlfir.eval_in_mem (#118070)
This patch encapsulate array function call lowering into
hlfir.eval_in_mem and allows directly evaluating the call into the LHS
when possible.

The conditions are: LHS is contiguous, not accessed inside the function,
it is not a whole allocatable, and the function results needs not to be
finalized. All these conditions are tested in the previous hlfir.eval_in_mem
optimization (#118069) that is leveraging the extension of getModRef to
handle function calls(#117164).

This yields a 25% speed-up on polyhedron channel2 benchmark (from 1min
to 45s measured on an X86-64 Zen 2).
2024-12-03 10:04:52 +01:00