Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
A designator without cosubscripts can have subscripts, component
references, substrings, &c. and still have corank. The current
IsCoarray() predicate only seems to work for whole variable/component
references. This was breaking some cases of THIS_IMAGE().
Implement constant folding for LCOBOUND and UCOBOUND intrinsic
functions. Moves some error detection code from intrinsics.cpp to
fold-integer.cpp so that erroneous calls get properly flagged and
converted into known errors.
Add `c_devloc` as intrinsic and inline it during lowering. `c_devloc` is
used in CUDA Fortran to get the address of device variables.
For the moment, we borrow almost all semantic checks from `c_loc` except
for the pointer or target restriction. The specifications of `c_devloc`
are are pretty vague and we will relax/enforce the restrictions based on
library and apps usage comparing them to the reference compiler.
This patch adds a canonicalization pattern for optimizing redundant
"pointer" fir.converts. Such converts prevent the StackArrays pass
to recognize fir.freemem for the corresponding fir.allocmem, e.g.:
```
%69 = fir.allocmem !fir.array<2xi32>
%71:2 = hlfir.declare %69(%70) {uniq_name = ".tmp.arrayctor"} :
(!fir.heap<!fir.array<2xi32>>, !fir.shape<1>) ->
(!fir.heap<!fir.array<2xi32>>, !fir.heap<!fir.array<2xi32>>)
%95 = fir.convert %71#1 :
(!fir.heap<!fir.array<2xi32>>) -> !fir.ref<!fir.array<2xi32>>
%100 = fir.convert %95 :
(!fir.ref<!fir.array<2xi32>>) -> !fir.heap<!fir.array<2xi32>>
fir.freemem %100 : !fir.heap<!fir.array<2xi32>>
```
I found this in `tonto`, but the change does not affect performance at all.
Anyway, it looks like a reasonable thing to do, and it makes easier
to compare the performance profiles with other compilers'.
This is trivially additional support for the existing ALLOCATE
directive, which allows an ALIGN clause.
The ALLOCATE directive is currently not implemented, so this is just
addding the necessary parser parts to allow the compiler to not say
"Huh? I don't get this" [or "Expected OpenMP construct"] when it
encounters the ALIGN clause.
Some parser testing is updated and a new todo test, just in case the
feature of align clause is not supported by the initial support for
ALLOCATE.
Optimized bufferization can transform hlfir.assign into a loop
nest doing element per element assignment, but it avoids
doing so for RHS that is hlfir.expr. This is done to let
ElementalAssignBufferization pattern to try to do a better job.
This patch moves the hlfir.assign inlining after opt-bufferization,
and enables it for hlfir.expr RHS.
The hlfir.expr RHS cases are present in tonto, and this patch
results in some nice improvements. Note that those cases
are handled by other compilers also using array temporaries,
so this patch seems to just get rid of the Assign runtime
overhead/inefficiency.
Allow utility constructs (error and nothing) to appear in the
specification part as well as the execution part. The exception is
"ERROR AT(EXECUTION)" which should only be in the execution part.
In case of ambiguity (the boundary between the specification and the
execution part), utility constructs will be parsed as belonging to the
specification part. In such cases move them to the execution part in the
OpenMP canonicalization code.
Introduce cuf.sync_descriptor to be used to sync device global
descriptor after pointer association.
Also move CUFCommon so it can be used in FIRBuilder lib as well.
This commit fixes some but not all memory leaks in Flang. There are
still 91 tests that fail with ASAN.
- Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does
not free allocations of nested blocks.
- Pass `ModuleOp` as value instead of reference.
- Add few missing deallocations in test cases and other places.
Resolves https://github.com/llvm/llvm-project/issues/114703
I think it's the best practice that each macro has it's own `ifndef`
check and this way the build issue is resolved for me.
I also find the names of these macro a bit too generic - an easy recipe
for conflicts. In my case, the error was likely caused by something else
defining `CHECK` but not `CHECK_MSG`, so likely these `CHECK` and
`CHECK_MSG` weren't actually working at all because the result of
`ifndef` is always false.
As a definitive fix, perhaps it makes sense to rename them to something
more specific, e.g. `FLANG_CHECK` and `FLANG_CHECK_MSG`.
The F23 standard requires that a call to intrinsic module procedure
ieee_support_halting be foldable to a constant at compile time in some
contexts. See for example F23 Clause 10.1.11 [Specification expression]
list item (13), Clause 1.1.12 [Constant expression] list item (11), and
references to specification and constant expressions elsewhere, such as
constraints C1012, C853, and C704.
Some Arm processors allow a user to control processor behavior when an
arithmetic exception is signaled, and some Arm processors do not have
this capability. An Arm executable will run on either type of processor,
so it is effectively unknown at compile time whether or not this support
will be available at runtime. This in conflict with the standard
requirement.
This patch addresses this conflict by implementing ieee_support_halting
calls on Arm processors to check if this capability is present at
runtime. A call to ieee_support_halting in a constant context, such as
in the specification part of a program unit, will generate a compile
time "cannot be computed as a constant value" error. The expectation is
that such calls are unlikely to appear in production code.
Code generation for other processors will continue to generate a compile
time constant result for ieee_support_halting calls.
Allocatable members of privatized derived types must be allocated,
with the same bounds as the original object, whenever that member
is also allocated in it, but Flang was not performing such
initialization.
The `Initialize` runtime function can't perform this task unless
its signature is changed to receive an additional parameter, the
original object, that is needed to find out which allocatable
members, with their bounds, must also be allocated in the clone.
As `Initialize` is used not only for privatization, sometimes this
other object won't even exist, so this new parameter would need
to be optional.
Because of this, it seemed better to add a new runtime function:
`InitializeClone`.
To avoid unnecessary calls, lowering inserts a call to it only for
privatized items that are derived types with allocatable members.
Fixes https://github.com/llvm/llvm-project/issues/114888
Fixes https://github.com/llvm/llvm-project/issues/114889
Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).
This is nearly identical to the UNSIGNED feature that has been available
in Sun Fortran for years, and now implemented in GNU Fortran for
gfortran 15, and proposed for ISO standardization in J3/24-116.txt.
See the new documentation for details; but in short, this is C's
unsigned type, with guaranteed modular arithmetic for +, -, and *, and
the related transformational intrinsic functions SUM & al.
This re-applies #117867 with a small fix that hopefully prevents build
bot failures. The fix is avoiding `dyn_cast` for the result of
`getOperation()`. Instead we can assign the result to `mlir::ModuleOp`
directly since the type of the operation is known statically (`OpT` in
`OperationPass`).
This is a starting PR to implicitly map allocatable record fields.
This PR contains the following changes:
1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that
these utils work on the `mlir::Value` level rather than the
`semantics::Symbol` level. This takes one step towards to enabling
MLIR passes to more easily do some lowering themselves (e.g. creating
`omp.map.bounds` ops for implicitely caputured data like this PR
does).
2. Adds support for implicitely capturing and mapping allocatable fields
in record types.
There is quite some distant to still cover to have full support for
this. I added a number of todos to guide further development.
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
-frealloc-lhs is the default.
If -fno-realloc-lhs is specified, then an allocatable on the left
side of an intrinsic assignment is not implicitly (re)allocated
to conform with the right hand side. Fortran runtime will issue
an error if there is a mismatch in shape/type/allocation-status.
I am trying to switch to keeping the reduction value in a temporary
scalar location so that I can use hlfir::genLoopNest easily.
This also allows using omp.loop_nest with worksharing for OpenMP.
This folding makes sure there are no hlfir.shape_of users
of hlfir.elemental - this may enable more InlineElementals matches,
because it is looking for exactly two uses of an hlfir.elemental.
The OmpLinearClause class was a variant of two classes, one for when the
linear modifier was present, and one for when it was absent. These two
classes did not follow the conventions for parse tree nodes, (i.e.
tuple/wrapper/union formats), which necessitated specialization of the
parse tree visitor.
The new form of OmpLinearClause is the standard tuple with a list of
modifiers and an object list. The specialization of parse tree visitor
for it has been removed.
Parsing and unparsing of the new form bears additional complexity due to
syntactical differences between OpenMP 5.2 and prior versions: in OpenMP
5.2 the argument list is post-modified, while in the prior versions, the
step modifier was a post-modifier while the linear modifier had an
unusual syntax of `modifier(list)`.
With this change the LINEAR clause is no different from any other
clauses in terms of its structure and use of modifiers. Modifier
validation and all other checks work the same as with other clauses.
Support the atomic compare option of a fail(memory-order) clauses.
Additional tests introduced to check that parsing and semantics checks
for the new clause is handled.
Lowering for atomic compare is still unsupported and wil end in a TOOD
(aka "Not yet implemented"). A test for this case with the fail clause
is also present.
Split some headers into headers for public and private declarations in
preparation for #110217. Moving the runtime-private headers in
runtime-private include directory will occur in #110298.
* Do not use `sizeof(Descriptor)` in the compiler. The size of the
descriptor is target-dependent while `sizeof(Descriptor)` is the size of
the Descriptor for the host platform which might be too small when
cross-compiling to a different platform. Another problem is that the
emitted assembly ((cross-)compiling to the same target) is not identical
between Flang's running on different systems. Moving the declaration of
`class Descriptor` out of the included header will also reduce the
amount of #included sources.
* Do not use `sizeof(ArrayConstructorVector)` and
`alignof(ArrayConstructorVector)` in the compiler. Same reason as with
`Descriptor`.
* Compute the descriptor's extra flags without instantiating a
Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime
source, but not the compiler source.
* Move `InquiryKeywordHashDecode` into runtime-private header. The
function is defined in the runtime sources and trying to call it in the
compiler would lead to a link-error.
* Move allocator-kind magic numbers into common header. They are the
only declarations out of `allocator-registry.h` in the compiler as well.
This does not make Flang cross-compile ready yet, the main goal is to
avoid transitive header dependencies from Flang to clang-rt. There are
more assumptions that host platform is the same as the target platform.
1. In `CufOpConversion` `isDeviceGlobal` was renamed
`isRegisteredGlobal` and moved to the common file. `isRegisteredGlobal`
excludes constant `fir.global` operation from registration. This is to
avoid calls to `_FortranACUFGetDeviceAddress` on globals which do not
have any symbols in the runtime. This was done for
`_FortranACUFRegisterVariable` in #118582, but also needs to be done
here after #118591
2. `CufDeviceGlobal` no longer adds the `#cuf.cuda<constant>` attribute
to the constant global. As discussed in #118582 a module variable with
the #cuf.cuda<constant> attribute is not a compile time constant. Yet,
the compile time constant also needs to be copied into the GPU module.
The candidates for copy to the GPU modules are
- the globals needing regsitrations regardless of their uses in device
code (they can be referred to in host code as well)
- the compile time constant when used in device code
3. The registration of "constant" module device variables (
#cuf.cuda<constant>) can be restored in `CufAddConstructor`
Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect
for pointers and derived type with default initialization.
For pointers, the descriptor was not established with the rank/type
code/element size, leading to undefined behavior if any inquiry was made
to it prior to a pointer assignment (and if/when using the runtime for
pointer assignments, the descriptor must have been established).
For derived type with default initialization, the copies were not
default initialized.
Fortran.h and target.h are defining symbols where some are used by both, the Fortran runtime (Flang-RT) and Fortran compiler (Flang), and others are used by Flang only. With the upcoming refactoring of the Fortran runtime into its own subproject (#110217), move the declarations that are used by both into new headers to minimize the amount of code that will need to be shared by Flang-RT and Flang.
Details:
* `Fortran.h`: Flang-RT only uses some enum definitions out of this file, but not `AsFortran` which is defined in `Fortran.cpp`. Moving the enums into `Fortran-consts.h` allows keeping `Fortran.cpp` within Flang.
* `target.h`: Contains some floating-point definitions that is used by the non-GTest unittests in `fp-testing.h`. Flang-RT also uses some non-GTest as well. Moving those definitions avoids the dependence on the entire FortranEvaluate library.
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
Add pattern that update fir.declare memref when it comes from a device
global and is not a descriptor. In that case, we recover the device
address that needs to be used in ops like `fir.array_coor` and so on.
in CUDA Fortran, device function are converted to `gpu.func` inside the
`gpu.module` operation. Update the AbstractResult pass to be able to run
on `func.func` and `gpu.func` operations inside the `gpu.module`.
This patch encapsulate array function call lowering into
hlfir.eval_in_mem and allows directly evaluating the call into the LHS
when possible.
The conditions are: LHS is contiguous, not accessed inside the function,
it is not a whole allocatable, and the function results needs not to be
finalized. All these conditions are tested in the previous hlfir.eval_in_mem
optimization (#118069) that is leveraging the extension of getModRef to
handle function calls(#117164).
This yields a 25% speed-up on polyhedron channel2 benchmark (from 1min
to 45s measured on an X86-64 Zen 2).