Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
`The **is_device_ptr** clause indicates that its list items are device
pointers.`
`The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.`
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
`Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address`
`Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc`
Allocate statement for variable with CUDA attributes need to allocate
memory on the device and not the host. Add a proper TODO so we keep
track of work to be done for it.
This adds support for complex type to the OpenMP reductions.
Note that some more work would be needed to give decent error messages when complex
is used in ways that need client supplied functions (e.g. MAX or MIN). It does fail these with
a not so user friendly message at present.
Flang supports source allocation to allocatable or pointers with a non
deferred length that do not match the source length. This documented at:
9708d09003/flang/docs/Extensions.md (L312)
The current lowering code was bugged when such explicit length allocate
object appeared after a deferred length object in the source allocation
list:
Since "lenParams" had been computed when generating allocation of the
deferred length object, the call to genSetDeferredLengthParameters was
not a no-op on when lowering the explicit length allocation, and the
explicit length was overridden with the source length.
The output of the program added in test was:
```
ZZheZZ
ZZhelloZZ
ZZhelloZZ
```
Instead of:
```
ZZheZZ
ZZhelloZZ
ZZhello ZZ
```
Skip genSetDeferredLengthParameters when the allocate object has non
deferred length.
The all one masks was not properly created for i128 types because
builder.createIntegerConstant ended-up truncating -1 to something
positive.
Add a builder.createAllOnesInteger/createMinusOneInteger helpers and use
them where createIntegerConstant(..., -1) was used.
Add an assert in createIntegerConstant to catch negative numbers for
i128 type.
llvm-project/flang/lib/Lower/Bridge.cpp:3775:14:
error: variable 'nbDeviceResidentObject' set but not used [-Werror,-Wunused-but-set-variable]
unsigned nbDeviceResidentObject = 0;
^
1 error generated.
Add more support for CUDA data transfer in assignment. This patch adds
device to device and device to host support. If device symbols are
present on the rhs, some implicit data transfer are initiated. A
temporary is created and the data are transferred to the host. The
expression is evaluated on the host and the assignment is done.
See F'2023 section 11.4: "If the stop-code in an ERROR STOP statement is
of type character or does not appear, it is recommended that a
processor-dependent nonzero value be supplied as the process exit
status"
Fixes https://github.com/llvm/llvm-project/issues/66581.
This PR is to address `TODO(loc, "procedure pointer component default
initialization");`.
It handles default init for procedure pointer components in a derived
type that is 32 bytes or larger (Default init for smaller size type has
already been handled).
```
interface
subroutine sub()
end
end interface
type dt
real :: r1 = 5.0
procedure(real), pointer, nopass :: pp1 => null()
real, pointer :: rp1 => null()
procedure(), pointer, nopass :: pp2 => sub
end type
type(dt) :: dd1
end
```
Whenever lowering is checking if a function or global already exists in
the mlir::Module, it was doing module->lookup.
On big programs (~5000 globals and functions), this causes important
slowdowns because these lookups are linear. Use mlir::SymbolTable to
speed-up these lookups. The SymbolTable has to be created from the
ModuleOp and maintained in sync. It is therefore placed in the
converter, and FirOPBuilders can take a pointer to it to speed-up the
lookups.
This patch does not bring mlir::SymbolTable to FIR/HLFIR passes, but
some passes creating a lot of runtime calls could benefit from it too.
More analysis will be needed.
As an example of the speed-ups, this patch speeds-up compilation of
Whizard compare_amplitude_UFO.F90 from 5 mins to 2 mins on my machine
(there is still room for speed-ups).
There were several functions, mostly reduction-related, that were only
called from OpenMP.cpp. Remove them from OpenMP.h, and make them local
in OpenMP.cpp:
- genOpenMPReduction
- findReductionChain
- getConvertFromReductionOp
- updateReduction
- removeStoreOp
Also, move the function bodies out of the "public" section.
The clause templates defined in ClauseT.h were originally based on
flang's parse tree nodes. Since those representations are going to be
reused for clang (together with the clause splitting code), it makes
sense to separate them from flang, and instead have them based on the
actual OpenMP spec (v5.2).
The member names in the templates follow the naming presented in the
spec, and the representation (e.g. members) is derived from the clause
definitions as described in the spec.
Since the representations of some clauses has changed (while preserving
the information), the current code using the clauses (especially the
code converting parser::OmpClause to omp::Clause) needs to be adjusted.
This patch does not make any functional changes.
This PR fixes `not yet implemented: procedure pointer component in
structure constructor` as shown in the following test case.
```
MODULE M
TYPE :: DT
PROCEDURE(Fun), POINTER, NOPASS :: pp1
END TYPE
CONTAINS
INTEGER FUNCTION Fun(Arg)
INTEGER :: Arg
Fun = Arg
END FUNCTION
END MODULE
PROGRAM MAIN
USE M
IMPLICIT NONE
TYPE (DT) :: v2
PROCEDURE(FUN), POINTER :: pp2
v2 = DT(pp2)
v2 = DT(bar())
CONTAINS
FUNCTION BAR() RESULT(res)
PROCEDURE(FUN), POINTER :: res
END
END
```
In CUDA Fortran data transfer can be done via assignment statements
between host and device variables.
This patch introduces a `fir.cuda_data_transfer` operation that
materialized the data transfer between two memory references.
Simple transfer not involving descriptors from host to device are also
lowered in this patch. When the rhs is an expression that required an
evaluation, a temporary is created. The evaluation is done on the host
and then the transfer is initiated.
Implicit transfer when device symbol are present on the rhs is not part
of this patch. Transfer from device to host is not part of this patch.
Put all of the genOMP functions together, organize them in two groups:
for declarative constructs and for other (executable) constructs.
Replace visit functions for OpenMPDeclarativeConstruct and
OpenMPConstruct from listing individual visitors for each variant
alternative to using a single generic visitor. Essentially, going from
```
std::visit(
[](foo x) { genOMP(foo); }
[](bar x) { TODO }
[](baz x) { genOMP(baz); }
)
```
to
```
void genOMP(bar x) { // Separate visitor for an unhandled case
TODO
}
[...]
std::visit([&](auto &&s) { genOMP(s); }) // generic
```
This doesn't change any functionality, just reorganizes the functions a
bit. The intent here is to improve the readability of this file.
Cray pointee symbols can be host associated from a module or host
procedure while the related cray pointer is not explicitly associated.
This caused the "not yet implemented: lowering symbol to HLFIR" to fire
when lowering a reference to the cray pointee and fetching the cray
pointer.
This patch:
- Ensures cray pointers are always instantiated when instantiating a
cray pointee.
- Fix internal procedure lowering to deal with cray pointee host
association like it does for pointers (the lowering strategy for cray
pointee is to create a pointer that is updated with the cray pointer
value before being fetched).
This should fix the bug reported in
https://github.com/llvm/llvm-project/issues/85420.
This PR fixes a subset of procedure pointer component initialization in
structure constructor.
It covers
1. NULL()
2. procedure
For example:
```
MODULE M
TYPE :: DT
!PROCEDURE(Fun), POINTER, NOPASS :: pp1
PROCEDURE(Fun), POINTER :: pp1
END TYPE
CONTAINS
INTEGER FUNCTION Fun(Arg)
class(dt) :: arg
END FUNCTION
END MODULE
PROGRAM MAIN
USE M
IMPLICIT NONE
TYPE (DT), PARAMETER :: v1 = DT(NULL())
TYPE (DT) :: v2
v2 = DT(fun)
END
```
Passing a procedure pointer itself or reference to a function that
returns a procedure pointer is TODO.
This patch contains slight modifications to the reverted PR #85258 to
avoid issues with constructs containing multiple reduction clauses,
uncovered by a test on the gfortran testsuite.
This reverts commit 9f80444c2e.
The related functions are `gatherDataOperandAddrAndBounds` and
`genBoundsOps`. The former is used in OpenACC as well, and it was
updated to pass evaluate::Expr instead of parser objects.
The difference in the test case comes from unfolded conversions of index
expressions, which are explicitly of type integer(kind=8).
Delete now unused `findRepeatableClause2` and `findClause2`.
Add `AsGenericExpr` that takes std::optional. It already returns
optional Expr. Making it accept an optional Expr as input would reduce
the number of necessary checks when handling frequent optional values in
evaluator.
[Clause representation 4/6]
Re-use fir::getTypeAsString instead of creating something new here. This
spells integer names like i32 instead of i_32 so there is a lot of test
churn.
This has been tested with arrays with compile-time constant bounds.
Allocatable arrays and arrays with non-constant bounds are not yet
supported. User-defined reduction functions are also not yet supported.
The design is intended to work for arrays with non-constant bounds too
without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I
haven't fixed yet).
We need some way to get these runtime bounds into the reduction init and
combiner regions. To keep things simple for now I opted to always box
the array arguments so the box can be passed as one argument and the
lower bounds and extents read from the box. This has the disadvantage of
resulting in fir.box_dim operations inside of the critical section. If
these prove to be a performance issue, we could follow OpenACC reading
box lower bounds and extents before the reduction and passing them as
block arguments to the reduction init and combiner regions. I would
prefer to keep things simple for now.
Note: this implementation only works when the HLFIR lowering is used. I
don't think it is worth supporting FIR-only lowering because the plan is
for that to be removed soon.
OpenMP array reductions 6/6
Previous PR: https://github.com/llvm/llvm-project/pull/84957
This patch moves some code in PFT to MLIR OpenMP lowering to the
`ClauseProcessor` class. This is so that some behavior that is related
to certain clauses stays within the `ClauseProcessor` and it's not the
caller the one responsible for always doing this when the clause is
present.
In this patch some uses of `llvm::SmallVector` in Flang's lowering to
MLIR are replaced by other types (i.e. `llvm::ArrayRef` and
`llvm::SmallVectorImpl`) which are intended for these uses. This
generally prevents relying on always passing small vectors with a
particular number of elements in the stack.
Polymorphic entity lowering status is good. The main remaining TODO is
to allow lowering of vector subscripted polymorphic entity, but this
does not deserve blocking all application using polymorphism.
Remove experimental option and enable lowering of polymorphic entity by
default.
The current lowering did not handle sequence associated argument passed
by descriptor. This case is special because sequence association implies
that the actual and dummy argument need to to agree in rank and shape.
Usually, arguments that can be sequence associated are passed by raw
address, and the shape mistmatch is transparent. But there are three
cases of explicit and assumed-size arrays passed by descriptors:
- polymorphic arguments
- BIND(C) assumed-length arguments (F'2023 18.3.7 (5)).
- length parametrized derived types (TBD)
The callee side is expecting a descriptor containing the dummy rank and
shape. This was not the case. This patch fix that by evaluating the
dummy shape on the caller side using the interface (that has to be
available when arguments are passed by descriptors).
Parsing of the cuf kernel loop directive has been updated to handle
variants with the * syntax. This patch updates the lowering to make use
of them.
- If the grid or block syntax uses only stars then the operation
variadic operand remains empty.
- If there is values and stars, then stars are represented as a zero
constant value.
…essor
Rename `findRepeatableClause` to `findRepeatableClause2`, and make the
new `findRepeatableClause` operate on new `omp::Clause` objects.
Leave `Map` unchanged, because it will require more changes for it to
work.
[Clause representation 3/6]
Temporarily rename old clause list to `clauses2`, old clause iterator to
`ClauseIterator2`.
Change `findUniqueClause` to iterate over `omp::Clause` objects, modify
all handlers to operate on 'omp::clause::xyz` equivalents.
[Clause representation 2/6]
In file included from ../llvm-project/flang/lib/Lower/OpenMP/Clauses.cpp:9:
../llvm-project/flang/lib/Lower/OpenMP/Clauses.h:195:17: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
195 | return Clause{id, specific, source};
| ^~~~~~~~~~~~
| { }
The new set of classes representing OpenMP classes mimics the contents
of parser::OmpClause, but differs in a few aspects:
- it can be easily created, copied, etc.
- is based on semantics::SomeExpr instead of parser objects.
This set of classes is geared towards a language-agnostic representation
of OpenMP clauses. While the class members are still based on flang's
`parser::OmpClause`, the classes themselves are actually C++ templates
parameterized with types essential to represent necessary information.
The two parameters are
- `IdType`: the type that can represent object's identity (for flang it
will be `semantics::Symbol *`),
- `ExprType`: the type that can represent expressions, arithmetic and
object references (`semantics::MaybeExpr` in flang).
The templates are defined in a namespace `tomp` (to distinguish it from
`omp`).
This patch introduces file "Clauses.cpp", which contains instantiations
of all of the templates for flang. The instantiations are members of
namespace `omp`, and include an all-encompassing variant class
`omp::Clause`, which is the analog of `parser::OmpClause`.
The class `OmpObject` is represented by `omp::Object`, which contains
the symbol associated with the object, and `semantics::MaybeExpr`
representing the designator for the symbol reference. For each specific
clause in the variant `parser::OmpClause`, there exists a `make`
function that will generate the corresponding `omp::Clause` from it. The
intent was to use the make functions as variant visitors. The creation
of a clause instance would then follow the pattern:
```
omp::Clause clause = std::visit([](auto &&s) { return make(s, semaCtx); }, parserClause.u);
```
If a new clause `foo` is added in the future, then:
- a new template `tomp::FooT` representing the clause needs to be added
to ClauseT.h,
- a new instance needs to be created in flang, this can be as simple as
`using Foo = tomp::FooT<...>`,
- a new make function needs to be implemented to create object of class
Foo from `parser::OmpClause::Foo`.
This patch only introduces the new classes, they are not yet used
anywhere.
[Clause representation 1/6]