Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
`The **is_device_ptr** clause indicates that its list items are device
pointers.`
`The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.`
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
`Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address`
`Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc`
This adds support for complex type to the OpenMP reductions.
Note that some more work would be needed to give decent error messages when complex
is used in ways that need client supplied functions (e.g. MAX or MIN). It does fail these with
a not so user friendly message at present.
Flang supports source allocation to allocatable or pointers with a non
deferred length that do not match the source length. This documented at:
9708d09003/flang/docs/Extensions.md (L312)
The current lowering code was bugged when such explicit length allocate
object appeared after a deferred length object in the source allocation
list:
Since "lenParams" had been computed when generating allocation of the
deferred length object, the call to genSetDeferredLengthParameters was
not a no-op on when lowering the explicit length allocation, and the
explicit length was overridden with the source length.
The output of the program added in test was:
```
ZZheZZ
ZZhelloZZ
ZZhelloZZ
```
Instead of:
```
ZZheZZ
ZZhelloZZ
ZZhello ZZ
```
Skip genSetDeferredLengthParameters when the allocate object has non
deferred length.
The all one masks was not properly created for i128 types because
builder.createIntegerConstant ended-up truncating -1 to something
positive.
Add a builder.createAllOnesInteger/createMinusOneInteger helpers and use
them where createIntegerConstant(..., -1) was used.
Add an assert in createIntegerConstant to catch negative numbers for
i128 type.
Add more support for CUDA data transfer in assignment. This patch adds
device to device and device to host support. If device symbols are
present on the rhs, some implicit data transfer are initiated. A
temporary is created and the data are transferred to the host. The
expression is evaluated on the host and the assignment is done.
See F'2023 section 11.4: "If the stop-code in an ERROR STOP statement is
of type character or does not appear, it is recommended that a
processor-dependent nonzero value be supplied as the process exit
status"
Fixes https://github.com/llvm/llvm-project/issues/66581.
Straightforward computation of `A − FLOOR (A / P) * P` should
produce NaN, when P is infinity. The -menable-no-infs lowering
can still use the relaxed operations sequence.
This PR is to address `TODO(loc, "procedure pointer component default
initialization");`.
It handles default init for procedure pointer components in a derived
type that is 32 bytes or larger (Default init for smaller size type has
already been handled).
```
interface
subroutine sub()
end
end interface
type dt
real :: r1 = 5.0
procedure(real), pointer, nopass :: pp1 => null()
real, pointer :: rp1 => null()
procedure(), pointer, nopass :: pp2 => sub
end type
type(dt) :: dd1
end
```
This PR fixes `not yet implemented: procedure pointer component in
structure constructor` as shown in the following test case.
```
MODULE M
TYPE :: DT
PROCEDURE(Fun), POINTER, NOPASS :: pp1
END TYPE
CONTAINS
INTEGER FUNCTION Fun(Arg)
INTEGER :: Arg
Fun = Arg
END FUNCTION
END MODULE
PROGRAM MAIN
USE M
IMPLICIT NONE
TYPE (DT) :: v2
PROCEDURE(FUN), POINTER :: pp2
v2 = DT(pp2)
v2 = DT(bar())
CONTAINS
FUNCTION BAR() RESULT(res)
PROCEDURE(FUN), POINTER :: res
END
END
```
In CUDA Fortran data transfer can be done via assignment statements
between host and device variables.
This patch introduces a `fir.cuda_data_transfer` operation that
materialized the data transfer between two memory references.
Simple transfer not involving descriptors from host to device are also
lowered in this patch. When the rhs is an expression that required an
evaluation, a temporary is created. The evaluation is done on the host
and then the transfer is initiated.
Implicit transfer when device symbol are present on the rhs is not part
of this patch. Transfer from device to host is not part of this patch.
Cray pointee symbols can be host associated from a module or host
procedure while the related cray pointer is not explicitly associated.
This caused the "not yet implemented: lowering symbol to HLFIR" to fire
when lowering a reference to the cray pointee and fetching the cray
pointer.
This patch:
- Ensures cray pointers are always instantiated when instantiating a
cray pointee.
- Fix internal procedure lowering to deal with cray pointee host
association like it does for pointers (the lowering strategy for cray
pointee is to create a pointer that is updated with the cray pointer
value before being fetched).
This should fix the bug reported in
https://github.com/llvm/llvm-project/issues/85420.
This PR fixes a subset of procedure pointer component initialization in
structure constructor.
It covers
1. NULL()
2. procedure
For example:
```
MODULE M
TYPE :: DT
!PROCEDURE(Fun), POINTER, NOPASS :: pp1
PROCEDURE(Fun), POINTER :: pp1
END TYPE
CONTAINS
INTEGER FUNCTION Fun(Arg)
class(dt) :: arg
END FUNCTION
END MODULE
PROGRAM MAIN
USE M
IMPLICIT NONE
TYPE (DT), PARAMETER :: v1 = DT(NULL())
TYPE (DT) :: v2
v2 = DT(fun)
END
```
Passing a procedure pointer itself or reference to a function that
returns a procedure pointer is TODO.
This patch contains slight modifications to the reverted PR #85258 to
avoid issues with constructs containing multiple reduction clauses,
uncovered by a test on the gfortran testsuite.
This reverts commit 9f80444c2e.
The related functions are `gatherDataOperandAddrAndBounds` and
`genBoundsOps`. The former is used in OpenACC as well, and it was
updated to pass evaluate::Expr instead of parser objects.
The difference in the test case comes from unfolded conversions of index
expressions, which are explicitly of type integer(kind=8).
Delete now unused `findRepeatableClause2` and `findClause2`.
Add `AsGenericExpr` that takes std::optional. It already returns
optional Expr. Making it accept an optional Expr as input would reduce
the number of necessary checks when handling frequent optional values in
evaluator.
[Clause representation 4/6]
When the kernel launch has no arguments, the generated parser was
expecting at least a type to be present. Make the last part of the
assemble format optional.
Add a run line to round-trip the output through fir-opt so we make sure
the IR can be parsed and printed correctly.
Re-use fir::getTypeAsString instead of creating something new here. This
spells integer names like i32 instead of i_32 so there is a lot of test
churn.
This PR changes the build system to use use the sources for the module
`omp_lib` and the `omp_lib.h` include file from the `openmp` runtime
project and not from a separate copy of these files. This will greatly
reduce potential for inconsistencies when adding features to the OpenMP
runtime implementation.
When the OpenMP subproject is not configured, this PR also disables the
corresponding LIT tests with a "REQUIRES" directive at the beginning of
the OpenMP test files.
---------
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
This has been tested with arrays with compile-time constant bounds.
Allocatable arrays and arrays with non-constant bounds are not yet
supported. User-defined reduction functions are also not yet supported.
The design is intended to work for arrays with non-constant bounds too
without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I
haven't fixed yet).
We need some way to get these runtime bounds into the reduction init and
combiner regions. To keep things simple for now I opted to always box
the array arguments so the box can be passed as one argument and the
lower bounds and extents read from the box. This has the disadvantage of
resulting in fir.box_dim operations inside of the critical section. If
these prove to be a performance issue, we could follow OpenACC reading
box lower bounds and extents before the reduction and passing them as
block arguments to the reduction init and combiner regions. I would
prefer to keep things simple for now.
Note: this implementation only works when the HLFIR lowering is used. I
don't think it is worth supporting FIR-only lowering because the plan is
for that to be removed soon.
OpenMP array reductions 6/6
Previous PR: https://github.com/llvm/llvm-project/pull/84957
Most FIR passes only look for FIR operations inside of functions (either
because they run only on func.func or they run on the module but iterate
over functions internally). But there can also be FIR operations inside
of fir.global, some OpenMP and OpenACC container operations.
This has worked so far for fir.global and OpenMP reductions because they
only contained very simple FIR code which doesn't need most passes to be
lowered into LLVM IR. I am not sure how OpenACC works.
In the long run, I hope to see a more systematic approach to making sure
that every pass runs on all of these container operations. I will write
an RFC for this soon.
In the meantime, this pass duplicates the CFG conversion pass to also
run on omp reduction operations. This is similar to how the
AbstractResult pass is already duplicated for fir.global operations.
OpenMP array reductions 2/6
Previous PR: https://github.com/llvm/llvm-project/pull/84952
Next PR: https://github.com/llvm/llvm-project/pull/84954
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
`fir.cuda_kernel_launch` represents a call to a cuda kernel with the
chervon syntax. Its assembly format is meant to match `fir.call`. This
patch updates the format to match the syntax closer for args and their
types.
Polymorphic entity lowering status is good. The main remaining TODO is
to allow lowering of vector subscripted polymorphic entity, but this
does not deserve blocking all application using polymorphism.
Remove experimental option and enable lowering of polymorphic entity by
default.
The current lowering did not handle sequence associated argument passed
by descriptor. This case is special because sequence association implies
that the actual and dummy argument need to to agree in rank and shape.
Usually, arguments that can be sequence associated are passed by raw
address, and the shape mistmatch is transparent. But there are three
cases of explicit and assumed-size arrays passed by descriptors:
- polymorphic arguments
- BIND(C) assumed-length arguments (F'2023 18.3.7 (5)).
- length parametrized derived types (TBD)
The callee side is expecting a descriptor containing the dummy rank and
shape. This was not the case. This patch fix that by evaluating the
dummy shape on the caller side using the interface (that has to be
available when arguments are passed by descriptors).
Parsing of the cuf kernel loop directive has been updated to handle
variants with the * syntax. This patch updates the lowering to make use
of them.
- If the grid or block syntax uses only stars then the operation
variadic operand remains empty.
- If there is values and stars, then stars are represented as a zero
constant value.
Reductions such as min are intrinsic procedures. This distinguishes them
from user defined reductions. Previously, the intrinsic attribute was
not set when visiting reduction clauses causing them to be missed.
wsloop-reduction-min.f90 (the other min reduction test) worked because
it contained "min" used as an intrinsic inside of the body of the
reduction. This allowed ResolveNamesVisitor::HandleProcedureName to set
the correct attribute on that Symbol.
Comparing c_ptr type for equality or inequality is raising an error.
```
not yet implemented: intrinsic module procedure: c_ptr_eq
```
or this one for inequality
```
not yet implemented: intrinsic module procedure: c_ptr_ne
```
This patch adds a lowering for them and fix the `__fortran_builtins.f90` module for inequality.
Reland after fix has landed for circular modules #85309
Comparing c_ptr type for equality or inequality is raising an error.
```
not yet implemented: intrinsic module procedure: c_ptr_eq
```
or this one for inequality
```
not yet implemented: intrinsic module procedure: c_ptr_ne
```
This patch adds a lowering for them and fix the `__fortran_builtins.f90`
module for inequality.
This effectively implements some now deprecated OpenMP functionality
that some applications (most notably at the moment GenASiS)
unfortunately depend on (deprecated in specification version 5.2):
"If a list item in a use_device_ptr clause is not of type C_PTR, the
behavior is as if the list item appeared in a use_device_addr clause.
Support for such list items in a use_device_ptr clause is deprecated."
This PR downgrades the hard-error to a deprecated warning and "promotes"
the above cases by simply moving the offending operands from the
use_device_ptr value list to the back of the use_device_addr list (and
moves the related symbols, locs and types that form the BlockArgs
correspondingly) and then the generation of the target data construct
proceeds as normal.
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.
This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).
Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.
Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.
Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.
Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.
Commit series for by-ref OpenMP reductions 3/3
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
A mold argument need to be added to the hlfir.element_addr and set in
lowering so that when the hlfir.element_addr need to be turned into an
hlfir.elemental operation because the designator must be turned into a
value, the mold can be set on the hlfir.elemental to later allocate the
temporary according the the dynamic type.
This situation happens whenever the vector subscripted polymorphic
designator does not appear as an assignment left-hand side, or as an
IO-input item.
I initially thought retrieving the mold would be tricky if the dynamic
type of the designator was set by a part-ref of the right of the vector
subscripts ("array(vector)%polymorphic_comp"), but this turned out to be
impossible because:
1. A derived type component can be polymorphic only if it has the
POINTER or ALLOCATABLE attribute (F2023 C708).
2. Vector-subscripted part are ranked and F2023 C919 prohibits any
part-ref on the right of the rank part to have the POINTER or
ALLOCATABLE attribute.
=> If a vector subscripted designator is polymorphic, the vector
subscripted part is the rightmost part, and the mold is the base of the
vector subscripted part. This makes the retrieval of the mold easy in
lowering. The mold argument is always set to be the base of the vector
subscripted part when lowering the vector subscripted part, and it is
removed at the end of the designator lowering if the designator is not
polymorphic. This way there is no need to find back the mold from the
inside of the hlfir.element_addr body.
For non polymorphic entities, semantics knows the type size and rewrite
sizeof to `"cst element size" * size(x)`.
Lowering has to deal with the polymorphic case where the type size must
be retrieved from the descriptor (note that the lowering implementation
would work with any entity, polymorphic on not, it is just not used for
the non polymorphic cases).
Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on #84033).