Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
`The **is_device_ptr** clause indicates that its list items are device
pointers.`
`The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.`
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
`Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address`
`Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc`
This adds support for complex type to the OpenMP reductions.
Note that some more work would be needed to give decent error messages when complex
is used in ways that need client supplied functions (e.g. MAX or MIN). It does fail these with
a not so user friendly message at present.
This patch contains slight modifications to the reverted PR #85258 to
avoid issues with constructs containing multiple reduction clauses,
uncovered by a test on the gfortran testsuite.
This reverts commit 9f80444c2e.
Re-use fir::getTypeAsString instead of creating something new here. This
spells integer names like i32 instead of i_32 so there is a lot of test
churn.
This PR changes the build system to use use the sources for the module
`omp_lib` and the `omp_lib.h` include file from the `openmp` runtime
project and not from a separate copy of these files. This will greatly
reduce potential for inconsistencies when adding features to the OpenMP
runtime implementation.
When the OpenMP subproject is not configured, this PR also disables the
corresponding LIT tests with a "REQUIRES" directive at the beginning of
the OpenMP test files.
---------
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
This has been tested with arrays with compile-time constant bounds.
Allocatable arrays and arrays with non-constant bounds are not yet
supported. User-defined reduction functions are also not yet supported.
The design is intended to work for arrays with non-constant bounds too
without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I
haven't fixed yet).
We need some way to get these runtime bounds into the reduction init and
combiner regions. To keep things simple for now I opted to always box
the array arguments so the box can be passed as one argument and the
lower bounds and extents read from the box. This has the disadvantage of
resulting in fir.box_dim operations inside of the critical section. If
these prove to be a performance issue, we could follow OpenACC reading
box lower bounds and extents before the reduction and passing them as
block arguments to the reduction init and combiner regions. I would
prefer to keep things simple for now.
Note: this implementation only works when the HLFIR lowering is used. I
don't think it is worth supporting FIR-only lowering because the plan is
for that to be removed soon.
OpenMP array reductions 6/6
Previous PR: https://github.com/llvm/llvm-project/pull/84957
Most FIR passes only look for FIR operations inside of functions (either
because they run only on func.func or they run on the module but iterate
over functions internally). But there can also be FIR operations inside
of fir.global, some OpenMP and OpenACC container operations.
This has worked so far for fir.global and OpenMP reductions because they
only contained very simple FIR code which doesn't need most passes to be
lowered into LLVM IR. I am not sure how OpenACC works.
In the long run, I hope to see a more systematic approach to making sure
that every pass runs on all of these container operations. I will write
an RFC for this soon.
In the meantime, this pass duplicates the CFG conversion pass to also
run on omp reduction operations. This is similar to how the
AbstractResult pass is already duplicated for fir.global operations.
OpenMP array reductions 2/6
Previous PR: https://github.com/llvm/llvm-project/pull/84952
Next PR: https://github.com/llvm/llvm-project/pull/84954
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
Reductions such as min are intrinsic procedures. This distinguishes them
from user defined reductions. Previously, the intrinsic attribute was
not set when visiting reduction clauses causing them to be missed.
wsloop-reduction-min.f90 (the other min reduction test) worked because
it contained "min" used as an intrinsic inside of the body of the
reduction. This allowed ResolveNamesVisitor::HandleProcedureName to set
the correct attribute on that Symbol.
This effectively implements some now deprecated OpenMP functionality
that some applications (most notably at the moment GenASiS)
unfortunately depend on (deprecated in specification version 5.2):
"If a list item in a use_device_ptr clause is not of type C_PTR, the
behavior is as if the list item appeared in a use_device_addr clause.
Support for such list items in a use_device_ptr clause is deprecated."
This PR downgrades the hard-error to a deprecated warning and "promotes"
the above cases by simply moving the offending operands from the
use_device_ptr value list to the back of the use_device_addr list (and
moves the related symbols, locs and types that form the BlockArgs
correspondingly) and then the generation of the target data construct
proceeds as normal.
Previously reduction variables were always passed by value into and out
of the initialization and combiner regions of the OpenMP reduction
declare operation.
This worked well for reductions of primitive types (and might perform
better than passing by reference). But passing by reference will be
useful for array and derived type reductions (e.g. to move allocation
inside of the init region).
Passing reductions by reference requires different LLVM-IR generation
when lowering from MLIR because some of the loads/stores/allocations
will now be moved inside of the init and combiner regions. This
alternate code generation is requested using a new attribute to
omp.wsloop and omp.parallel.
Existing lowerings from mlir are unaffected (these will continue to use
the by-value argument passing.
Flang will continue to pass by-value argument passing for trivial types
unless a (hidden) command line argument is supplied. Non-trivial types
will always use the by-ref lowering.
Array reductions are not ready yet (but are coming very soon). In the
meantime, this is tested by forcing existing reductions to use by-ref.
Commit series for by-ref OpenMP reductions 3/3
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on #84033).
As per the OpenMP standard, "If a variable appears in a link clause on a
declare target directive that does not have a device_type clause with
the nohost device-type-description then it is treated as if it had
appeared in a map clause with a map-type of tofrom" is an implicit
mapping rule. Before this change, such variables were mapped as to by
default.
I did not know how `-mmlir` flag works and was deferring the addition of
`--openm-enabled-delayed-privatization` until later because I thought
some work needs to be done to do that. This commit just adds some extra
`RUN` lines to delayed privatization tests to run them from `flang` as
well.
This patch seeks to create a process that happens on module finalization
for OpenMP, in which a list of operations that had declare target
directives applied to them and were not generated at the time of
processing the original declare target directive are re-checked to apply
the appropriate declare target semantics.
This works by maintaining a vector of declare target related data inside
of the FIR converter, in this case the symbol and the two relevant
unsigned integers representing the enumerators. This vector is added to
via a new function called from Bridge.cpp, insertDeferredDeclareTargets,
which happens prior to the processing of the directive (similarly to
getDeclareTargetFunctionDevice currently for requires), it effectively
checks if the Operation the declare target directive is applied to
currently exists, if it doesn't it appends to the vector. This is a
seperate function to the processing of the declare target via the
overloaded genOMP as we unfortunately do not have access to the list
without passing it through every call, as the AbstractConverter we pass
will not allow access to it (I've seen no other cases of casting it to a
FirConverter, so I opted to not do that).
The list is then processed at the end of the module in the
finalizeOpenMPLowering function in Bridge by calling a new function
markDelayedDeclareTargetFunctions which marks the latently generated
operations. In certain cases, some still will not be generated, e.g. if
an interface is defined, marked as declare target, but has no definition
or usage in the module then it will not be emitted to the module, so due
to these cases we must silently ignore when an operation has not been
found via it's symbol.
The main use-case for this (although, I imagine there is others) is for
processing interfaces that have been declared in a module with a declare
target directive but do not have their implementation defined in the
same module. For example, inside of a seperate C++ module that will be
linked in. In cases where the interface is called inside of a target
region it'll be marked as used on device appropriately (although,
realistically a user should explicitly mark it to match the
corresponding definition), however, in cases where it's used in a
non-clear manner through something like a function pointer passed to an
external call we require this explicit marking, which this patch adds
support for (currently will cause the compiler to crash).
This patch also adds documentation on the declare target process and
mechanisms within the compiler currently.
Internal procedures cannot be called directly from outside the host
procedure, so there is no point giving them external linkage. The only
reason flang did is because it is the default in MLIR.
Giving external linkage to them:
- prevents deleting them when not used/inlined by LLVM
- causes bugs with shared libraries (at least on linux x86-64) because
the call to the internal function could lead to a dynamic loader call
that would overwrite r10 register (the static chain pointer) due to
system calls and did not restore (it seems it does not expect r10 to be
used for PLT calls).
This patch gives internal linkage to internal procedures:
Note: the llvm.linkage attribute name cannot be obtained via a
getLinkageAttrName since it is not the same name as the one used in the
LLVM dialect. It is just a placeholder defined in
mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp until the func dialect
gets a real linkage model. So simply avoid hard coding it too many times
in lowering.
Adds basic support for emitting delayed privatizers from flang. So far,
only types of symbols are supported (i.e. scalars), support for more
complicated types will be added later. This also makes sure that
reduction and delayed privatization work properly together by merging
the
body-gen callbacks for both in case both clauses are present on the
parallel construct.
Add initial handling of OpenMP copyprivate clause in Flang.
When lowering copyprivate, Flang generates the copy function
needed by each variable and builds the appropriate
omp.single's CopyPrivateVarList.
This is patch 3 of 4, to add support for COPYPRIVATE in Flang.
Original PR: https://github.com/llvm/llvm-project/pull/73128
This patch adds support in flang for the depend clause in target and
target enter/update/exit constructs. Previously, the following line in a
fortran program would have resulted in the error shown below it.
!$omp target map(to:a) depend(in:a)
"not yet implemented: Unhandled clause DEPEND in TARGET construct"
This patch reworks the way that wsloop reduction operations function to
better match the expected semantics from the OpenMP specification,
following the rework of parallel reductions.
The new semantics create a private reduction variable as a block
argument which should be used normally for all operations on that
variable in the region; this private variable is then combined with the
others into the shared variable. This way no special omp.reduction
operations are needed inside the region. These block arguments follow
the loop control block arguments.
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
This patch reworks the way that parallel reduction operations function
to better match the expected semantics from the OpenMP specification.
Previously specific omp.reduction operations were used inside the
region, meaning that the reduction only applied when the correct
operation was used, whereas the specification states that any change to
the variable inside the region should be taken into account for the
reduction.
The new semantics create a private reduction variable as a block
argument which should be used normally for all operations on that
variable in the region; this private variable is then combined with the
others into the shared variable. This way no special omp.reduction
operations are needed inside the region.
This patch only makes the change for the `parallel` operation, the
change for the `wsloop` operation will be in a separate patch.
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
In some cases, when privatizing a threadprivate common block, the
original symbol will correspond to the common block, instead of
its threadprivate version. This can happen, for instance, with a
common block, declared in a separate module, used by a parent
procedure and privatized in its child procedure. In this case,
symbol lookup won't find a symbol in the parent procedure, but
only in the module where the common block was defined.
Fixes https://github.com/llvm/llvm-project/issues/65028
This patch seeks to add an initial lowering for pointers and allocatable variables
captured by implicit and explicit map in Flang OpenMP for Target operations that
take map clauses e.g. Target, Target Update. Target Exit/Enter etc.
Currently this is done by treating the type that lowers to a descriptor
(allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's
effectively what descriptor types lower to in LLVM-IR and what they're represented as
in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a structure
containing scalar and array elements that represent various aspects of the underlying
data being mapped (lower bound, upper bound, extent being the main ones of interest
in most cases) and a pointer to the allocated data. In this current iteration of the mapping
we map the structure in it's entirety and then attach the underlying data pointer and map
the data to the device, this allows most of the required data to be resident on the device
for use. Currently we do not support the addendum (another block of pointer data), but
it shouldn't be too difficult to extend this to support it.
The MapInfoOp generation for descriptor types is primarily handled in an optimization
pass, where it expands BoxType (descriptor types) map captures into two maps, one for
the structure (scalar elements) and the other for the pointer data (base address) and
links them in a Parent <-> Child relationship. The later lowering processes will then treat
them as a conjoined structure with a pointer member map.
This patch removes the omp.target module attribute, since the
information it held on the target CPU and features is available through
the fir.target_cpu and fir.target_features module attributes. Target
outlining during the MLIR to LLVM IR translation stage is updated, so
that these attributes, at that point available as llvm.func attributes,
are passed along to the newly created function.
`getDataOperandBaseAddr` retrieve the address of a value when we need to
generate bound operations. When switching to HLFIR, we did not really
handle the fact that this value was then pointing to the result of a
hlfir.declare. Because of that the `#1` value was being used. `#0` value
is carrying the correct information about lowerbounds and should be
used. This patch updates the `getDataOperandBaseAddr` function to use
the correct result value from hlfir.declare.
OpenMP standard differentiates between omp simd (2.9.3.1) and omp do/for
simd (2.9.3.2 for OpenMP 5.0 standard) pragmas. The first one describes
the loop which needs to be vectorized. The second pragma describes the
loop which needs to be workshared between existing threads. Each thread
can use SIMD instructions to execute its chunk of the loop.
That's why we need to model
```
!$omp simd
do-loop
```
as `omp.simdloop` operation and add compiler hints for vectorization.
The worksharing loop:
!$omp do simd
do-loop
should be represented as worksharing loop (`omp.wsloop`).
Currently Flang denotes both types of OpenMP pragmas by `omp.simdloop`
operation. In consequence we cannot differentiate between:
```
!$omp parallel simd
do-loop
```
and
```
!$omp parallel do simd
do-loop
```
The second loop should be workshared between multiple threads. The first one describes the loop which needs to be redundantly executed by multiple threads. Current Flang implementation does not perform worksharing for `!$omp do simd` pragma and generates valid code only for the first case.
The `map` clause in OpenMP allows structure components to be specified
(unlike other clauses). Structure components do get their own symbols,
but these are not meant to be instantiated.
When a component reference is passed as an argument to the omp.target
op, it gets a corresponding parameter in the target op's entry block.
The original symbols are then bound to the same kind of an extended
value as before, but the value is now based on the parameters.
To handle structure components more gracefully, put their symbols on the
list of mapped objects, but skip them when creating extended values.
Fixes https://github.com/llvm/llvm-project/issues/79478.
When a default(none) clause exists and a threadprivate variable is used
inside the construct, the variable does not inherit threadprivate
behavior and throws the below error.
> error: The DEFAULT(NONE) clause requires that 'a' must be listed in a
data-sharing attribute clause
Added a condition to skip the error if it is a threadprivate variable.
Fixes: https://github.com/llvm/llvm-project/issues/49545
This brings `createBodyOfOp` to its final intended form. First, input
privatization is performed, then the recursive lowering takes place, and
finally the output privatization (lastprivate) is done.
This enables fixing a known issue with infinite loops inside of an
OpenMP region, and the fix is included in this patch.
Fixes https://github.com/llvm/llvm-project/issues/74348.
Recursive lowering [5/5]
---------
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
If -nogpulib option is passed by the user, then the OpenMP device
runtime is not used and we should not emit globals to configure
debugging at compile-time for the device runtime.
Link to -nogpulib flag implementation for Clang:
https://reviews.llvm.org/D125314
The I/O runtime's API allows -1 to be passed for a unit number in a
READ, WRITE, or PRINT statement, where it gets replaced by 5 or 6 as
appropriate. This turns out to have been a bad idea, as it prevents the
I/O runtime from detecting and reporting a program's invalid attempt to
use -1 as an I/O unit number. So just pass 5 or 6 as appropriate.
Emits MLIR op corresponding to `!$omp target update` directive. So far,
only motion types: `to` and `from` are supported. Motion modifiers:
`present`, `mapper`, and `iterator` are not supported yet.
This is a follow up to #75047 & #75159, only the last commit is relevant
to this PR.
This patch changes the bounds generation code shared between OpenMP and OpenACC to always attach an extent to the bounds generation. This is
currently required for OpenMP descriptor lowering but may not
necessarily be required in the case of OpenACC.
This patch removes the val field from the `MapInfoOp`.
Previously when lowering `TargetOp`, the bounds information for the
`BoxValues` were also being mapped. Instead these ops are now cloned
inside the target region to prevent mapping of non reference typed
values.
Initialization of reduction variable for min-reduction is set to largest
negative value. As such, in presence of non-negative operands, min
reduction gives incorrect output. This patch initialises min-reduction
to use the maximum positive value instead, so that it can produce
correct output for the entire range of real valued operands.
Fixes https://github.com/llvm/llvm-project/issues/73101
This patch adds the following check from OpenMP 5.2.
```
If the directive has a clause, it must contain at least one enter clause
or at least one link clause.
```
Also added a warning for the deprication of `TO` clause on `DECLARE
TARGET` construct.
```
The clause-name to may be used as a synonym for the clause-name enter.
This use has been deprecated.
```
Based on the tests for to clause, the tests for enter clause are added.
This patch does not add tests where both to and enter clause are used together.
Hoist non-atomic expressions in atomic intrinsics. Use a list to collect
the non-atomic expressions since the max and min intrinsics can have
more than two operands.
Hoisting makes the lowering to LLVMIR of iand,ior,ieor intrinsics
trivial. For max and min this still results in multiple instructions in
the atomic region but the loads are removed, which should help improve
performance.