Emits MLIR op corresponding to `!$omp target update` directive. So far,
only motion types: `to` and `from` are supported. Motion modifiers:
`present`, `mapper`, and `iterator` are not supported yet.
This is a follow up to #75047 & #75159, only the last commit is relevant
to this PR.
C_FUNLOC was not handling procedure pointer argument correctly, the
issue lied in `hlfir::convertToBox` that did not handle procedure
pointers.
I modified the interface of `hlfir::convertToXXX` to take values on the
way because hlfir::Entity are fundamentally an mlir::Value with type
guarantees, so they should be dealt with by value as mlir::Value are
(they are very small).
This is a lot less complex than the data case where the shape has to be
accounted for, so the implementation is done inline.
One corner case will not be supported correctly for now: the case where
POINTER and TARGET points to the same internal procedure may return
false because lowering is creating fir.embox_proc each time the address
of an internal procedure is taken, so different thunk for the same
internal procedure/host link may be created and compare to false. This
will be fixed in a later patch that moves creating of internal procedure
fir.embox_proc in the host so that the addresses are the same when the
host link is the same. This change is required to properly support the
required lifetime of internal procedure addresses anyway (should be the
always be the lifetime of the host, even when the address is taken in an
internal procedure).
This patch fixes:
flang/lib/Optimizer/Transforms/StackArrays.cpp:452:7: error:
ignoring return value of function declared with 'nodiscard'
attribute [-Werror,-Wunused-result]
Re-land PR after being reverted because of buildbot failures.
This patch adds representation for `device_type` clause information on
compute construct (parallel, kernels, serial).
The `device_type` clause on compute construct impacts clauses that
appear after it. The values impacted by `device_type` are now tied with
an attribute array that represent the device_type associated with them.
`DeviceType::None` is used to represent the value produced by a clause
before any `device_type`. The operands and the attribute information are
parser/printed together.
This is an example with `vector_length` clause. The first value (64) is
not impacted by `device_type` so it will be represented with
DeviceType::None. None is not printed. The second value (128) is tied
with the `device_type(multicore)` clause.
```
!$acc parallel vector_length(64) device_type(multicore) vector_length(256)
```
```
acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) {
}
```
When multiple values can be produced for a single clause like
`num_gangs` and `wait`, an extra attribute describe the number of values
belonging to each `device_type`. Values and attributes are
parsed/printed together.
```
acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>])
```
While preparing this patch I noticed that the wait devnum is not part of
the operations and is not lowered. It will be added in a follow up
patch.
This patch adds representation for `device_type` clause information on
compute construct (parallel, kernels, serial).
The `device_type` clause on compute construct impacts clauses that
appear after it. The values impacted by `device_type` are now tied with
an attribute array that represent the device_type associated with them.
`DeviceType::None` is used to represent the value produced by a clause
before any `device_type`. The operands and the attribute information are
parser/printed together.
This is an example with `vector_length` clause. The first value (64) is
not impacted by `device_type` so it will be represented with
DeviceType::None. None is not printed. The second value (128) is tied
with the `device_type(multicore)` clause.
```
!$acc parallel vector_length(64) device_type(multicore) vector_length(256)
```
```
acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) {
}
```
When multiple values can be produced for a single clause like
`num_gangs` and `wait`, an extra attribute describe the number of values
belonging to each `device_type`. Values and attributes are
parsed/printed together.
```
acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>])
```
While preparing this patch I noticed that the wait devnum is not part of
the operations and is not lowered. It will be added in a follow up
patch.
The `acc` dialect operations now implement MemoryEffects interfaces in
the following ways:
- Data entry operations which may read host memory via `varPtr` are now
marked as so. The majority of them do NOT actually read the host memory.
For example, `acc.present` works on the basis of presence of pointer and
not necessarily what the data points to - so they are not marked as
reading the host memory. They still use `varPtr` though but this
dependency is reflected through ssa.
- Data clause operations which may mutate the data pointed to by
`accPtr` are marked as doing so.
- Data clause operations which update required structured or dynamic
runtime counters are marked as reading and writing the newly defined
`RuntimeCounters` resource. Some operations, like `acc.getdeviceptr` do
not actually use the runtime counters - but are marked as reading them
since the address obtained depends on the mapping operations which do
update the runtime counters. Namely, `acc.getdeviceptr` cannot be moved
across other mapping operations.
- Constructs are marked as writing to the `ConstructResource`. This may
be too strict but is needed for the following reasons: 1) Structured
constructs may not use `accPtr` and instead use `varPtr` - when this is
the case, data actions may be removed even when used. 2) Unstructured
constructs are currently used to aggregate multiple data actions. We do
not want such constructs removed or moved for now.
- Terminators are marked as `Pure` as in other dialects.
The current approach has the following limitations which may require
further improvements:
- Subsequent `acc.copyin` operations on same data do not actually read
host memory pointed to by `varPtr` but are still marked as so.
- Two `acc.delete` operations on same data may not mutate `accPtr` until
the runtime counters are zero (but are still marked as mutating).
- The `varPtrPtr` argument, when present, points to the address of
location of `varPtr`. When mapping to target device, an `accPtrPtr`
needs computed and this memory is mutated. This effect is not captured
since the current operations do not produce `accPtrPtr`.
- Runtime counter effects are imprecise since two operations with
differing `varPtr` increment/decrement different counters. Additionally,
operations with `varPtrPtr` mutate attachment counters.
- The `ConstructResource` is too strict and likely can be relaxed with
better modeling.
This makes an adjustment to the existing fir minloc/maxloc generation
code to handle functions with a dim=1 that produce a scalar result. This
should allow us to get the same benefits as the existing generated
minmax reductions.
This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.
Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.
Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.
Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
Lower procedure pointer components, except in the context of structure
constructor (left TODO).
Procedure pointer components lowering share most of the lowering logic
of procedure poionters with the following particularities:
- They are components, so an hlfir.designate must be generated to
retrieve the procedure pointer address from its derived type base.
- They may have a PASS argument. While there is no dispatching as with
type bound procedure, special care must be taken to retrieve the derived
type component base in this case since semantics placed it in the
argument list and not in the evaluate::ProcedureDesignator.
These components also bring a new level of recursive MLIR types since a
fir.type may now contain a component with an MLIR function type where
one of the argument is the fir.type itself. This required moving the
"derived type in construction" stackto the converter so that the object
and function type lowering utilities share the same state (currently the
function type utilty would end-up creating a new stack when lowering its
arguments, leading to infinite loops). The BoxedProcedurePass also
needed an update to deal with this recursive aspect.
Lowering was instantiating component symbols (but the last) in initial
target designator as if they were whole objects, leading to collisions
and bugs.
Fixes https://github.com/llvm/llvm-project/issues/75728
Implement the C struct passing ABI on X86-64 for the trivial case where
the structs have one element. This is required to cover some cases of
BIND(C) derived type pass with the VALUE attribute.
This takes the code from D144103 and extends it to maxloc, to allow the
simplifyMinMaxlocReduction method to work with both min and max
intrinsics by switching condition and limit/initial value.
Make sure we only load box and read its bounds when it is present.
- Add `AddrAndBoundInfo` struct to be able to carry around the `addr`
and `isPresent` values. This is likely to grow so we can make all the
access in a single `fir.if` operation.
Branching to an endif statement from outside of the if is nonconformant:
subroutine jump(n)
goto 6
if (n == 3) then
goto 7
6 end if
print *, 'pass'
return
7 print *, 'fail'
end
However, this branch was permitted up to f90. Account for this usage
when rewriting if constructs and if statements by suppressing rewriting
if the end statement is labeled.
The function `instantiateVariable` in Bridge.cpp has the following code:
```
if (var.getSymbol().test(
Fortran::semantics::Symbol::Flag::OmpThreadprivate))
Fortran::lower::genThreadprivateOp(*this, var);
if (var.getSymbol().test(
Fortran::semantics::Symbol::Flag::OmpDeclareTarget))
Fortran::lower::genDeclareTargetIntGlobal(*this, var);
```
Implement `handleOpenMPSymbolProperties` in OpenMP.cpp, move the above
code there, and have `instantiateVariable` call this function instead.
This would further separate OpenMP-related details into OpenMP.cpp.
There was some discussion on discourse[1] about allowing call to FIR
generation functions from other part of lowering belonging to OpenMP.
This solution exposes a simple `genEval` member function on the
`AbstractConverter` so that IR generation for PFT Evaluation objects can
be called from lowering outside of the FirConverter but not exposing it.
[1] https://discourse.llvm.org/t/openmp-lowering-from-pft-to-fir/75263
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
The adds a hlfir minloc intrinsic, similar to the minval intrinsic
already added, to help in the lowering of minloc. The idea is to later
add maxloc too, and from there add a simplification for producing minloc
with inlined elemental and hopefully less temporaries.
VALUE derived type are passed by reference outside of BIND(C) interface.
The ABI is much simpler and it is possible for these arguments to have
the OPTIONAL attribute.
In the BIND(C) context, these arguments must follow the C ABI for
struct, which may lead the data to be passed in register. OPTIONAL is
also forbidden for those arguments, so it is safe to directly use the
fir.type<T> type for the func.func argument.
Codegen is in charge of later applying the C passing ABI according to
the target (https://github.com/llvm/llvm-project/pull/74829).
In the context of C/Fortran interoperability (BIND(C)), it is possible
to give the VALUE attribute to a BIND(C) derived type dummy, which
according to Fortran 2018 18.3.6 - 2. (4) implies that it must be passed
like the equivalent C structure value. The way C structure value are
passed is ABI dependent.
LLVM does not implement the C struct ABI passing for LLVM aggregate type
arguments. It is up to the front-end, like clang is doing, to split the
struct into registers or pass the struct on the stack (llvm "byval") as
required by the target ABI.
So the logic for C struct passing sits in clang. Using it from flang
requires setting up a lot of clang context and to bridge FIR/MLIR
representation to clang AST representation for function signatures (in
both directions). It is a non trivial task.
See
https://stackoverflow.com/questions/39438033/passing-structs-by-value-in-llvm-ir/75002581#75002581.
Since BIND(C) struct are rather limited as opposed to generic C struct
(e.g. no bit fields). It is easier to provide a limited implementation
of it for the case that matter to Fortran.
This patch:
- Updates the generic target rewrite pass to keep track of both the new
argument type and attributes. The motivation for this is to be able to
tell if a previously marshalled argument is passed in memory (it is a C
pointer), or if it is being passed on the stack (has the byval llvm
attributes).
- Adds an entry point in the target specific codegen to marshal struct
arguments, and use it in the generic target rewrite pass.
- Implements limited support for the X86-64 case. So far, the support
allows telling if a struct must be passed in register or on the stack,
and to deal with the stack case. The register case is left TODO in this
patch.
The X86-64 ABI implemented is the System V ABI for AMD64 version 1.0
For a character literal that is split over more than one source line
with free form line continuation using '&'
at the end of one line but missing the standard-required '&' on the
continuation line, also handle the case of spaces at the beginning of
the continuation line.
For example,
PRINT *, 'don'&
't poke the bear'
now prints "don't poke the bear", like nearly all other Fortran
compilers do.
This is not strictly standard conforming behavior, and the compiler
emits a portability warning with -pedantic.
Fixes llvm-test-suite/Fortran/gfortran/regression/continuation_1.f90,
.../continuation_12.f90, and .../continuation_13.f90.
An INTENT(IN) attribute on a pointer dummy argument prevents
modification of the pointer itself only, not modification of any
component of its target. Fix this case without breaking definability
checking for pointer components of non-pointer INTENT(IN) dummy
arguments.
We have other targets with scalable vectors (e.g.RISC-V), and there
doesn't seem to be any particular reason these options can't be used on
those targets.
Types of AMDGPU address space were defined not only in Clang-specific class
but also in LLVM header.
If we unify the AMD GPU address space enumeration, then we can reuse it in
Clang, Flang and LLVM.
Changed semantic check from giving error to giving a warning about
deprecation from OpenMP 5.2 and later about checks for dummy argument
list-items present on IS_DEVICE_PTR clause.
This P is blocker for
https://github.com/llvm/llvm-project/pull/71255
When assigning to a whole allocatable, lowering is dealing with the
implicit conversion to preserve the RHS lower bounds. In case of
character KIND mismatch, the code was setting the new RHS length to the
one from the LHS, which is wrong for two reasons:
- no padding/truncation was actually done in the conversion
- the RHS length should anyway not be touched since the one from the
allocatable LHS may change to become the one of the RHS.
Update the code to preserve the RHS type length when materializing the
implicit character KIND conversion.
`nsw` is a flag for LLVM arithmetic operations meaning "no signed wrap".
If this keyword is present, the result of the operation is a poison
value if overflow occurs. Adding this keyword permits LLVM to re-order
integer arithmetic more aggressively.
In
https://discourse.llvm.org/t/rfc-changes-to-fircg-xarray-coor-codegen-to-allow-better-hoisting/75257/16
@vzakhari observed that adding nsw is useful to enable hoisting of
address calculations after some loops (or is at least a step in that
direction).
Classic flang also adds nsw to address calculations.
The specific terminator operation depends on what operation it is inside
of. The function `genOpenMPTerminator` performs these checks and selects
the appropriate type of terminator.
Remove partial duplication of that code, and replace it with a function
call. This makes `genOpenMPTerminator` be the sole source of OpenMP
terminators.
Replace explicit calls to
```
op = builder.create<SectionOp>(...)
createBodyOfOp<SectionOp>(op, ...)
```
with a single call to
```
createOpWithBody<SectionOp>(...)
```
This is NFC, that's what the `createOpWithBody` function does.
The function `genCommonBlockMember` is not specific to OpenMP, and it
could very well be a common utility. Move it to ConvertVariable.cpp
where it logically belongs.
Preliminary patch to change lowering/code generation to use
llvm::DataLayout information instead of generating "sizeof" GEP (see
https://github.com/llvm/llvm-project/issues/71507).
Fortran Semantic analysis needs to know about the target type size and
alignment to deal with common blocks, and intrinsics like
C_SIZEOF/TRANSFER. This information should be obtained from the
llvm::DataLayout so that it is consistent during the whole compilation
flow.
This change is changing flang-new and bbc drivers to:
1. Create the llvm::TargetMachine so that the data layout of the target
can be obtained before semantics.
2. Sharing bbc/flang-new set-up of the
SemanticConstext.targetCharateristics from the llvm::TargetMachine. For
now, the actual part that set-up the Fortran type size and alignment
from the llvm::DataLayout is left TODO so that this change is mostly an
NFC impacting the drivers.
3. Let the lowering bridge set-up the mlir::Module datalayout attributes
since it is doing it for the target attribute, and that allows the llvm
data layout information to be available during lowering.
For flang-new, the changes are code shuffling: the `llvm::TargetMachine`
instance is moved to `CompilerInvocation` class so that it can be used
to set-up the semantic contexts. `setMLIRDataLayout` is moved to
`flang/Optimizer/Support/DataLayout.h` (it will need to be used from
codegen pass for fir-opt target independent testing.)), and the code
setting-up semantics targetCharacteristics is moved to
`Tools/TargetSetup.h` so that it can be shared with bbc.
As a consequence, LLVM targets must be registered when running
semantics, and it is not possible to run semantics for a target that is
not registered with the -triple option (hence the power pc specific
modules can only be built if the PowerPC target is available.