This pass will mark functions called from TargetOp's
and declare target functions as implicitly declare
target by adding the MLIR declare target attribute
directly to the function.
This pass executes after the initial lowering of Fortran's PFT
to MLIR (FIR/OMP+Arith etc.) and is one of a series of passes
that aim to clean up the MLIR for offloading (seperate passes
in different patches, one for early outlining, another for declare
target function filtering).
Reviewers: jsjodin, skatrak, kiaranchandramohan
Differential Revision: https://reviews.llvm.org/D154247
This patch adds support for selecting which functions are lowered to LLVM IR
from MLIR depending on declare target information and whether host or device
code is being generated.
The approach proposed by this patch is to perform the filtering in two stages:
- An MLIR transformation pass, which is added to the Flang translation flow
after the `OMPEarlyOutliningPass`. The functions that are kept are those
that match the OpenMP processor (host or device) the compiler invocation
is targeting, according to the presence of the `-fopenmp-is-target-device`
compiler option and declare target information. All functions contaning an
`omp.target` are also kept, regardless of the declare target information of
the function, due to the need for keeping target regions visible for both
host and device compilation.
- A filtering step during translation to LLVM IR, which is peformed for those
functions that were kept because of the presence of a target region inside.
If the targeted OpenMP processor does not match the declare target
information of the function, then it is removed from the LLVM IR after its
contents have been processed and translated. Since they should only contain
an omp.target operation which, in turn, should have been outlined into
another LLVM IR function, the wrapper can be deleted at that point.
Depends on D150328 and D150329.
Differential Revision: https://reviews.llvm.org/D147641
This is an attempt at mimicing the method in which
threadprivate handles the following type of variables:
program main
integer :: i
!$omp declare target to(i)
end
Which essentially generates a GlobalOp for the variable (which
would normally only be an alloca) when it's instantiated. The
main difference is there is no operation generated within the
function, instead the declare target attribute is appended
later within handleDeclareTarget.
Reviewers: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D152037
The problem appeared as a segfault for case like this:
```
type t
character(11), allocatable :: c
end type
character(12), alloctable :: x
type(t) y
y = t(x)
```
The frontend representes `y = t(x)` as `y=t(c=%SET_LENGTH(x,11_8))`.
When 'x' is unallocated the hlfir.set_length lowering results in
segfault. It could probably be handled in hlfir.set_length lowering
by using NULL base for the hlfir.declare depending on the allocation
status of 'x', but I am not sure if !hlfir.expr, in general, is supposed
to represent an expression created from unallocated allocatable.
I believe in Fortran that would mean referencing an unallocated
allocatable, which is not allowed.
I decided to special case `SET_LENGTH` in structure constructor,
so that we use its 'x' operand as the RHS for the assign operation
implying the isAllocatable check for cases when 'x' is allocatable.
This requires setting keep_lhs_length_if_realloc flag for the assign
operation. Note that when the component being intialized has
deferred length the frontend does not produce `SET_LENGTH`.
Differential Revision: https://reviews.llvm.org/D155151
There was a bug with the -funderscoring / -fno-underscoring options from (https://reviews.llvm.org/D140795) that prevented the driver option from controlling the underscoring behaviour and instead the behaviour could only be controlled by the pass option instead of the driver option. The driver test case did not catch the bug and also needed to be updated.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D155042
This patch implements an early outlining transform of omp.target operations in
flang. The pass is needed because optimizations may cross target op region
boundaries, but with the outlining the resulting functions only contain a
single omp.target op plus a func.return, so there should not be any opportunity
to optimize across region boundaries.
The patch also adds an interface to be able to store and retrieve the parent
function name of the original target operation. This is needed to be able to
create correct kernel function names when lowering to LLVM-IR.
Reviewed By: kiranchandramohan, domada
Differential Revision: https://reviews.llvm.org/D154879
Currently the local builder used in IntrinsicCall doesn't have the
fastmath flags passed to it. This results in the fastmath attribute
not being added to certain runtime calls. This patch simply forwards
the fastmath flags from the parent builder.
Differential Revision: https://reviews.llvm.org/D154611
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
We will use hlfir.get_length to lower inquiries of char length
applied to hlfir.expr character values.
Reviewed By: tblah, jeanPerier
Differential Revision: https://reviews.llvm.org/D154560
Wrong error was reported mentioning that the common block was in
more than one data sharing clause.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D154393
The old code had overgrown itself and become difficult to read and
modify. I've rewritten it and moved it into its own translation unit.
I moved PreparedActualArgument to the header file for the
transformational intrinsic lowering. Logically, it belongs in
ConvertCall.h, but putting it there would create a circular dependency
between HlfirIntrinsics and ConvertCall.
Differential Revision: https://reviews.llvm.org/D154235
This patch lowers allocatables and pointers named in "private" OpenMP clause.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D148570
The soon-to-be-published next revision of the ISO Fortran language standard
contains a couple of breaking changes to previous specifications that may cause
existing programs to silently change their behavior.
For the change that introduces automatic reallocation of deferred length
allocatable character scalar variables when they appear as the targets
of internal WRITE statements, as IOMSG=/ERRMSG= variables, as outputs
of INQUIRE specifiers, or as INTENT(OUT) arguments to intrinsic
procedures, this patch adds an optional portability warning.
Differential Revision: https://reviews.llvm.org/D154242
When the analysis of hlfir.region_assign determined that the LHS region
evaluation may be impacted by the assignment effects, all LHS must be
fully evaluated and saved before any assignment is done.
This patch adds TemporaryStorage variants to save address, including
vector subscripted entities addresses whose shape must be saved.
It uses the DescriptorStack runtime to deal with complex cases inside
forall. For the sake of simplicity, this is also used for vector
subscripted LHS outside of foralls (each element address is saved as
a descriptor on this stack. This is a bit suboptimal, but it is a safe
start that will work with all kinds of type (polymorphic, PDTs...)
without further work). Another approach would be to saved only the
values that are conflicting in the LHS computation, but this would
require a much more complex analysis of the LHS region DAG.
Differential Revision: https://reviews.llvm.org/D154057
This is an assisting patch which is implemented to address review comment to switch std::list<Name> to OmpObjectlist from https://reviews.llvm.org/D142722.
Also addressed a semantic check https://github.com/llvm/llvm-project/issues/61161 OpenMP 5.2 standard states that only pointer variables (C_PTR, Cray pointers, POINTER or ALLOCATABLE items) can appear in SIMD aligned clause (section 5.11). And not to allow common block names on an ALIGNED clause.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D152637
This patch adds 'unordered' attribute handling the HLFIR elementals'
builders and fixes the attribute handling in lowering and transformations.
Depends on D154031, D154032
Reviewed By: jeanPerier, tblah
Differential Revision: https://reviews.llvm.org/D154035
By default, `hlfir.elemental` and `hlfir.elemental_addr` must process
the elements in order. The `unordered` attribute may be set,
if it is safe to process the elements out of order.
This patch just adds parsing support for the new attribute.
Reviewed By: jeanPerier, tblah
Differential Revision: https://reviews.llvm.org/D154032
This patch sets `unordered` `fir.do_loop` attribute during lowering
of elemental subroutine calls to HLFIR, when it is safe to do so.
Proper handling of `hlfir.elemental` will be done in a separate patch.
Reviewed By: jeanPerier, tblah
Differential Revision: https://reviews.llvm.org/D154031
Extend the SourceFile class to take account of #line directives
when computing source file positions for error messages.
Adjust the output of #line directives to -E output so that they
reflect any #line directives that were in the input.
Differential Revision: https://reviews.llvm.org/D153910
Some symbols were not resolved in the device, host and self clause
resulting in an `Internal: no symbol found` error.
This patch adds symbol resolution for these clauses.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D153919
In WHERE and masked FORALL assignment, both the mask and the
RHS may need to be saved in some temporary storage before evaluating
the assignment.
The code was trying to "optimize" that case when evaluating the RHS
by not fetching the mask temporary that was just created, but in simple
cases of WHERE construct where the evaluated mask is an hlfir.expr,
this caused the hlfir.expr to be both used in an hlfir.associate and
later in an hlfir.apply to create the fir.if to mask the RHS evaluation.
This double usage prevents codegen from inlining the hlfir.expr at the
hlfir.apply, and from "moving" the hlfir.expr storage into the temp
during hlfir.associate bufferization. So this is pessimizing the code:
this would lead to created two mask array temporary storages
This was caught by the unexpectedly high number of "not yet implemented:
hlfir.associate of hlfir.expr with more than one use" that were firing.
Use the mask temporary instead (the hlfir.associate result) when possible.
Some temporary (the "inlined stack") do not support fetching and pushing
in the same run (a single counter is used to keep track of the fetching
and pushing position). Add a canBeFetchedAfterPush() for safety,
but this limitation is anyway not relevant for hlfir.expr since the
inlined stack is only used to save "trivial" scalars.
Also update the temporary storage name to only indicate "forall" if
the top level construct is a FORALL. This is not a very precise name,
but it should at least give a correct context to indicate in the IR
why some temporary array storage was created.
Differential Revision: https://reviews.llvm.org/D153880
These are initial changes to experiment with building the Fortran runtime
as a CUDA or OpenMP target offload library.
The initial patch defines a set of macros that have to be used consistently
in Flang runtime source code so that it can be built for different
offload devices using different programming models (CUDA, HIP, OpenMP target
offload). Currently supported modes are:
* CUDA: Flang runtime may be built as a fatlib for the host and a set
of CUDA architectures specified during the build. The packaging
of the device code is done by the CUDA toolchain and may differ
from toolchan to toolchain.
* OpenMP offload:
- host_device mode: Flang runtime may be built as a fatlib for the host
and a set of OpenMP offload architectures. The packaging
of the device code is done by the OpenMP offload compiler and may differ
from compiler to compiler.
OpenMP offload 'nohost' mode is a TODO to match the build setup
of libomptarget/DeviceRTL. Flang runtime will be built as LLVM Bitcode
library using Clang/LLVM toolchain. The host part of the library
will be "empty", so there will be two distributable object: the host
Flang runtime and dummy host library with device Flang runtime pieces
packaged using clang-offload-packager and clang.
In all supported modes, enabling parts of Flang runtime for the device
compilation can be done iteratively to make the patches observable.
Note that at any point in time the resulting library may have unresolved
references to not yet enabled parts of Flang runtime.
Example cmake/make commands for building with Clang for NVPTX target:
cmake \
-DFLANG_EXPERIMENTAL_CUDA_RUNTIME=ON \
-DCMAKE_CUDA_ARCHITECTURES=80 \
-DCMAKE_C_COMPILER=/clang_nvptx/bin/clang \
-DCMAKE_CXX_COMPILER=/clang_nvptx/bin/clang++ \
-DCMAKE_CUDA_COMPILER=/clang_nvptx/bin/clang \
/llvm-project/flang/runtime/
make -j FortranRuntime
Example cmake/make commands for building with Clang OpenMP offload:
cmake \
-DFLANG_EXPERIMENTAL_OMP_OFFLOAD_BUILD="host_device" \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DFLANG_OMP_DEVICE_ARCHITECTURES="sm_80" \
../flang/runtime/
make -j FortranRuntime
Differential Revision: https://reviews.llvm.org/D151173
This patch adds support for vector subscripted assignment left-hand
side. It does not yet add support for the cases where the LHS must be
saved because its evaluation could be impacted by the assignment.
The implementation adds an hlfir::ElementalOpInterface to share the
elemental inlining utility and some other tools between
hlfir::ElementalOp and hlfir::ElelemntalAddrOp.
It adds generateYieldedLHS() to allow retrieving the LHS value
in lowering, whether or not it is vector subscripted. If it is vector
subscripted, this utility creates a loop nest iterating over the
elements and returns the address of an element.
Differential Revision: https://reviews.llvm.org/D153759
When `AssignOp` is used with LHS that is a compiler generated temporary
special care must be taken to initialize the temporary and avoid
finalizations of its components. This change-set adds optional
`temporary_lhs` attribute for `AssignOp` to convey this information
to HLFIR-to-FIR conversion pass. Currently, this results in
calling `AssignTemporary` runtime for doing the assignment.
Reviewed By: jeanPerier, tblah
Differential Revision: https://reviews.llvm.org/D152482
This patch adds an hlfir operation called `char_extremum`, which takes the
lexicographic comparison between a variadic number (minimum of 2 arguments) of
characters.
Discussion for this work can be found in the draft revision found
[here](https://reviews.llvm.org/D143326). The reason I'm not promoting that draft to
a true patch for review was because I needed to separate out the op
definition/codegen and lowering as two separate patches, as preferred by
@jeanPerier.
Differential Revision: https://reviews.llvm.org/D152474
Lower user defined assignment inside the hlfir.region_assign
"userDefinedAssignment" mlir region.
This is done by adding an entry point to ConvertCall.h in order
to call genUserCall with the region block arguments as arguments.
The codegen for hlfir.region_assign with user defined assignment
will be added in a later patch.
Differential Revision: https://reviews.llvm.org/D153404
When a LEN type parameter of one PDT is being used as the value
of a LEN type parameter in another PDT, expression rewriting can
loop infinitely due to an incorrect assumption that the same PDT's
parameters are being referenced.
Fixes LLVM bug https://github.com/llvm/llvm-project/issues/63198
Differential Revision: https://reviews.llvm.org/D153465
Remove the `DenseMapInfo<std::variant<Ts...>>` variant out from
`llvm/ADT/DenseMapInfo.h` into a separate header
`llvm/ADT/DenseMapInfoVariant.h`
This allows us to remove the `<variant>` include, which is being
transitively and unncessary included in all translation units that
include `llvm/ADT/DenseMap.h`.
There have been similar changes to move out specializations for
* `APInt.h` fd7e309e02 and
* `StringRef.h`/`ArrayRef.h`
983565a6fe
to reduce the compilation time. As we are unable to move the
specialization into `<variant>`, we create a separate
`DenseMapInfoVariant.h` header that can be used by anyone who needs this
specialization.
This reduces the total number of preprocessing tokens across the LLVM
source files in lib from (roughly) 1,964,876,961 to 1,936,551,496 - a
reduction of ~1.44%. This should result in a small improvement in
compilation time.
Differential Revision: https://reviews.llvm.org/D150997
It seems just replacing the operation was not replacing all of the uses
when the types of the expression before and after this pass differ (due
to differing shape information). Now the shape information is always
kept the same.
This fixes https://github.com/llvm/llvm-project/issues/63399
Differential Revision: https://reviews.llvm.org/D153333
Adds a new HLFIR operation for the COUNT intrinsic according to
the design set out in flang/docs/HighLevel.md. This patch includes all
the necessary changes to create a new HLFIR operation and lower it into
the fir runtime call.
Author was @jacob-crawley. Minor adjustments by @tblah
Differential Revision: https://reviews.llvm.org/D152521
Codegen only supports conversions between logicals and integers. The
verifier should reflect this.
Differential Revision: https://reviews.llvm.org/D152935
This patch moves PPC intrinsic generator code to PPCIntrinsicCall.cpp. In order to move PowerPC intrinsic code out of IntrinsicCall.cpp, we need to also move some declarations to IntrinsicCall.h. handlers[] and mathOperations[] were also chosen to be moved to the IntrinsicCall header. Similarly, ppcHandlers[] and ppcMathOperations[] were moved to the PPCIntrinsicCall header. There are future patches coming up that will introduce many new PPC intrinsics, these will now be defined in PPCIntrinsicCall.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D152460
[flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region
This supports lowering of stop statement in OpenMP/OpenACC region.
* OpenMP/OpenACC: Emit `fir.unreachable` only if the block is not
terminated by any terminator. This avoids knocking off an existing
OpenMP/OpenACC terminator.
* OpenMP: Emit the OpenMP terminator instead of `fir.unreachable` since
OpenMP regions can only be terminated by OpenMP terminators. This is
currently skipped for OpenACC since unstructured code is not yet
handled specially in OpenACC lowering.
Fixes#60737Fixes#61877
Co-authored-by: Kiran Chandramohan <kiranchandramohan@gmail.com>
Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>
Reviewed By: vdonaldson, peixin
Differential Revision: https://reviews.llvm.org/D129969
Add lowering support for the min operator
in reduction clause.
Depends on D151565
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D151671
Add parsing supprot for dim in gang clause
Depends on D151971
Reviewed By: razvanlupusoru, jeanPerier
Differential Revision: https://reviews.llvm.org/D151972
Use the new firstprivate representation on the comupte construct.
Reviewed By: razvanlupusoru, jeanPerier
Differential Revision: https://reviews.llvm.org/D151975
This patch adds parser support for the force modifier on the collapse clause
introduced in OpenACC 3.3.
Lowering will currently hit a TODO as the MLIR representation of the acc.loop
might need some update.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D151974
Promised interfaces allow for a dialect to "promise" the implementation of an interface, i.e.
declare that it supports an interface, but have the interface defined in an extension in a library
separate from the dialect itself. A promised interface is powerful in that it alerts the user when
the interface is attempted to be used (e.g. via cast/dyn_cast/etc.) and the implementation has
not yet been provided. This makes the system much more robust against misconfiguration,
and ensures that we do not lose the benefit we currently have of defining the interface in
the dialect library.
Differential Revision: https://reviews.llvm.org/D120368
In a future patch we plan on introducing a large set of Power-PC specific intrinsics. During our prototyping we found that the number of function type generators we were defining, plus those already defined, were reaching an unreasonable number. This patch introduces a generic function type generator function that can be used for almost all cases. The generator supports creating function types with up to 4 arguments and with arguments/return type of types: void, integer, real, and comlex. The intention is for a future patch, which introduces a set of PowerPC-specific vector intrinsics, to also introduce support in the generator for: integer vector, unsigned vector, and real vector types.
Reviewed By: luporl
Differential Revision: https://reviews.llvm.org/D151812