`fir.local` ops are not supposed to have any uses at this point (i.e.
during lowering to LLVM). In case of serialization, the
`fir.do_concurrent` users are expected to have been lowered to
`fir.do_loop` nests. In case of parallelization, the `fir.do_concurrent`
users are expected to have been lowered to the target parallel model
(e.g. OpenMP).
This hopefully resolved a build issue introduced by
https://github.com/llvm/llvm-project/pull/142567 (see for example:
https://lab.llvm.org/buildbot/#/builders/199/builds/4009).
AAPCS64 reserves any of X9-X15 for a compiler to choose to use for this
purpose, and says not to use X16 or X18 like GCC (and the previous
implementation) chose to use. The X18 register may need to get used by
the kernel in some circumstances, as specified by the platform ABI, so
it is generally an unwise choice. Simply choosing a different register
fixes the problem of this being broken on any platform that actually
follows the platform ABI (which is all of them except EABI, if I am
reading this linux kernel bug correctly
https://lkml2.uits.iu.edu/hypermail/linux/kernel/2001.2/01502.html). As
a side benefit, also generate slightly better code and avoids needing
the compiler-rt to be present. I did that by following the XCore
implementation instead of PPC (although in hindsight, following the
RISCV might have been slightly more readable). That X18 is wrong to use
for this purpose has been known for many years (e.g.
https://www.mail-archive.com/gcc@gcc.gnu.org/msg76934.html) and also
known that fixing this to use one of the correct registers is not an ABI
break, since this only appears inside of a translation unit. Some of the
other temporary registers (e.g. X9) are already reserved inside llvm for
internal use as a generic temporary register in the prologue before
saving registers, while X15 was already used in rare cases as a scratch
register in the prologue as well, so I felt that seemed the most logical
choice to choose here.
When emitting the modules on which a module depends under the
-fhermetic-module-files options, eliminate duplicates by name rather
than by symbol addresses. This way, when a dependent module is in the
symbol table more than once due to the use of a nested hermetic module,
it doesn't get emitted multiple times to the new module file.
The global name semantic check was firing in a bogus way when BIND(C)
variables are in hermetic module.
Do not raise the error if one of the symbol with the conflicting global
name is an "hermetic variant" of the other.
The default relocation model for clang depends on the cmake flag
CLANG_DEFAULT_PIE_ON_LINUX. By default it is set to ON, but when it's
OFF, the default relocation model will be "static".
The outcome of the test running clang without any PIC/PIE flags will
depend on the cmake flag, so make sure it only runs when the flag is ON.
The parser will accept a wide variety of illegal attempts at forming an
ATOMIC construct, leaving it to the semantic analysis to diagnose any
issues. This consolidates the analysis into one place and allows us to
produce more informative diagnostics.
The parser's outcome will be parser::OpenMPAtomicConstruct object
holding the directive, parser::Body, and an optional end-directive. The
prior variety of OmpAtomicXyz classes, as well as OmpAtomicClause have
been removed. READ, WRITE, etc. are now proper clauses.
The semantic analysis consistently operates on "evaluation"
representations, mainly evaluate::Expr (as SomeExpr) and
evaluate::Assignment. The results of the semantic analysis are stored in
a mutable member of the OpenMPAtomicConstruct node. This follows a
precedent of having `typedExpr` member in parser::Expr, for example.
This allows the lowering code to avoid duplicated handling of AST nodes.
Using a BLOCK construct containing multiple statements for an ATOMIC
construct that requires multiple statements is now allowed. In fact, any
nesting of such BLOCK constructs is allowed.
This implementation will parse, and perform semantic checks for both
conditional-update and conditional-update-capture, although no MLIR will
be generated for those. Instead, a TODO error will be issues prior to
lowering.
The allowed forms of the ATOMIC construct were based on the OpenMP 6.0
spec.
This PR updates the flang lowering to explicitly implement the OpenACC
rules:
- As per OpenACC 3.3 standard section 2.9.6 independent clause: A loop
construct with no auto or seq clause is treated as if it has the
independent clause when it is an orphaned loop construct or its parent
compute construct is a parallel construct.
- As per OpenACC 3.3 standard section 2.9.7 auto clause: When the parent
compute construct is a kernels construct, a loop construct with no
independent or seq clause is treated as if it has the auto clause.
- Loops in serial regions are `seq` if they have no other parallelism
marking such as gang, worker, vector.
For now the `acc.loop` verifier has not yet been updated to enforce
this.
Symbols that have a pre-existing DSA set in the enclosing context should
not be made shared based on them being static duration variables.
Suggested-by: Leandro Lupori <leandro.lupori@linaro.org>
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
This fixes an issue with `TransferWriteOp`'s implementation of the
`MemoryEffectOpInterface` where the write effect was attached to the
stored value rather than the base.
This had the effect that when asking for the memory effects for the
input memref buffer using `getEffectsOnValue(...)`, the function would
return no-effects (as the effect would have been attached to the stored
value rather than the input buffer).
So far as I can tell this option is driver-only so we can just re-use
what already exists for clang. I've added a unit test based on clang's
unit test to demonstrate that the option is handled.
Still TODO is to ensure that flang-rt is built with the same macos
minimum version as compiler-rt. At the moment, setting the flang minimum
version to older than the macos version on which flang was built will
lead to link warnings because flangrt is built for version of macos on
which flang was built rather than the oldest supported version (as
compiler-rt is).
Starts the effort to map `do concurrent` locality specifiers to OpenMP
clauses. This PR adds support for basic specifiers (no `init` or `copy`
regions yet).
When a derived type declares a generic procedure binding of interest to
the runtime library, such as for ASSIGNMENT(=), it overrides any binding
that might have been present for the parent type.
Fixes https://github.com/llvm/llvm-project/issues/142414.
Two recently-added functions in Semantics/tools.h need some cleaning up
to conform to the coding style of the project. One of them should
actually be in Parser/tools.{h,cpp}, the other doesn't need to be
defined in the header.
Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime with new
implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, and Destroy.
Default derived type I/O is also recursive, but already disabled. It can
be added to this new framework later if the overall approach succeeds.
Note that derived type FINAL subroutine calls, defined assignments, and
defined I/O procedures all perform callbacks into user code, which may
well reenter the runtime library. This kind of recursion is not handled
by this change, although it may be possible to do so in the future using
thread-local work queues.
The effects of this restructuring on CPU performance are yet to be
measured.
This PR adds functionality to the `MapInfoFinalization` pass wherein the
underlying data pointer of a `fir.boxchar` is mapped as a member of the
parent boxchar.
This PR simply reverts a few lines in
bf60aa1c55 to their state in
bcba39a56f so that they are constant for
some of the build tests that require it. This should fix the breakage
caused by #142022.
Previously, a bug in the MemCptOpt LLVM IR pass caused issues with
adding alias tags for locally allocated objects for Fortran code.
However, the bug has now been fixed (https://github.com/llvm/llvm-project/pull/129537 ),
and we can safely enable alias tags for these objects. This change should
improve the accuracy of the alias analysis.
More accurate alias analysis assumes that Cray pointers do not alias
with other variables. This assumption is common among other compilers.
If the code violates this assumption, it can lead to incorrect results
(see: https://github.com/llvm/llvm-project/issues/141928)
This patch adds support for the -mrecip command line option. The parsing
of this options is equivalent to Clang's and it is implemented by
setting the "reciprocal-estimates" function attribute.
Also move the ParseMRecip(...) function to CommonArgs, so that Flang is
able to make use of it as well.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This change allows the flang CLI to accept `-W[no-]<feature>` flags matching the clang syntax and enable and disable usage and language feature warnings.
Reference types were being constructed from openmp private clauses without propagating volatility.
Fix this by checking the volatility of the original variable and add a test.
Prevent hlfir.declare output to be fir.box/class values with the
heap/pointer attribute to ensure the runtime descriptor attributes are
in line with the Fortran attributes for the entities being declared
(only fir.ref<box/class> can be ALLOCATABLE/POINTERS).
This fixes a bug where an associated entity inside a SELECT TYPE was being
unexpectedly reallocated inside assign runtime because the selector was allocatable
and this attribute was not properly removed when creating the descriptor
for the associated entity (that does not inherit the ALLOCATABLE/POINTER
attribute according to Fortran 2023 section 11.1.3.3).
Start removing debug intrinsics support -- starting with the flag that
controls production of their replacement, debug records. This patch
removes the command-line-flag and with it the ability to switch back to
intrinsics. The module / function / block level "IsNewDbgInfoFormat"
flags get hardcoded to true, I'll to incrementally remove things that
depend on those flags.
According to the OpenMP standard, variables with static storage duration
are predetermined as shared.
Add a check when creating implicit symbols for OpenMP to fix them
erroneously getting set to firstprivate.
Fixes llvm#140732.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
This adds another puzzle piece for the support of OpenMP DECLARE
REDUCTION functionality.
This adds support for operators with derived types, as well as declaring
multiple different types with the same name or operator.
A new detail class for UserReductionDetials is introduced to hold the
list of types supported for a given reduction declaration.
Tests for parsing and symbol generation added.
Declare reduction is still not supported to lowering, it will generate a
"Not yet implemented" fatal error.
Fixes#141306Fixes#97241Fixes#92832Fixes#66453
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.
We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.
Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:
`clang -cc1 -mprefer-vector-width=junk test.c'
With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
During `hlfir.assign` inlining and `ElementalAssignBufferization`
we can deduce the optimal shape from `lhs` and `rhs` shapes.
It is probably better be done in a separate pass that propagates
constant shapes, but I have not seen any benchmarks that would
benefit from this yet. So consider this as a workaround for a bigger
TODO issue.
The `ElementalAssignBufferization` case is from 465.tonto,
but I do not have performance results yet (I do not expect much).
If there is a read-effect operation inside `hlfir.elemental`,
there is no reason to block moving it to the assignment point
unless there are write-effect operations between the elemental
and the assignment. The previous code was disallowing the optimization
even if there were only read-effect operations in between.
This case is from 465.tonto, though this change does not improve
performance at all.
To map `fir.boxchar` types reliably onto an offload target, such as a
GPU, the `omp.map.info` operation is used to map the underlying data
pointer (`fir.ref<fir.char<k, ?>>`) wrapped by the `fir.boxchar` MLIR
value. The `omp.map.info` operation needs a pointer to the underlying
data pointer.
Given a reference to a descriptor (`fir.box`), the `fir.box_offset` is
used to obtain the address of the underlying data pointer. This PR
extends `fir.box_offset` to provide the same functionality for
`fir.boxchar` as well.
hlfir.copy_in implements copying non-contiguous array slices for
functions that take in arrays required to be contiguous through
flang-rt.
For large arrays of trivial types, this can incur overhead compared to a
plain, inlined copy loop.
To address that, add a new InlineHLFIRCopyIn optimisation pass to inline
hlfir.copy_in operations for trivial types.
For the time being, the pattern is only applied in cases where the
copy-in does not require a corresponding copy-out, such as when the
function being called declares the array parameter as intent(in).
Applying this optimisation reduces the runtime of thornado-mini's
DeleptonizationProblem by about 10%.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Polymorphic temporary are currently propagated as
fir.ref<fir.class<fir.heap<>>> because their allocation may be delayed
to the hlfir.assign copy (using realloc).
This patch moves away from this and directly allocate the temp and
propagate it as a fir.class.
The polymorphic temporaries creating is also simplified by avoiding the
need to call the runtime to setup the descriptor altogether (the runtime
is still call for the allocation currently because alloca/allocmem do
not support polymorphism).
If a "TASK DEPEND" clause is not given a valid task dependece type
modifier, the semantic checks for the clause will result in an ICE
because they assume that such modifiers will be present. Check whether
the modifiers are present and show an appropriate error instead of
crashing the compiler if they are not.
Fixes llvm#133678.
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Extends support for locality specifiers in `do concurrent` by supporting
data types that need `init` regions.
This further unifies the paths taken by the compiler for OpenMP
privatization clauses and `do concurrent` locality specifiers.
This patch refactors the Flang documentation CMake and Sphinx
configuration to address build issues.
**CMake changes**:
- Moves the `gen_rst_file_from_td()` call out of the HTML-only block so
that both `docs-flang-html` and `docs-flang-man` builds depend on the
generated `FlangCommandLineReference.rst` file.
**conf.py changes**:
- Introduces `myst_parser` dependency as a required Markdown parser for
both HTML and man builds.
- Introduces the correct source_suffix mapping for both .rst and .md
files.
- Populates the man_pages configuration so the main index page generates
a ` flang(1) `man page.
Fixes#141757
---------
Authored-by: Samarth Narang <samanara@qti.qualcomm.com>
Inaccessible procedure bindings can't be overridden, but DEFERRED
bindings must be in a non-abstract extension. We presently emit an error
for an attempt to override an inaccessible binding in this case. But
some compilers accept this usage, and since it seems safe enough, I'll
allow it with an optional warning. Codes can avoid this warning and
conform to the standard by changing the deferred bindings to be public.
For componentwise assignment in derived type intrinsic assignment, the
runtime type information's special binding table is currently populated
only with type-bound ASSIGNMENT(=) procedures that have the same derived
type for both arguments. This restriction excludes all defined
assignments for cases that cannot arise in this context, like defined
assignments from intrinsic types or incompatible derived types.
However, this restriction also excludes defined assignments from
distinct but compatible derived types, i.e. ancestors. Loosen it a
little to allow them.
Fixes https://github.com/llvm/llvm-project/issues/142151.
When a generic ASSIGNMENT(=) has elemental and non-elemental specific
procedures that match the actual arguments, the non-elemental procedure
must take precedence. We get this right for generics defined with
interface blocks, but the type-bound case fails if the non-elemental
specific takes a non-default PASS argument.
Fixes https://github.com/llvm/llvm-project/issues/141807.
DeclareSimdConstruct (and other declarative constructs) can currently
implicitly declare variables regardless of whether the source code
contains "implicit none" or not. This causes semantic analysis issues if
the implicit type does not match the declared type. To solve it, skip
implicit typing for OpenMPDeclarativeConstruct. Fixes issue #140754.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>