Straightforward computation of `A − FLOOR (A / P) * P` should
produce NaN, when P is infinity. The -menable-no-infs lowering
can still use the relaxed operations sequence.
Recurrencies in the call graph (even if they are not executed)
prevent computing the minimal stack size required for a kernel
execution. This change disables some functionality of F18 IO
to avoid recursive calls. A couple of functions are rewritten
to work without using recursion.
Added `FLANG_LIBCUDACXX_PATH` CMake variable to specify
installation of header-only libcudacxx library.
If it is specified, the `<cuda/std/variant>` is used to provide
implementation of `std::variant`.
Supports the REDUCE() transformational intrinsic function of Fortran
(see F'2023 16.9.173) in a manner similar to the existing support for
SUM(), PRODUCT(), &c. There are APIs for total reductions to scalar
results, and APIs for partial reductions that reduce the rank of the
argument by one.
This implementation requires more functions than other reductions
because the various possible types of the user-supplied OPERATION=
function need to be elaborated.
Once the basic API in reduce.h has been approved, later patches will
implement lowering.
REDUCE() is primarily for completeness, not portability; only one other
Fortran compiler implements this F'2018 feature today, and only some
types work correctly with it.
This commit adds required files into the offload build closure,
which means adding RT_API_ATTRS and other markers.
The implementation does not work for CUDA yet, because of
std::variant,swap,reverse usage. These issues will be resolved
separately (e.g. by using libcudacxx header files).
A file unit is emulated via a temporary buffer that accumulates
the output, which is printed out via std::printf at the end
of the IO statement. This implementation will be used for the offload
devices.
This is a simplified implementation of std::reference_wrapper that can be used
in the offload builds for the device code. The methods are properly
marked with RT_API_ATTRS so that the device compilation succedes.
Reviewers: jeanPerier, klausler
Reviewed By: jeanPerier
Pull Request: https://github.com/llvm/llvm-project/pull/85178
This is a simplified implementation of std::optional that can be used
in the offload builds for the device code. The methods are properly
marked with RT_API_ATTRS so that the device compilation succedes.
Reviewers: klausler, jeanPerier
Reviewed By: jeanPerier
Pull Request: https://github.com/llvm/llvm-project/pull/85177
The maximum number of significant hexadecimal digits in EX0.0 REAL
output editing is 29, not 28. Fix by computing it at build time from the
precision of REAL(16).
Avoid referencing executionEnvironment in the device code, since
environment.cpp is not part of the CUDA build yet.
This is a temporary fix before #85182 is merged.
At the end of an internal output statement, be sure to finish any
following control edit descriptors in the format (if any), and (for
output) advance to the next record. Return the right I/O error status
code if output overruns the buffer.
…check
Add an environment variable by which a user can disable the pointer
validation check in DEALLOCATE statement handling. This is not safe, but
it can help make a code work that allocates a pointer with an extended
derived type, associates its target with a pointer to one of its
ancestor types, and then deallocates that pointer.
Certain functions in glibc have "nonnull" attributes on pointer
parameters (even in cases where passing a null pointer should be handled
correctly). There are a few cases of such calls in flang: memcmp and
memcpy with the length parameter set to 0.
Avoid passing a null pointer to these functions, since the conflict with
the nonnull attribute could cause an undefined behavior.
This was detected by the undefined behavior sanitizer.
For `LDBL_MANT_DIG == 113` targets the FortranFloat128Math library
is just an interface library that provides sources and compilation
options to be used for building FortranRuntime - there are not extra
dependencies on other libraries, so it can be a part of FortranRuntime,
which helps to avoid extra linking steps in the compiler driver.
Targets with __float128 support in libc will also use this path.
Other targets, where the math support comes from
FLANG_RUNTIME_F128_MATH_LIB,
FortranFloat128Math is built as a standalone static library,
and the compiler driver needs to conduct the linking.
Flang APIs for COMPLEX(16) are just thin C wrappers around
the C math functions. Flang uses C _Complex ABI for passing/returning
COMPLEX values, so the runtime is aligned to this.
For `LDBL_MANT_DIG == 113` targets the REAL(16) versions of F18
runtime APIs can stay and should better stay in FortranRuntime.
This way, no additional linking actions are required, because
glibc provides all that is needed.
I thought I would isolate all REAL(16) implementations (both
via `__float128` and `long double`) into Float128Math library,
but that was a bad idea.
This should fix aarch64 buildbots failing gfortran tests.
Changed the lowering to call Norm2DimReal16 for REAL(16).
Added the corresponding entry point to FortranFloat128Math,
which required some restructuring in the related templates.
The reductions implementations rely on trivial operations that
are supported by the build compiler runtime, so they can be enabled
whenever the build compiler provides 128-bit float support.
std::conj used by DOT_PRODUCT is a template implementation
in most environments, so it should not introduce a dependency
on any 128-bit float support library. I am not goind to
test it in all the build environments before merging.
If it fails for someone, I will deal with it.
We can use 'long double' variants of the math functions in this case.
I used the callees from STD namespace, except for the Bessel's
functions.
The new code can be enabled with -DFLANG_RUNTIME_F128_MATH_LIB=libm.
Support for complex data types is pending.
This PR does not include support for COMPLEX(16) intrinsics.
Note that (fp ** int) operations do not require Float128Math library,
as they are implemented via basic F128 operations,
which are supported by the build compilers' runtimes.
An implied ENDFILE record, which truncates an external file, should be
written to a sequential unit whenever the file is repositioned for a
BACKSPACE or REWIND statement if a WRITE statement has executed since
the last OPEN/BACKSPACE/REWIND.
But the REC= and POS= positioning specifiers don't apply to sequential
units (they're for direct and stream units, resp.), so don't truncate
the file when they're used.
Implemented few entry points for REAL(16) math in FortranF128Math
static library. It is a thin wrapper around GNU libquadmath.
Flang driver can always link it, and the dependencies will
be brought in as needed.
The final Fortran program/library that uses any of the entry points
will depend on the underlying third-party library - this dependency
has to be resolved somehow. I added FLANG_RUNTIME_F128_MATH_LIB
CMake control so that the compiler driver and the runtime library
can be built using the same third-party library: this way the linker
knows which dependency to link in (under --as-needed).
The compiler distribution should specify which third-party library
is required for linking/running the apps that use REAL(16).
The compiler package may provide a version of the third-party library
or at least a stub library that can be used for linking, but
the final program execution will still require the actual library.
Spread, reshape, pack, and other transformational intrinsic runtimes are
using `CopyElement` utility to copy elements. This utility was dealing
with deep copies, but only when the allocatable components where
"immediate" components of the type being copied. If the allocatable
components were nested inside a nonpointer/nonallocatable component,
they were not deep copied, leading to bugs later when manipulating the
value (or double free when applying #81117).
Visit data components with allocatable components (using the
noDestructionNeeded flag to avoid expensive and useless type visit when
there are no such components).
The runtime was currently only deallocating the direct allocatable
components, which caused leaks when there are allocatable components
nested in the direct components.
Update Destroy to recursively destroy components.
Also call Destroy from Assign to deallocate nested allocatable
components before doing the assignment as required by F2018 9.7.3.2
point 7.
This lack of deallocation was visible if the nested components had user
defined assignment "observing" the allocation state.
When testing the arguments to see whether they are integers, check first
that they are within the maximum range of a 64-bit integer; otherwise, a
value of larger magnitude will set an invalid operand exception flag.
Finish plugging-in ASYNCHRONOUS IO in lowering (GetAsynchronousId was
not used yet).
Add a runtime implementation for GetAsynchronousId (only the signature
was defined). Always return zero since flang runtime "fakes"
asynchronous IO (data transfer are always complete, see
flang/docs/IORuntimeInternals.md).
Update all runtime integer argument and results for IDs to use the
AsynchronousId int alias for consistency.
In lowering, asynchronous attribute is added on the hlfir.declare of
ASYNCHRONOUS variable, but nothing else is done. This is OK given the
synchronous aspects of flang IO, but it would be safer to treat these
variable as volatile (prevent code motion of related store/loads) since
the asynchronous data change can also be done by C defined user
procedure (see 18.10.4 Asynchronous communication). Flang lowering
anyway does not give enough info for LLVM to do such code motions (the
variables that are passed in a call are not given the noescape
attribute, so LLVM will assume any later opaque call may modify the
related data and would not move load/stores of such variables
before/after calls even if it could from a pure Fortran point of view
without ASYNCHRONOUS).
When a real-valued reference to the MOD/MODULO intrinsic functions has
operands that are exact integers, use the fast exact integer algorithm
rather than calling std::fmod.
The intrinsic is defined as a GNU extension here:
https://gcc.gnu.org/onlinedocs/gfortran/SIGNAL.html
And as an IBM extension here:
https://www.ibm.com/docs/en/xffbg/121.141?topic=procedures-signali-proc-extension
The IBM version provides a compatible subset of the functionality
offered by the GNU version. This patch supports most of the GNU
features, but not calling SIGNAL as a function. We don't currently
support intrinsics being both subroutines AND functions and this changed
seemed too large to be justified by a non-standard intrinsic.
I cannot point to open source code Fortran using this intrinsic. This is
needed for a proprietary code base.
The code that parses repeat counts, field widths, &c. from FORMAT
strings has an incorrect overflow check, so the maximum integer value is
not accepted. Fix.
Fixes https://github.com/llvm/llvm-project/issues/79255.
The new accurate algorithm for real MOD and MODULO in the runtime is not
as fast as std::fmod(), which is also accurate. So use std::fmod() for
those floating-point types that it supports.
Fixes https://github.com/llvm/llvm-project/issues/78641.