This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
#100690 introduces allocator registry with the ability to store
allocator index in the descriptor. This patch adds an attribute to
fir.embox and fircg.ext_embox to be able to set the allocator index
while populating the descriptor fields.
This patch enhances the descriptor with the ability to have specialized
allocator. The allocators are registered in a dedicated registry and the
index of the desired allocator is stored in the descriptor. The default
allocator, std::malloc, is registered at index 0.
In order to have this allocator index in the descriptor, the f18Addendum
field is repurposed to be able to hold the presence flag for the
addendum (lsb) and the allocator index.
Since this is a change in the semantic and name of the 7th field of the
descriptor, the CFI_VERSION is bumped to the date of the initial change.
This patch only adds the ability to have this features as part of the
descriptor but does not add specific allocator yet. CUDA fortran will be
the first user of this feature to allocate descriptor data in the
different type of device memory base on the CUDA attribute.
---------
Co-authored-by: Slava Zakharin <szakharin@nvidia.com>
Extend the runtime validation of deallocated pointers so that it also
works when pointers are allocated &/or deallocated outside Fortran.
Previously, bogus runtime errors would be reported for pointers
allocated via CFI_allocate() and deallocated in Fortran, and
CFI_deallocate() did not check that it was deallocating a whole
contiguous pointer that was allocated as such.
The standard requires a compiler to diagnose an incorrect use of a
pointer in a DEALLOCATE statement. The pointer must be associated with
an entire object that was allocated as a pointer (not allocatable) by an
ALLOCATE statement.
Implement by appending a validation footer to pointer allocations. This
is an extra allocated word that encodes the base address of the
allocation. If it is not found after the data payload when the pointer
is deallocated, signal an error. There is a chance of a false positive
result, but that should be vanishingly unlikely.
This change requires all pointer allocations (not allocatables) to take
place in the runtime in PointerAllocate(), which might be slower in
cases that could otherwise be handled with a native memory allocation
operation. I believe that memory allocation of pointers is less common
than with allocatables, which are not affected. If this turns out to
become a performance problem, we can inline the creation and
initialization of the footer word.
Fixes https://github.com/llvm/llvm-project/issues/78391.
Ensure that the f18Addendum flag is preserved in AllocatableApplyMold(),
that raw().type is reinitialized in AllocatableDeallocatePolymorphic(),
and that the implementations of SameTypeAs() and ExtendsTypeOf() handle
unallocated unlimited polymorphic arguments correctly.
Example:
```
module types
type t
real,allocatable :: c
end type t
contains
function h(x)
class(t),allocatable :: h
...
end function h
subroutine test
type(t),allocatable :: b(:)
allocate(b(2),source=h(2.5))
end subroutine test7
end module type
```
`DoFromSourceAssign` creates two descriptors for initializing
`b(1)` and `b(2)` from the result of `h`. This Create call
creates a descriptor without properly initialized addendum,
so the Assign just does shallow copies of the descriptor
representing result of `h` into `b(1)` and `b(2)`.
I modified Create code to properly establish the descriptor
for derived type case.
I had to keep the `addendum` argument to keep the testing
in `flang/unittests/Runtime/TemporaryStack.cpp`.
I extended the "closure" of the device code containing the initial
transformational.cpp. The device side of the library should not be
complete at least for some APIs. For example, I tested with C OpenMP
code calling BesselJnX0 with a nullptr descriptor that failed with
a runtime error when executing on a GPU.
I added `--expt-relaxed-constexpr` for NVCC compiler to avoid multiple
warnings about missing `__attribute__((device))` on constexpr methods
coming from C++ header files.
Source allocation with a zero sized array is legal, and the resulting
allocatable/pointer should be allocated/associated.
The current code skipped the actual allocation, leading the allocatable
or pointer to look unallocated/disassociated.
When a FINAL subroutine is being invoked for a discontiguous array, which can
happen for INTENT(OUT) dummy arguments and for some left-hand side variables
in intrinsic assignment statements, it may be the case that the subroutine
being called was defined with a dummy argument that requires contiguous data.
Extend the derived type descriptions used by the runtime to signify when
a special procedure binding requires contiguity; set the flags accordingly;
check them in the runtime support library, and, when necessary, use a
temporary shallow copy of the finalized array data in the call to the
final subroutine.
Differential Revision: https://reviews.llvm.org/D156760
BytesFor() used to return KIND for the size, which is not always
correct, so I changed it to return the size of the actual CppType
corresponding to the given category and kind.
MinElemLen() used to calculate size incorrectly (e.g. CFI_type_extended_double
was sized 10, whereas it may occupy more bytes on a target), so I changed it
to call BytesFor().
Additional changes were needed to resolve new failures for transformational
intrinsics. These intrinsics used to work for not fully supported
data types (e.g. REAL(3)), but now stopped working because CppType
cannot be computed for those categories/kinds. The solution is to use
known element size from the source argument(s) for establishing
the destination descriptor - the element size is all that is needed
for transformational intrinsics to keep working.
Note that this does not help cases, where runtime still has to
compute the element size, e.g. when it creates descriptors for
components of derived types. If the component has unsupported
data type, BytesFor() will still fail. So these cases require
adding support for the missing types.
New regression unit test in Runtime/Transformational.cpp
demonstrates the case that will start working properly with
this commit.
PointerDeallocate was silently doing nothing because it relied on
Destroy that doe not do anything for Pointers. Add an option to Destroy
in order to destroy pointers.
Add a unit test for PointerDeallocate.
Differential Revision: https://reviews.llvm.org/D122492
Profiling a basic internal real input read benchmark shows some
hot spots in the code used to prepare input for decimal-to-binary
conversion, which is of course where the time should be spent.
The library that implements decimal to/from binary conversions has
been optimized, but not the code in the Fortran runtime that calls it,
and there are some obvious light changes worth making here.
Move some member functions from *.cpp files into the class definitions
of Descriptor and IoStatementState to enable inlining and specialization.
Make GetNextInputBytes() the new basic input API within the
runtime, replacing GetCurrentChar() -- which is rewritten in terms of
GetNextInputBytes -- so that input routines can have the
ability to acquire more than one input character at a time
and amortize overhead.
These changes speed up the time to read 1M random reals
using internal I/O from a character array from 1.29s to 0.54s
on my machine, which on par with Intel Fortran and much faster than
GNU Fortran.
Differential Revision: https://reviews.llvm.org/D113697
If the source has an addendum, the descriptor that is being established
to describe a section over the source needs to copy the addendum so that
derived type information is correctly set in the descriptor being
established.
This allows namelist IO with derived type to work correctly.
Differential Revision: https://reviews.llvm.org/D113258
Move the closure of the subset of flang/runtime/*.h header files that
are referenced by source files outside flang/runtime (apart from unit tests)
into a new directory (flang/include/flang/Runtime) so that relative
include paths into ../runtime need not be used.
flang/runtime/pgmath.h.inc is moved to flang/include/flang/Evaluate;
it's not used by the runtime.
Differential Revision: https://reviews.llvm.org/D109107
Use derived type information tables to drive default component
initialization (when needed), component destruction, and calls to
final subroutines. Perform these operations automatically for
ALLOCATE()/DEALLOCATE() APIs for allocatables, automatics, and
pointers. Add APIs for use in lowering to perform these operations
for non-allocatable/automatic non-pointer variables.
Data pointer component initialization supports arbitrary constant
designators, a F'2008 feature, which may be a first for Fortran
implementations.
Differential Revision: https://reviews.llvm.org/D106297
This is *not* user-defined derived type I/O, but rather Fortran's
built-in capabilities for using derived type data in I/O lists
and NAMELIST groups.
This feature depends on having the derived type description tables
that are created by Semantics available, passed through compilation
as initialized static objects to which pointers can be targeted
in the descriptors of I/O list items and NAMELIST groups.
NAMELIST processing now handles component references on input
(e.g., "&GROUP x%component = 123 /").
The C++ perspectives of the derived type information records
were transformed into proper classes when it was necessary to add
member functions to them.
The code in Semantics that generates derived type information
was changed to emit derived type components in component order,
not alphabetic order.
Differential Revision: https://reviews.llvm.org/D104485
When chasing down another unrelated bug, I noticed that the
implementations of various character intrinsic functions assume
that the lower bounds of (some of) their arguments were 1.
This isn't necessarily the case, so I've cleaned them up, tweaked
the unit tests to exercise the fix, and regularized the allocation
pattern used for results to use SetBounds() before Allocate() rather
than the old original Descriptor::Allocate() wrapper around
CFI_allocate().
Since there were few other remaining uses of the old original
Descriptor::Allocate() wrapper, I also converted them to the
new one and deleted the old one.
Differential Revision: https://reviews.llvm.org/D104325
Add InputNamelist and OutputNamelist as I/O data transfer APIs
to be used with internal & external list-directed I/O; delete the
needless original namelist-specific Begin... calls.
Implement NAMELIST output and input; add basic tests.
Differential Revision: https://reviews.llvm.org/D101931
Add runtime APIs, implementations, and tests for ALL, ANY, COUNT,
MAXLOC, MAXVAL, MINLOC, MINVAL, PRODUCT, and SUM reduction
transformantional intrinsic functions for all relevant argument
and result types and kinds, both without DIM= arguments
(total reductions) and with (partial reductions).
Complex-valued reductions have their APIs in C so that
C's _Complex types can be used for their results.
Some infrastructure work was also necessary or noticed:
* Usage of "long double" in the compiler was cleaned up a
bit, and host dependences on x86 / MSVC have been isolated
in a new Common/long-double header.
* Character comparison has been exposed via an extern template
so that reductions could use it.
* Mappings from Fortran type category/kind to host C++ types
and vice versa have been isolated into runtime/cpp-type.h and
then used throughout the runtime as appropriate.
* The portable 128-bit integer package in Common/uint128.h
was generalized to support signed comparisons.
* Bugs in descriptor indexing code were fixed.
Differential Revision: https://reviews.llvm.org/D99666
The standard interoperability routine CFI_establish() does not
accept a zero-length CHARACTER type. Since these can be valid
results of intrinsic function references, work around the design
of CFI_establish() in the wrapper routine that calls it.
Differential Revision: https://reviews.llvm.org/D99296
Define Fortran derived types that describe the characteristics
of derived types, and instantiations of parameterized derived
types, that are of relevance to the runtime language support
library. Define a suite of corresponding C++ structure types
for the runtime library to use to interpret instances of the
descriptions.
Create instances of these description types in Semantics as
static initializers for compiler-created objects in the scopes
that define or instantiate user derived types.
Delete obsolete code from earlier attempts to package runtime
type information.
Differential Revision: https://reviews.llvm.org/D92802
Add error reporting infrastructure and support for ALLOCATE
and DEALLOCATE statements of intrinsic types without SOURCE=
or MOLD=.
Differential revision: https://reviews.llvm.org/D91215
Scan FORMAT strings locally to avoid C++ binary runtime dependence when computing deepest parenthesis nesting
Remove a dependency on ostream from runtime
Remove remaining direct external references from runtime to C++ library binaries
Remove runtime dependences on lib/common
SetPos() and SetRec()
Instantiate templates for input
Begin input; rearrange locking, deal with CLOSE races
View()
Update error message in test to agree with compiler change
First cut at real input
More robust I/O runtime error handling
Debugging of REAL input
Add iostat.{h,cpp}
Rename runtime/numeric-* to runtime/edit-*
Move templates around, templatize integer output editing
Move LOGICAL and CHARACTER output from io-api.cpp to edit-output.cpp
Change pointer argument to reference
More list-directed input
Complex list-directed input
Use enum class Direction rather than bool for templates
Catch up with changes to master
Undo reformatting of Lower code
Use record number instead of subscripts for internal unit
Unformatted sequential backspace
Testing and debugging
Dodge bogus GCC warning
Add <cstddef> for std::size_t to fix CI build
Address review comments
Original-commit: flang-compiler/f18@50406b3496
Reviewed-on: https://github.com/flang-compiler/f18/pull/1053
Use internal units for internal I/O state
Replace use of virtual functions
reference_wrapper
Internal formatted output to array descriptor
Delete dead code
Begin list-directed internal output
Refactorings and renamings for clarity
List-directed external I/O (character)
COMPLEX list-directed output
Control list items
First cut at unformatted I/O
More OPEN statement work; rename class to ExternalFileUnit
Complete OPEN (exc. for POSITION=), add CLOSE()
OPEN(POSITION=)
Flush buffers on crash and for terminal output; clean up
Documentation
Fix backquote in documentation
Fix typo in comment
Begin implementation of input
Refactor binary floating-point properties to a new header, simplify numeric output editing
Dodge spurious GCC 7.2 build warning
Address review comments
Original-commit: flang-compiler/f18@9c4bba11cf
Reviewed-on: https://github.com/flang-compiler/f18/pull/982
Updated CMake files accordingly, using better regex
Updated license headers to match new extension and fit within 80 columns
Updated other comments within files that referred to the old extension
Original-commit: flang-compiler/f18@ae7721e611
Reviewed-on: https://github.com/flang-compiler/f18/pull/958