Commit Graph

105 Commits

Author SHA1 Message Date
Peter Klausler
65b06cd983 [flang][runtime] Check SOURCE= conformability on ALLOCATE (#144113)
The SOURCE= expression of an ALLOCATE statement, when present and not
scalar, must conform to the shape of the allocated objects. Check this
at runtime, and return a recoverable error, or crash, when appropriate.

Fixes https://github.com/llvm/llvm-project/issues/143900.
2025-06-16 14:36:35 -07:00
Valentin Clement (バレンタイン クレメン)
9992668404 [flang][cuda] Add runtime check for passing device arrays (#144003) 2025-06-12 20:47:58 -07:00
Peter Klausler
10f512f7bb Revert runtime work queue patch, it breaks some tests that need investigation (#143713)
Revert "[flang][runtime] Another try to fix build failure"

This reverts commit 13869cac2b5051e453aa96ad71220d9d33404620.

Revert "[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors
(#143650)"

This reverts commit d75e28477a.

Revert "[flang][runtime] Replace recursion with iterative work queue
(#137727)"

This reverts commit 163c67ad3d.
2025-06-11 07:55:06 -07:00
Peter Klausler
b512077c37 [flang][runtime] Another try to fix build failure (#143702)
Tweak accessibility to try to get code past whatever gcc is being used
by the flang-runtime-cuda-gcc build bot.
2025-06-11 06:34:46 -07:00
Peter Klausler
d75e28477a [flang][runtime] Fix build bot flang-runtime-cuda-gcc errors (#143650)
Adjust default parent class accessibility to attemp to work around what
appear to be old GCC's interpretation.
2025-06-10 20:36:52 -07:00
Peter Klausler
163c67ad3d [flang][runtime] Replace recursion with iterative work queue (#137727)
Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime with new
implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, and Destroy.

Default derived type I/O is also recursive, but already disabled. It can
be added to this new framework later if the overall approach succeeds.

Note that derived type FINAL subroutine calls, defined assignments, and
defined I/O procedures all perform callbacks into user code, which may
well reenter the runtime library. This kind of recursion is not handled
by this change, although it may be possible to do so in the future using
thread-local work queues.

The effects of this restructuring on CPU performance are yet to be
measured.
2025-06-10 14:44:19 -07:00
Valentin Clement (バレンタイン クレメン)
9c54512c3e [flang][cuda] Allocate the dst descriptor in data transfer (#143437)
In a test like: 

```
integer, allocatable, device :: da(:)
allocate(a(200))
a = 2
da = a ! da is not allocated before data transfer is initiated. Allocate it with a
```

The reference compiler will allocate the data for the `da` descriptor so
the data transfer can be done properly.
2025-06-10 09:43:30 -07:00
Peter Klausler
7b9518ae27 [flang][runtime] Accommodate change of type in assignment to allocatable (#141988)
When an assignment to a derived type allocatable requires
(re)allocation, its type may change to that of the right-hand side. The
code didn't update its derived type pointer, leading to the wrong type
being put into the descriptors created for elemental defined assignment
subroutine calls.

Fixes https://github.com/llvm/llvm-project/issues/141835.
2025-06-04 09:22:01 -07:00
Peter Klausler
4c6b60a639 [flang] Extension: allow char string edit descriptors in input formats (#140624)
FORMAT("J=",I3) is accepted by a few other Fortran compilers as a valid
format for input as well as for output. The character string edit
descriptor "J=" is interpreted as if it had been 2X on input, causing
two characters to be skipped over. The skipped characters don't have to
match the characters in the literal string. An optional warning is
emitted under control of the -pedantic option.
2025-05-28 13:58:22 -07:00
Valentin Clement (バレンタイン クレメン)
fc9ce037ef [flang][rt] Enable Count and CountDim for device build (#141684) 2025-05-28 09:55:49 -07:00
Kajetan Puchalski
09a70b1e10 [flang-rt] Explicitly define the default ShallowCopy* templates (#141619)
Not explicitly defining the default case for ShallowCopy* functions does
not meet the requirements for gcc to actually instantiate the templates,
leading to build errors that show up with gcc but not with clang.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-05-27 16:38:48 +01:00
Kajetan Puchalski
0d464009fe [flang-rt] Fix usage of kNoAsyncId in assign.cpp (#141077)
Fix a leftover old variable name causing build bot errors.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-05-22 15:49:03 +01:00
Kajetan Puchalski
c2892b0bdf [flang-rt] Optimise ShallowCopy and use it in CopyInAssign (#140569)
Using Descriptor.Element<>() when iterating through a rank-1 array is
currently inefficient, because the generic implementation suitable for
arrays of any rank makes the compiler unable to perform optimisations
that would make the rank-1 case considerably faster.

This is currently done inside ShallowCopy, as well as by CopyInAssign,
where the implementation of elemental copies (inside Assign) is
equivalent to ShallowCopyDiscontiguousToDiscontiguous.

To address that, add a DescriptorIterator abstraction specialised for
arrays of various ranks, and use that throughout ShallowCopy to iterate
over the arrays.

Furthermore, depending on the pointer type passed to memcpy, the
optimiser can remove the memcpy calls from ShallowCopy altogether which
can result in substantial performance improvements on its own.
Specialise ShallowCopy for various element pointer types to make these
optimisations possible.

Finally, replace the call to Assign inside CopyInAssign with a call to
newly optimised ShallowCopy.

For the thornado-mini application, this reduces the runtime by 27.7%.

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-05-22 15:11:46 +01:00
Valentin Clement (バレンタイン クレメン)
c17ae161fd [flang][cuda] Use nullptr for comparison (#140767)
Comparison without explicit nullptr seems to bring false positives. Use
explicit nullptr.
2025-05-20 11:04:06 -07:00
Valentin Clement (バレンタイン クレメン)
f5609aa1b0 [flang][cuda] Use a reference for asyncObject (#140614)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.

New tentative with some fix. The previous was reverted some time ago.

Reviewed in #138010
2025-05-19 15:02:53 -07:00
Kazu Hirata
56aa935bec [flang-rt] Fix warnings
This patch fixes:

  flang-rt/include/flang-rt/runtime/emit-encoded.h:67:27: error:
  implicit conversion from 'const char16_t' to 'char32_t' may change
  the meaning of the represented code unit
  [-Werror,-Wcharacter-conversion]

  flang-rt/lib/runtime/edit-input.cpp:1114:18: error: implicit
  conversion from 'char32_t' to 'char16_t' may lose precision and
  change the meaning of the represented code unit
  [-Werror,-Wcharacter-conversion]

  flang-rt/lib/runtime/edit-input.cpp:1133:18: error: implicit
  conversion from 'char32_t' to 'char16_t' may lose precision and
  change the meaning of the represented code unit
  [-Werror,-Wcharacter-conversion]

  flang-rt/lib/runtime/edit-input.cpp:1033:14: error: implicit
  conversion from 'char32_t' to 'char16_t' may lose precision and
  change the meaning of the represented code unit
  [-Werror,-Wcharacter-conversion]

  flang-rt/lib/runtime/edit-input.cpp:986:14: error: implicit
  conversion from 'char32_t' to 'char16_t' may lose precision and
  change the meaning of the represented code unit
  [-Werror,-Wcharacter-conversion]
2025-05-15 17:46:00 -07:00
Peter Klausler
36ccfe29be [flang] Clear obsolete type from reallocated allocatable (#139788)
When an assignment to a polymorphic allocatable changes its type to an
intrinsic type, be sure to reset its descriptor's derived type pointer
to null.

Fixes https://github.com/llvm/llvm-project/issues/136522.
2025-05-15 11:25:44 -07:00
Aaron Ballman
7548cec16f [www][docs] Remove last mentions of IRC (#139076)
It's the end of an era. The IRC channel was previously where the
community gathered to discuss technical topics but is now a ghost town
where the primary activity is moderators (me) kickbanning the same
individual dozens of times a day for CoC violations and the secondary
activity is telling the occasional person to come to Discord for help.
The number of people engaging on IRC for the community's intended
purposes seems to be roughly one person a month.

So this removes all remaining mentions of IRC from our documentation so
that it no longer appears to be an "official" channel for communicating
with the community. It also removes IRC handles from the various
maintainers lists, since those would stand out as confusing
anachronisms.

The IRC channel topic already recommends people come to the Discord
server. There is no way to "shut down" an IRC channel such that it no
longer exists, so the channel will continue to exist on OFTC, but will
be unmoderated.

(This was previously discussed in https://discourse.llvm.org/c/llvm/5
but some mentions persisted.)
2025-05-08 09:40:33 -04:00
Valentin Clement (バレンタイン クレメン)
9b6b144438 Revert "[flang][cuda] Use a reference for asyncObject" (#138221)
Reverts llvm/llvm-project#138186
2025-05-01 17:41:44 -07:00
Valentin Clement (バレンタイン クレメン)
7f922f1400 [flang][cuda] Use a reference for asyncObject (#138186)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.

New tentative with some fix. The previous was reverted yesterday.
2025-05-01 17:04:12 -07:00
Valentin Clement (バレンタイン クレメン)
01a18809ee Revert "[flang][cuda] Use a reference for asyncObject (#138010)" (#138082)
This reverts commit 9b0eaf71e6.
2025-04-30 22:03:26 -07:00
Valentin Clement (バレンタイン クレメン)
16f01b3777 [flang][cuda] Fix signatures after argument change (#138081) 2025-04-30 21:40:12 -07:00
Valentin Clement (バレンタイン クレメン)
ba3a46c1ea [flang][cuda] Fix type of kNoAsyncObject (#138029) 2025-04-30 14:59:02 -07:00
Valentin Clement (バレンタイン クレメン)
9b0eaf71e6 [flang][cuda] Use a reference for asyncObject (#138010)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
2025-04-30 14:02:29 -07:00
Slava Zakharin
a8607063f3 [flang-rt] Simplify INDEX with len-1 SUBSTRING. (#137889)
The len-1 case is noticeably slower than gfortran's straightforward
implementation
075611b646/libgfortran/intrinsics/string_intrinsics_inc.c (L253)
This change speeds up a simple microkernel by 37% on icelake.
2025-04-30 08:25:06 -07:00
Michael Kruse
77581e2751 Reapply "[Flang] Remove FLANG_INCLUDE_RUNTIME (#124126)"
This reverts commit 27539c3f90. Retry
with new buildbot configuration after master restart.

Original message:

Remove the FLANG_INCLUDE_RUNTIME option which was replaced by
LLVM_ENABLE_RUNTIMES=flang-rt.

The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the
non-runtimes build instructions for the Flang runtime so they do not
conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217.
In order to not maintain multiple build instructions for the same thing,
this PR completely removes the old build instructions (effectively
forcing FLANG_INCLUDE_RUNTIME=OFF).

As per discussion in
https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2
we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is
compiled in a bootstrapping (non-standalone) build. Because it is
possible to build Flang-RT separately, this behavior can be disabled
using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an
implicitly adding runtimes/projects in #123964.
2025-04-30 12:32:49 +02:00
Valentin Clement (バレンタイン クレメン)
565a075909 [flang][cuda][rt] Track asynchronous allocation stream for deallocation (#137073)
When an asynchronous allocation is made, we call `cudaMallocAsync` with
a stream. For deallocation, we need to call `cudaFreeAsync` with the
same stream. in order to achieve that, we need to track the allocation
and their respective stream.

This patch adds a simple sorted array of asynchronous allocations. A
binary search is performed to retrieve the allocation when deallocation
is needed.
2025-04-24 10:01:47 -07:00
Joseph Huber
a5cdbef5f0 Revert "[LLVM] Replace use of LLVM_RUNTIMES_TARGET with LLVM_DEFAULT_TARGET_TRIPLE (#136208)"
This reverts commit 2e145f11c0.

Somehow causes some static assertions to fail?
2025-04-22 08:08:51 -05:00
Joseph Huber
2e145f11c0 [LLVM] Replace use of LLVM_RUNTIMES_TARGET with LLVM_DEFAULT_TARGET_TRIPLE (#136208)
Summary:
For purposes of determining the triple, it's more correct to use
`LLVM_DEFAULT_TARGET_TRIPLE`.
2025-04-22 07:59:54 -05:00
Peter Klausler
03b3620538 [flang] Tweak integer output under width-free I/G editing (#136316)
A recent patch fixed Fujitsu test case 0561_0168 by emitting a leading
space for "bare" (no width 'w') I and G output editing of integer
values. This fix has broken another Fujitsu test case (0561_0168), since
the leading space should not be produced at the first column of the
output record. Adjust.
2025-04-18 12:52:39 -07:00
Peter Klausler
32145a5e18 [flang][runtime] Better handling for integer input into null address (#135987)
The original descriptor-only path for I/O checks for null data addresses
and crashes with a readable message, but there's no such check on the
new fast path for formatted integer input, and so a READ into (say) a
deallocated allocatable will crash with a segfault. Put a null data
address check on the new fast path.
2025-04-18 12:51:18 -07:00
Peter Klausler
21a406c92c [flang] Improve runtime SAME_TYPE_AS() (#135670)
The present implementation of the intrinsic function SAME_TYPE_AS()
yields false positive .TRUE. results for distinct derived types that
happen to have the same name.

Replace with an implementation that can now depend on derived type
information records being the same type if and only if they are at the
same location, or are PDT instantiations of the same uninstantiated
derived type. And ensure that the derived type information includes
references from instantiated PDTs to their original types. (The derived
type information format supports these references already, but they were
not being set, perhaps because the current faulty SAME_TYPE_AS
implementation didn't need them, and nothing else does.)

Fixes https://github.com/llvm/llvm-project/issues/135580.
2025-04-18 12:48:33 -07:00
Valentin Clement (バレンタイン クレメン)
d79bb93278 [flang][cuda] Carry over the stream information to kernel launch (#136217)
In CUDA Fortran the stream is encoded in an INTEGER(cuda_stream_kind)
variable.

This information is carried over the GPU dialect through the
`cuf.stream_cast` and the token in the GPU ops.

When converting the `gpu.launch_func` to runtime call, the
`cuf.stream_cast` becomes a no-op and the reference to the stream is
passed to the runtime.

The runtime is adapted to take integer references instead of value for
stream.
2025-04-18 10:44:18 -07:00
Slava Zakharin
273aecdb20 [flang-rt] Use runtime::memchr instead of std::memchr. (#135298) 2025-04-18 08:45:52 -07:00
Eugene Epshteyn
3428cc94c8 [flang] Implement external routine usage of hostnm() (#134900)
Previously, `hostnm` extended intrinsic was implemented as proper
intrinsic. Since then we found out that some applications use `hostnm`
as external routine via `external hostnm`. This prevents `hostnm` from
being recognized as an intrinsic. This PR implements `hostnm` as
external routine.
2025-04-15 19:04:59 -04:00
Peter Klausler
72144d119a [flang][runtime] Fix recently broken big-endian formatted integer input (#135417)
My recent change to speed up formatted integer input has a bug on
big-endian targets that has shown up on ppc64 AIX build bots. Fix.
2025-04-11 12:52:23 -07:00
Slava Zakharin
f4203ca2b7 [flang-rt] Declare DeviceTrap static inline. (#135286) 2025-04-10 17:38:04 -07:00
Valentin Clement (バレンタイン クレメン)
1d8966e246 [flang][cuda] Use the provided stream in kernel launch (#135267) 2025-04-10 17:15:23 -07:00
Valentin Clement (バレンタイン クレメン)
49f8ccd1eb [flang][cuda] Pass stream information to kernel launch functions (#135246) 2025-04-10 13:50:50 -07:00
Slava Zakharin
755016a3a8 [flang-rt] Fixed warnings and miscompilations in CUDA build. (#134470)
* DescribeIEEESignaledExceptions() is unused on the device - warning.
* StopStatementText() could return while marked noreturn - warning.
* Including cuda/std/complex only in the device compilation
  may cause nvcc to try to register variables in `cuda` namespace,
  while they are not defined in the host compilation - error.
  I decided to include cuda/std/complex always under RT_USE_LIBCUDACXX.
2025-04-10 11:27:03 -07:00
Peter Klausler
cd56666d7b [flang][runtime] Fix CUDA flang-rt build breakage (#135220)
I used "std::nullopt" instead of the correct "Fortran::common::nullopt"
in a recent patch, and you can get away with that only for CPU builds.
Fix.
2025-04-10 10:39:27 -07:00
Peter Klausler
18fe0124e7 [flang][runtime] Formatted input optimizations (#134715)
Make some minor tweaks (inlining, caching) to the formatting input path
to improve integer input in a SPEC code. (None of the I/O library has
been tuned yet for performance, and there are some easy optimizations
for common cases.) Input integer values are now calculated with native
C/C++ 128-bit integers.

A benchmark that only reads about 5M lines of three integer values each
speeds up from over 8 seconds to under 3 in my environment with these
changeds.

If this works out, the code here can be used to optimize the formatted
input paths for real and character data, too.

Fixes https://github.com/llvm/llvm-project/issues/134026.
2025-04-10 09:56:46 -07:00
Valentin Clement (バレンタイン クレメン)
56b792322a [flang][cuda] Use the aysncId in device allocation (#135099)
Use `cudaMallocAsync` in the `CUFAllocDevice` allocator when asyncId is
provided.

More work is needed to be able to call `cudaFreeAsync` since the
allocated address and stream needs to be tracked.
2025-04-09 17:34:48 -07:00
Peter Klausler
e0950ebb9c [flang][runtime] Tweak width-free I/G formatted I&O (#135047)
For Fujitsu test case 0561/0561_0168.f90, adjust both input and output
sides of the extension I (and G) edit descriptors with no width (as
distinct from I0/G0). On input, be sure to halt on a separator character
rather than complaining about an invalid character; on output, be sure
to emit a leading space.
2025-04-09 12:31:36 -07:00
Valentin Clement (バレンタイン クレメン)
f4d87c42a6 [flang][cuda] Add asyncId to allocate entry point (#134947) 2025-04-09 10:52:02 -07:00
Valentin Clement (バレンタイン クレメン)
5ebe22a35d [flang][cuda] Add async id to allocators (#134724)
Add async id to allocators in preparation for stream allocation.
2025-04-08 10:16:59 -07:00
Eugene Epshteyn
61af05fe82 [flang] Add runtime and lowering implementation for extended intrinsic PUTENV (#134412)
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
2025-04-04 16:26:08 -04:00
Peter Klausler
ade9d1f810 [flang][runtime] Remove bad runtime assertion (#134176)
The RUNTIME_CHECK in question doesn't allow for the possibility that an
allocatable or pointer component could be processed by defined I/O.
Remove it in favor of a dynamic allocation check.
2025-04-04 08:43:02 -07:00
Peter Klausler
262b3f7615 [flang] Remove runtime dependence on C++ support for types (#134164)
Fortran::runtime::Descriptor::BytesFor() only works for Fortran
intrinsic types for which a C++ type counterpart exists, so it crashes
on some types that are legitimate Fortran types like REAL(2). Move some
logic from Evaluate into a new header in flang/Common, then use it to
avoid this needless dependence on C++.
2025-04-04 08:42:38 -07:00
Peter Klausler
c8bde44cfc [flang] Implement FSEEK and FTELL (#133003)
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
    
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
    
Depends on https://github.com/llvm/llvm-project/pull/132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
2025-04-04 08:40:51 -07:00