Commit Graph

85 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
01a18809ee Revert "[flang][cuda] Use a reference for asyncObject (#138010)" (#138082)
This reverts commit 9b0eaf71e6.
2025-04-30 22:03:26 -07:00
Valentin Clement (バレンタイン クレメン)
16f01b3777 [flang][cuda] Fix signatures after argument change (#138081) 2025-04-30 21:40:12 -07:00
Valentin Clement (バレンタイン クレメン)
ba3a46c1ea [flang][cuda] Fix type of kNoAsyncObject (#138029) 2025-04-30 14:59:02 -07:00
Valentin Clement (バレンタイン クレメン)
9b0eaf71e6 [flang][cuda] Use a reference for asyncObject (#138010)
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
2025-04-30 14:02:29 -07:00
Slava Zakharin
a8607063f3 [flang-rt] Simplify INDEX with len-1 SUBSTRING. (#137889)
The len-1 case is noticeably slower than gfortran's straightforward
implementation
075611b646/libgfortran/intrinsics/string_intrinsics_inc.c (L253)
This change speeds up a simple microkernel by 37% on icelake.
2025-04-30 08:25:06 -07:00
Michael Kruse
77581e2751 Reapply "[Flang] Remove FLANG_INCLUDE_RUNTIME (#124126)"
This reverts commit 27539c3f90. Retry
with new buildbot configuration after master restart.

Original message:

Remove the FLANG_INCLUDE_RUNTIME option which was replaced by
LLVM_ENABLE_RUNTIMES=flang-rt.

The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the
non-runtimes build instructions for the Flang runtime so they do not
conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217.
In order to not maintain multiple build instructions for the same thing,
this PR completely removes the old build instructions (effectively
forcing FLANG_INCLUDE_RUNTIME=OFF).

As per discussion in
https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2
we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is
compiled in a bootstrapping (non-standalone) build. Because it is
possible to build Flang-RT separately, this behavior can be disabled
using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an
implicitly adding runtimes/projects in #123964.
2025-04-30 12:32:49 +02:00
Valentin Clement (バレンタイン クレメン)
565a075909 [flang][cuda][rt] Track asynchronous allocation stream for deallocation (#137073)
When an asynchronous allocation is made, we call `cudaMallocAsync` with
a stream. For deallocation, we need to call `cudaFreeAsync` with the
same stream. in order to achieve that, we need to track the allocation
and their respective stream.

This patch adds a simple sorted array of asynchronous allocations. A
binary search is performed to retrieve the allocation when deallocation
is needed.
2025-04-24 10:01:47 -07:00
Joseph Huber
a5cdbef5f0 Revert "[LLVM] Replace use of LLVM_RUNTIMES_TARGET with LLVM_DEFAULT_TARGET_TRIPLE (#136208)"
This reverts commit 2e145f11c0.

Somehow causes some static assertions to fail?
2025-04-22 08:08:51 -05:00
Joseph Huber
2e145f11c0 [LLVM] Replace use of LLVM_RUNTIMES_TARGET with LLVM_DEFAULT_TARGET_TRIPLE (#136208)
Summary:
For purposes of determining the triple, it's more correct to use
`LLVM_DEFAULT_TARGET_TRIPLE`.
2025-04-22 07:59:54 -05:00
Peter Klausler
03b3620538 [flang] Tweak integer output under width-free I/G editing (#136316)
A recent patch fixed Fujitsu test case 0561_0168 by emitting a leading
space for "bare" (no width 'w') I and G output editing of integer
values. This fix has broken another Fujitsu test case (0561_0168), since
the leading space should not be produced at the first column of the
output record. Adjust.
2025-04-18 12:52:39 -07:00
Peter Klausler
32145a5e18 [flang][runtime] Better handling for integer input into null address (#135987)
The original descriptor-only path for I/O checks for null data addresses
and crashes with a readable message, but there's no such check on the
new fast path for formatted integer input, and so a READ into (say) a
deallocated allocatable will crash with a segfault. Put a null data
address check on the new fast path.
2025-04-18 12:51:18 -07:00
Peter Klausler
21a406c92c [flang] Improve runtime SAME_TYPE_AS() (#135670)
The present implementation of the intrinsic function SAME_TYPE_AS()
yields false positive .TRUE. results for distinct derived types that
happen to have the same name.

Replace with an implementation that can now depend on derived type
information records being the same type if and only if they are at the
same location, or are PDT instantiations of the same uninstantiated
derived type. And ensure that the derived type information includes
references from instantiated PDTs to their original types. (The derived
type information format supports these references already, but they were
not being set, perhaps because the current faulty SAME_TYPE_AS
implementation didn't need them, and nothing else does.)

Fixes https://github.com/llvm/llvm-project/issues/135580.
2025-04-18 12:48:33 -07:00
Valentin Clement (バレンタイン クレメン)
d79bb93278 [flang][cuda] Carry over the stream information to kernel launch (#136217)
In CUDA Fortran the stream is encoded in an INTEGER(cuda_stream_kind)
variable.

This information is carried over the GPU dialect through the
`cuf.stream_cast` and the token in the GPU ops.

When converting the `gpu.launch_func` to runtime call, the
`cuf.stream_cast` becomes a no-op and the reference to the stream is
passed to the runtime.

The runtime is adapted to take integer references instead of value for
stream.
2025-04-18 10:44:18 -07:00
Slava Zakharin
273aecdb20 [flang-rt] Use runtime::memchr instead of std::memchr. (#135298) 2025-04-18 08:45:52 -07:00
Eugene Epshteyn
3428cc94c8 [flang] Implement external routine usage of hostnm() (#134900)
Previously, `hostnm` extended intrinsic was implemented as proper
intrinsic. Since then we found out that some applications use `hostnm`
as external routine via `external hostnm`. This prevents `hostnm` from
being recognized as an intrinsic. This PR implements `hostnm` as
external routine.
2025-04-15 19:04:59 -04:00
Peter Klausler
72144d119a [flang][runtime] Fix recently broken big-endian formatted integer input (#135417)
My recent change to speed up formatted integer input has a bug on
big-endian targets that has shown up on ppc64 AIX build bots. Fix.
2025-04-11 12:52:23 -07:00
Slava Zakharin
f4203ca2b7 [flang-rt] Declare DeviceTrap static inline. (#135286) 2025-04-10 17:38:04 -07:00
Valentin Clement (バレンタイン クレメン)
1d8966e246 [flang][cuda] Use the provided stream in kernel launch (#135267) 2025-04-10 17:15:23 -07:00
Valentin Clement (バレンタイン クレメン)
49f8ccd1eb [flang][cuda] Pass stream information to kernel launch functions (#135246) 2025-04-10 13:50:50 -07:00
Slava Zakharin
755016a3a8 [flang-rt] Fixed warnings and miscompilations in CUDA build. (#134470)
* DescribeIEEESignaledExceptions() is unused on the device - warning.
* StopStatementText() could return while marked noreturn - warning.
* Including cuda/std/complex only in the device compilation
  may cause nvcc to try to register variables in `cuda` namespace,
  while they are not defined in the host compilation - error.
  I decided to include cuda/std/complex always under RT_USE_LIBCUDACXX.
2025-04-10 11:27:03 -07:00
Peter Klausler
cd56666d7b [flang][runtime] Fix CUDA flang-rt build breakage (#135220)
I used "std::nullopt" instead of the correct "Fortran::common::nullopt"
in a recent patch, and you can get away with that only for CPU builds.
Fix.
2025-04-10 10:39:27 -07:00
Peter Klausler
18fe0124e7 [flang][runtime] Formatted input optimizations (#134715)
Make some minor tweaks (inlining, caching) to the formatting input path
to improve integer input in a SPEC code. (None of the I/O library has
been tuned yet for performance, and there are some easy optimizations
for common cases.) Input integer values are now calculated with native
C/C++ 128-bit integers.

A benchmark that only reads about 5M lines of three integer values each
speeds up from over 8 seconds to under 3 in my environment with these
changeds.

If this works out, the code here can be used to optimize the formatted
input paths for real and character data, too.

Fixes https://github.com/llvm/llvm-project/issues/134026.
2025-04-10 09:56:46 -07:00
Valentin Clement (バレンタイン クレメン)
56b792322a [flang][cuda] Use the aysncId in device allocation (#135099)
Use `cudaMallocAsync` in the `CUFAllocDevice` allocator when asyncId is
provided.

More work is needed to be able to call `cudaFreeAsync` since the
allocated address and stream needs to be tracked.
2025-04-09 17:34:48 -07:00
Peter Klausler
e0950ebb9c [flang][runtime] Tweak width-free I/G formatted I&O (#135047)
For Fujitsu test case 0561/0561_0168.f90, adjust both input and output
sides of the extension I (and G) edit descriptors with no width (as
distinct from I0/G0). On input, be sure to halt on a separator character
rather than complaining about an invalid character; on output, be sure
to emit a leading space.
2025-04-09 12:31:36 -07:00
Valentin Clement (バレンタイン クレメン)
f4d87c42a6 [flang][cuda] Add asyncId to allocate entry point (#134947) 2025-04-09 10:52:02 -07:00
Valentin Clement (バレンタイン クレメン)
5ebe22a35d [flang][cuda] Add async id to allocators (#134724)
Add async id to allocators in preparation for stream allocation.
2025-04-08 10:16:59 -07:00
Eugene Epshteyn
61af05fe82 [flang] Add runtime and lowering implementation for extended intrinsic PUTENV (#134412)
Implement extended intrinsic PUTENV, both function and subroutine forms.
Add PUTENV documentation to flang/docs/Intrinsics.md. Add functional and
semantic unit tests.
2025-04-04 16:26:08 -04:00
Peter Klausler
ade9d1f810 [flang][runtime] Remove bad runtime assertion (#134176)
The RUNTIME_CHECK in question doesn't allow for the possibility that an
allocatable or pointer component could be processed by defined I/O.
Remove it in favor of a dynamic allocation check.
2025-04-04 08:43:02 -07:00
Peter Klausler
262b3f7615 [flang] Remove runtime dependence on C++ support for types (#134164)
Fortran::runtime::Descriptor::BytesFor() only works for Fortran
intrinsic types for which a C++ type counterpart exists, so it crashes
on some types that are legitimate Fortran types like REAL(2). Move some
logic from Evaluate into a new header in flang/Common, then use it to
avoid this needless dependence on C++.
2025-04-04 08:42:38 -07:00
Peter Klausler
c8bde44cfc [flang] Implement FSEEK and FTELL (#133003)
Add function and subroutine forms of FSEEK and FTELL as intrinsic
procedures. Accept common aliases from legacy compilers as well.
    
A separate patch to llvm-test-suite will enable tests for these
procedures once this patch has merged.
    
Depends on https://github.com/llvm/llvm-project/pull/132423; CI builds
will likely fail until that patch is merged and this PR is rebased.
2025-04-04 08:40:51 -07:00
Andre Kuhlenschmidt
b11eece1bb [flang][intrinsics] Implement the time intrinsic (#133823)
This PR implements the nonstandard intrinsic time.

In addition to running the unit tests, I also double checked that the
example code works by manually compiling and running it.
2025-04-03 15:33:40 -07:00
Andre Kuhlenschmidt
85fdab33b0 [flang][intrinsic] add nonstandard intrinsic unlink (#134162)
This PR adds the intrinsic `unlink` to flang. 

## Test plan
- Added two codegen unit tests and ensured flang-check continues to
pass.
- Manually compiled and ran the example from the documentation.
2025-04-03 14:33:53 -07:00
Daniel Chen
2080334574 [flang-rt] Pass the whole path of libflang_rt.runtime.a to linker on AIX and LoP (#131041)
This PR is to improve the driver code to build `flang-rt` path by
re-using the logic and code of `compiler-rt`.

1. Moved `addFortranRuntimeLibraryPath` and `addFortranRuntimeLibs` to
`ToolChain.h` and made them virtual so that they can be overridden if
customization is needed. The current implementation of those two
procedures is moved to `ToolChain.cpp` as the base implementation to
default to.

2. Both AIX and PPCLinux now override `addFortranRuntimeLibs`. 
The overriding function of `addFortranRuntimeLibs` for both AIX and
PPCLinux calls `getCompilerRTArgString` => `getCompilerRT` =>
`buildCompilerRTBasename` to get the path to `flang-rt`. This code
handles `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` setting. As shown in
`PPCLinux.cpp`, `FT_static` is the default. If not found, it will search
and build for `FT_shared`. To differentiate `flang-rt` from `clang-rt`,
a boolean flag `IsFortran` is passed to the chain of functions in order
to reach `buildCompilerRTBasename`.
2025-04-03 11:21:19 -04:00
Valentin Clement (バレンタイン クレメン)
bb179c483a [flang][rt] Allow ReportFatalUserError to be build on device (#133979) 2025-04-01 13:50:42 -07:00
Valentin Clement (バレンタイン クレメン)
afa32d3e0e [flang][cuda] Fix char argument
This would fail with `error: argument of type "char" is incompatible with parameter of type "const char *"`
2025-04-01 11:00:50 -07:00
Valentin Clement (バレンタイン クレメン)
01889de8e9 [flang][device] Enable Stop functions on device build (#133803)
Update `StopStatement` and `StopStatementText` to be build for the
device.
2025-04-01 10:06:45 -07:00
Slava Zakharin
1ab3a4f234 [flang-rt][NFC] Work around CTK12.8 compilation failure. (#133833)
It happened in https://lab.llvm.org/buildbot/#/builders/152/builds/1131
when the buildbot was switched from CTK12.3 to CTK12.8.
The logs are gone by now, so the above link is useless.

The error was:
error: ‘auto’ not permitted in template argument

This workaround helps, but I also reported the issue to NVCC devs.
2025-04-01 08:04:45 -07:00
Jean-Didier PAILLEUX
513a91a5f1 [flang/flang-rt] Implement PERROR intrinsic form GNU Extension (#132406)
Add the implementation of the `PERROR(STRING) ` intrinsic from the GNU
Extension to prints on the stderr a newline-terminated error message
corresponding to the last system error prefixed by `STRING`.
(https://gcc.gnu.org/onlinedocs/gfortran/PERROR.html)
2025-04-01 15:47:54 +02:00
Valentin Clement (バレンタイン クレメン)
0b31f08537 [flang][cuda] Add support for NV_CUDAFOR_DEVICE_IS_MANAGED (#133778)
Add support for the environment variable `NV_CUDAFOR_DEVICE_IS_MANAGED`
as described in the documentation:
https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/index.html#controlling-device-data-is-managed.

This mainly switch device allocation to managed allocation.
2025-03-31 13:17:21 -07:00
Peter Klausler
4ea5aa09de [flang][NFC] Restore I/O runtime API header name (#132423)
flang/include/flang/Runtime/io-api.h was changed into io-api-consts.h,
then wrapped into a new io-api.h that includes io-api-consts.h, does
some redundant includes and declarations, and then declares the
prototype of one function, InquiryKeywordHashDecode.

Make that function static in io-stmt.cpp prior to its sole call site,
then undo the renaming, to reduce confusion and redundancy.
2025-03-26 12:09:16 -07:00
Michael Kruse
27539c3f90 Revert "[Flang] Remove FLANG_INCLUDE_RUNTIME (#124126)"
The production buildbot master apparently has not yet been restarted
since https://github.com/llvm/llvm-zorg/pull/393 landed.

This reverts commit 96d1baedef.
2025-03-26 19:02:13 +01:00
Michael Kruse
96d1baedef [Flang] Remove FLANG_INCLUDE_RUNTIME (#124126)
Remove the FLANG_INCLUDE_RUNTIME option which was replaced by
LLVM_ENABLE_RUNTIMES=flang-rt.

The FLANG_INCLUDE_RUNTIME option was added in #122336 which disables the
non-runtimes build instructions for the Flang runtime so they do not
conflict with the LLVM_ENABLE_RUNTIMES=flang-rt option added in #110217.
In order to not maintain multiple build instructions for the same thing,
this PR completely removes the old build instructions (effectively
forcing FLANG_INCLUDE_RUNTIME=OFF).

As per discussion in
https://discourse.llvm.org/t/buildbot-changes-with-llvm-enable-runtimes-flang-rt/83571/2
we now implicitly add LLVM_ENABLE_RUNTIMES=flang-rt whenever Flang is
compiled in a bootstrapping (non-standalone) build. Because it is
possible to build Flang-RT separately, this behavior can be disabled
using `-DFLANG_ENABLE_FLANG_RT=OFF`. Also see the discussion an
implicitly adding runtimes/projects in #123964.
2025-03-26 18:50:41 +01:00
Slava Zakharin
613a077b05 [flang] Generate quadmath_wrapper.h for Flang Evaluate. (#132817)
When building Flang with Clang, we need to do the same quadmath.h
wrapping as we do for flang-rt. I extracted the CMake code
into FlangCommon.cmake, and cleaned up the arguments passing
to execute_process (note that `-###` was treated as `-` in the original
code, because `#` starts a comment). I believe the Clang command
does not require the input source file, so I removed it as well.
2025-03-25 12:08:38 -07:00
Eugene Epshteyn
2c8e26081f [flang] Add HOSTNM runtime and lowering intrinsics implementation (#131910)
Implement GNU extension intrinsic HOSTNM, both function and subroutine
forms. Add HOSTNM documentation to `flang/docs/Intrinsics.md`. Add
lowering and semantic unit tests.

(This change is modeled after GETCWD implementation.)
2025-03-25 13:17:17 -04:00
vdonaldson
92e0560347 [flang] ieee_denorm (#132307)
Add support for the nonstandard ieee_denorm exception for real kinds 3,
4, 8 on x86 processors.
2025-03-25 13:02:43 -04:00
Joseph Huber
60eb89f9fa [flang-rt] Fix typo using static instead of shared
Summary:
I copied this from the static usage, replaced the shared on the
dependency but not on the target.
2025-03-25 10:24:32 -05:00
Michael Kruse
ea68d830d9 [flang-rt][NFC] Fix indention 2025-03-24 15:15:43 +01:00
Joseph Huber
85974a0537 [flang-rt] Add experimental support for GPU build (#131826)
Summary:
This patch adds initial support for compiling `flang-rt` directly for
the GPU. The method used here matches what's already done for `libc` and
`libc++` for the GPU and builds off of those projects.

Mainly this requires setting up some flags and setting the sources that
currently work. This will deposit the resulting library in the
appropriate directory. These files are then intended to be linked via
`-Xoffload-linker` support in the offloading driver.
```
lib/clang/21/lib/nvptx64-nvidia-cuda/libflang_rt.runtime.a
lib/clang/21/lib/amdgcn-amd-amdhsa/libflang_rt.runtime.a
```

This is obviously missing a lot of functions, mainly the `io` support.
Most of what we cannot support is due to using POSIX things that just
don't make sense on the GPU. Stuff like `pthreads` or `sema`.

Getting unit tests to run on this will also be a challenge. We could run
tests the same way we do with `libc`, but the problem there is that the
`libc` test suite is freestanding while `gtest` currently doesn't
compile on the GPU bcause it uses a lot of weird stuff. If the unit
tests were simply `int main` then it would work.

I don't understand the actual runtime code very well, I'd appreciate
some guidance on how to actually support Fortran IO from this interface.
As I understand it, Fortran IO requires a stack-like operation, which
conflicts with the SIMT model GPUs use. Worst case scenario we could
burn some LDS to keep a stack, or serialize it somehow since we can
always just iterate over all the active lanes.

Building this right now looks like this, which depends on the arguments
added in https://github.com/llvm/llvm-project/pull/131695.
```
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=compiler-rt;libc;libcxx;libcxxabi;flang-rt \
    -DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBC_PROVIDER=llvm \
    -DRUNTIMES_nvptx64-nvidia-cuda_FLANG_RT_LIBCXX_PROVIDER=llvm \
    -DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBC_PROVIDER=llvm \
    -DRUNTIMES_amdgcn-amd-amdhsa_FLANG_RT_LIBCXX_PROVIDER=llvm
```
2025-03-24 08:31:42 -05:00
Joseph Huber
038cdd236f [flang-rt] Add support for using LLVM in-tree libc/libc++ (#131695)
Summary:
This patch adds an interface that uses an in-tree build of LLVM's libc
and libc++.

This is done using the `-DFLANG_RT_LIBC_PROVIDER=llvm` and
`-DFLANG_RT_LIBCXX_PROVIDER=llvm` options. Using `libc` works in terms
of CMake, but the LLVM libc is not yet complete enough to compile all
the files.
2025-03-24 06:05:24 -05:00
Valentin Clement (バレンタイン クレメン)
ecaef010f3 [flang][cuda] Support corner case of data transfer (#132451)
The flang runtime will complain when the number of elements in the two
descriptors involved in the data transfer are not matching.

In some cases, we can still perform the data transfer to match the
behavior of the reference compiler.

When the RHS elements count is bigger than the LHS elements count and
both descriptors are contiguous, we can perform the data transfer with
the bare pointers and the number of bytes from the LHS.

We don't really have unit tests set up for data transfer, this is why I
didn't include one here.
2025-03-21 15:39:05 -07:00