Commit Graph

70 Commits

Author SHA1 Message Date
Joseph Huber
a4f553fcde [libc] Fix using the libcgpu.a for NVPTX in non-LTO builds
CUDA requires a PTX feature to be compiled generally, because the
`libcgpu.a` archive contains LLVM-IR we need to have one present to
compile it. Currently, the wrapper fatbinary format we use to
incorporate these into single-source offloading languages has a special
option to provide this. Since this was not present in the builds, if the
user did not specify it via `-foffload-lto` it would not compile from
CUDA or OpenMP due to the missing PTX features. Fix this by passing it
to the packager invocation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D154864
2023-07-10 13:54:47 -05:00
Petr Hosek
36c15be20b [libc] Use LIBC_INCLUDE_DIR in CMake rules
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.

Differential Revision: https://reviews.llvm.org/D154278
2023-07-10 07:32:24 +00:00
Petr Hosek
bf171aaa7a Revert "[libc] Use LIBC_INCLUDE_DIR in CMake rules"
This reverts commit 6e821f0b3a since
it broke the libc-aarch64-ubuntu-fullbuild-dbg bot.
2023-07-07 20:52:54 +00:00
Petr Hosek
6e821f0b3a [libc] Use LIBC_INCLUDE_DIR in CMake rules
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.

Differential Revision: https://reviews.llvm.org/D154278
2023-07-07 20:42:25 +00:00
Petr Hosek
e1cb5924cb Revert "[libc] Use LIBC_INCLUDE_DIR in CMake rules"
This reverts commit 046deabd93 since
it broke libc-aarch64-ubuntu-fullbuild-dbg.
2023-07-05 17:20:11 +00:00
Petr Hosek
046deabd93 [libc] Use LIBC_INCLUDE_DIR in CMake rules
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.

Differential Revision: https://reviews.llvm.org/D154278
2023-07-05 17:16:19 +00:00
Tue Ly
f9753ef189 [libc][Obvious] Fix a typo in setting FMA control option for RISCV64. 2023-06-02 11:15:29 -04:00
Joseph Huber
e7735a57b9 [libc] Correctly pass 'CXX_STANDARD' to the packaged GPU build
We need to perform the GPU build separately. The `CXX_STANDARD` option
was not being passed properly.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D149225
2023-04-26 16:52:31 -05:00
Joseph Huber
1c968f7a1f [libc] Ignore unknown CUDA versions for libc targeting NVPTX
Summary:
We generally need a CUDA toolchain to build the tests for the GPU `libc`
targeting NVPTX. However, clang will commonly emit warnings on versions
that are too new. We can ignore these in `libc` since we are manually
specifying the `+ptx` version to use whenever we compile. So we do not
need to worry about unexpected changes and we do not depend on any newer
features. So this should not be problematic.
2023-04-21 13:27:45 -05:00
Joseph Huber
a73cd00d87 [libc] Bump up sm_60's CUDA feature to +ptx63
Summary:
The sm_60 GPU is the oldest model that's supported for using the RPC
features of the `libc` GPU runtime. This also requires at least `+ptx63`
to enable use of the active mask. So, this patch sets that as the
minimum.
2023-04-21 13:27:45 -05:00
Joseph Huber
8704c3a31f [libc] Set minimum CUDA PTX feature to +ptx60
Summary:
The `+ptx` features correspond to the related CUDA version. We require a
certain set of features from the `ptxas` assembler, which is tied to the
CUDA version. Some of the ones set here were insufficient, so I am
simply setting a cutoff to the CUDA 9.0 release as the minimum. This
roughly corresponds to what should be required for sm_60 to be compiled
with the source.
2023-04-20 18:01:01 -05:00
Joseph Huber
24214832fd [libc] Fix nvptx_options variable not being reset in CMake
Summary:
This variable was not being reset, which caused the options to be
compounded when building multiple architectures. This was very
problematic as the architectures are not compatible.
2023-04-19 15:28:26 -05:00
Joseph Huber
e2356fb07e [libc] Add special handling for CUDA PTX features
The NVIDIA compilation path requires some special options. This is
mostly because compilation is dependent on having a valid CUDA
toolchain. We don't actually need the CUDA toolchain to create the
exported `libcgpu.a` library because it's pure LLVM-IR. However, for
some language features we need the PTX version to be set. This is
normally set by checking the CUDA version, but without one installed it
will fail to build. We instead choose a minimum set of features on the
desired target, inferred from
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
and the PTX refernece for functions like `nanosleep`.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D148532
2023-04-17 11:51:34 -05:00
Tue Ly
6edad0c8f0 [libc][RISCV] Let RISCV64 targets test implementations with and without FMA.
Let RISCV64 targets math implementations with and without FMA
automatically.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D146730
2023-04-06 09:23:48 -04:00
Joseph Huber
9f5c6dcf59 [libc] Search for the CUDA patch explicitly when testing
The packaged version of the `libc` library does not depend on the CUDA
installation because it only uses `clang` and emits LLVM-IR. However,
for testing we directly need the CUDA toolkit to emit and execute the
files. This patch explicitly passes `--cuda-path` to the relevant
compilations for NVPTX testing.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D147653
2023-04-05 15:14:47 -05:00
Joseph Huber
2dc60b4ea4 [libc] Use LTO for AMDGPU compilation and linking
Summary:
The AMDGPU ABI isn't stable or well defined. For that reson we prefer to
rely on LTO to ensure that multiple files get linked correctly.
Currently the internal targets used for testing mix LLVM-IR and
assembly. We should be consistent here.
2023-03-29 14:20:44 -05:00
Joseph Huber
dab75a4378 [libc] Remove leftover debug prints 2023-03-14 15:14:12 -05:00
Joseph Huber
ab107b3fac [libc] Fix CMake deduplication -Xclang arguments
Summary:
We use `-Xclang` to pass the GPU binary to be embedded. In the case of
multi-source objects this will be passed more than once, but CMake
implicitly deduplicates arguments. Use the special generator to prevent
this from happening.
2023-03-14 15:04:37 -05:00
Joseph Huber
597cef4486 [libc] Fix GPU fatbinary dependencies for multi-source object libraries
Summary:
Multi-source object libraries require some additional handling, this
logic wasn't correctly settending the dependency on each filename
individually and was instead using the last one. This meant that only
the last file was built for multi-object libraries.
2023-03-14 15:04:37 -05:00
Joseph Huber
c2a17bff24 [libc] Set the stub filename to the target name instead of the source
The GPU target requires some weird special case handling to create fat
binaries. CMake offers no way to set the name of an object library. The
only way to do this is to create a file with the desired name and use
that. Currently we name it after the source filename. However, this
breaks if there is more than a single source. This patch changes the
logic to instead look up the object target name and use that. E.g.
`src.__support.OSUtil.osutil` will be `osutil.cpp`.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D145912
2023-03-14 10:31:04 -05:00
Joseph Huber
a031f72187 [libc] Correctly pass the compile options to the internal GPU compilation
Summary:
We use an internal option to create the GPU binary used for testing.
This wasn't getting the proper flags passed to it due to a missing
variable name.
2023-03-14 08:19:13 -05:00
Joseph Huber
8a712bf7c4 [libc] Fix common compile options not getting passed to GPU object
Summary:
This variable was named incorrectly. We weren't getting needed flags
passed to object library builds.
2023-03-10 16:54:20 -06:00
Siva Chandra Reddy
772e37f893 [libc] Add ALIAS option to add_object_library rule.
This ALIAS option is now used with threads/callonce target.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D145409
2023-03-06 22:11:33 +00:00
Siva Chandra Reddy
bfeef8b794 [libc] Add a linting target named "libc-lint".
Lint targets for individual entrypoints have also been cleaned up. The
target "libc-lint" depends on the individual lint targets.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D144705
2023-02-24 19:55:43 +00:00
Joseph Huber
98697f4764 [libc] Fix LIBC_GPU_ARCHITECTURES not being used
Summary:
This variable is supposed to control the architectures to build for. At
some point this was changes out for testing and never fixed.
2023-02-21 13:00:16 -06:00
Joseph Huber
eb71ecfa10 [libc] Fix GPU include directories not being set properly
Summary:
For some reason, this variable was set after where it was used. Causing
weird behaviour with including the standard headers. Fix it.
2023-02-20 15:43:16 -06:00
Joseph Huber
4a872412d8 [libc] Fix dependencies for generating the GPU binary file
This patch adjusts the way dependencies are handled in the packaed
version of the GPU libc runtime. Previously the files were not getting
updated properly in the install when they changed.

Depends on D144214

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D144280
2023-02-20 09:37:19 -06:00
Joseph Huber
d51d2b5909 [libc] Support add_object_library for the GPU build
This patch unifies the handling of generating the GPU build targets
between the `add_entrypoint_library` and the `add_object_library`
functions. The `_build_gpu_objects` function will create two targets.
One contains a single object file with several GPU binaries embedded in
it, a so-called fatbinary. The other is a direct compile of the
supported target to be used internally only. This patch pulls out some
of the properties logic so that we can handle both more easily. This
patch also required adding an ovverride  `NO_GPU_BUILD` for cases when
we only want to build the source file as normal.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D144214
2023-02-20 09:37:18 -06:00
Guillaume Chatelet
c3228714cc [libc][NFC] Make tuning macros start with LIBC_COPT_
Rename preprocessor definitions that control tuning of llvm libc.

Differential Revision: https://reviews.llvm.org/D143913
2023-02-15 10:00:16 +00:00
Joseph Huber
5fde2d9951 [libc] Write stub files to a new directory to avoid conflicts
Summary:
This hack with stub files is used to make the final object archive
have human-understandable names. We currently output these into the
current binary directory, which sometimes interferes with the actual
source file. Put these in their own directory to be certain they don't
conflict.
2023-02-13 16:39:59 -06:00
Tue Ly
c1e252417e [libc] Add -mavx2 together with -mfma to allow clang pre-12 to generate fma
instructions.

For clang-11, having -mfma without -mavx2 does not generate fma
instructions, causing a build bot to fail on log10_test.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D143234
2023-02-03 15:12:09 -05:00
Joseph Huber
6d0e137358 [libc] Remove OpenMP and build the GPU libc directly
The current `libcgpu.a` is actually an archive of fatbinaries. The host
file contains nothing but a section called `LLVM_OFFLOADING` that
contains embedded device code. This used to be handled implicitly by
borrowing the OpenMP toolchain, which did this packaging internally.
Passing the OpenMP flags causes problems with trying to move to testing.
This patch pulls this logic out into the CMake and handles it manually.

This patch is a lot of noise, but it fundamentally comes down to the
following changes.
1. Build the source for every GPU architecture (GPU architectures are
   generally not backwards compatible)
2. Combine all of these files into a single binary blob
3. Embed that binary blob into a host file
4. Package these host files into a `.a` archive.
5. The device code will be extracted and managed by the offloading
   linker.

Another important point. Right now we are maintaining an important
distinction with the GPU build. That is, when we build the exported
library we will build for many GPU architectures. However, the internal
version will only be built for a single GPU architecture, one that was
found on the user's system. This is intended to be used for internal
testing, very similar to the current path where `libc` is compiled for a
single target triple.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D143089
2023-02-02 09:47:03 -06:00
Siva Chandra Reddy
23872aae12 [libc] Add an off-by-default option to silence "skipping" messages from CMake.
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D142802
2023-01-30 16:39:41 +00:00
Guillaume Chatelet
4e9ac30816 [reland][libc][NFC] Add -fno-lax-vector-conversions compilation flag
Now that a3d2c344773cc4fc95136fd67245880b34d8e335 has been submitted.
2022-12-27 10:32:41 +00:00
Guillaume Chatelet
d065472c9e Revert "[libc][NFC] Add -fno-lax-vector-conversions compilation flag"
This breaks aarch64 build.

This reverts commit 32f4c3f103.
2022-12-27 08:30:19 +00:00
Guillaume Chatelet
32f4c3f103 [libc][NFC] Add -fno-lax-vector-conversions compilation flag 2022-12-27 08:25:32 +00:00
Guillaume Chatelet
6d9d387f73 Use -Wstrict-prototypes with clang only 2022-12-18 15:54:21 +01:00
Joseph Huber
55151e138d [libc] Add initial support for a libc implementation for the GPU
This patch contains the initial support for building LLVM's libc as a
target for the GPU. Currently this only supports a handful of very basic
functions that can be implemented without an operating system. The GPU
code is build using the existing OpenMP toolchain. This allows us to
minimally change the existing codebase and get a functioning static
library. This patch allows users to create a static library called
`libcgpu.a` that contains fat binaries containing device IR.

Current limitations are the lack of test support and the fact that only
one target OS can be built at a time. That is, the user cannot get a
`libc` for Linux and one for the GPU simultaneously.

This introduces two new CMake variables to control the behavior
`LLVM_LIBC_TARET_OS` is exported so the user can now specify it to equal
`"gpu"`. `LLVM_LIBC_GPU_ARCHITECTURES` is also used to configure how
many targets to build for at once.

Depends on D138607

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D138608
2022-11-29 14:51:54 -06:00
Siva Chandra Reddy
2e4ef9b6ef [libc][NFC] Add a few compiler warning flags.
A bunch of cleanup to supress the new warnings is also done.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D130723
2022-08-04 23:46:38 +00:00
Tue Ly
d883a4ad02 [libc] Implement sinf function that is correctly rounded to all rounding modes.
Implement sinf function that is correctly rounded to all rounding modes.

- We use a simple range reduction for `pi/16 < |x|` :
    Let `k = round(x / pi)` and `y = (x/pi) - k`.
    So `k` is an integer and `-0.5 <= y <= 0.5`.
Then
```
sin(x) = sin(y*pi + k*pi)
          = (-1)^(k & 1) * sin(y*pi)
          ~ (-1)^(k & 1) * y * P(y^2)
```
    where `y*P(y^2)` is a degree-15 minimax polynomial generated by Sollya with:
```
> P = fpminimax(sin(x*pi)/x, [|0, 2, 4, 6, 8, 10, 12, 14|], [|D...|], [0, 0.5]);
```

- Performance benchmark using perf tool from CORE-MATH project
(https://gitlab.inria.fr/core-math/core-math/-/tree/master) on Ryzen 1700:
Before this patch (not correctly rounded):
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf
CORE-MATH reciprocal throughput   : 17.892
System LIBC reciprocal throughput : 25.559
LIBC reciprocal throughput        : 29.381
```
After this patch (correctly rounded):
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinf
CORE-MATH reciprocal throughput   : 17.896
System LIBC reciprocal throughput : 25.740

LIBC reciprocal throughput        : 27.872
LIBC reciprocal throughput        : 20.012     (with `-msse4.2` flag)
LIBC reciprocal throughput        : 14.244     (with `-mfma` flag)
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D123154
2022-07-22 10:07:31 -04:00
Tue Ly
ed261e7106 [libc] Add float type and flag for nearest_integer to enable SSE4.2.
Add float type and flag for nearest integer to automatically test with
and without SSE4.2 flag.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D129916
2022-07-22 09:29:41 -04:00
Tue Ly
ae5c82502e [libc][Obvious] Do not add __NO_ to targets with FLAG__NO suffix. 2022-06-30 10:45:59 -04:00
Guillaume Chatelet
aeccc16497 Re-land [libc] Apply no-builtin everywhere, remove unnecessary flags
This is a reland of D126773 / b2a9ea4420.

The removal of `-mllvm -combiner-global-alias-analysis` has landed separately
in D128051 / 7b73f53790.

And the removal of `-mllvm --tail-merge-threshold=0` is scheduled for
removal in a subsequent patch.
2022-06-22 12:30:20 +00:00
Guillaume Chatelet
4a6929f811 Revert "[libc] Apply no-builtin everywhere, remove unnecessary flags"
This reverts commit b2a9ea4420.
2022-06-16 09:28:17 +00:00
Tue Ly
667863d8a8 [libc] Fix cmake compatibility issue with list(POP_FRONT).
list(POP_FRONT) is only added to cmake in 3.15, while our base line
version is 3.13

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D127129
2022-06-06 13:36:03 -04:00
Tue Ly
614567a7bf [libc] Automatically add -mfma flag for architectures supporting FMA.
Detect if the architecture supports FMA instructions and if
the targets depend on fma.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D123615
2022-06-03 01:21:20 -04:00
Guillaume Chatelet
b2a9ea4420 [libc] Apply no-builtin everywhere, remove unnecessary flags
Note, this is a re-submission of D125894 with `features = ["-header_modules"]`
added to the main BUILD.bazel file.

Some functions like `stpncpy` are implemented in terms of `memset` but are not
currently using `-fno-builtin-memset`. This is somewhat hidden by the fact that
we use `-ffreestanding` globally and that `-ffreestanding` implies
`-fno-builtin` for Clang.

This patch also removes `-mllvm -combiner-global-alias-analysis` that is Clang
specific and that does not bring substantial gains on modern processors.

Also we keep `-mllvm --tail-merge-threshold=0` for aarch64 in CMakeLists.txt
but we omit it in the Bazel config. This is because Bazel consumes the source
files directly and so it can use PGO to take optimal decisions locally.

Differential Revision: https://reviews.llvm.org/D126773
2022-06-01 13:34:36 +00:00
Tue Ly
800051487f [libc] Implement FLAGS option for generating all combinations for targets.
Add FLAGS option for add_header_library, add_object_library,
add_entrypoint_object, and add_libc_unittest.

In general, a flag is a string provided for supported functions under the
multi-valued option `FLAGS`.  It should be one of the following forms:
  FLAG_NAME
  FLAG_NAME__NO
  FLAG_NAME__ONLY
A target will inherit all the flags of its upstream dependency.

When we create a target `TARGET_NAME` with a flag using (add_header_library,
add_object_library, ...), its behavior will depend on the flag form as follow:
- FLAG_NAME: The following 2 targets will be generated:
    `TARGET_NAME` that has `FLAG_NAME` in its `FLAGS` property.
    `TARGET_NAME.__NO_FLAG_NAME` that depends on `DEP.__NO_FLAG_NAME` if
       `TARGET_NAME` depends on `DEP` and `DEP` has `FLAG_NAME` in its `FLAGS`
       property.
- FLAG_NAME__ONLY: Only generate 1 target `TARGET_NAME` that has `FLAG_NAME`
    in its `FLAGS` property.
- FLAG_NAME__NO: Only generate 1 target `TARGET_NAME.__NO_FLAG_NAME` that
    depends on `DEP.__NO_FLAG_NAME` if `DEP` is in its DEPENDS list and `DEP`
    has `FLAG_NAME` in its `FLAGS` property.

To show all the targets generated, pass SHOW_INTERMEDIATE_OBJECTS=ON to cmake.
To show all the targets' dependency and flags, pass
`SHOW_INTERMEDIATE_OBJECTS=DEPS` to cmake.

To completely disable a flag FLAG_NAME expansion, set the variable
`SKIP_FLAG_EXPANSION_FLAG_NAME=TRUE`.

Reviewed By: michaelrj, sivachandra

Differential Revision: https://reviews.llvm.org/D125174
2022-06-01 00:54:07 -04:00
Guillaume Chatelet
0443bfabe7 Revert "[libc] Apply no-builtin everywhere, remove unnecessary flags"
This reverts commit 94d6dd9057.
2022-05-20 14:37:17 +00:00
Guillaume Chatelet
94d6dd9057 [libc] Apply no-builtin everywhere, remove unnecessary flags
Some functions like `stpncpy` are implemented in terms of `memset` but are not
currently using `-fno-builtin-memset`. This is somewhat hidden by the fact that
we use `-ffreestanding` globally and that `-ffreestanding` implies
`-fno-builtin` for Clang.

This patch also removes `-mllvm -combiner-global-alias-analysis` that is Clang
specific and that does not bring substantial gains on modern processors.

Also we keep `-mllvm --tail-merge-threshold=0` for aarch64 in CMakeLists.txt
but we omit it in the Bazel config. This is because Bazel consumes the source
files directly and so it can use PGO to take optimal decisions locally.

Differential Revision: https://reviews.llvm.org/D125894
2022-05-19 09:08:42 +00:00