CUDA requires a PTX feature to be compiled generally, because the
`libcgpu.a` archive contains LLVM-IR we need to have one present to
compile it. Currently, the wrapper fatbinary format we use to
incorporate these into single-source offloading languages has a special
option to provide this. Since this was not present in the builds, if the
user did not specify it via `-foffload-lto` it would not compile from
CUDA or OpenMP due to the missing PTX features. Fix this by passing it
to the packager invocation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154864
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
We need to perform the GPU build separately. The `CXX_STANDARD` option
was not being passed properly.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149225
Summary:
We generally need a CUDA toolchain to build the tests for the GPU `libc`
targeting NVPTX. However, clang will commonly emit warnings on versions
that are too new. We can ignore these in `libc` since we are manually
specifying the `+ptx` version to use whenever we compile. So we do not
need to worry about unexpected changes and we do not depend on any newer
features. So this should not be problematic.
Summary:
The sm_60 GPU is the oldest model that's supported for using the RPC
features of the `libc` GPU runtime. This also requires at least `+ptx63`
to enable use of the active mask. So, this patch sets that as the
minimum.
Summary:
The `+ptx` features correspond to the related CUDA version. We require a
certain set of features from the `ptxas` assembler, which is tied to the
CUDA version. Some of the ones set here were insufficient, so I am
simply setting a cutoff to the CUDA 9.0 release as the minimum. This
roughly corresponds to what should be required for sm_60 to be compiled
with the source.
Summary:
This variable was not being reset, which caused the options to be
compounded when building multiple architectures. This was very
problematic as the architectures are not compatible.
The NVIDIA compilation path requires some special options. This is
mostly because compilation is dependent on having a valid CUDA
toolchain. We don't actually need the CUDA toolchain to create the
exported `libcgpu.a` library because it's pure LLVM-IR. However, for
some language features we need the PTX version to be set. This is
normally set by checking the CUDA version, but without one installed it
will fail to build. We instead choose a minimum set of features on the
desired target, inferred from
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes
and the PTX refernece for functions like `nanosleep`.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D148532
Let RISCV64 targets math implementations with and without FMA
automatically.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D146730
The packaged version of the `libc` library does not depend on the CUDA
installation because it only uses `clang` and emits LLVM-IR. However,
for testing we directly need the CUDA toolkit to emit and execute the
files. This patch explicitly passes `--cuda-path` to the relevant
compilations for NVPTX testing.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D147653
Summary:
The AMDGPU ABI isn't stable or well defined. For that reson we prefer to
rely on LTO to ensure that multiple files get linked correctly.
Currently the internal targets used for testing mix LLVM-IR and
assembly. We should be consistent here.
Summary:
We use `-Xclang` to pass the GPU binary to be embedded. In the case of
multi-source objects this will be passed more than once, but CMake
implicitly deduplicates arguments. Use the special generator to prevent
this from happening.
Summary:
Multi-source object libraries require some additional handling, this
logic wasn't correctly settending the dependency on each filename
individually and was instead using the last one. This meant that only
the last file was built for multi-object libraries.
The GPU target requires some weird special case handling to create fat
binaries. CMake offers no way to set the name of an object library. The
only way to do this is to create a file with the desired name and use
that. Currently we name it after the source filename. However, this
breaks if there is more than a single source. This patch changes the
logic to instead look up the object target name and use that. E.g.
`src.__support.OSUtil.osutil` will be `osutil.cpp`.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D145912
Summary:
We use an internal option to create the GPU binary used for testing.
This wasn't getting the proper flags passed to it due to a missing
variable name.
Lint targets for individual entrypoints have also been cleaned up. The
target "libc-lint" depends on the individual lint targets.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D144705
This patch adjusts the way dependencies are handled in the packaed
version of the GPU libc runtime. Previously the files were not getting
updated properly in the install when they changed.
Depends on D144214
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D144280
This patch unifies the handling of generating the GPU build targets
between the `add_entrypoint_library` and the `add_object_library`
functions. The `_build_gpu_objects` function will create two targets.
One contains a single object file with several GPU binaries embedded in
it, a so-called fatbinary. The other is a direct compile of the
supported target to be used internally only. This patch pulls out some
of the properties logic so that we can handle both more easily. This
patch also required adding an ovverride `NO_GPU_BUILD` for cases when
we only want to build the source file as normal.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D144214
Summary:
This hack with stub files is used to make the final object archive
have human-understandable names. We currently output these into the
current binary directory, which sometimes interferes with the actual
source file. Put these in their own directory to be certain they don't
conflict.
instructions.
For clang-11, having -mfma without -mavx2 does not generate fma
instructions, causing a build bot to fail on log10_test.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D143234
The current `libcgpu.a` is actually an archive of fatbinaries. The host
file contains nothing but a section called `LLVM_OFFLOADING` that
contains embedded device code. This used to be handled implicitly by
borrowing the OpenMP toolchain, which did this packaging internally.
Passing the OpenMP flags causes problems with trying to move to testing.
This patch pulls this logic out into the CMake and handles it manually.
This patch is a lot of noise, but it fundamentally comes down to the
following changes.
1. Build the source for every GPU architecture (GPU architectures are
generally not backwards compatible)
2. Combine all of these files into a single binary blob
3. Embed that binary blob into a host file
4. Package these host files into a `.a` archive.
5. The device code will be extracted and managed by the offloading
linker.
Another important point. Right now we are maintaining an important
distinction with the GPU build. That is, when we build the exported
library we will build for many GPU architectures. However, the internal
version will only be built for a single GPU architecture, one that was
found on the user's system. This is intended to be used for internal
testing, very similar to the current path where `libc` is compiled for a
single target triple.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D143089
This patch contains the initial support for building LLVM's libc as a
target for the GPU. Currently this only supports a handful of very basic
functions that can be implemented without an operating system. The GPU
code is build using the existing OpenMP toolchain. This allows us to
minimally change the existing codebase and get a functioning static
library. This patch allows users to create a static library called
`libcgpu.a` that contains fat binaries containing device IR.
Current limitations are the lack of test support and the fact that only
one target OS can be built at a time. That is, the user cannot get a
`libc` for Linux and one for the GPU simultaneously.
This introduces two new CMake variables to control the behavior
`LLVM_LIBC_TARET_OS` is exported so the user can now specify it to equal
`"gpu"`. `LLVM_LIBC_GPU_ARCHITECTURES` is also used to configure how
many targets to build for at once.
Depends on D138607
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D138608
Add float type and flag for nearest integer to automatically test with
and without SSE4.2 flag.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D129916
This is a reland of D126773 / b2a9ea4420.
The removal of `-mllvm -combiner-global-alias-analysis` has landed separately
in D128051 / 7b73f53790.
And the removal of `-mllvm --tail-merge-threshold=0` is scheduled for
removal in a subsequent patch.
list(POP_FRONT) is only added to cmake in 3.15, while our base line
version is 3.13
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D127129
Detect if the architecture supports FMA instructions and if
the targets depend on fma.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D123615
Note, this is a re-submission of D125894 with `features = ["-header_modules"]`
added to the main BUILD.bazel file.
Some functions like `stpncpy` are implemented in terms of `memset` but are not
currently using `-fno-builtin-memset`. This is somewhat hidden by the fact that
we use `-ffreestanding` globally and that `-ffreestanding` implies
`-fno-builtin` for Clang.
This patch also removes `-mllvm -combiner-global-alias-analysis` that is Clang
specific and that does not bring substantial gains on modern processors.
Also we keep `-mllvm --tail-merge-threshold=0` for aarch64 in CMakeLists.txt
but we omit it in the Bazel config. This is because Bazel consumes the source
files directly and so it can use PGO to take optimal decisions locally.
Differential Revision: https://reviews.llvm.org/D126773
Add FLAGS option for add_header_library, add_object_library,
add_entrypoint_object, and add_libc_unittest.
In general, a flag is a string provided for supported functions under the
multi-valued option `FLAGS`. It should be one of the following forms:
FLAG_NAME
FLAG_NAME__NO
FLAG_NAME__ONLY
A target will inherit all the flags of its upstream dependency.
When we create a target `TARGET_NAME` with a flag using (add_header_library,
add_object_library, ...), its behavior will depend on the flag form as follow:
- FLAG_NAME: The following 2 targets will be generated:
`TARGET_NAME` that has `FLAG_NAME` in its `FLAGS` property.
`TARGET_NAME.__NO_FLAG_NAME` that depends on `DEP.__NO_FLAG_NAME` if
`TARGET_NAME` depends on `DEP` and `DEP` has `FLAG_NAME` in its `FLAGS`
property.
- FLAG_NAME__ONLY: Only generate 1 target `TARGET_NAME` that has `FLAG_NAME`
in its `FLAGS` property.
- FLAG_NAME__NO: Only generate 1 target `TARGET_NAME.__NO_FLAG_NAME` that
depends on `DEP.__NO_FLAG_NAME` if `DEP` is in its DEPENDS list and `DEP`
has `FLAG_NAME` in its `FLAGS` property.
To show all the targets generated, pass SHOW_INTERMEDIATE_OBJECTS=ON to cmake.
To show all the targets' dependency and flags, pass
`SHOW_INTERMEDIATE_OBJECTS=DEPS` to cmake.
To completely disable a flag FLAG_NAME expansion, set the variable
`SKIP_FLAG_EXPANSION_FLAG_NAME=TRUE`.
Reviewed By: michaelrj, sivachandra
Differential Revision: https://reviews.llvm.org/D125174
Some functions like `stpncpy` are implemented in terms of `memset` but are not
currently using `-fno-builtin-memset`. This is somewhat hidden by the fact that
we use `-ffreestanding` globally and that `-ffreestanding` implies
`-fno-builtin` for Clang.
This patch also removes `-mllvm -combiner-global-alias-analysis` that is Clang
specific and that does not bring substantial gains on modern processors.
Also we keep `-mllvm --tail-merge-threshold=0` for aarch64 in CMakeLists.txt
but we omit it in the Bazel config. This is because Bazel consumes the source
files directly and so it can use PGO to take optimal decisions locally.
Differential Revision: https://reviews.llvm.org/D125894