This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155181
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155174
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155099
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155076
This ensures that if someone calls the `rpc_shutdown` method multiple
times it will not segfault and gracefully continue. This was causing
problems in the OpenMP usage. This could point to other issues, but for
now this is a safe fix.
Differential Revision: https://reviews.llvm.org/D155005
Subnormal floating point numbers have a lower effective precision than
normal floating point numbers. This can cause issues for the fuzz test
since the MPFR floats have a constant precision regardless of the
exponent, and the precision must match exactly or else create rounding
errors. To solve this problem, the precision of the MPFR floats is
dynamically calculated.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154909
CUDA requires a PTX feature to be compiled generally, because the
`libcgpu.a` archive contains LLVM-IR we need to have one present to
compile it. Currently, the wrapper fatbinary format we use to
incorporate these into single-source offloading languages has a special
option to provide this. Since this was not present in the builds, if the
user did not specify it via `-foffload-lto` it would not compile from
CUDA or OpenMP due to the missing PTX features. Fix this by passing it
to the packager invocation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154864
These functions have definitions differing between C and C++. GNU
respects the C++ definitions while the LLVM libc does not. This causes
many bugs and the current hack creates other issues. Rather than hack
around this I'd rather temporarily disable these than regress with the
integration into other offloading languages. We lose test support for
them but we should be able to re-enable these once the `libc` headers
provide these correctly.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154850
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
There will be subsequent patches to move things around and make the file layout more principled.
Differential Revision: https://reviews.llvm.org/D154770
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the `libc` project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.
This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.
Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with `nowait` regions, but I've observed some weird behavior when
one of those regions calls `exit`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154312
AMDGPU supports aliases now, so we can drop this case and leave it only
for the NVPTX target. Unfortunately it's unlikely that NVPTX will be
able to support this in the future due to their PTX language being very
limited.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154704
For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high.
In this case it is desirable to turn off the hardware prefetcher completely.
This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available.
This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level.
Reviewed By: rtenneti
Differential Revision: https://reviews.llvm.org/D154494
This reverts commit a4a26374aa.
This was causing some problems with the CPU build and CUDA buildbot.
Revert until I can figure out what those issues are and fix them. I
believe it is just some CMake.
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D154519
Another low hanging fruit we can put on the GPU, this ports the tests
over to the hermetic framework so we can run them on the GPU.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D154540
Summary:
Reviewer requested that this routine not be a macro, however that means
that it was not being intitialized as the static initializer was done
before the memcpy from the device. Fix this so we can get timing
information.
This patch adds the necessary support to provide timing information in
`libc` tests. This is useful for determining which tests look what
amount of time. We also can use this as a test basis for providing more
fine-grained timing when implementing things on the GPU.
The main difficulty with this is the fact that the AMDGPU fixed
frequency clock operates at an unknown frequency. We need to read this
on a per-card basis from the driver and then copy it in. NVPTX on the
other hand has a fixed clock at a resolution of 1ns. I have also
increased the resolution of the print-outs as the majority of these are
below a millisecond for me.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154446
The accuracy for the MPFR numbers in the strtofloat fuzz test was set
too high, causing rounding issues when rounding to a smaller final
result.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154150
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
When crt1 isn't available, which is typical on baremetal, hermetic tests
aren't created and the hermetic test target won't be available.
Differential Revision: https://reviews.llvm.org/D154279
Summary:
This function is intended to only be used on the GPU as a shorthand. The
static assert should only fire if it's called ,but it seems that its
precence can sometimes cause issues and other times not. Simply remove
it as it's causing build problems.
Fix a bunch more instances of incorrect use of the `static`
keyword and missing use of LIBC_INLINE and LIBC_INLINE_VAR
macros. Note that even forward declarations and generic template
declarations must follow the prescribed patterns for libc code so
that they match every definition, all template specializations.
Reviewed By: Caslyn
Differential Revision: https://reviews.llvm.org/D154260
Implicit narrowing conversions from int to uint16_t
get a compiler warning with the warning settings used
in the Fuchsia build.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D154256
This patch makes sure that we always build the RPC server. The proposed
used for this is to begin integrating this server implementation into
`libomptarget`. That requires that we build this server ahead of time
when using a `LLVM_ENABLE_PROJECTS` build. Make a few tweaks to ensure
that the GCC compiler which may be used for this build doesn't complain.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154105
This patch adds the other two methods to the server so the external
users can use the interface through the obfuscated interface.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154224
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
This patch simply enables the `div`, `ldiv,` and, `lldiv` functions on
the GPU. This should be straightforward enough.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154143
The RPC calls all have delays associated with them. Currently the `exit`
function does an async send and immediately exits the GPU. This can have
the effect that the RPC server never sees the exit call and we continue.
This patch changes that to first sync with the server before continuing
to perform its exit. There is still a hazard here, where the kernel can
complete before the RPC call reads back its response, but this is simply
multi-threaded hazards. This change ensures that the server *will*
always exit some time after the GPU exits.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154112