Summary:
This patch adds a temporary implementation that uses a struct-based
interface in lieu of varargs support. Once varargs support exists we
will move this implementation to the "real" printf implementation.
Conceptually, this patch has the client copy over its format string and
arguments to the server. The server will then scan the format string
searching for any specifiers that are actually a string. If it is a
string then we will send the pointer back to the server to tell it to
copy it back. This copied value will then replace the pointer when the
final formatting is done.
This will require a built-in extension to the varargs support to get
access to the underlying struct. The varargs used on the GPU will simply
be a struct wrapped in a varargs ABI.
Summary:
The GPU uses a SIMT execution model. That means that each value actually
belongs to a group of 32 or 64 other lanes executing next to it. These
platforms offer some intrinsic fuctions to actually take elements from
neighboring lanes. With these we can do parallel scans or reductions.
These functions do not have an immediate user, but will be used in the
allocator interface that is in-progress and are generally good to have.
This patch is a precommit for these new utilitly functions.
Towards the goal of getting `ninja libc-lint` back to green, fix the numerous
instances of:
warning: header guard does not follow preferred style [llvm-header-guard]
This is because many of our header guards start with `__LLVM` rather than
`LLVM`.
To filter just these warnings:
$ ninja -k2000 libc-lint 2>&1 | grep llvm-header-guard
To automatically apply fixits:
$ find libc/src libc/include libc/test -name \*.h | \
xargs -n1 -I {} clang-tidy {} -p build/compile_commands.json \
-checks='-*,llvm-header-guard' --fix --quiet
Some manual cleanup is still necessary as headers that were missing header
guards outright will have them inserted before the license block (we prefer
them after).
Summary:
Recent changes added an include path in the float128 type that used the
internal `libc` path to find the macro. This doesn't work once it's
installed because we need to search from the root of the install dir.
This patch adds "include/" to the include path so that our inclusion
of installed headers always match the internal use.
Having libc_errno outside of the namespace causes versioning issues when
trying to link the tests against LLVM-libc. Most of this patch is just
moving libc_errno inside the namespace in tests. This isn't necessary in
the function implementations since those are already inside the
namespace.
This patch provides specific test macros to deal with `errno`.
This will help abstract away the differences between unit test and integration/hermetic tests in #79319.
In one case we use `libc_errno` which is a struct, in the other case we deal directly with `errno`.
Use a size smaller than the smallest supported page size so that we
don't
clobber over any guard pages, which may result in a segfault before
__stack_chk_fail can be called.
Also, move __stack_chk_fail outside of our namespace.
Summary:
We call the global constructors by function pointer. For whatever reason
the NVPTX architecture relies very specifically on the arguments to the
function pointer invocation matching what the function is implemented
as. This is problematic as most of these constructors are generated
with no arguments. This patch removes the extended arguments that GNU
and LLVM use for the constructors optionally so that it can support the
common case.
Summary:
The NVPTX backend is picky about the definitions of functions. Because
we call these functions with these arguments it can cause some problems
when it goes through the backend. This was observed in a different test
for `printf` that hasn't been landed yet. Also adjust the priority.
The test tries to set the guard_size and stack_size of a thread to
SIZE_MAX / 4, which is a huge value in 64-bit systems but 1GB in 32-bit
ones.
We increase the size to 3 * (SIZE_MAX / 4) so it can also fail in 32-bit
systems.
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
This patch mostly renames files so it better reflects the function they declare.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155607
The GPU port of the LLVM C library needs to export a few extensions to
the interface such that users can interface with it. This patch adds the
necessary logic to define a GPU extension. Currently, this only exports
a `rpc_reset_client` function. This allows us to use the server in
D147054 to set up the RPC interface outside of `libc`.
Depends on https://reviews.llvm.org/D147054
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152283
We provide the `send_n` and `recv_n` utilities as a generic way to
stream data between both sides of the process. This was previously
tested and performed as expected when using a string of constant size.
However, when the size was allowed to diverge between the threads in the
warp or wavefront this could deadlock. This did not occur on NVPTX
because of the use of the explicit warp sync. However, on AMD one of the
work items in the wavefront could continue executing and hit the next
`recv` call before the other threads, then we would deadlock as we
violated the RPC invariants.
This patch replaces the for loop with a thread ballot. This will cause
every thread in the warp or wavefront to continue executing the loop
until all of them can exit. This acts as a more explicit wavefront sync.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150992
Also adjust pthread_create_test to accomodate large page sizes. Both
these changes should now fix the full build builders.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D151158
Only functional for stack growsdown (same as before), but custom
`stack`, `stacksize`, `guardsize`, and `detachstate` all should be
working.
Differential Revision: https://reviews.llvm.org/D148290
Currently we provide the `send_n` and `recv_n` functions. These were
somewhat divergent and not tested on the GPU. This patch changes the
support to be more common. We do this my making the CPU provide an array
equal the to at least the lane size while the GPU can rely on the
private memory address of its stack variables. This allows us to send
data back and forth generically.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150379
The interface exported by the RPC library allows users to simply send
and recieve fixed sized packets without worrying about the data motion
underneath. However, this was broken in the current implementation. We
can think of the send and recieve implementations in terms of waiting
for ownership of the buffer, using the buffer, and posting ownership to
the other side. Our implementation of `recv` was incorrect in the
following scenarios.
recv -> send // we still own the buffer and should give away ownership
recv -> close // The other side is not waiting for data, this will
result in multiple openings of the same port
This patch attempts to fix this with an admittedly hacky fix where we
track if the previous implementation was a recv and post conditionally.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150327
Currently the opcode is only valid if it is the same between all of the
ports. This is possible to violate if the opcode is places into a memory
location and then read in a non-uniform manner by the warp / wavefront.
Moving this to a compile time constant makes it impossible to break this
invariant.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150115
The execution model of the GPU expects that groups of threads will
execute in lock-step in SIMD fashion. It's both important for
performance and correctness that we treat this as the smallest possible
granularity for an RPC operation. Thus, we map multiple threads to a
single larger buffer and ship that across the wire.
This patch makes the necessary changes to support executing the RPC on
the GPU with multiple threads. This requires some workarounds to mimic
the model when handling the protocol from the CPU. I'm not completely
happy with some of the workarounds required, but I think it should work.
Uses some of the implementation details from D148191.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D148943
This patch adds the necessary hacks to support global constructors and
destructors. This is an incredibly hacky process caused by the primary
fact that Nvidia does not provide any binary tools and very little
linker support. We first had to emit references to these functions and
their priority in D149451. Then we dig them out of the module once it's
loaded to manually create the list that the linker should have made for
us. This patch also contains a few Nvidia specific hacks, but it passes
the test, albeit with a stack size warning from `ptxas` for the
callback. But this should be fine given the resource usage of a common
test.
This also adds a dependency on LLVM to the NVPTX loader, which hopefully doesn't
cause problems with our CUDA buildbot.
Depends on D149451
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D149527
Removes the redundant Ack/Data bit manipulation.
Represents the inbox/outbox state with one bit instead of two. This will
be useful if we change to a packed representation and otherwise cuts the
runtime state space from 16 to 4.
Further simplification is possible, this patch is intentionally minimal.
- can_{send,recv}_data are now in == out
- {client,server}::try_open can be factored into Process:try_open
This implements the state machine of D148191, modulo differences in atomic
ordering and fences.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D149788
This patch makes the necessary changes to support calling global
constructors and destructors on the GPU. The patch in D149340 allows the
`lld` linker to create the symbols pointing us to these globals. These
should be executed by a single thread, which is more difficult on the
GPU because all threads are active. I chose to use an atomic counter to
sync every thread on the GPU. This is very slow if you use more than a
few thousand threads, but for testing purposes it should be sufficient.
Depends on D149340 D149363
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149398
This patch adds extra intrinsics for the GPU. Some of these are unused
for now but will be used later. We use these currently to update the
`RPC` handling. Currently, every thread can update the RPC client, which
isn't correct. This patch adds code neccesary to allow a single thread
to perfrom the write while the others wait.
Feedback is welcome for the naming of these functions. I'm copying the
OpenMP nomenclature where we call an AMD `wavefront` or NVIDIA `warp` a
`lane`.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D148810
Previously unconditionally stored to the return value. This is
incorrect, we should only return if user value is non-null.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D148293
The RPC interface can support multiple independent clients. This support
currently only supports many single-thread warps / workgroups
coordinating over a single lock. This patch uses the support added in
the previous patch to test the RPC interface with multiple blocks.
Note that this does not work with multiple threads currently because of
the effect of warps / workgroups executing in lockstep incorrectly. This
will be added later.
Depends on D148485
Reviewed By: lntue, sivachandra
Differential Revision: https://reviews.llvm.org/D148486
Currently, the RPC interface with the loader is only tested if the other
tests fail. This test adds a direct test that runs a simple integer
increment over the RPC handshake 10000 times.
Depends on https://reviews.llvm.org/D148288
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D148342
This patch performs the same operation to copy over the `argv` array to
the `envp` array. This allows the GPU tests to use environment
variables.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146322
This patch enables integration tests running on the GPU. This uses the
RPC interface implemented in D145913 to compile the necessary
dependencies for the integration test object. We can then use this to
compile the objects for the GPU directly and execute them using the AMD
HSA loader combined with its RPC server. For example, the compiler is
performing the following actions to execute the integration tests.
```
$ clang++ --target=amdgcn-amd-amdhsa -mcpu=gfx1030 -nostdlib -flto -ffreestanding \
crt1.o io.o quick_exit.o test.o rpc_client.o args_test.o -o image
$ ./amdhsa_loader image 1 2 5
args_test.cpp:24: Expected 'my_streq(argv[3], "3")' to be true, but is false
```
This currently only works with a single threaded client implementation
running on AMDGPU. Further work will implement multiple clients for AMD
and the ability to run on NVPTX as well.
Depends on D145913
Reviewed By: sivachandra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D146256
This generalizes handling of the integration tests. We now implicitly
depend on the `libc.startup.${LIBC_TARGET_OS}.crt1` file rather than
passing it in manually. This simplifies the interface.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146237
This doesn't seem to be used anymore after recent changes that removed
the `--sysroot` method for the integration tests.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D146040
The target "check-libc" now runs all enabled tests which, depending on
the build mode, includes the unit tests, the integration tests and the api
test.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D144663
This part of the effort to make all test related pieces into the `test`
directory. This helps is excluding test related pieces in a straight
forward manner if LLVM_INCLUDE_TESTS is OFF. Future patches will also move
the MPFR wrapper and testutils into the 'test' directory.
This adds "-pthreads" which appears to be a clang only
alias for "-pthread" (all the drivers check for both).
Use "-pthread" instead to be compatible with gcc.
Otherwise you get:
FAILED: bin/libc-gwp-asan-uaf-should-crash
: && /usr/bin/g++-11 <...> -pthreads <...> projects/libc/test/integration/scudo/liblibc_for_scudo_integration_test.a && :
g++-11: error: unrecognized command-line option ‘-pthreads’; did you mean ‘-pthread’?
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D143258
Summary:
A previous change removed a transient inclusion of `stdlib.h` from the
`string_utils.h` file which this test depended on. Include it directly
here.