Summary:
This patch implements 'getenv'. I was torn on how to implement this,
since realistically we only have access to this environment pointer in
the "loader" interface. An alternative would be to use an RPC call every
time, but I think that's overkill for what this will be used for. A
better solution is just to emit a common `DataEnvironment` that contains
all of the host visible resources to initialize. Right now this is the
`env_ptr`, `clock_freq`, and `rpc_client`.
I did this by making the `app.h` interface that Linux uses more general,
could possibly move that into a separate patch, but I figured it's
easier to see with the usage.
According to discussions on monthly meeting, we probably don't want to
cache `getpid` anymore. glibc removes their cache. bionic is hesitating
whether such cache is to be removed. `getpid` is async-signal-safe, so
we must make sure it always work.
However, for `gettid`, we have more freedom. Moreover, we are using
`gettid` to examine deadlock such that the performance penalty is not
negligible here. Thus, this patch is separated from previous patch to
provide only `tid` caching. It is much more simplified. Hopefully,
previous build issues can be resolved easily.
Previous commit uses wrong clock id and forget to release an additional
rdlock. cc @Eric977
Sorry for missing this in my initial review.
Fixes https://github.com/llvm/llvm-project/issues/100960.
Notice that the timestamp is created via
```c++
LIBC_NAMESPACE::clock_gettime(CLOCK_REALTIME, &ts);
ts.tv_nsec += 50'000;
if (ts.tv_nsec >= 1'000'000'000) {
ts.tv_nsec -= 1'000'000'000;
ts.tv_sec += 1;
}
```
Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.
Depends on https://github.com/llvm/llvm-project/pull/96015.
When scudo is built with LLVM-libc's headers, certain functions also
need to be linked from LLVM-libc. This patch adds those functions to the
list to be linked into the specific scudo test, which uses a minimal
subset of libc.
Fixes#92861 and #59453
Summary:
This patch adds a temporary implementation that uses a struct-based
interface in lieu of varargs support. Once varargs support exists we
will move this implementation to the "real" printf implementation.
Conceptually, this patch has the client copy over its format string and
arguments to the server. The server will then scan the format string
searching for any specifiers that are actually a string. If it is a
string then we will send the pointer back to the server to tell it to
copy it back. This copied value will then replace the pointer when the
final formatting is done.
This will require a built-in extension to the varargs support to get
access to the underlying struct. The varargs used on the GPU will simply
be a struct wrapped in a varargs ABI.
Summary:
The GPU uses a SIMT execution model. That means that each value actually
belongs to a group of 32 or 64 other lanes executing next to it. These
platforms offer some intrinsic fuctions to actually take elements from
neighboring lanes. With these we can do parallel scans or reductions.
These functions do not have an immediate user, but will be used in the
allocator interface that is in-progress and are generally good to have.
This patch is a precommit for these new utilitly functions.
Towards the goal of getting `ninja libc-lint` back to green, fix the numerous
instances of:
warning: header guard does not follow preferred style [llvm-header-guard]
This is because many of our header guards start with `__LLVM` rather than
`LLVM`.
To filter just these warnings:
$ ninja -k2000 libc-lint 2>&1 | grep llvm-header-guard
To automatically apply fixits:
$ find libc/src libc/include libc/test -name \*.h | \
xargs -n1 -I {} clang-tidy {} -p build/compile_commands.json \
-checks='-*,llvm-header-guard' --fix --quiet
Some manual cleanup is still necessary as headers that were missing header
guards outright will have them inserted before the license block (we prefer
them after).
Summary:
Recent changes added an include path in the float128 type that used the
internal `libc` path to find the macro. This doesn't work once it's
installed because we need to search from the root of the install dir.
This patch adds "include/" to the include path so that our inclusion
of installed headers always match the internal use.
Having libc_errno outside of the namespace causes versioning issues when
trying to link the tests against LLVM-libc. Most of this patch is just
moving libc_errno inside the namespace in tests. This isn't necessary in
the function implementations since those are already inside the
namespace.
This patch provides specific test macros to deal with `errno`.
This will help abstract away the differences between unit test and integration/hermetic tests in #79319.
In one case we use `libc_errno` which is a struct, in the other case we deal directly with `errno`.
Use a size smaller than the smallest supported page size so that we
don't
clobber over any guard pages, which may result in a segfault before
__stack_chk_fail can be called.
Also, move __stack_chk_fail outside of our namespace.
Summary:
We call the global constructors by function pointer. For whatever reason
the NVPTX architecture relies very specifically on the arguments to the
function pointer invocation matching what the function is implemented
as. This is problematic as most of these constructors are generated
with no arguments. This patch removes the extended arguments that GNU
and LLVM use for the constructors optionally so that it can support the
common case.
Summary:
The NVPTX backend is picky about the definitions of functions. Because
we call these functions with these arguments it can cause some problems
when it goes through the backend. This was observed in a different test
for `printf` that hasn't been landed yet. Also adjust the priority.
The test tries to set the guard_size and stack_size of a thread to
SIZE_MAX / 4, which is a huge value in 64-bit systems but 1GB in 32-bit
ones.
We increase the size to 3 * (SIZE_MAX / 4) so it can also fail in 32-bit
systems.
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
This patch mostly renames files so it better reflects the function they declare.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155607
The GPU port of the LLVM C library needs to export a few extensions to
the interface such that users can interface with it. This patch adds the
necessary logic to define a GPU extension. Currently, this only exports
a `rpc_reset_client` function. This allows us to use the server in
D147054 to set up the RPC interface outside of `libc`.
Depends on https://reviews.llvm.org/D147054
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152283
We provide the `send_n` and `recv_n` utilities as a generic way to
stream data between both sides of the process. This was previously
tested and performed as expected when using a string of constant size.
However, when the size was allowed to diverge between the threads in the
warp or wavefront this could deadlock. This did not occur on NVPTX
because of the use of the explicit warp sync. However, on AMD one of the
work items in the wavefront could continue executing and hit the next
`recv` call before the other threads, then we would deadlock as we
violated the RPC invariants.
This patch replaces the for loop with a thread ballot. This will cause
every thread in the warp or wavefront to continue executing the loop
until all of them can exit. This acts as a more explicit wavefront sync.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150992
Also adjust pthread_create_test to accomodate large page sizes. Both
these changes should now fix the full build builders.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D151158
Only functional for stack growsdown (same as before), but custom
`stack`, `stacksize`, `guardsize`, and `detachstate` all should be
working.
Differential Revision: https://reviews.llvm.org/D148290
Currently we provide the `send_n` and `recv_n` functions. These were
somewhat divergent and not tested on the GPU. This patch changes the
support to be more common. We do this my making the CPU provide an array
equal the to at least the lane size while the GPU can rely on the
private memory address of its stack variables. This allows us to send
data back and forth generically.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150379
The interface exported by the RPC library allows users to simply send
and recieve fixed sized packets without worrying about the data motion
underneath. However, this was broken in the current implementation. We
can think of the send and recieve implementations in terms of waiting
for ownership of the buffer, using the buffer, and posting ownership to
the other side. Our implementation of `recv` was incorrect in the
following scenarios.
recv -> send // we still own the buffer and should give away ownership
recv -> close // The other side is not waiting for data, this will
result in multiple openings of the same port
This patch attempts to fix this with an admittedly hacky fix where we
track if the previous implementation was a recv and post conditionally.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150327
Currently the opcode is only valid if it is the same between all of the
ports. This is possible to violate if the opcode is places into a memory
location and then read in a non-uniform manner by the warp / wavefront.
Moving this to a compile time constant makes it impossible to break this
invariant.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150115
The execution model of the GPU expects that groups of threads will
execute in lock-step in SIMD fashion. It's both important for
performance and correctness that we treat this as the smallest possible
granularity for an RPC operation. Thus, we map multiple threads to a
single larger buffer and ship that across the wire.
This patch makes the necessary changes to support executing the RPC on
the GPU with multiple threads. This requires some workarounds to mimic
the model when handling the protocol from the CPU. I'm not completely
happy with some of the workarounds required, but I think it should work.
Uses some of the implementation details from D148191.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D148943
This patch adds the necessary hacks to support global constructors and
destructors. This is an incredibly hacky process caused by the primary
fact that Nvidia does not provide any binary tools and very little
linker support. We first had to emit references to these functions and
their priority in D149451. Then we dig them out of the module once it's
loaded to manually create the list that the linker should have made for
us. This patch also contains a few Nvidia specific hacks, but it passes
the test, albeit with a stack size warning from `ptxas` for the
callback. But this should be fine given the resource usage of a common
test.
This also adds a dependency on LLVM to the NVPTX loader, which hopefully doesn't
cause problems with our CUDA buildbot.
Depends on D149451
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D149527