This patch implements `hcreate(_r)/hsearch(_r)/hdestroy(_r)` as
specified in https://man7.org/linux/man-pages/man3/hsearch.3.html.
Notice that `neon/asimd` extension is not yet added in this patch.
- The implementation is largely simplified from rust's
[`hashbrown`](https://github.com/rust-lang/hashbrown/blob/master/src/raw/mod.rs)
as we only consider fix-sized insertion-only hashtables. Technical
details are provided in code comments.
- This patch also contains a portable string hash function, which is
derived from [`aHash`](https://github.com/tkaitchuck/aHash)'s fallback
routine. Not using any SIMD acceleration, it has a good enough quality
(passing all SMHasher tests) and is not too bad in speed.
- Some general functionalities are added, such as `memory_size`,
`offset_to`(alignment), `next_power_of_two`, `is_power_of_two`.
`ctz/clz` are extended to support shorter integers.
Summary:
The `fgets` function as implemented is not functional currently when
called with multiple threads. This is because we rely on reapeatedly
polling the character to detect EOF. This doesn't work when there are
multiple threads that may with to poll the characters. this patch pulls
out the logic into a standalone RPC call to handle this in a single
operation such that calling it from multiple threads functions as
expected. It also makes it less slow because we no longer make N RPC
calls for N characters.
Summary:
This function follows closely with the pattern of all the other
functions. That is, making a new opcode and forwarding the call to the
host. However, this also required modifying the test somewhat. It seems
that not all `libc` implementations follow the same error rules as are
tested here, and it is not explicit in the standard, so we simply
disable these EOF checks when targeting the GPU.
Summary:
This enum previously manually specified the value. This just made it
unnecessarily difficult to add new ones without changing everything.
This patch also makes it compatible with C by removing the `:`
annotation and instead using the `LAST` method.
Summary:
This patch adds the necessary entrypoints to handle the `fseek`,
`fflush`, and `ftell` functions. These are all very straightfoward, we
simply make RPC calls to the associated function on the other end.
Implementing it this way allows us to more or less borrow the state of
the stream from the server as we intentionally maintain no internal
state on the GPU device. However, this does not implement the `errno`
functinality so that must be ignored.
Summary:
Normally, the implementation of `puts` simply writes a second newline
charcter after printing the first string. However, because the GPU does
everything in batches of the SIMT group size, this will end up with very
poor output where you get the strings printed and then 1-64 newline
characters all in a row. Optimizations like to turn `printf` calls into
`puts` so it's a good idea to make this produce the expected output.
The least invasive way I could do this was to add a new opcode. It's a
little bloated, but it avoids an unneccessary and slow send operation to
configure this.
Summary:
Previously, the `fread` operation was wrong in cases when we read less
data than was requested. That is, if we tried to read N bytes while the
file was in EOF, it would still copy N bytes of garbage. This is fixed
by only copying over the sizes we got from locally opening it rather
than just using the provided size.
Additionally, this patch simplifies the interface. The output functions
have special variants for writing to stdout / stderr. This is primarily
an optimization for these common cases so we can avoid sending the
stream as an argument which has a high delay. Because for input, we
already need to start with a `send` to tell the server how much data to
read, it costs us nothing to send the file along with it so this is
redundant. Re-use the file encoding scheme from the other
implementations, the one that stores the stream type in the LSBs of the
FILE pointer.
This patch updates the siginfo_t struct definition to match the
definition from the kernel here:
https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/siginfo.h
In particular, there are two main changes:
1. swap position of si_code and si_errno: si_code show come after
si_errno in all systems except MIPS. Since we don't MIPS, the order is
fixed for now, but can be easily \#ifdef'd if MIPS support is
implemented in the future.
2. We add a union of structs that are filled depending on the signal
raised.
This change was required for the fork and spawn integration tests in
rv32, since they fork/clone the running process, call
wait/waitid/waitpid, and read the status, which was wrong in rv32
because wait/waitid/waitpid are implemented in rv32 using SYS_waitid.
SYS_waitid takes a pointer to a siginfo_t and fills the proper fields in
the struct. The previous siginfo_t definition was being incorrectly
filled due to not taking into account the signal raised.
Summary:
This patch implements the `fgets`, `getc`, `fgetc`, and `getchar`
functions on the GPU. Their implementations are straightforward enough.
One thing worth noting is that the implementation of `fgets` will be
extremely slow due to the high latency to read a single char. A faster
solution would be to make a new RPC call to call `fgets` (due to the
special rule that newline or null breaks the stream). But this is left
out because performance isn't the primary concern here.
This patch changes the size of time_t to be an int64_t. This still
follows the POSIX standard which only requires time_t to be an integer.
Making time_t a 64-bit integer also fixes two cases in 32 bits platforms
that use SYS_clock_nanosleep_time64 and SYS_clock_gettime64, as the name
of these calls implies, they require a 64-bit time_t. For instance, in rv32,
the 32-bit version of these syscalls is not available.
We also follow glibc here, where time_t is still a 32-bit integer in
arm32.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159125
libc uses SYS_prlimit64 (which takes a struct rlimit64) to implement
setrlimt and getrlimit (which take a struct rlimit). In 64-bit bits
systems this is not an issue since the members of struct rlimit64 and
struct rlimit are 64 bits long, however, in 32-bit systems the members
of struct rlimit are only 32 bits long, causing wrong values being
passed to SYS_prlimit64.
This patch changes rlim_t to be __UINT64_TYPE__ (which also changes
rlimit as a side-effect), fixing the problem of mismatching types in
the syscall.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159104
This function implements the `abort` function on the GPU. The
implementation here closely mirros the `exit` call where we first
synchornize with the RPC server to make sure it's listening and then we
exit on the GPU.
I was unsure if this should be a simple `__builtin_assert` on the GPU. I
elected to go with an RPC approach to make this a more "true" `abort`
call. That is, it should invoke some signal handlers and exit with the
proper code according to the implemented C library on the server.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159210
This patch adds support for `fread` on the GPU via the RPC mechanism.
Here we simply pass the size of the read to the server and then copy it
back to the client via the RPC channel. This should allow us to do the
basic operations on files now. This will obviously be slow for large
sizes due ot the number of RPC calls involved, this could be optimized
further by having a special RPC call that can initiate a memcpy between
the two pointers.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155121
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
This patch adds the `rpc_host_call` function as a GPU extension. This is
exported from the `libc` project to use the RPC interface to call a
function pointer via RPC any copying the arguments by-value. The
interface can only support a single void pointer argument much like
pthreads. The function call here is the bare-bones version of what's
required for OpenMP reverse offloading. Full support will require
interfacing with the mapping table, nowait support, etc.
I decided to test this interface in `libomptarget` as that will be the
primary consumer and it would be more difficult to make a test in `libc`
due to the testing infrastructure not really having a concept of the
"host" as it runs directly on the GPU as if it were a CPU target.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D155003
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D154519
The GPU port of the LLVM C library needs to export a few extensions to
the interface such that users can interface with it. This patch adds the
necessary logic to define a GPU extension. Currently, this only exports
a `rpc_reset_client` function. This allows us to use the server in
D147054 to set up the RPC interface outside of `libc`.
Depends on https://reviews.llvm.org/D147054
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152283
This patch adds the reentrent qsort entrypoint, qsort_r. This is done by
extending the qsort functionality and moving it to a shared utility
header. For this reason the qsort_r tests focus mostly on the places
where it differs from qsort, since they share the same sorting code.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D152467
This patch updates the struct dirent to be on par with glibc (by adding
a missing d_type member) and update the readdir call to use SYS_getdents64
instead of SYS_getdents.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D147738
This patch adds the function "socket" from the header "sys/socket". It's
a simple syscall wrapper, and I plan on adding the related functions in
a followup patch.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149622
This patch enables us to use the existing `libc` support for string
conversion functions on the GPU. This required setting the `fenv_t` and
long double configuration. As far as I am aware, long doubles are
converted to doubles on the GPU and the floating point environment is
just an `uint32_t`.
This code is still untested as we are still working out how to run the
unit tests on the GPU.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D149306
The stdio test failures were due to headers potentially not being built
in the correct order. This should set up the dependencies correctly.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D146551
This patch implements setjmp and longjmp in riscv using inline asm. The
following changes were required:
* Omit frame pointer: otherwise gcc won't allow us to use s0
* Use __attribute__((naked)): otherwise both gcc and clang will generate
function prologue and epilogue in both functions. This doesn't happen
in x86_64, so we guard it to only riscv
Furthermore, using __attribute__((naked)) causes two problems: we
can't use `return 0` (both gcc and clang) and the function arguments in
the function body (clang only), so we had to use a0 and a1 directly.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D145584
This patch adds the wchar header, as well as the functions to convert to
and from wide chars. The header also sets up the definitions for wint
and wchar.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D145995
The implementation currently ignores all spawn attributes. Support for
them will be added in future changes.
A simple allocator for integration tests has been added so that the
integration test for posix_spawn can use the
posix_spawn_file_actions_add* functions.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D135752
Previously the futex type was defined in terms of unsigned int, now it's
uint32, which is more portable.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D135408
They were disabled because we were including linux/signal.h from our
signal.h. Linux's signal.h is not designed to be included from user
programs as it causes a lot of non-standard name pollution. Also, it is
not self-contained. This change defines types and macros relevant for
signal related syscalls within libc's headers and removes inclusion of
Linux headers.
This patch enables the funtions only for x86_64. They will be enabled
for aarch64 also in a follow up patch after testing.
Reviewed By: abrachet, lntue
Differential Revision: https://reviews.llvm.org/D134567
The existing thrd_once function has been refactored so that the
implementation can be shared between thrd_once and pthread_once
functions.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D134716
Tested:
Limited unit test: This makes a call and checks that no error was
returned, but we currently don't have the ability to ensure that
time has elapsed as expected.
Co-authored-by: Jeff Bailey <jeffbailey@google.com>
Reviewed By: sivachandra, jeffbailey
Differential Revision: https://reviews.llvm.org/D134095