clang-p2996

Author	SHA1	Message	Date
Jinsong Ji	e85a9f5540	libc: Prefix RPC Status code to avoid conflict in windows build (#119991 ) Somehow conflict with define in wingdi.h. Fix build failures: [ 52%] Building CXX object projects/offload/plugins-nextgen/common/CMakeFiles/PluginCommon.dir/src/RPC.cpp.obj In file included from ...llvm\offload\plugins-nextgen\common\src\RPC.cpp:16: ...\llvm\libc\shared\rpc.h(48,3): error: expected identifier 48 \| ERROR = 0x1000, \| ^ c:\Program files (x86)\Windows Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from macro 'ERROR' 118 \| #define ERROR 0 \| ^ ...\llvm\offload\plugins-nextgen\common\src\RPC.cpp(75,17): error: expected unqualified-id 75 \| return rpc::ERROR; \| ^ c:\Program files (x86)\Windows Kits\10\include\10.0.22000.0\um\wingdi.h(118,29): note: expanded from macro 'ERROR' 118 \| #define ERROR 0 \| ^ 2 errors generated.	2024-12-15 09:35:44 -05:00
Joseph Huber	a6ef0debb1	[libc][NFC] Rename RPC opcodes to better reflect their usage Summary: RPC_ is a generic prefix here, use LIBC_ to indicate that these are opcodes used to implement the C library	2024-12-02 15:35:08 -06:00
Joseph Huber	1d810ece2b	[libc] Move libc server handlers to a shared header (#117908 ) Summary: We can simply include this header from the shared directory now and do not need to have this level of indirection. Simply stash it with the other libc opcode handlers. If we were able to move the printf handlers to the shared directory then this could just be a header as well, which would HEAVILY simplify the mess associated with building the RPC server first in the projects build, then copying it to the runtimes build.	2024-11-27 14:57:52 -06:00
Joseph Huber	d7c20a6f0c	[libc][NFC] Move RPC opcodes to the 'shared/' directory as well	2024-11-25 12:04:10 -06:00
Joseph Huber	b4d49fb52e	[libc] Remove RPC server API and use the header directly (#117075 ) Summary: This patch removes much of the `llvmlibc_rpc_server` interface. This pretty much deletes all of this code and just replaces it with including `rpc.h` directly. We still maintain the file to let `libc` handle the opcodes, since those depend on the `printf` impelmentation. This will need to be cleaned up more, but I don't want to put too much into a single patch.	2024-11-25 07:13:28 -06:00
Joseph Huber	89614ceb40	[libc] Move RPC interface to `libc/shared` to export it (#117034 ) Summary: Previous patches have made the `rpc.h` header independent of the `libc` internals. This allows us to include it directly rather than providing an indirect C API. This patch only does the work to move the header. A future patch will pull out the `rpc_server` interface and simply replace it with a single function that handles the opcodes.	2024-11-22 15:32:25 -06:00
Joseph Huber	27d25d1c12	[libc] Increase RPC opcode to 32-bit and use a class byte (#116905 ) Summary: Currently, the RPC interface uses a basic opcode to communicate with the server. This currently is 16 bits. There's no reason for this to be 16 bits, because on the GPU a 32-bit write is the same as a 16-bit write performance wise. Additionally, I am now making all the `libc` based opcodes qualified with the 'c' type, mimiciing how Linux handles `ioctls` all coming from the same driver. This will make it easier to extend the interface when it's exported directly.	2024-11-19 21:56:10 -06:00
Joseph Huber	be0c67c90e	[libc] Remove dependency on `cpp::function` in `rpc.h` (#112422 ) Summary: I'm going to attempt to move the `rpc.h` header to a separate folder that we can install and include outside of `libc`. Before doing this I'm going to try to trim up the file so there's not as many things I need to copy to make it work. This dependency on `cpp::functional` is a low hanging fruit. I only did it so that I could overload the argument of the work function so that passing the id was optional in the lambda, that's not a huge deal and it makes it more explicit I suppose.	2024-10-15 12:31:06 -07:00
Ivan Butygin	26ca8ef836	[libc] GPU RPC interface: add return value to `rpc_host_call` (#111288 )	2024-10-06 20:22:07 +03:00
Ivan Butygin	bbe79a803c	[libc] Use RAII alloc in gpu rpc printf impl (#110352 )	2024-09-28 15:44:01 +03:00
Ivan Butygin	ef390b36ca	[libc] Use RAII based alloc in gpu rpc_server instead of manual new/delete (#110341 ) Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-09-28 11:53:21 +03:00
Joseph Huber	b712a1445b	[libc] Fix memory leak and accidentally ignoring dimensions in loader Summary: The loader had a bug where we weren't setting the dimensions correctly, also I forgot to delete the paths for this RPC call.	2024-09-27 09:57:44 -05:00
Joseph Huber	fe6a3d46aa	[libc] Implement the 'rename' function on the GPU (#109814 ) Summary: Straightforward implementation like the other `stdio.h` functions.	2024-09-24 09:32:42 -07:00
Joseph Huber	16d11e26f3	[libc] Add GPU support for the 'system' function (#109687 ) Summary: This function can easily be implemented by forwarding it to the host process. This shows up in a few places that we might want to test the GPU so it should be provided. Also, I find the idea of the GPU offloading work to the CPU via `system` very funny.	2024-09-23 14:04:28 -07:00
Joseph Huber	f126bc984c	[libc] Fix conflict values from internal `limits.h` when used externally	2024-08-07 10:09:02 -05:00
Joseph Huber	c8e69fa4a0	[libc] Fix GPU 'printf' on strings with padding Summary: We get the `strlen` to know how much memory to allocate here, but it wasn't taking into account if the padding was larger than the string itself. This patch sets it to an empty string so we always add the minimum size. This implementation is slightly wasteful with memory, but I am not concerned with a few extra bytes here and there for some memory that gets immediately free'd.	2024-07-20 22:36:12 -05:00
Joseph Huber	40effc7af5	[libc] Implement (v\|f)printf on the GPU (#96369 ) Summary: This patch implements the `printf` family of functions on the GPU using the new variadic support. This patch adapts the old handling in the `rpc_fprintf` placeholder, but adds an extra RPC call to get the size of the buffer to copy. This prevents the GPU from needing to parse the string. While it's theoretically possible for the pass to know the size of the struct, it's prohibitively difficult to do while maintaining ABI compatibility with NVIDIA's varargs. Depends on https://github.com/llvm/llvm-project/pull/96015.	2024-07-12 19:36:13 -05:00
Joseph Huber	ec0e6ef09b	[libc] Implement the 'remove' function on the GPU (#97096 ) Summary: Straightforward RPC implementation of the `remove` function for the GPU. Copies over the string and calls `remove` on it, passing the result back. This is required for building some `libc++` functionality.	2024-07-01 06:29:48 -05:00
Joseph Huber	7327014b49	[libc] Implement temporary `printf` on the GPU (#85331 ) Summary: This patch adds a temporary implementation that uses a struct-based interface in lieu of varargs support. Once varargs support exists we will move this implementation to the "real" printf implementation. Conceptually, this patch has the client copy over its format string and arguments to the server. The server will then scan the format string searching for any specifiers that are actually a string. If it is a string then we will send the pointer back to the server to tell it to copy it back. This copied value will then replace the pointer when the final formatting is done. This will require a built-in extension to the varargs support to get access to the underlying struct. The varargs used on the GPU will simply be a struct wrapped in a varargs ABI.	2024-04-02 16:25:18 -05:00
Joseph Huber	a1a8bb1d3a	[libc] Change RPC interface to not use device ids (#87087 ) Summary: The current implementation of RPC tied everything to device IDs and forced us to do init / shutdown to manage some global state. This turned out to be a bad idea in situations where we want to track multiple hetergeneous devices that may report the same device ID in the same process. This patch changes the interface to instead create an opaque handle to the internal device and simply allocates it via `new`. The user will then take this device and store it to interface with the attached device. This interface puts the burden of tracking the device identifier to mapped d evices onto the user, but in return heavily simplifies the implementation.	2024-03-29 12:49:16 -05:00
Marc Auberer	77118536b5	[libc] Remove obsolete LIBC_HAS_BUILTIN macro (#86554 ) Fixes #86546 and removes the macro `LIBC_HAS_BUILTIN`. This was necessary to support older compilers that did not support `__has_builtin`. All of the compilers we support already have this builtin. See: https://libc.llvm.org/compiler_support.html All uses now use `__has_builtin` directly cc @nickdesaulniers	2024-03-27 17:22:41 +01:00
Joseph Huber	8a79003307	[libc] Move RPC opcodes include out of the header Summary: This header isn't strictly necessary, and is currently broken because we install these to separate locations.	2024-03-10 14:07:47 -05:00
Joseph Huber	29762e3722	[libc][NFCI] Remove lane size template argument on RPC server (#84557 ) Summary: We previously changed the data layout for the RPC buffer to make it lane size agnostic. I put off changing the size for the server case to make the patch smaller. This patch simply reorganizes code by making the lane size an argument to the port rather than a templtae size. Heavily simplifies a lot of code, no more `std::variant`.	2024-03-08 15:02:19 -06:00
Joseph Huber	1a2ecbb398	[libc] Remove 'llvm-gpu-none' directory from build (#82816 ) Summary: This directory is leftover from when we handled both AMDGPU and NVPTX in the same build and merged them into a pseudo triple. Now the only thing it contains is the RPC server header. This gets rid of it, but now that it's in the base install directory we should make it clear that it's an LLVM libc header.	2024-02-23 14:11:31 -06:00
Joseph Huber	f879ac0385	[libc] Rework the RPC interface to accept runtime wave sizes (#80914 ) Summary: The RPC interface needs to handle an entire warp or wavefront at once. This is currently done by using a compile time constant indicating the size of the buffer, which right now defaults to some value on the client (GPU) side. However, there are currently attempts to move the `libc` library to a single IR build. This is problematic as the size of the wave fronts changes between ISAs on AMDGPU. The builitin `__builtin_amdgcn_wavefrontsize()` will return the appropriate value, but it is only known at runtime now. In order to support this, this patch restructures the packet. Now instead of having an array of arrays, we simply have a large array of buffers and slice it according to the runtime value if we don't know it ahead of time. This also somewhat has the advantage of making the buffer contiguous within a page now that the header has been moved out of it.	2024-02-13 10:45:43 -06:00
Joseph Huber	bf02c84cb8	[libc] Use file lock to join newline on RPC puts call (#73373 ) Summary: The puts call appends a newline. With multiple threads, this can be done out of order such that another thread puts something before we finish appending the newline. Add a flockfile and funlockfile to ensure that the whole string is printed before another string can appear.	2023-11-27 08:41:15 -06:00
Joseph Huber	a39215768b	[libc] Rework the 'fgets' implementation on the GPU (#69635 ) Summary: The `fgets` function as implemented is not functional currently when called with multiple threads. This is because we rely on reapeatedly polling the character to detect EOF. This doesn't work when there are multiple threads that may with to poll the characters. this patch pulls out the logic into a standalone RPC call to handle this in a single operation such that calling it from multiple threads functions as expected. It also makes it less slow because we no longer make N RPC calls for N characters.	2023-10-19 17:00:01 -04:00
Joseph Huber	ddc30ff802	[libc] Implement the 'ungetc' function on the GPU (#69248 ) Summary: This function follows closely with the pattern of all the other functions. That is, making a new opcode and forwarding the call to the host. However, this also required modifying the test somewhat. It seems that not all `libc` implementations follow the same error rules as are tested here, and it is not explicit in the standard, so we simply disable these EOF checks when targeting the GPU.	2023-10-17 13:02:31 -05:00
Joseph Huber	1a5d3b6cda	[libc] Scan the ports more fairly in the RPC server (#66680 ) Summary: Currently, we use the RPC server to respond to different ports which each contain a request from some client thread wishing to do work on the server. This scan starts at zero and continues until its checked all ports at which point it resets. If we find an active port, we service it and then restart the search. This is bad for two reasons. First, it means that we will always bias the lower ports. If a thread grabs a high port it will be stuck for a very long time until all the other work is done. Second, it means that the `handle_server` function can technically run indefinitely as long as the client is always pushing new work. Because the OpenMP implementation uses the user thread to service the kernel, this means that it could be stalled with another asyncrhonous device's kernels. This patch addresses this by making the server restart at the next port over. This means we will always do a full scan of the ports before quitting.	2023-09-26 16:09:48 -05:00
Joseph Huber	7ac8e26fc7	[libc] Implement `fseek`, `fflush`, and `ftell` on the GPU (#67160 ) Summary: This patch adds the necessary entrypoints to handle the `fseek`, `fflush`, and `ftell` functions. These are all very straightfoward, we simply make RPC calls to the associated function on the other end. Implementing it this way allows us to more or less borrow the state of the stream from the server as we intentionally maintain no internal state on the GPU device. However, this does not implement the `errno` functinality so that must be ignored.	2023-09-26 09:46:46 -05:00
Guillaume Chatelet	b6bc9d72f6	[libc] Mass replace enclosing namespace (#67032 ) This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079	2023-09-26 11:45:04 +02:00
Joseph Huber	791b279924	[libc] Change the `puts` implementation on the GPU (#67189 ) Summary: Normally, the implementation of `puts` simply writes a second newline charcter after printing the first string. However, because the GPU does everything in batches of the SIMT group size, this will end up with very poor output where you get the strings printed and then 1-64 newline characters all in a row. Optimizations like to turn `printf` calls into `puts` so it's a good idea to make this produce the expected output. The least invasive way I could do this was to add a new opcode. It's a little bloated, but it avoids an unneccessary and slow send operation to configure this.	2023-09-25 11:17:22 -05:00
Joseph Huber	f548d19fc8	[libc] Fix and simplify the implementation of 'fread' on the GPU (#66948 ) Summary: Previously, the `fread` operation was wrong in cases when we read less data than was requested. That is, if we tried to read N bytes while the file was in EOF, it would still copy N bytes of garbage. This is fixed by only copying over the sizes we got from locally opening it rather than just using the provided size. Additionally, this patch simplifies the interface. The output functions have special variants for writing to stdout / stderr. This is primarily an optimization for these common cases so we can avoid sending the stream as an argument which has a high delay. Because for input, we already need to start with a `send` to tell the server how much data to read, it costs us nothing to send the file along with it so this is redundant. Re-use the file encoding scheme from the other implementations, the one that stores the stream type in the LSBs of the FILE pointer.	2023-09-21 14:28:06 -05:00
Joseph Huber	e2bc0f9266	[libc][NFC] Remove unused function from the RPC server Summary: I missed removing this now-unused function in the previous patch. Remove it to clean up the interface.	2023-09-21 11:56:48 -05:00
Joseph Huber	59896c168a	[libc] Remove the 'rpc_reset' routine from the RPC implementation (#66700 ) Summary: This patch removes the `rpc_reset` function. This was previously used to initialize the RPC client on the device by setting up the pointers to communicate with the server. The purpose of this was to make it easier to initialize the device for testing. However, this prevented us from enforcing an invariant that the buffers are all read-only from the client side. The expected way to initialize the server is now to copy it from the host runtime. This will allow us to maintain that the RPC client is in the constant address space on the GPU, potentially through inference, and improving caching behaviour.	2023-09-21 11:07:09 -05:00
Joseph Huber	a1be5d69df	[libc] Implement more input functions on the GPU (#66288 ) Summary: This patch implements the `fgets`, `getc`, `fgetc`, and `getchar` functions on the GPU. Their implementations are straightforward enough. One thing worth noting is that the implementation of `fgets` will be extremely slow due to the high latency to read a single char. A faster solution would be to make a new RPC call to call `fgets` (due to the special rule that newline or null breaks the stream). But this is left out because performance isn't the primary concern here.	2023-09-14 15:39:29 -05:00
Joseph Huber	07102a1194	[libc] Implement the 'abort' function on the GPU This function implements the `abort` function on the GPU. The implementation here closely mirros the `exit` call where we first synchornize with the RPC server to make sure it's listening and then we exit on the GPU. I was unsure if this should be a simple `__builtin_assert` on the GPU. I elected to go with an RPC approach to make this a more "true" `abort` call. That is, it should invoke some signal handlers and exit with the proper code according to the implemented C library on the server. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159210	2023-08-31 08:40:15 -05:00
Joseph Huber	7fd9f0f4e0	[libc] Remove `MAX_LANE_SIZE` definition from the RPC server This `MAX_LANE_SIZE` was a hack from the days when we used a single instance of the server and had some GPU state handle it. Now that we have everything templated this really shouldn't be used. This patch removes its use and replaces it with template arguments. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D158633	2023-08-23 12:09:30 -05:00
Joseph Huber	334bbc0d67	[libc] Add support for the 'fread' function on the GPU This patch adds support for `fread` on the GPU via the RPC mechanism. Here we simply pass the size of the read to the server and then copy it back to the client via the RPC channel. This should allow us to do the basic operations on files now. This will obviously be slow for large sizes due ot the number of RPC calls involved, this could be optimized further by having a special RPC call that can initiate a memcpy between the two pointers. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155121	2023-07-26 13:51:35 -05:00
Joseph Huber	a42c1f8d97	[libc][Obvious] Fix use of `fwrite` in the RPC server Summary: The RPC server used the size field which meant we didn't get the correct return value for partial reads. We fix that here.	2023-07-26 11:13:38 -05:00
Joseph Huber	c381a94753	[libc] Remove test RPC opcodes from the exported header This patch does the noisy work of removing the test opcodes from the exported interface to an interface that is only visible in `libc`. The benefit of this is that we both test the exported RPC registration more directly, and we do not need to give this interface to users. I have decided to export any opcode that is not a "core" libc feature as having its MSB set in the opcode. We can think of these as non-libc "extensions". Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154848	2023-07-21 15:36:36 -05:00
Joseph Huber	d3aabeb7b5	[libc] Treat the locks array as a bitfield Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply turn this into a bitfield. I did this manually rather than having a separate type as we need very special handling of the masks used to interact with the locks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155511	2023-07-21 10:49:11 -05:00
Joseph Huber	e537c83975	[libc] Add basic support for calling host functions from the GPU This patch adds the `rpc_host_call` function as a GPU extension. This is exported from the `libc` project to use the RPC interface to call a function pointer via RPC any copying the arguments by-value. The interface can only support a single void pointer argument much like pthreads. The function call here is the bare-bones version of what's required for OpenMP reverse offloading. Full support will require interfacing with the mapping table, nowait support, etc. I decided to test this interface in `libomptarget` as that will be the primary consumer and it would be more difficult to make a test in `libc` due to the testing infrastructure not really having a concept of the "host" as it runs directly on the GPU as if it were a CPU target. Reviewed By: jplehr Differential Revision: https://reviews.llvm.org/D155003	2023-07-19 10:11:46 -05:00
Joseph Huber	979fb95021	Revert "[libc] Treat the locks array as a bitfield" Summary: This caused test failures on the gfx90a buildbot. This works on my gfx1030 and the Nvidia buildbots, so we'll need to investigate what is going wrong here. For now revert it to get the bots green. This reverts commit `05abcc5792`.	2023-07-19 09:27:08 -05:00
Joseph Huber	05abcc5792	[libc] Treat the locks array as a bitfield Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply turn this into a bitfield. I did this manually rather than having a separate type as we need very special handling of the masks used to interact with the locks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155511	2023-07-18 11:34:21 -05:00
Joseph Huber	a608076726	[libc][Obvious] Check if the state hasn't already been destroyed on shutdown This ensures that if someone calls the `rpc_shutdown` method multiple times it will not segfault and gracefully continue. This was causing problems in the OpenMP usage. This could point to other issues, but for now this is a safe fix. Differential Revision: https://reviews.llvm.org/D155005	2023-07-11 14:35:38 -05:00
Joseph Huber	c850ea1498	[libc] Support fopen / fclose on the GPU This patch adds the necessary support for the fopen and fclose functions to work on the GPU via RPC. I added a new test that enables testing this with the minimal features we have on the GPU. I will update it once we have `fread` and `fwrite` to actually check the outputted strings. For now I just relied on checking manually via the outpuot temp file. Reviewed By: JonChesterfield, sivachandra Differential Revision: https://reviews.llvm.org/D154519	2023-07-05 18:31:58 -05:00
Joseph Huber	62f57bc9b0	[libc] Add other RPC callback methods to the RPC server This patch adds the other two methods to the server so the external users can use the interface through the obfuscated interface. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154224	2023-06-30 11:29:37 -05:00
Joseph Huber	667c10353e	[libc] Fix the implementation of exit on the GPU The RPC calls all have delays associated with them. Currently the `exit` function does an async send and immediately exits the GPU. This can have the effect that the RPC server never sees the exit call and we continue. This patch changes that to first sync with the server before continuing to perform its exit. There is still a hazard here, where the kernel can complete before the RPC call reads back its response, but this is simply multi-threaded hazards. This change ensures that the server will always exit some time after the GPU exits. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154112	2023-06-29 13:22:23 -05:00
Joseph Huber	31c154881c	[libc] Allow the RPC client to be initialized via a H2D memcpy The RPC client must be initialized to set a pointer to the underlying buffer. This is currently done with the `reset` method which may not be ideal for the use-case. We want runtimes to be able to initialize this without needing to call a kernel. Recent changes allowed the `Client` type to be trivially copyable. That means we can create a client on the server side and then copy it over. To that end we take the existing externally visible symbol and initialize it to the client's pointer. Therefore we can look up the symbol and copy it over once loaded. No test currently, I tested with a demo OpenMP application but couldn't think of how to put that in-tree. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D153633	2023-06-26 10:41:32 -05:00

1 2

51 Commits