clang-p2996

Author	SHA1	Message	Date
Joseph Huber	9553e156cb	[libc] Allocate fine-grained memory for the RPC host symbol Summary: This pointer has been causing issues. Allocating and reading from coarse memory on the CPU is not guaranteed and varies depending on the kernel version and support. Previously we attempted to pin the memory but this caused unexpected failures. This should be a legal operation and work around the problem as fine-grained memory should be always legal to write to by both sides.	2023-12-01 13:47:33 -06:00
Joseph Huber	8c1d476db0	Revert "[libc] Explicitly pin memory for the client symbol lookup (#73988 )" Summary: This caused the bots to begin failing. Revert for now to get the bot green. This reverts commit `8bea804923`. This reverts commit `e1395c7bdb`.	2023-12-01 13:04:49 -06:00
Joseph Huber	8bea804923	[libc] Move the pointer to pin off the stack to the heap (#74118 ) Summary: This may be problematic to pin a stack pointer. Allocate it via the OS allocator instead as the documentation suggests. For some reason, if you attempt to free this pointer after the memory region has been unlocked, it will return an invalid pointer.	2023-12-01 12:31:34 -06:00
Joseph Huber	e1395c7bdb	[libc] Explicitly pin memory for the client symbol lookup (#73988 ) Summary: Previously, we determined that coarse grained memory cannot be used in the general case. That removed the buffer used to transfer the memory, however we still had this lookup. Though we do not access the symbol directly, it still conflicts with the agents apparently. Pin this as well. This resolves the problems @lntue was having with the `libc` GPU build.	2023-11-30 15:35:33 -06:00
Joseph Huber	0584e6c166	[libc] Explicitly pin memory for the HSA memory transfer (#73973 ) Summary: This portion of code handles mapping the RPC client memory over to the device. HSA copies need to be between two slices of memory that HSA has allocated. Previously we used coarse-grained memory to act as the host source. However, the support for this varies depending on the kernel and version and should not be relied upon. This patch changes that handling to use the `hsa_amd_memory_lock` API to explicitly pin memory to a location sufficient for a DMA transfer to the GPU.	2023-11-30 13:46:52 -06:00
Joseph Huber	bf02c84cb8	[libc] Use file lock to join newline on RPC puts call (#73373 ) Summary: The puts call appends a newline. With multiple threads, this can be done out of order such that another thread puts something before we finish appending the newline. Add a flockfile and funlockfile to ensure that the whole string is printed before another string can appear.	2023-11-27 08:41:15 -06:00
Joseph Huber	8341a40ec1	[libc] Update the AMDGPU implementation to use code object 5 (#72580 ) Summary: This patch includes the necessary changes to make the `libc` tests running on AMD GPUs run using the newer code object version. The 'code object version' is AMD's internal ABI for making kernel calls. The move from 4 to 5 changed how we handle arguments for builtins such as obtaining the grid size or setting up the size of the private stack. Fixes: https://github.com/llvm/llvm-project/issues/72517	2023-11-21 07:14:10 -06:00
Joseph Huber	dc30fa6aca	[libc][fix] Call GPU destructors in the correct order Summary: I was mistakenly iterating the list backwards. Regular semantics puts both arrays in priority order but the destructors are called backwards.	2023-11-09 09:22:41 -06:00
Jon Chesterfield	f0e100a05a	[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987 ) If you build with dynamic_hsa, the symbol is known and compilation succeeds. If you then run with a slightly older libhsa, this argument is not recognised and an error returned. I'd rather the program runs with a misleading omp wtime than refuses to run at all.	2023-11-01 22:43:34 +00:00
Joseph Huber	9e390a1408	[libc][Obvious] Fix missing semicolon in AMDGPU loader implementation Summary: Title	2023-10-30 14:58:46 -05:00
Jon Chesterfield	896749aa0d	[amdgpu][openmp] Avoiding writing to packet header twice (#70695 ) I think it follows from the HSA spec that a write to the first byte is deemed significant to the GPU in which case writing to the second short and reading back from it later would be safe. However, the examples for this all involve an atomic write to the first 32 bits and it seems a credible risk that the occasional CI errors abound invalid packets have as their root cause that the firmware notices the early write to packet->setup and treats that as a sign that the packet is ready to go. That was overly-paranoid, however in passing noticed the code in libc is genuinely invalid. The memset writes a zero to the header byte, changing it from type_invalid (1) to type_vendor (0), at which point the GPU is free to read the 64 byte packet and interpret it as a vendor packet, which is probably why libc CI periodically errors about invalid packets. Also a drive by change to do the atomic store on a uint32_t consistently. I'm not sure offhand what __atomic_store_n on a uint16_t* and an int resolves to, seems better to be unambiguous there.	2023-10-30 18:35:52 +00:00
Hans Wennborg	e2fc68c3db	Typos: 'maxium', 'minium'	2023-10-23 10:42:28 +02:00
Joseph Huber	a39215768b	[libc] Rework the 'fgets' implementation on the GPU (#69635 ) Summary: The `fgets` function as implemented is not functional currently when called with multiple threads. This is because we rely on reapeatedly polling the character to detect EOF. This doesn't work when there are multiple threads that may with to poll the characters. this patch pulls out the logic into a standalone RPC call to handle this in a single operation such that calling it from multiple threads functions as expected. It also makes it less slow because we no longer make N RPC calls for N characters.	2023-10-19 17:00:01 -04:00
alfredfo	f350532099	[libc] Fix accidental LIBC_NAMESPACE_clock_freq (#69620 ) See-also: https://github.com/llvm/llvm-project/pull/69548	2023-10-19 19:39:02 +02:00
Joseph Huber	ddc30ff802	[libc] Implement the 'ungetc' function on the GPU (#69248 ) Summary: This function follows closely with the pattern of all the other functions. That is, making a new opcode and forwarding the call to the host. However, this also required modifying the test somewhat. It seems that not all `libc` implementations follow the same error rules as are tested here, and it is not explicit in the standard, so we simply disable these EOF checks when targeting the GPU.	2023-10-17 13:02:31 -05:00
Joseph Huber	1a5d3b6cda	[libc] Scan the ports more fairly in the RPC server (#66680 ) Summary: Currently, we use the RPC server to respond to different ports which each contain a request from some client thread wishing to do work on the server. This scan starts at zero and continues until its checked all ports at which point it resets. If we find an active port, we service it and then restart the search. This is bad for two reasons. First, it means that we will always bias the lower ports. If a thread grabs a high port it will be stuck for a very long time until all the other work is done. Second, it means that the `handle_server` function can technically run indefinitely as long as the client is always pushing new work. Because the OpenMP implementation uses the user thread to service the kernel, this means that it could be stalled with another asyncrhonous device's kernels. This patch addresses this by making the server restart at the next port over. This means we will always do a full scan of the ports before quitting.	2023-09-26 16:09:48 -05:00
Joseph Huber	2b7227db1e	[libc] Fix RPC server global after mass replace of __llvm_libc Summary: This variable needs a reserved name starting with `__`. It was mistakenly changed with a mass replace. It happened to work because the tests still picked up the associated symbol, but it just became a bad name because it's not reserved anymore.	2023-09-26 14:28:48 -05:00
Joseph Huber	7ac8e26fc7	[libc] Implement `fseek`, `fflush`, and `ftell` on the GPU (#67160 ) Summary: This patch adds the necessary entrypoints to handle the `fseek`, `fflush`, and `ftell` functions. These are all very straightfoward, we simply make RPC calls to the associated function on the other end. Implementing it this way allows us to more or less borrow the state of the stream from the server as we intentionally maintain no internal state on the GPU device. However, this does not implement the `errno` functinality so that must be ignored.	2023-09-26 09:46:46 -05:00
Guillaume Chatelet	b6bc9d72f6	[libc] Mass replace enclosing namespace (#67032 ) This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079	2023-09-26 11:45:04 +02:00
Joseph Huber	791b279924	[libc] Change the `puts` implementation on the GPU (#67189 ) Summary: Normally, the implementation of `puts` simply writes a second newline charcter after printing the first string. However, because the GPU does everything in batches of the SIMT group size, this will end up with very poor output where you get the strings printed and then 1-64 newline characters all in a row. Optimizations like to turn `printf` calls into `puts` so it's a good idea to make this produce the expected output. The least invasive way I could do this was to add a new opcode. It's a little bloated, but it avoids an unneccessary and slow send operation to configure this.	2023-09-25 11:17:22 -05:00
Joseph Huber	f548d19fc8	[libc] Fix and simplify the implementation of 'fread' on the GPU (#66948 ) Summary: Previously, the `fread` operation was wrong in cases when we read less data than was requested. That is, if we tried to read N bytes while the file was in EOF, it would still copy N bytes of garbage. This is fixed by only copying over the sizes we got from locally opening it rather than just using the provided size. Additionally, this patch simplifies the interface. The output functions have special variants for writing to stdout / stderr. This is primarily an optimization for these common cases so we can avoid sending the stream as an argument which has a high delay. Because for input, we already need to start with a `send` to tell the server how much data to read, it costs us nothing to send the file along with it so this is redundant. Re-use the file encoding scheme from the other implementations, the one that stores the stream type in the LSBs of the FILE pointer.	2023-09-21 14:28:06 -05:00
Joseph Huber	e2bc0f9266	[libc][NFC] Remove unused function from the RPC server Summary: I missed removing this now-unused function in the previous patch. Remove it to clean up the interface.	2023-09-21 11:56:48 -05:00
Joseph Huber	59896c168a	[libc] Remove the 'rpc_reset' routine from the RPC implementation (#66700 ) Summary: This patch removes the `rpc_reset` function. This was previously used to initialize the RPC client on the device by setting up the pointers to communicate with the server. The purpose of this was to make it easier to initialize the device for testing. However, this prevented us from enforcing an invariant that the buffers are all read-only from the client side. The expected way to initialize the server is now to copy it from the host runtime. This will allow us to maintain that the RPC client is in the constant address space on the GPU, potentially through inference, and improving caching behaviour.	2023-09-21 11:07:09 -05:00
Joseph Huber	a1be5d69df	[libc] Implement more input functions on the GPU (#66288 ) Summary: This patch implements the `fgets`, `getc`, `fgetc`, and `getchar` functions on the GPU. Their implementations are straightforward enough. One thing worth noting is that the implementation of `fgets` will be extremely slow due to the high latency to read a single char. A faster solution would be to make a new RPC call to call `fgets` (due to the special rule that newline or null breaks the stream). But this is left out because performance isn't the primary concern here.	2023-09-14 15:39:29 -05:00
Joseph Huber	4792ae5cd5	[libc] Fix building the RPC server with LIBC_NAMESPACE (#65642 ) A recent patch required the implementation to define `LIBC_NAMESPACE`. For GPU offloading we provide a static library whose internal implementation relies on the `libc` headers. This is a separate library that is constructed during the "bootstrap" phase. This patch moves the definition of the `LIBC_NAMESPACE` CMake variable up so its available during bootstrapping and adds it to the definition of the RPC server.	2023-09-07 12:47:36 -05:00
Joseph Huber	701e6f7630	[libc][fix] Fix buffer overrun in initialization of GPU return value Summary: The HSA API explicitly states that the size is a count of uint32_t's not a byte count. This was erroneously being used as a simple memcpy, causing some weird behaviour. Fix this by correctly passing `1` to initialize a single integer to zero.	2023-09-02 17:59:01 -05:00
Joseph Huber	07102a1194	[libc] Implement the 'abort' function on the GPU This function implements the `abort` function on the GPU. The implementation here closely mirros the `exit` call where we first synchornize with the RPC server to make sure it's listening and then we exit on the GPU. I was unsure if this should be a simple `__builtin_assert` on the GPU. I elected to go with an RPC approach to make this a more "true" `abort` call. That is, it should invoke some signal handlers and exit with the proper code according to the implemented C library on the server. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159210	2023-08-31 08:40:15 -05:00
Joseph Huber	7fd9f0f4e0	[libc] Remove `MAX_LANE_SIZE` definition from the RPC server This `MAX_LANE_SIZE` was a hack from the days when we used a single instance of the server and had some GPU state handle it. Now that we have everything templated this really shouldn't be used. This patch removes its use and replaces it with template arguments. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D158633	2023-08-23 12:09:30 -05:00
Joseph Huber	334bbc0d67	[libc] Add support for the 'fread' function on the GPU This patch adds support for `fread` on the GPU via the RPC mechanism. Here we simply pass the size of the read to the server and then copy it back to the client via the RPC channel. This should allow us to do the basic operations on files now. This will obviously be slow for large sizes due ot the number of RPC calls involved, this could be optimized further by having a special RPC call that can initiate a memcpy between the two pointers. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155121	2023-07-26 13:51:35 -05:00
Joseph Huber	a42c1f8d97	[libc][Obvious] Fix use of `fwrite` in the RPC server Summary: The RPC server used the size field which meant we didn't get the correct return value for partial reads. We fix that here.	2023-07-26 11:13:38 -05:00
Joseph Huber	c381a94753	[libc] Remove test RPC opcodes from the exported header This patch does the noisy work of removing the test opcodes from the exported interface to an interface that is only visible in `libc`. The benefit of this is that we both test the exported RPC registration more directly, and we do not need to give this interface to users. I have decided to export any opcode that is not a "core" libc feature as having its MSB set in the opcode. We can think of these as non-libc "extensions". Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154848	2023-07-21 15:36:36 -05:00
Joseph Huber	d3aabeb7b5	[libc] Treat the locks array as a bitfield Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply turn this into a bitfield. I did this manually rather than having a separate type as we need very special handling of the masks used to interact with the locks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155511	2023-07-21 10:49:11 -05:00
Joseph Huber	cf269417b2	[libc] Add an override option for specifying the loader implementation There are some cases when testing we want to override the logic for not building tests if the loader is not present. This allows users to specify an external binary that fulfils the same duties which will force the tests to be built even without meeting the dependencies. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155837	2023-07-20 08:44:58 -05:00
Jon Chesterfield	095e69404a	[libc][amdgpu] Accept deadstripped clock_freq global If the clock_freq symbol isn't used, and is removed, we don't need to abort the loader. Can instead just not set it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D155832	2023-07-20 14:23:08 +01:00
Jon Chesterfield	d483824fc8	[libc][amdgpu] Tolerate different install directories for hsa.h HSA headers might be under a hsa/ directory or might not. This scheme matches the one used by the openmp amdgpu plugin. Reviewed By: jhuber6, jplehr Differential Revision: https://reviews.llvm.org/D155812	2023-07-20 13:43:17 +01:00
Joseph Huber	e537c83975	[libc] Add basic support for calling host functions from the GPU This patch adds the `rpc_host_call` function as a GPU extension. This is exported from the `libc` project to use the RPC interface to call a function pointer via RPC any copying the arguments by-value. The interface can only support a single void pointer argument much like pthreads. The function call here is the bare-bones version of what's required for OpenMP reverse offloading. Full support will require interfacing with the mapping table, nowait support, etc. I decided to test this interface in `libomptarget` as that will be the primary consumer and it would be more difficult to make a test in `libc` due to the testing infrastructure not really having a concept of the "host" as it runs directly on the GPU as if it were a CPU target. Reviewed By: jplehr Differential Revision: https://reviews.llvm.org/D155003	2023-07-19 10:11:46 -05:00
Joseph Huber	979fb95021	Revert "[libc] Treat the locks array as a bitfield" Summary: This caused test failures on the gfx90a buildbot. This works on my gfx1030 and the Nvidia buildbots, so we'll need to investigate what is going wrong here. For now revert it to get the bots green. This reverts commit `05abcc5792`.	2023-07-19 09:27:08 -05:00
Joseph Huber	05abcc5792	[libc] Treat the locks array as a bitfield Currently we keep an internal buffer of device memory that is used to indicate ownership of a port. Since we only use this as a single bit we can simply turn this into a bitfield. I did this manually rather than having a separate type as we need very special handling of the masks used to interact with the locks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D155511	2023-07-18 11:34:21 -05:00
Joseph Huber	a608076726	[libc][Obvious] Check if the state hasn't already been destroyed on shutdown This ensures that if someone calls the `rpc_shutdown` method multiple times it will not segfault and gracefully continue. This was causing problems in the OpenMP usage. This could point to other issues, but for now this is a safe fix. Differential Revision: https://reviews.llvm.org/D155005	2023-07-11 14:35:38 -05:00
Joseph Huber	c850ea1498	[libc] Support fopen / fclose on the GPU This patch adds the necessary support for the fopen and fclose functions to work on the GPU via RPC. I added a new test that enables testing this with the minimal features we have on the GPU. I will update it once we have `fread` and `fwrite` to actually check the outputted strings. For now I just relied on checking manually via the outpuot temp file. Reviewed By: JonChesterfield, sivachandra Differential Revision: https://reviews.llvm.org/D154519	2023-07-05 18:31:58 -05:00
Joseph Huber	5db39796bf	[libc] Support timing information in libc tests This patch adds the necessary support to provide timing information in `libc` tests. This is useful for determining which tests look what amount of time. We also can use this as a test basis for providing more fine-grained timing when implementing things on the GPU. The main difficulty with this is the fact that the AMDGPU fixed frequency clock operates at an unknown frequency. We need to read this on a per-card basis from the driver and then copy it in. NVPTX on the other hand has a fixed clock at a resolution of 1ns. I have also increased the resolution of the print-outs as the majority of these are below a millisecond for me. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154446	2023-07-05 14:27:08 -05:00
Joseph Huber	df52a22b1b	[libc] Make the RPC server target always available This patch makes sure that we always build the RPC server. The proposed used for this is to begin integrating this server implementation into `libomptarget`. That requires that we build this server ahead of time when using a `LLVM_ENABLE_PROJECTS` build. Make a few tweaks to ensure that the GCC compiler which may be used for this build doesn't complain. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154105	2023-06-30 11:30:57 -05:00
Joseph Huber	62f57bc9b0	[libc] Add other RPC callback methods to the RPC server This patch adds the other two methods to the server so the external users can use the interface through the obfuscated interface. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154224	2023-06-30 11:29:37 -05:00
Joseph Huber	667c10353e	[libc] Fix the implementation of exit on the GPU The RPC calls all have delays associated with them. Currently the `exit` function does an async send and immediately exits the GPU. This can have the effect that the RPC server never sees the exit call and we continue. This patch changes that to first sync with the server before continuing to perform its exit. There is still a hazard here, where the kernel can complete before the RPC call reads back its response, but this is simply multi-threaded hazards. This change ensures that the server will always exit some time after the GPU exits. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154112	2023-06-29 13:22:23 -05:00
Joseph Huber	31c154881c	[libc] Allow the RPC client to be initialized via a H2D memcpy The RPC client must be initialized to set a pointer to the underlying buffer. This is currently done with the `reset` method which may not be ideal for the use-case. We want runtimes to be able to initialize this without needing to call a kernel. Recent changes allowed the `Client` type to be trivially copyable. That means we can create a client on the server side and then copy it over. To that end we take the existing externally visible symbol and initialize it to the client's pointer. Therefore we can look up the symbol and copy it over once loaded. No test currently, I tested with a demo OpenMP application but couldn't think of how to put that in-tree. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D153633	2023-06-26 10:41:32 -05:00
Joseph Huber	e0b487bfc0	[libc] Rename and install the RPC server interface This patch prepares the RPC interface to be installed. We place this in the existing `llvm-gpu-none` directory as it will also give us access to the generated `libc` headers for the opcodes. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D153040	2023-06-21 11:26:24 -05:00
Joseph Huber	4272d09196	[libc][NFC] Cleanup the RPC server implementation prior to installing This does some simple cleanup prior to landing the patch to install these. Differential Revision: https://reviews.llvm.org/D153439	2023-06-21 11:14:20 -05:00
Joseph Huber	964a535bfa	[libc] Remove flexible array and replace with a template Currently the implementation of the RPC interface requires a flexible struct. This caused problems when compilling the RPC server with GCC as would be required if trying to export the RPC server interface. This required that we either move to the `x[1]` workaround or make it a template parameter. While just using `x[1]` would be much less noisy, this is technically undefined behavior. For this reason I elected to use templates. The downside to using templates is that the server code must now be able to handle multiple different types at runtime. I was unable to find a good solution that didn't rely on type erasure so I simply branch off of the given value. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D153304	2023-06-20 15:22:37 -05:00
Joseph Huber	490958b9ea	[libc][obvious] Actually return the value from `malloc` for NVPTX Switching to this interface we neglected to actually write the output from the malloc call to the RPC buffer. Fix this so the tests pass again. Differential Revision: https://reviews.llvm.org/D153069	2023-06-15 15:13:11 -05:00
Joseph Huber	dcdfc963d7	[libc] Export GPU extensions to `libc` for external use The GPU port of the LLVM C library needs to export a few extensions to the interface such that users can interface with it. This patch adds the necessary logic to define a GPU extension. Currently, this only exports a `rpc_reset_client` function. This allows us to use the server in D147054 to set up the RPC interface outside of `libc`. Depends on https://reviews.llvm.org/D147054 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152283	2023-06-15 11:02:24 -05:00

1 2

82 Commits