clang-p2996

Author	SHA1	Message	Date
Joseph Huber	4213f4a9ae	[Libomptarget] Fix resizing the buffer of RPC handles Summary: The previous code would potentially make it smaller if a device with a lower ID touched it later. Also we should minimize changes to the state for multi threaded reasons. This just sets up an owned slot for each at initialization time.	2024-04-01 07:29:57 -05:00
Joseph Huber	a1a8bb1d3a	[libc] Change RPC interface to not use device ids (#87087 ) Summary: The current implementation of RPC tied everything to device IDs and forced us to do init / shutdown to manage some global state. This turned out to be a bad idea in situations where we want to track multiple hetergeneous devices that may report the same device ID in the same process. This patch changes the interface to instead create an opaque handle to the internal device and simply allocates it via `new`. The user will then take this device and store it to interface with the attached device. This interface puts the burden of tracking the device identifier to mapped d evices onto the user, but in return heavily simplifies the implementation.	2024-03-29 12:49:16 -05:00
Joseph Huber	ed68aac9f2	[Libomptarget] Move API implementations into GenericPluginTy (#86683 ) Summary: The plan is to remove the entire plugin interface and simply use the `GenericPluginTy` inside of `libomptarget` by statically linking against it. This means that inside of `libomptarget` we will simply do `Plugin.data_alloc` without the dynamically loaded interface. To reduce the amount of code required, this patch simply moves all of the RTL implementation functions inside of the Generic device. Now the `__tgt_rtl_` interface is simply a shallow wrapper that will soon go away. There is some redundancy here, this will be improved later. For now what is important is minimizing the changes to the API.	2024-03-27 14:10:54 -05:00
Joseph Huber	4dc3225248	[Libomptarget] Replace global `PluginTy::get` interface with references (#86595 ) Summary: We have a plugin singleton that implements the Plugin interface. This then spawns separate device and kernels. Previously when these needed to reach into the global singleton they would use the `PluginTy::get` routine to get access to it. In the future we will move away from this as the lifetime of the plugin will be handled by `libomptarget` directly. This patch removes uses of this inside of the plugin implementaion themselves by simply keeping a reference to the plugin inside of the device. The external `__tgt_rtl` functions still use the global method, but will be removed later.	2024-03-26 07:13:59 -05:00
Joseph Huber	2cad43c1ba	[Libomptarget] Factor functions out of 'Plugin' interface (#86528 ) Summary: This patch factors common functions out of the `Plugin` interface prior to its removal in a future patch. This simply temporarily renames it to `PluginTy` so that we could re-use `Plugin::check` internally as this needs to be defined statically per plugin now. We can refactor this later. The future patch will delete `PluginTy` and `PluginTy::get` entirely. This simply tries to minimize a few changes to make it easier to land.	2024-03-25 15:24:39 -05:00
Joseph Huber	9f0321ccf1	[Libomptarget] Make plugins depend explicitly on `intrinsics_gen` Summary: It's possible for the OpenMP offloading plugins to be build before tablegen is run despite the fact that we rely on it. Simply make it depend on it currently.	2024-03-24 15:24:35 -05:00
Joseph Huber	909ea28ac6	[Libomptarget] Specificall add LLVM include dirs in plugins	2024-03-24 14:29:39 -05:00
Joseph Huber	dcbddc2525	[Libomptarget] Unify and simplify plugin CMake (#86191 ) Summary: This patch reworks the CMake handling for building plugins. All this does is pull a lot of shared and common logic into a single helper function. This also simplifies the OMPT libraries from being built separately instead of just added.	2024-03-22 16:13:58 -05:00
dhruvachak	b5d02bbd0d	[OpenMP] Increment kernel args version, used by runtime for detecting dyn_ptr. (#85363 ) A kernel implicit parameter (dyn_ptr) was introduced some time back. This patch increments the kernel args version for a compiler supporting dyn_ptr. The version will be used by the runtime to determine whether the implicit parameter is generated by the compiler. The versioning is required to support use cases where code generated by an older compiler is linked with a newer runtime. If approved, this patch should be backported to release 18.	2024-03-19 16:40:22 -07:00
Joseph Huber	470040bd4d	[Libomptarget][NFC] Remove warning on return value const	2024-03-15 18:50:33 -05:00
Ulrich Weigand	2210c85a66	Reapply [libomptarget] Support BE ELF files in plugins-nextgen (#85246 ) Code in plugins-nextgen reading ELF files is currently hard-coded to assume a 64-bit little-endian ELF format. Unfortunately, this assumption is even embedded in the interface between GlobalHandler and Utils/ELF routines, which use ELF64LE types. To fix this, I've refactored the interface to use generic types, in particular by using (a unique_ptr to) ObjectFile instead of ELF64LEObjectFile, and ELFSymbolRef instead of ELF64LE::Sym. This allows properly templating over multiple ELF format variants inside Utils/ELF; specifically, this patch adds support for 64-bit big-endian ELF files in addition to 64-bit little-endian files.	2024-03-15 18:28:28 +01:00
Ulrich Weigand	4c8714efc5	Revert "[libomptarget] Support BE ELF files in plugins-nextgen (#85246 )" This reverts commit `611c62b30d`.	2024-03-14 18:38:13 +01:00
Ulrich Weigand	611c62b30d	[libomptarget] Support BE ELF files in plugins-nextgen (#85246 ) Code in plugins-nextgen reading ELF files is currently hard-coded to assume a 64-bit little-endian ELF format. Unfortunately, this assumption is even embedded in the interface between GlobalHandler and Utils/ELF routines, which use ELF64LE types. To fix this, I've refactored the interface to use generic types, in particular by using (a unique_ptr to) ObjectFile instead of ELF64LEObjectFile, and ELFSymbolRef instead of ELF64LE::Sym. This allows properly templating over multiple ELF format variants inside Utils/ELF; specifically, this patch adds support for 64-bit big-endian ELF files in addition to 64-bit little-endian files.	2024-03-14 18:19:12 +01:00
Joseph Huber	8a79003307	[libc] Move RPC opcodes include out of the header Summary: This header isn't strictly necessary, and is currently broken because we install these to separate locations.	2024-03-10 14:07:47 -05:00
Ulrich Weigand	fb7cc73975	Revert "[libomptarget] Support BE ELF files in plugins-nextgen (#83976 )" This reverts commit `15b7b3182c`.	2024-03-06 21:37:45 +01:00
Ulrich Weigand	15b7b3182c	[libomptarget] Support BE ELF files in plugins-nextgen (#83976 ) Code in plugins-nextgen reading ELF files is currently hard-coded to assume a 64-bit little-endian ELF format. Unfortunately, this assumption is even embedded in the interface between GlobalHandler and Utils/ELF routines, which use ELF64LE types. To fix this, I've refactored the interface to push all ELF specific types into Utils/ELF. Specifically, this patch removes both the getSymbol and getSymbolAddress routines and replaces them with a single findSymbolInImage, which gets a StringRef identifying the raw object file image as input, and returns a StringRef covering the data addressed by the symbol (address and size) if found, or std::nullopt otherwise. This allows properly templating over multiple ELF format variants inside Utils/ELF; specifically, this patch adds support for 64-bit big-endian ELF files in addition to 64-bit little-endian files.	2024-03-06 20:49:12 +01:00
Daniel Martinez	aa6ebf9be1	Replace some C headers with C++ ones (#82697 ) #81434 Replaced some C headers with C++ ones Co-authored-by: Daniel Martinez <danielmartinez@cock.li>	2024-03-04 01:21:31 -05:00
Joseph Huber	1a2ecbb398	[libc] Remove 'llvm-gpu-none' directory from build (#82816 ) Summary: This directory is leftover from when we handled both AMDGPU and NVPTX in the same build and merged them into a pseudo triple. Now the only thing it contains is the RPC server header. This gets rid of it, but now that it's in the base install directory we should make it clear that it's an LLVM libc header.	2024-02-23 14:11:31 -06:00
Joseph Huber	47b7c91abe	[libc] Rework the GPU build to be a regular target (#81921 ) Summary: This is a massive patch because it reworks the entire build and everything that depends on it. This is not split up because various bots would fail otherwise. I will attempt to describe the necessary changes here. This patch completely reworks how the GPU build is built and targeted. Previously, we used a standard runtimes build and handled both NVPTX and AMDGPU in a single build via multi-targeting. This added a lot of divergence in the build system and prevented us from doing various things like building for the CPU / GPU at the same time, or exporting the startup libraries or running tests without a full rebuild. The new appraoch is to handle the GPU builds as strict cross-compiling runtimes. The first step required https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC` target to build for the GPU without touching the other targets. This means that the GPU uses all the same handling as the other builds in `libc`. The new expected way to build the GPU libc is with `LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`. The second step was reworking how we generated the embedded GPU library by moving it into the library install step. Where we previously had one `libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This patch includes the necessary clang / OpenMP changes to make that not break the bots when this lands. We unfortunately still require that the NVPTX target has an `internal` target for tests. This is because the NVPTX target needs to do LTO for the provided version (The offloading toolchain can handle it) but cannot use it for the native toolchain which is used for making tests. This approach is vastly superior in every way, allowing us to treat the GPU as a standard cross-compiling target. We can now install the GPU utilities to do things like use the offload tests and other fun things. Some certain utilities need to be built with `--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine workaround as we will always assume that the GPU `libc` is a cross-build with a functioning host. Depends on https://github.com/llvm/llvm-project/pull/81557	2024-02-22 15:29:29 -06:00
Joseph Huber	0ac4438560	[Libomptarget] Remove unused 'SupportsEmptyImages' API function (#80316 ) Summary: This function is always false in the current implementation and is not even considered required. Just remove it and if someone needs it in the future they can add it back in. This is done to simplify the interface prior to other changes	2024-02-05 10:00:09 -06:00
Joseph Huber	621bafd5c1	[Libomptarget] Move target table handling out of the plugins (#77150 ) Summary: This patch removes the bulk of the handling of the `__tgt_offload_entries` out of the plugins itself. The reason for this is because the plugins themselves should not be handling this implementation detail of the OpenMP runtime. Instead, we expose two new plugin API functions to get the points to a device pointer for a global as well as a kernel type. This required introducing a new type to represent a binary image that has been loaded on a device. We can then use this to load the addresses as needed. The creation of the mapping table is then handled just in `libomptarget` where we simply look up each address individually. This should allow us to expose these operations more generically when we provide a separate API.	2024-01-22 11:06:47 -06:00
carlobertolli	ae99966a27	[OpenMP] Enable automatic unified shared memory on MI300A. (#77512 ) This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch also introduces an environment variable OMPX_APU_MAPS that, if set, triggers automatic zero-copy also on non APU GPUs (e.g., on discrete GPUs). This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-22 10:30:22 -06:00
Joseph Huber	37c1a5e3f5	[Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828 ) Summary: The constructors and destructors look up a symbol in the ELF quickly to determine if they need to be run on the GPU. This allows us to avoid the very slow actions required to do the slower lookup using the vendor API. One problem occurs with how we handle the lifetime of these images. Right now there is no invariant to specify the lifetime of the underlying binary image that is loaded. In the typical case, this comes from the binary itself in the `.llvm.offloading` section, meaning that the lifetime of the binary should match the executable itself. This would work fine, if it weren't for the fact that the plugin is loaded via `dlopen` and can have a teardown order out of sync with the main executable. This was likely what was occuring when this failed on some systems but not others. A potential solution would be to simply copy images into memory so the runtime does not rely on external references. Another would be to manually zero these out after initialization as to prevent this mistake from happening accidentally. The former has the benefit of making some checks easier, and allowing for constant initialization be done on the ELF itself (normally we can't do this because writing to a constant section, e.g. .llvm.offloading is a segfault.). The downside would be the extra time required to copy the image in bulk (Although we are likely doing this in the vendor runtimes as well). This patch went with a quick solution to simply set a boolean value at initialization time if we need to call destructors. Fixes: https://github.com/llvm/llvm-project/issues/77798	2024-01-11 15:00:53 -06:00
Joseph Huber	e203968e41	[Libomptarget] Do not abort on failed plugin init (#77623 ) Summary: The current code logic is supposed to skip plugins that aren't found or could not be loaded. However, the plugic ontained a call to `abort` if it failed, which prevented us from continuing if initilalization the plugin failed (such as if `dlopen` failed for the dyanmic plugins).	2024-01-10 11:42:04 -06:00
Joseph Huber	d03b8c3a04	[Libomptarget][NFC] Format in-line comments consistently (#77530 ) Summary: The LLVM style uses /Foo=/ when indicating the name of a constant. See https://llvm.org/docs/CodingStandards.html#comment-formatting. This is useful for consistency, as well as because `clang-format` understands this syntax and formats it more cleanly. Do a bulk update of this syntax.	2024-01-10 10:10:08 -06:00
carlobertolli	ce4144406c	Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371 ) Reverts llvm/llvm-project#75999 lit test is failing.	2024-01-08 14:38:29 -06:00
carlobertolli	22a73e7c46	[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999 ) …on (zero-copy) on MI300A. This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-08 14:17:28 -06:00
Joseph Huber	bda562519b	[Libomptarget][NFC] Fix unhandled allocator enum value	2024-01-08 10:17:05 -06:00
Joseph Huber	fb32977ac7	[Libomptarget] Fix RPC-based malloc on NVPTX (#72440 ) Summary: The device allocator on NVPTX architectures is enqueued to a stream that the kernel is potentially executing on. This can lead to deadlocks as the kernel will not proceed until the allocation is complete and the allocation will not proceed until the kernel is complete. CUDA 11.2 introduced async allocations that we can manually place on separate streams to combat this. This patch makes a new allocation type that's guaranteed to be non-blocking so it will actually make progress, only Nvidia needs to care about this as the others are not blocking in this way by default. I had originally tried to make the `alloc` and `free` methods take a `__tgt_async_info`. However, I observed that with the large volume of streams being created by a parallel test it quickly locked up the system as presumably too many streams were being created. This implementation not just creates a new stream and immediately destroys it. This obviously isn't very fast, but it at least gets the cases to stop deadlocking for now.	2024-01-02 16:53:53 -06:00
Joseph Huber	64f0681e97	[Libomptarget] Rework image checking further (#76120 ) Summary: In the future, we may have more checks for different kinds of inputs, e.g. SPIR-V. This patch simply reworks the handling to be more generic and do the magic detection up-front. The checks inside the routines are now asserts so we don't spend time checking this stuff over and over again. This patch also tweaked the bitcode check. I used a different function to get the Lazy-IR module now, as it returns the raw expected value rather than the SM diganostic. No functionality change intended.	2023-12-29 15:14:39 -06:00
Joseph Huber	f324584ae3	[Libomptarget][NFCI] Remove caching of created ELF files (#76080 ) Summary: We currently keep a cache of created ELF files from the relevant images. This shouldn't be necessary as the entire ELF interface is generally trivially constructable and extremely cheap. The cost of constructing one of these objects is simply a size check and writing a pointer to the underlying data. Given that, keeping a cache of these images should not be necessary overall.	2023-12-20 17:13:41 -06:00
Joseph Huber	e4f4022b70	[Libomptarget][NFC] Fix linting warnings in the plugins Summary: Fix some linting warnings present in the plugins.	2023-12-20 10:07:34 -06:00
Joseph Huber	ac029e02a9	[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720 ) Summary: This patch reorganizes a lot of the code used to check for compatibility with the current environment. The main bulk of this patch involves moving from using a separate `__tgt_image_info` struct (which just contains a string for the architecture) to instead simply checking this information from the ELF directly. Checking information in the ELF is very inexpensive as creating an ELF file is simply writing a base pointer. The main desire to do this was to reorganize everything into the ELF image. We can then do the majority of these checks without first initializing the plugin. A future patch will move the first ELF checks to happen without initializing the plugin so we no longer need to initialize and plugins that don't have needed images. This patch also adds a lot more sanity checks for whether or not the ELF is actually compatible. Such as if the images have a valid ABI, 64-bit width, executable, etc.	2023-12-19 20:01:31 -06:00
Shilei Tian	3768039913	[OpenMP] Directly use user's grid and block size in kernel language mode (#70612 ) In kernel language mode, use user's grid and blocks size directly. No validity check, which means if user's values are too large, the launch will fail, similar to what CUDA and HIP are doing right now.	2023-12-18 12:26:18 -05:00
Joseph Huber	913622d012	[Libomptarget] Remove remaining global constructors in plugins (#75814 ) Summary: This patch fixes the remaining global constructor in the plguins after addressing the ones in the JIT interface. This struct was mistakenly using global constructors as not all the members were being initialized properly. This was almost certainly being optimized out because it's trivial, but would still be present in debug builds and prevented us from compiling with `-Werror=global-constructors`. We will want to do that once offloading is moved to a runtimes only build.	2023-12-18 11:01:02 -06:00
Joseph Huber	1580877555	[Libomptarget] Remove bitcode image map used for JIT processing (#75672 ) Summary: Libomptarget supports JIT by treating an LLVM-IR file as a regular input image. The handling here used a global map to keep track of triples once it was parsed. This was done to same time, however this created a global constructor as well as an extra mutex to handle it. This patch removes the use of this map. Instead, we simply use the file magic to perform a quick check if the input image is valid bitcode. If not, we then create a lazy module. This should roughly equivalent to the old handling that create an IR symbol table. Here we can prevent the module from materializing everything but the single triple metadata we read in later.	2023-12-18 09:28:06 -06:00
Kazu Hirata	b8f89b84bc	Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-16 15:02:17 -08:00
Joseph Huber	0ab663d202	[Libomptarget] Move ELF symbol extraction to the ELF utility (#74717 ) Summary: We shouldn't have the format specific ELF handling in the generic plugin manager. This patch moves that out of the implementation and into the ELF utilities. This patch changes the SHT_NOBITS case to be a hard error, which should be correct as the existing use already seemed to return an error if the result was a null pointer. This also uses a `const_cast`, which is bad practice. However, rebuilding the `constness` of all of this would be a massive overhaul, and this matches the previous behaviour (We would take a pointer to the image that is most likely read-only in the ELF).	2023-12-14 11:04:13 -06:00
Johannes Doerfert	12cbccc312	[OpenMP] Add extra flags to libomptarget and plugin builds (#74520 )	2023-12-11 10:41:50 -08:00
Johannes Doerfert	0ace6ee73a	[OpenMP][FIX] Ensure we do not read outside the device image (#74669 ) Before we expected all symbols in the device image to be backed up with data that we could read. However, uninitialized values are not. We now check for this case and avoid reading random memory. This also replaces the correct readGlobalFromImage call with a isSymbolInImage check after https://github.com/llvm/llvm-project/pull/74550 picked the wrong one. Fixes: https://github.com/llvm/llvm-project/issues/74582	2023-12-06 14:57:57 -08:00
Joseph Huber	6f3bd3a2f6	[Libomptarget] Add a utility function for checking existence of symbols (#74550 ) Summary: There are now a few cases that check if a symbol is present before continuing, effectively making them optional features if present in the image. This was done in at least three locations and required an ugly operation to consume the error. This patch makes a utility function to handle that instead.	2023-12-06 07:41:27 -06:00
Johannes Doerfert	68db7aef74	[OpenMP] Reorganize the initialization of `PluginAdaptorTy` (#74397 ) This introduces checked errors into the creation and initialization of `PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the user if the initialization failed. The new organization avoids the "initOnce" stuff but we still do not eagerly initialize the plugin devices (I think we should merge `PluginAdaptorTy::initDevices` into `PluginAdaptorTy::init`)	2023-12-05 16:04:01 -08:00
Johannes Doerfert	9f87509b19	[OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532 ) This fixes two bugs and adds a test for them: - A shared library with declare target functions but without kernels should not error out due to missing globals. - Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of indirect declare targets.	2023-12-05 15:25:10 -08:00
Johannes Doerfert	5fe741f08e	[OpenMP] Separate Requirements into a standalone header (#74126 ) This is not completely NFC since we now check all 4 requirements and the test is checking the good and the bad case for combining flags.	2023-12-01 14:47:00 -08:00
dhruvachak	ca2d79f9ca	[OpenMP] Add an INFO message for data transfer of kernel launch env. (#74030 )	2023-12-01 10:58:23 -08:00
Johannes Doerfert	148dec9fa4	[OpenMP][NFC] Separate Envar (environment variable) handling (#73994 )	2023-11-30 15:23:34 -08:00
Johannes Doerfert	2e7f47d4a8	[OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868 )	2023-11-29 16:04:19 -08:00
Johannes Doerfert	fae233c63f	[OpenMP] Avoid initializing the KernelLaunchEnvironment if possible (#73864 ) If we don't have a team reduction we don't need a kernel launch environment (for now). In that case we can avoid the cost.	2023-11-29 14:49:13 -08:00
Johannes Doerfert	e2299e8d9d	[OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718 )	2023-11-29 08:29:41 -08:00
Johannes Doerfert	db96a9c3b7	[OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725 ) For historic reasons we had it setup that there was ` plugin-nextgen/common/PluginInterface/<sources + headers>` which is not what we do anywhere else. Now it looks like the rest: ``` plugin-nextgen/common/include/<headers> plugin-nextgen/common/src/<sources> ``` As part of this, `dlwrap.h` was moved into common/include (as `DLWrap.h`) since it is exclusively used by the plugins.	2023-11-29 07:57:01 -08:00

1 2 3 4

174 Commits