clang-p2996

Author	SHA1	Message	Date
Joseph Huber	621bafd5c1	[Libomptarget] Move target table handling out of the plugins (#77150 ) Summary: This patch removes the bulk of the handling of the `__tgt_offload_entries` out of the plugins itself. The reason for this is because the plugins themselves should not be handling this implementation detail of the OpenMP runtime. Instead, we expose two new plugin API functions to get the points to a device pointer for a global as well as a kernel type. This required introducing a new type to represent a binary image that has been loaded on a device. We can then use this to load the addresses as needed. The creation of the mapping table is then handled just in `libomptarget` where we simply look up each address individually. This should allow us to expose these operations more generically when we provide a separate API.	2024-01-22 11:06:47 -06:00
carlobertolli	ae99966a27	[OpenMP] Enable automatic unified shared memory on MI300A. (#77512 ) This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch also introduces an environment variable OMPX_APU_MAPS that, if set, triggers automatic zero-copy also on non APU GPUs (e.g., on discrete GPUs). This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-22 10:30:22 -06:00
Joseph Huber	d03b8c3a04	[Libomptarget][NFC] Format in-line comments consistently (#77530 ) Summary: The LLVM style uses /Foo=/ when indicating the name of a constant. See https://llvm.org/docs/CodingStandards.html#comment-formatting. This is useful for consistency, as well as because `clang-format` understands this syntax and formats it more cleanly. Do a bulk update of this syntax.	2024-01-10 10:10:08 -06:00
Joseph Huber	0fe86f9c51	[Libomptarget] Remove extra cache for offloading entries (#77012 ) Summary: The offloading entries right now are assumed to be baked into the binary itself, and thus always valid whenever the library is executing. This means that we don't need to copy them to additional storage and can instead simply pass around references to it. This is not likely to change in the expected operation of the OpenMP library. Additionally, the indirection for the offload entry struct is simply two pointers, so moving it by value is trivial.	2024-01-08 16:49:33 -06:00
carlobertolli	ce4144406c	Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371 ) Reverts llvm/llvm-project#75999 lit test is failing.	2024-01-08 14:38:29 -06:00
carlobertolli	22a73e7c46	[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999 ) …on (zero-copy) on MI300A. This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-08 14:17:28 -06:00
Joseph Huber	fb32977ac7	[Libomptarget] Fix RPC-based malloc on NVPTX (#72440 ) Summary: The device allocator on NVPTX architectures is enqueued to a stream that the kernel is potentially executing on. This can lead to deadlocks as the kernel will not proceed until the allocation is complete and the allocation will not proceed until the kernel is complete. CUDA 11.2 introduced async allocations that we can manually place on separate streams to combat this. This patch makes a new allocation type that's guaranteed to be non-blocking so it will actually make progress, only Nvidia needs to care about this as the others are not blocking in this way by default. I had originally tried to make the `alloc` and `free` methods take a `__tgt_async_info`. However, I observed that with the large volume of streams being created by a parallel test it quickly locked up the system as presumably too many streams were being created. This implementation not just creates a new stream and immediately destroys it. This obviously isn't very fast, but it at least gets the cases to stop deadlocking for now.	2024-01-02 16:53:53 -06:00
Felipe Cabarcas	9b6ea5e8f8	[OpenMP] Improve omp offload profiler (#68016 ) Summary: Adding information to the LIBOMPTARGET profiler runtime kernel and API calls. Key changes: * Adding information to runtime calls for better understanding of how the application is executing. For example teams requested by the user, size of memory transfers. * Profile timer was changed from 'us' to 'ns', since 'us' was too coarse-grain to register some important details like key kernel duration * Removed non API or Runtime calls, to reduce complexity of profile for application developers. --------- Co-authored-by: Felipe Cabarcas <cabarcas@leia.crpl.cis.udel.edu> Co-authored-by: fel-cab <fel-cab@github.com>	2023-12-22 14:58:11 -05:00
Joseph Huber	ac029e02a9	[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720 ) Summary: This patch reorganizes a lot of the code used to check for compatibility with the current environment. The main bulk of this patch involves moving from using a separate `__tgt_image_info` struct (which just contains a string for the architecture) to instead simply checking this information from the ELF directly. Checking information in the ELF is very inexpensive as creating an ELF file is simply writing a base pointer. The main desire to do this was to reorganize everything into the ELF image. We can then do the majority of these checks without first initializing the plugin. A future patch will move the first ELF checks to happen without initializing the plugin so we no longer need to initialize and plugins that don't have needed images. This patch also adds a lot more sanity checks for whether or not the ELF is actually compatible. Such as if the images have a valid ABI, 64-bit width, executable, etc.	2023-12-19 20:01:31 -06:00
dhruvachak	e4de6a602f	[OpenMP] [OMPT] A pointer to HostOpId should be passed in EMI callbacks. (#75574 ) With this change, TargetRegionOpId is no more used and hence deleted.	2023-12-15 12:07:42 -08:00
Johannes Doerfert	fe6f137e48	[OpenMP][NFC] Move mapping related code into OpenMP/Mapping.cpp (#75239 ) DeviceTy provides an abstraction for "middle-level" operations that can be done with a offload device. Mapping was tied into it but is not strictly necessary. Other languages do not track mapping, and even OpenMP can be used completely without mapping. This simply moves the relevant code into the OpenMP/Mapping.cpp as part of a new class MappingInfoTy. Each device still has one, but it does not clutter the device.cpp anymore.	2023-12-12 12:49:46 -08:00
Johannes Doerfert	5dd1fc7008	[OpenMP][NFC] Improve profiling for the offload runtime	2023-12-11 17:30:35 -08:00
Johannes Doerfert	2ada7bb68b	[OpenMP][NFCI] Remove effectively unused mutex The only use was already guarded by a different lock in the caller of loadBinary.	2023-12-11 17:30:35 -08:00
Johannes Doerfert	cee6918d87	[OpenMP][NFC] Move api.cpp to OpenMP/API.cpp	2023-12-11 17:30:34 -08:00
Johannes Doerfert	13b8826508	Revert " [OpenMP][NFC] Remove `DelayedBinDesc`" (#74679 ) Reverts llvm/llvm-project#74360 As I wrote in the analysis of #74360: Since `bc4e0c048a` we will not add PluginAdaptors into the container of all plugin adaptors before the plugin is not ready. The error is thereby gone. When and old HSA loads other libraries they can call register_image but that will simply not register the image with the plugin we are currently initializing. That seems like reasonable behavior, thought it is good to keep in mind if we ever want a kernel library (@jhuber6 @mjklemm). We can still have a standalone kernel library though or load it late after all plugins are setup (which seems reasonable). I did not expect one our tests actually doing exactly what this will not allow anymore, at least when you use rocm <5.5.0. Need to figure out if we want this behavior (for rocm <5.5.0).	2023-12-06 16:04:23 -08:00
Johannes Doerfert	d552ce2638	[OpenMP][NFC] Remove `DelayedBinDesc` (#74360 ) Remove `DelayedBinDesc` as it is not necessary since `bc4e0c048a`. See https://github.com/llvm/llvm-project/pull/74360#issuecomment-1843603736 for details.	2023-12-06 14:48:23 -08:00
Johannes Doerfert	68db7aef74	[OpenMP] Reorganize the initialization of `PluginAdaptorTy` (#74397 ) This introduces checked errors into the creation and initialization of `PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the user if the initialization failed. The new organization avoids the "initOnce" stuff but we still do not eagerly initialize the plugin devices (I think we should merge `PluginAdaptorTy::initDevices` into `PluginAdaptorTy::init`)	2023-12-05 16:04:01 -08:00
Johannes Doerfert	9f87509b19	[OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532 ) This fixes two bugs and adds a test for them: - A shared library with declare target functions but without kernels should not error out due to missing globals. - Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of indirect declare targets.	2023-12-05 15:25:10 -08:00
Johannes Doerfert	66784dcb3b	[OpenMP] Ensure `Devices` is accessed exlusively (#74374 ) We accessed the `Devices` container most of the time while holding the RTLsMtx, but not always. Sometimes we used the mutex for the size query, but then accessed Devices again unguarded. From now we properly encapsulate the container in a ProtectedObj which ensures exclusive accesses. We also hide the "isReady" part in the `getDevice` accessor and use an `llvm::Expected` to allow to return errors.	2023-12-04 17:10:37 -08:00
Johannes Doerfert	27f17837bb	[OpenMP][NFC] Remove PluginAdaptorManagerTy	2023-12-01 15:23:17 -08:00
Johannes Doerfert	7169c45efa	[OpenMP][NFCI] Organize offload entry logic This moves the offload entry logic into classes and provides convenient accessors. No functional change intended but we can now print all offload entries (and later look them up), tested via `OMPTARGET_DUMP_OFFLOAD_ENTRIES=<device_no>`.	2023-12-01 15:10:52 -08:00
Johannes Doerfert	b091a887e0	[OpenMP][NFC] Extract device image handling into a class/header (#74129 )	2023-12-01 14:59:12 -08:00
Johannes Doerfert	5fe741f08e	[OpenMP] Separate Requirements into a standalone header (#74126 ) This is not completely NFC since we now check all 4 requirements and the test is checking the good and the bad case for combining flags.	2023-12-01 14:47:00 -08:00
Johannes Doerfert	3530428b8f	[OpenMP][NFC] Extract OffloadPolicy into a helper class (#74029 ) OpenMP allows 3 different offload policies, handling of which we want to encapsulate.	2023-12-01 10:55:18 -08:00
Johannes Doerfert	bc4e0c048a	[OpenMP][NFC] Modernize the plugin handling (#74034 ) This basically moves code around again, but this time to provide cleaner interfaces and remove duplication. PluginAdaptorManagerTy is almost all gone after this.	2023-12-01 10:36:59 -08:00
Johannes Doerfert	51fc8544c7	[OpenMP][NFC] Move mapping related logic into Mapping.h (#74009 )	2023-11-30 17:08:41 -08:00
Johannes Doerfert	1035cc7029	[OpenMP][NFC] Encapsulate Devices.size() (#74010 )	2023-11-30 16:44:47 -08:00
Johannes Doerfert	b8b2a279d0	[OpenMP][NFC] Encapsulate profiling logic (#74003 ) This simply puts the profiling logic into the `Profiler` class and allows non-RAII profiling via `beginSection` and `endSection`.	2023-11-30 15:52:02 -08:00
Johannes Doerfert	148dec9fa4	[OpenMP][NFC] Separate Envar (environment variable) handling (#73994 )	2023-11-30 15:23:34 -08:00
Johannes Doerfert	b80b5f180b	[OpenMP] Replace copy and paste code with instantiation (#73991 )	2023-11-30 14:16:34 -08:00
Johannes Doerfert	fce4c0acd6	[OpenMP] Start organizing PluginManager, PluginAdaptors (#73875 )	2023-11-30 13:47:47 -08:00
Johannes Doerfert	2e7f47d4a8	[OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868 )	2023-11-29 16:04:19 -08:00
Johannes Doerfert	40422bf150	[OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (#73817 ) While this does not really encapsulate the mapping code, it at least moves most of the declarations out of the way.	2023-11-29 10:29:54 -08:00
Johannes Doerfert	8391bb3f5c	[OpenMP][NFC] Move more declarations out of private.h (#73823 )	2023-11-29 09:22:03 -08:00
Johannes Doerfert	b465f94b7c	[OpenMP][NFC] Put ExponentialBackoff in a Utils header (#73816 ) "private.h" will go.	2023-11-29 09:10:29 -08:00
Johannes Doerfert	fd2d0bf90e	[OpenMP][NFC] Replace unnecessary typedefs (#73815 )	2023-11-29 08:40:41 -08:00
Johannes Doerfert	e2299e8d9d	[OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718 )	2023-11-29 08:29:41 -08:00
Johannes Doerfert	db96a9c3b7	[OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725 ) For historic reasons we had it setup that there was ` plugin-nextgen/common/PluginInterface/<sources + headers>` which is not what we do anywhere else. Now it looks like the rest: ``` plugin-nextgen/common/include/<headers> plugin-nextgen/common/src/<sources> ``` As part of this, `dlwrap.h` was moved into common/include (as `DLWrap.h`) since it is exclusively used by the plugins.	2023-11-29 07:57:01 -08:00
Johannes Doerfert	2cfe7b1b66	[OpenMP][NFC] Extract timescope profile support into its own header (#73727 )	2023-11-29 07:54:35 -08:00
Johannes Doerfert	d1057014a1	[OpenMP][NFC] Create an "OpenMP" folder in the include folder (#73713 ) Not everything in libomptarget (include) is "OpenMP", but some things most certainly are. This commit moves some code around to start making this distinction without the intention to change functionality.	2023-11-28 15:41:31 -08:00
Johannes Doerfert	7233e42dff	[OpenMP][NFC] Move Environment.h and SourceInfo.h into "Shared" folder (#73703 )	2023-11-28 15:10:06 -08:00
Johannes Doerfert	8327f4a851	[OpenMP][NFC] Move Utils.h and Debug.h into a "Shared" include folder (#73701 ) Headers used throughout the different runtimes are different from the internal headers. This is a first step to bring structure in into the include folder.	2023-11-28 13:44:57 -08:00
Johannes Doerfert	7bfcce3e94	[OpenMP] Tear down GenericDeviceTy's with GenericPluginTy (#73557 ) There is no point in keeping GenericDeviceTy objects alive longer than the associated GenericPluginTy. Instead of the old API we now tear them down with the plugin, avoiding ordering issues.	2023-11-27 11:42:12 -08:00
Johannes Doerfert	f9436464a9	[OpenMP][NFC] Minor name and code simplification	2023-11-27 11:08:29 -08:00
Johannes Doerfert	2b2e711afc	[OpenMP][NFC] Remove no-op __tgt_rtl_deinit_plugin The order in which we deinit things, especially when shared libraries are involved, is complicated. To simplify our lives the nextgen plugin deinitializes the GenericPluginTy and subclasses automatically. The old __tgt_rtl_deinit_plugin is not needed anymore.	2023-11-27 11:07:57 -08:00
Johannes Doerfert	9c33bf62a7	[OpenMP][NFC] Remove unused (un)register_lib plugin API These APIs have not been hooked up for a while. No need to carry them.	2023-11-27 11:07:57 -08:00
Johannes Doerfert	f48c4d8aa1	[OpenMP] Be more forgiving during record and replay When we record and replay kernels we should not error out early if there is a chance the program might still run fine. This patch will: 1) Fallback to the allocation heuristic if the VAMap doesn't work. 2) Adjust the memory start to match the required address if possible. 3) Adjust the (guessed) pointer arguments if the memory start adjustment is impossible. This will allow kernels without indirect accesses to work while indirect accesses will most likely fail.	2023-11-20 17:15:34 -08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00

1 2 3 4 5

201 Commits