clang-p2996

Author	SHA1	Message	Date
Joseph Huber	ea707baca2	[Libomptarget][NFCI] Move logic out of PluginAdaptorTy (#86971 ) Summary: This patch removes most of the special handling from the `PluginAdaptorTy` in preparation for changing this to be the `GenericPluginTy`. Doing this requires that the OpenMP specific handling of stuff like device offsets be contained within the OpenMP plugin manager. Generally this was uninvasive expect for the change to tracking the offset and size of the used devices. The eaiest way I could think to do this was to use some maps, which double as indicators for which plugins have devices active. This should not affect the logic.	2024-03-29 07:19:22 -05:00
nicebert	20f5bcfb1a	[OpenMP] Add OpenMP extension API to dump mapping tables (#85381 ) This adds an API call ompx_dump_mapping_tables. This allows users to debug the mapping tables and can be especially useful for unified shared memory applications to check if the code behaves in the way it should. The implementation reuses code already present to dump mapping tables (in a debug setting). --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-03-18 14:09:20 -05:00
Daniel Martinez	aa6ebf9be1	Replace some C headers with C++ ones (#82697 ) #81434 Replaced some C headers with C++ ones Co-authored-by: Daniel Martinez <danielmartinez@cock.li>	2024-03-04 01:21:31 -05:00
Michael Halkenhäuser	e521752c04	[OpenMP][OMPT] Add OMPT callback for device data exchange 'Device-to-Device' (#81991 ) Since there's no `ompt_target_data_transfer_tofrom_device` (within ompt_target_data_op_t enum) or something other that conveys the meaning of inter-device data exchange we decided to indicate a Device-to-Device transfer by using: optype == ompt_target_data_transfer_from_device (=3) Hence, a device transfer may be identified e.g. by checking for: (optype == 3) && (src_device_num < omp_get_num_devices()) && (dest_device_num < omp_get_num_devices()) Fixes: #66478	2024-02-26 11:16:25 +01:00
Joseph Huber	87b4108211	[Libomptarget][NFC] Remove concept of optional plugin functions (#82681 ) Summary: Ever since the introduction of the new plugins we haven't exercised the concept of "optional" plugin functions. This is done in perparation for making the plugins use a static interface as it will greatly simplify the implementation if we assert that every function has the entrypoints. Currently some unsupported functions will just return failure or some other default value, so this shouldn't change anything.	2024-02-22 16:49:21 -06:00
Daniel Martinez	45fe67dd61	Fix build on musl by including stdint.h (#81434 ) openmp fails to build on musl since it lacks the defines for int32_t Co-authored-by: Daniel Martinez <danielmartinez@cock.li>	2024-02-22 13:14:27 -08:00
Joseph Huber	ea174c0934	[Libomptarget] Remove global ctor and use reference counting (#80499 ) Summary: Currently we rely on global constructors to initialize and shut down the OpenMP runtime library and plugin manager. This causes some issues because we do not have a defined lifetime that we can rely on to release and allocate resources. This patch instead adds some simple reference counted initialization and deinitialization function. A future patch will use the `deinit` interface to more intelligently handle plugin deinitilization. Right now we do nothing and rely on `atexit` inside of the plugins to tear them down. This isn't great because it limits our ability to control these things. Note that I made the `__tgt_register_lib` functions do the initialization instead of adding calls to the new runtime functions in the linker wrapper. The reason for this is because in the past it's been easier to not introduce a new function call, since sometimes the user's compiler will link against an older `libomptarget`. Maybe if we change the name with offloading in the future we can simplify this. Depends on https://github.com/llvm/llvm-project/pull/80460	2024-02-22 12:01:52 -06:00
Joseph Huber	cc374d8056	[OpenMP] Remove `register_requires` global constructor (#80460 ) Summary: Currently, OpenMP handles the `omp requires` clause by emitting a global constructor into the runtime for every translation unit that requires it. However, this is not a great solution because it prevents us from having a defined order in which the runtime is accessed and used. This patch changes the approach to no longer use global constructors, but to instead group the flag with the other offloading entires that we already handle. This has the effect of still registering each flag per requires TU, but now we have a single constructor that handles everything. This function removes support for the old `__tgt_register_requires` and replaces it with a warning message. We just had a recent release, and the OpenMP policy for the past four releases since we switched to LLVM is that we do not provide strict backwards compatibility between major LLVM releases now that the library is versioned. This means that a user will need to recompile if they have an old binary that relied on `register_requires` having the old behavior. It is important that we actively deprecate this, as otherwise it would not solve the problem of having no defined init and shutdown order for `libomptarget`. The problem of `libomptarget` not having a define init and shutdown order cascades into a lot of other issues so I have a strong incentive to be rid of it. It is worth noting that the current `__tgt_offload_entry` only has space for a 32-bit integer here. I am planning to overhaul these at some point as well.	2024-02-21 11:33:32 -06:00
dhruvachak	0d7f232baf	[libomptarget] [OMPT] Fixed return address computation for OMPT events. (#80498 ) Currently, __builtin_return_address is used to generate the return address when the callback invoker is created. However, this may result in the return address pointing to an internal runtime function. This is not what a tool would typically want. A tool would want to know the corresponding user code from where the runtime entry point is invoked. This change adds a thread local variable that is assigned the return address at the OpenMP runtime entry points. An RAII is used to manage the modifications to the thread local variable. Whenever the return address is required for OMPT events, it is read from the thread local variable.	2024-02-07 17:29:08 -08:00
Joseph Huber	0ac4438560	[Libomptarget] Remove unused 'SupportsEmptyImages' API function (#80316 ) Summary: This function is always false in the current implementation and is not even considered required. Just remove it and if someone needs it in the future they can add it back in. This is done to simplify the interface prior to other changes	2024-02-05 10:00:09 -06:00
Joseph Huber	2333865546	[Libomptarget] Fix data mapping on dynamic loads (#80559 ) Summary: The current logic tries to map target mapping tables to the current device. Right now it assumes that data is only mapped a single time per device. This is only true if we have a single instance of the runtime running on a single program. However, in the case of dynamic library loads or shared libraries, this may happen multiple times. Given a case of a simple dynamic library load which has its own target kernel instruction, the current logic had only the first call to `__tgt_target_kernel` to the data mapping for that device. Then, when the next dynamic library load got called, it would see that the global were already mapped for that device and skip registering its own entires, even though they were distinct. This resulted in none of the mappings being done and hitting an assertion. This patch simply gets rid of this per-device check. The check should instead be on the host offloading entries. We already have logic that calls `continue` if we already have entries for that pointer, so we can simply rely on that instead.	2024-02-03 15:28:20 -06:00
Joseph Huber	254287658f	[Libomptarget] Remove handling of old ctor / dtor entries (#80153 ) Summary: A previous patch removed creating these entries in clang in favor of the backend emitting a callable kernel and having the runtime call that if present. The support for the old style was kept around in LLVM 18.0 but now that we have forked to 19.0 we should remove the support. The effect of this would be that an application linking against a newer libomptarget that still had the old constructors will no longer be called. In that case, they can either recompile or use the `libomptarget.so.18` that comes with the previous release.	2024-01-31 11:48:07 -06:00
Joseph Huber	621bafd5c1	[Libomptarget] Move target table handling out of the plugins (#77150 ) Summary: This patch removes the bulk of the handling of the `__tgt_offload_entries` out of the plugins itself. The reason for this is because the plugins themselves should not be handling this implementation detail of the OpenMP runtime. Instead, we expose two new plugin API functions to get the points to a device pointer for a global as well as a kernel type. This required introducing a new type to represent a binary image that has been loaded on a device. We can then use this to load the addresses as needed. The creation of the mapping table is then handled just in `libomptarget` where we simply look up each address individually. This should allow us to expose these operations more generically when we provide a separate API.	2024-01-22 11:06:47 -06:00
carlobertolli	ae99966a27	[OpenMP] Enable automatic unified shared memory on MI300A. (#77512 ) This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch also introduces an environment variable OMPX_APU_MAPS that, if set, triggers automatic zero-copy also on non APU GPUs (e.g., on discrete GPUs). This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-22 10:30:22 -06:00
Joseph Huber	d03b8c3a04	[Libomptarget][NFC] Format in-line comments consistently (#77530 ) Summary: The LLVM style uses /Foo=/ when indicating the name of a constant. See https://llvm.org/docs/CodingStandards.html#comment-formatting. This is useful for consistency, as well as because `clang-format` understands this syntax and formats it more cleanly. Do a bulk update of this syntax.	2024-01-10 10:10:08 -06:00
Joseph Huber	0fe86f9c51	[Libomptarget] Remove extra cache for offloading entries (#77012 ) Summary: The offloading entries right now are assumed to be baked into the binary itself, and thus always valid whenever the library is executing. This means that we don't need to copy them to additional storage and can instead simply pass around references to it. This is not likely to change in the expected operation of the OpenMP library. Additionally, the indirection for the offload entry struct is simply two pointers, so moving it by value is trivial.	2024-01-08 16:49:33 -06:00
carlobertolli	ce4144406c	Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371 ) Reverts llvm/llvm-project#75999 lit test is failing.	2024-01-08 14:38:29 -06:00
carlobertolli	22a73e7c46	[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999 ) …on (zero-copy) on MI300A. This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-08 14:17:28 -06:00
Joseph Huber	fb32977ac7	[Libomptarget] Fix RPC-based malloc on NVPTX (#72440 ) Summary: The device allocator on NVPTX architectures is enqueued to a stream that the kernel is potentially executing on. This can lead to deadlocks as the kernel will not proceed until the allocation is complete and the allocation will not proceed until the kernel is complete. CUDA 11.2 introduced async allocations that we can manually place on separate streams to combat this. This patch makes a new allocation type that's guaranteed to be non-blocking so it will actually make progress, only Nvidia needs to care about this as the others are not blocking in this way by default. I had originally tried to make the `alloc` and `free` methods take a `__tgt_async_info`. However, I observed that with the large volume of streams being created by a parallel test it quickly locked up the system as presumably too many streams were being created. This implementation not just creates a new stream and immediately destroys it. This obviously isn't very fast, but it at least gets the cases to stop deadlocking for now.	2024-01-02 16:53:53 -06:00
Felipe Cabarcas	9b6ea5e8f8	[OpenMP] Improve omp offload profiler (#68016 ) Summary: Adding information to the LIBOMPTARGET profiler runtime kernel and API calls. Key changes: * Adding information to runtime calls for better understanding of how the application is executing. For example teams requested by the user, size of memory transfers. * Profile timer was changed from 'us' to 'ns', since 'us' was too coarse-grain to register some important details like key kernel duration * Removed non API or Runtime calls, to reduce complexity of profile for application developers. --------- Co-authored-by: Felipe Cabarcas <cabarcas@leia.crpl.cis.udel.edu> Co-authored-by: fel-cab <fel-cab@github.com>	2023-12-22 14:58:11 -05:00
Joseph Huber	ac029e02a9	[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720 ) Summary: This patch reorganizes a lot of the code used to check for compatibility with the current environment. The main bulk of this patch involves moving from using a separate `__tgt_image_info` struct (which just contains a string for the architecture) to instead simply checking this information from the ELF directly. Checking information in the ELF is very inexpensive as creating an ELF file is simply writing a base pointer. The main desire to do this was to reorganize everything into the ELF image. We can then do the majority of these checks without first initializing the plugin. A future patch will move the first ELF checks to happen without initializing the plugin so we no longer need to initialize and plugins that don't have needed images. This patch also adds a lot more sanity checks for whether or not the ELF is actually compatible. Such as if the images have a valid ABI, 64-bit width, executable, etc.	2023-12-19 20:01:31 -06:00
dhruvachak	e4de6a602f	[OpenMP] [OMPT] A pointer to HostOpId should be passed in EMI callbacks. (#75574 ) With this change, TargetRegionOpId is no more used and hence deleted.	2023-12-15 12:07:42 -08:00
Johannes Doerfert	fe6f137e48	[OpenMP][NFC] Move mapping related code into OpenMP/Mapping.cpp (#75239 ) DeviceTy provides an abstraction for "middle-level" operations that can be done with a offload device. Mapping was tied into it but is not strictly necessary. Other languages do not track mapping, and even OpenMP can be used completely without mapping. This simply moves the relevant code into the OpenMP/Mapping.cpp as part of a new class MappingInfoTy. Each device still has one, but it does not clutter the device.cpp anymore.	2023-12-12 12:49:46 -08:00
Johannes Doerfert	5dd1fc7008	[OpenMP][NFC] Improve profiling for the offload runtime	2023-12-11 17:30:35 -08:00
Johannes Doerfert	2ada7bb68b	[OpenMP][NFCI] Remove effectively unused mutex The only use was already guarded by a different lock in the caller of loadBinary.	2023-12-11 17:30:35 -08:00
Johannes Doerfert	cee6918d87	[OpenMP][NFC] Move api.cpp to OpenMP/API.cpp	2023-12-11 17:30:34 -08:00
Johannes Doerfert	13b8826508	Revert " [OpenMP][NFC] Remove `DelayedBinDesc`" (#74679 ) Reverts llvm/llvm-project#74360 As I wrote in the analysis of #74360: Since `bc4e0c048a` we will not add PluginAdaptors into the container of all plugin adaptors before the plugin is not ready. The error is thereby gone. When and old HSA loads other libraries they can call register_image but that will simply not register the image with the plugin we are currently initializing. That seems like reasonable behavior, thought it is good to keep in mind if we ever want a kernel library (@jhuber6 @mjklemm). We can still have a standalone kernel library though or load it late after all plugins are setup (which seems reasonable). I did not expect one our tests actually doing exactly what this will not allow anymore, at least when you use rocm <5.5.0. Need to figure out if we want this behavior (for rocm <5.5.0).	2023-12-06 16:04:23 -08:00
Johannes Doerfert	d552ce2638	[OpenMP][NFC] Remove `DelayedBinDesc` (#74360 ) Remove `DelayedBinDesc` as it is not necessary since `bc4e0c048a`. See https://github.com/llvm/llvm-project/pull/74360#issuecomment-1843603736 for details.	2023-12-06 14:48:23 -08:00
Johannes Doerfert	68db7aef74	[OpenMP] Reorganize the initialization of `PluginAdaptorTy` (#74397 ) This introduces checked errors into the creation and initialization of `PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the user if the initialization failed. The new organization avoids the "initOnce" stuff but we still do not eagerly initialize the plugin devices (I think we should merge `PluginAdaptorTy::initDevices` into `PluginAdaptorTy::init`)	2023-12-05 16:04:01 -08:00
Johannes Doerfert	9f87509b19	[OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532 ) This fixes two bugs and adds a test for them: - A shared library with declare target functions but without kernels should not error out due to missing globals. - Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of indirect declare targets.	2023-12-05 15:25:10 -08:00
Johannes Doerfert	66784dcb3b	[OpenMP] Ensure `Devices` is accessed exlusively (#74374 ) We accessed the `Devices` container most of the time while holding the RTLsMtx, but not always. Sometimes we used the mutex for the size query, but then accessed Devices again unguarded. From now we properly encapsulate the container in a ProtectedObj which ensures exclusive accesses. We also hide the "isReady" part in the `getDevice` accessor and use an `llvm::Expected` to allow to return errors.	2023-12-04 17:10:37 -08:00
Johannes Doerfert	27f17837bb	[OpenMP][NFC] Remove PluginAdaptorManagerTy	2023-12-01 15:23:17 -08:00
Johannes Doerfert	7169c45efa	[OpenMP][NFCI] Organize offload entry logic This moves the offload entry logic into classes and provides convenient accessors. No functional change intended but we can now print all offload entries (and later look them up), tested via `OMPTARGET_DUMP_OFFLOAD_ENTRIES=<device_no>`.	2023-12-01 15:10:52 -08:00
Johannes Doerfert	b091a887e0	[OpenMP][NFC] Extract device image handling into a class/header (#74129 )	2023-12-01 14:59:12 -08:00
Johannes Doerfert	5fe741f08e	[OpenMP] Separate Requirements into a standalone header (#74126 ) This is not completely NFC since we now check all 4 requirements and the test is checking the good and the bad case for combining flags.	2023-12-01 14:47:00 -08:00
Johannes Doerfert	3530428b8f	[OpenMP][NFC] Extract OffloadPolicy into a helper class (#74029 ) OpenMP allows 3 different offload policies, handling of which we want to encapsulate.	2023-12-01 10:55:18 -08:00
Johannes Doerfert	bc4e0c048a	[OpenMP][NFC] Modernize the plugin handling (#74034 ) This basically moves code around again, but this time to provide cleaner interfaces and remove duplication. PluginAdaptorManagerTy is almost all gone after this.	2023-12-01 10:36:59 -08:00
Johannes Doerfert	51fc8544c7	[OpenMP][NFC] Move mapping related logic into Mapping.h (#74009 )	2023-11-30 17:08:41 -08:00
Johannes Doerfert	1035cc7029	[OpenMP][NFC] Encapsulate Devices.size() (#74010 )	2023-11-30 16:44:47 -08:00
Johannes Doerfert	b8b2a279d0	[OpenMP][NFC] Encapsulate profiling logic (#74003 ) This simply puts the profiling logic into the `Profiler` class and allows non-RAII profiling via `beginSection` and `endSection`.	2023-11-30 15:52:02 -08:00
Johannes Doerfert	148dec9fa4	[OpenMP][NFC] Separate Envar (environment variable) handling (#73994 )	2023-11-30 15:23:34 -08:00
Johannes Doerfert	b80b5f180b	[OpenMP] Replace copy and paste code with instantiation (#73991 )	2023-11-30 14:16:34 -08:00
Johannes Doerfert	fce4c0acd6	[OpenMP] Start organizing PluginManager, PluginAdaptors (#73875 )	2023-11-30 13:47:47 -08:00
Johannes Doerfert	2e7f47d4a8	[OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868 )	2023-11-29 16:04:19 -08:00
Johannes Doerfert	40422bf150	[OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (#73817 ) While this does not really encapsulate the mapping code, it at least moves most of the declarations out of the way.	2023-11-29 10:29:54 -08:00
Johannes Doerfert	8391bb3f5c	[OpenMP][NFC] Move more declarations out of private.h (#73823 )	2023-11-29 09:22:03 -08:00
Johannes Doerfert	b465f94b7c	[OpenMP][NFC] Put ExponentialBackoff in a Utils header (#73816 ) "private.h" will go.	2023-11-29 09:10:29 -08:00
Johannes Doerfert	fd2d0bf90e	[OpenMP][NFC] Replace unnecessary typedefs (#73815 )	2023-11-29 08:40:41 -08:00
Johannes Doerfert	e2299e8d9d	[OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718 )	2023-11-29 08:29:41 -08:00
Johannes Doerfert	db96a9c3b7	[OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725 ) For historic reasons we had it setup that there was ` plugin-nextgen/common/PluginInterface/<sources + headers>` which is not what we do anywhere else. Now it looks like the rest: ``` plugin-nextgen/common/include/<headers> plugin-nextgen/common/src/<sources> ``` As part of this, `dlwrap.h` was moved into common/include (as `DLWrap.h`) since it is exclusively used by the plugins.	2023-11-29 07:57:01 -08:00

1 2 3 4 5

213 Commits