clang-p2996

Author	SHA1	Message	Date
Felipe Cabarcas	9b6ea5e8f8	[OpenMP] Improve omp offload profiler (#68016 ) Summary: Adding information to the LIBOMPTARGET profiler runtime kernel and API calls. Key changes: * Adding information to runtime calls for better understanding of how the application is executing. For example teams requested by the user, size of memory transfers. * Profile timer was changed from 'us' to 'ns', since 'us' was too coarse-grain to register some important details like key kernel duration * Removed non API or Runtime calls, to reduce complexity of profile for application developers. --------- Co-authored-by: Felipe Cabarcas <cabarcas@leia.crpl.cis.udel.edu> Co-authored-by: fel-cab <fel-cab@github.com>	2023-12-22 14:58:11 -05:00
Joseph Huber	ac029e02a9	[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720 ) Summary: This patch reorganizes a lot of the code used to check for compatibility with the current environment. The main bulk of this patch involves moving from using a separate `__tgt_image_info` struct (which just contains a string for the architecture) to instead simply checking this information from the ELF directly. Checking information in the ELF is very inexpensive as creating an ELF file is simply writing a base pointer. The main desire to do this was to reorganize everything into the ELF image. We can then do the majority of these checks without first initializing the plugin. A future patch will move the first ELF checks to happen without initializing the plugin so we no longer need to initialize and plugins that don't have needed images. This patch also adds a lot more sanity checks for whether or not the ELF is actually compatible. Such as if the images have a valid ABI, 64-bit width, executable, etc.	2023-12-19 20:01:31 -06:00
dhruvachak	e4de6a602f	[OpenMP] [OMPT] A pointer to HostOpId should be passed in EMI callbacks. (#75574 ) With this change, TargetRegionOpId is no more used and hence deleted.	2023-12-15 12:07:42 -08:00
Johannes Doerfert	fe6f137e48	[OpenMP][NFC] Move mapping related code into OpenMP/Mapping.cpp (#75239 ) DeviceTy provides an abstraction for "middle-level" operations that can be done with a offload device. Mapping was tied into it but is not strictly necessary. Other languages do not track mapping, and even OpenMP can be used completely without mapping. This simply moves the relevant code into the OpenMP/Mapping.cpp as part of a new class MappingInfoTy. Each device still has one, but it does not clutter the device.cpp anymore.	2023-12-12 12:49:46 -08:00
Johannes Doerfert	5dd1fc7008	[OpenMP][NFC] Improve profiling for the offload runtime	2023-12-11 17:30:35 -08:00
Johannes Doerfert	2ada7bb68b	[OpenMP][NFCI] Remove effectively unused mutex The only use was already guarded by a different lock in the caller of loadBinary.	2023-12-11 17:30:35 -08:00
Johannes Doerfert	cee6918d87	[OpenMP][NFC] Move api.cpp to OpenMP/API.cpp	2023-12-11 17:30:34 -08:00
Johannes Doerfert	13b8826508	Revert " [OpenMP][NFC] Remove `DelayedBinDesc`" (#74679 ) Reverts llvm/llvm-project#74360 As I wrote in the analysis of #74360: Since `bc4e0c048a` we will not add PluginAdaptors into the container of all plugin adaptors before the plugin is not ready. The error is thereby gone. When and old HSA loads other libraries they can call register_image but that will simply not register the image with the plugin we are currently initializing. That seems like reasonable behavior, thought it is good to keep in mind if we ever want a kernel library (@jhuber6 @mjklemm). We can still have a standalone kernel library though or load it late after all plugins are setup (which seems reasonable). I did not expect one our tests actually doing exactly what this will not allow anymore, at least when you use rocm <5.5.0. Need to figure out if we want this behavior (for rocm <5.5.0).	2023-12-06 16:04:23 -08:00
Johannes Doerfert	d552ce2638	[OpenMP][NFC] Remove `DelayedBinDesc` (#74360 ) Remove `DelayedBinDesc` as it is not necessary since `bc4e0c048a`. See https://github.com/llvm/llvm-project/pull/74360#issuecomment-1843603736 for details.	2023-12-06 14:48:23 -08:00
Johannes Doerfert	68db7aef74	[OpenMP] Reorganize the initialization of `PluginAdaptorTy` (#74397 ) This introduces checked errors into the creation and initialization of `PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the user if the initialization failed. The new organization avoids the "initOnce" stuff but we still do not eagerly initialize the plugin devices (I think we should merge `PluginAdaptorTy::initDevices` into `PluginAdaptorTy::init`)	2023-12-05 16:04:01 -08:00
Johannes Doerfert	9f87509b19	[OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532 ) This fixes two bugs and adds a test for them: - A shared library with declare target functions but without kernels should not error out due to missing globals. - Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of indirect declare targets.	2023-12-05 15:25:10 -08:00
Johannes Doerfert	66784dcb3b	[OpenMP] Ensure `Devices` is accessed exlusively (#74374 ) We accessed the `Devices` container most of the time while holding the RTLsMtx, but not always. Sometimes we used the mutex for the size query, but then accessed Devices again unguarded. From now we properly encapsulate the container in a ProtectedObj which ensures exclusive accesses. We also hide the "isReady" part in the `getDevice` accessor and use an `llvm::Expected` to allow to return errors.	2023-12-04 17:10:37 -08:00
Johannes Doerfert	27f17837bb	[OpenMP][NFC] Remove PluginAdaptorManagerTy	2023-12-01 15:23:17 -08:00
Johannes Doerfert	7169c45efa	[OpenMP][NFCI] Organize offload entry logic This moves the offload entry logic into classes and provides convenient accessors. No functional change intended but we can now print all offload entries (and later look them up), tested via `OMPTARGET_DUMP_OFFLOAD_ENTRIES=<device_no>`.	2023-12-01 15:10:52 -08:00
Johannes Doerfert	b091a887e0	[OpenMP][NFC] Extract device image handling into a class/header (#74129 )	2023-12-01 14:59:12 -08:00
Johannes Doerfert	5fe741f08e	[OpenMP] Separate Requirements into a standalone header (#74126 ) This is not completely NFC since we now check all 4 requirements and the test is checking the good and the bad case for combining flags.	2023-12-01 14:47:00 -08:00
Johannes Doerfert	3530428b8f	[OpenMP][NFC] Extract OffloadPolicy into a helper class (#74029 ) OpenMP allows 3 different offload policies, handling of which we want to encapsulate.	2023-12-01 10:55:18 -08:00
Johannes Doerfert	bc4e0c048a	[OpenMP][NFC] Modernize the plugin handling (#74034 ) This basically moves code around again, but this time to provide cleaner interfaces and remove duplication. PluginAdaptorManagerTy is almost all gone after this.	2023-12-01 10:36:59 -08:00
Johannes Doerfert	51fc8544c7	[OpenMP][NFC] Move mapping related logic into Mapping.h (#74009 )	2023-11-30 17:08:41 -08:00
Johannes Doerfert	1035cc7029	[OpenMP][NFC] Encapsulate Devices.size() (#74010 )	2023-11-30 16:44:47 -08:00
Johannes Doerfert	b8b2a279d0	[OpenMP][NFC] Encapsulate profiling logic (#74003 ) This simply puts the profiling logic into the `Profiler` class and allows non-RAII profiling via `beginSection` and `endSection`.	2023-11-30 15:52:02 -08:00
Johannes Doerfert	148dec9fa4	[OpenMP][NFC] Separate Envar (environment variable) handling (#73994 )	2023-11-30 15:23:34 -08:00
Johannes Doerfert	b80b5f180b	[OpenMP] Replace copy and paste code with instantiation (#73991 )	2023-11-30 14:16:34 -08:00
Johannes Doerfert	fce4c0acd6	[OpenMP] Start organizing PluginManager, PluginAdaptors (#73875 )	2023-11-30 13:47:47 -08:00
Johannes Doerfert	2e7f47d4a8	[OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868 )	2023-11-29 16:04:19 -08:00
Johannes Doerfert	40422bf150	[OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (#73817 ) While this does not really encapsulate the mapping code, it at least moves most of the declarations out of the way.	2023-11-29 10:29:54 -08:00
Johannes Doerfert	8391bb3f5c	[OpenMP][NFC] Move more declarations out of private.h (#73823 )	2023-11-29 09:22:03 -08:00
Johannes Doerfert	b465f94b7c	[OpenMP][NFC] Put ExponentialBackoff in a Utils header (#73816 ) "private.h" will go.	2023-11-29 09:10:29 -08:00
Johannes Doerfert	fd2d0bf90e	[OpenMP][NFC] Replace unnecessary typedefs (#73815 )	2023-11-29 08:40:41 -08:00
Johannes Doerfert	e2299e8d9d	[OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718 )	2023-11-29 08:29:41 -08:00
Johannes Doerfert	db96a9c3b7	[OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725 ) For historic reasons we had it setup that there was ` plugin-nextgen/common/PluginInterface/<sources + headers>` which is not what we do anywhere else. Now it looks like the rest: ``` plugin-nextgen/common/include/<headers> plugin-nextgen/common/src/<sources> ``` As part of this, `dlwrap.h` was moved into common/include (as `DLWrap.h`) since it is exclusively used by the plugins.	2023-11-29 07:57:01 -08:00
Johannes Doerfert	2cfe7b1b66	[OpenMP][NFC] Extract timescope profile support into its own header (#73727 )	2023-11-29 07:54:35 -08:00
Johannes Doerfert	d1057014a1	[OpenMP][NFC] Create an "OpenMP" folder in the include folder (#73713 ) Not everything in libomptarget (include) is "OpenMP", but some things most certainly are. This commit moves some code around to start making this distinction without the intention to change functionality.	2023-11-28 15:41:31 -08:00
Johannes Doerfert	7233e42dff	[OpenMP][NFC] Move Environment.h and SourceInfo.h into "Shared" folder (#73703 )	2023-11-28 15:10:06 -08:00
Johannes Doerfert	8327f4a851	[OpenMP][NFC] Move Utils.h and Debug.h into a "Shared" include folder (#73701 ) Headers used throughout the different runtimes are different from the internal headers. This is a first step to bring structure in into the include folder.	2023-11-28 13:44:57 -08:00
Johannes Doerfert	7bfcce3e94	[OpenMP] Tear down GenericDeviceTy's with GenericPluginTy (#73557 ) There is no point in keeping GenericDeviceTy objects alive longer than the associated GenericPluginTy. Instead of the old API we now tear them down with the plugin, avoiding ordering issues.	2023-11-27 11:42:12 -08:00
Johannes Doerfert	f9436464a9	[OpenMP][NFC] Minor name and code simplification	2023-11-27 11:08:29 -08:00
Johannes Doerfert	2b2e711afc	[OpenMP][NFC] Remove no-op __tgt_rtl_deinit_plugin The order in which we deinit things, especially when shared libraries are involved, is complicated. To simplify our lives the nextgen plugin deinitializes the GenericPluginTy and subclasses automatically. The old __tgt_rtl_deinit_plugin is not needed anymore.	2023-11-27 11:07:57 -08:00
Johannes Doerfert	9c33bf62a7	[OpenMP][NFC] Remove unused (un)register_lib plugin API These APIs have not been hooked up for a while. No need to carry them.	2023-11-27 11:07:57 -08:00
Johannes Doerfert	f48c4d8aa1	[OpenMP] Be more forgiving during record and replay When we record and replay kernels we should not error out early if there is a chance the program might still run fine. This patch will: 1) Fallback to the allocation heuristic if the VAMap doesn't work. 2) Adjust the memory start to match the required address if possible. 3) Adjust the (guessed) pointer arguments if the memory start adjustment is impossible. This will allow kernels without indirect accesses to work while indirect accesses will most likely fail.	2023-11-20 17:15:34 -08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Konstantinos Parasyris	d6a3d6b96d	[openmp] Fixed Support for VA for record-replay. (#70396 ) The commit was discussed in phabricator (https://reviews.llvm.org/D157186). Record replay currently fails on AMD as it conflicts with the heap memory allocator introduced in #69806. The workaround is setting `LIBOMPTARGET_HEAP_SIZE=0` during both record and replay run.	2023-10-29 12:27:19 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit `ddbaa11e9f`. Reapply the original commit, the broken test was repaired in `5e51363f38` in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit `c2a1249a82`. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Johannes Doerfert	d3921e4670	[OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806 ) The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing https://github.com/llvm/llvm-project/issues/66708	2023-10-21 14:49:30 -07:00
Johannes Doerfert	1cea309b7e	[OpenMP][NFC] Move DebugKind to make it reusable from the host	2023-10-20 19:28:09 -07:00
Michael Klemm	f93a697e47	[libomptarget][OpenMP] Initial implementation of omp_target_memset() and omp_target_memset_async() (#68706 ) Implement a slow-path version of omp_target_memset*() There is a TODO to implement a fast path that uses an on-device kernel instead of the host-based memory fill operation. This may require some additional plumbing to have kernels in libomptarget.so	2023-10-19 15:29:36 +02:00

1 2 3 4

194 Commits