clang-p2996

Author	SHA1	Message	Date
Joseph Huber	ba192debb4	[Libomptarget][Obvious] Fix typo in attribute lookup Summary: These are keys into the AMDGPU target metadata. One of them had a typo which prevented it from being extracted.	2023-12-20 19:03:35 -06:00
Joseph Huber	e4f4022b70	[Libomptarget][NFC] Fix linting warnings in the plugins Summary: Fix some linting warnings present in the plugins.	2023-12-20 10:07:34 -06:00
Joseph Huber	ac029e02a9	[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720 ) Summary: This patch reorganizes a lot of the code used to check for compatibility with the current environment. The main bulk of this patch involves moving from using a separate `__tgt_image_info` struct (which just contains a string for the architecture) to instead simply checking this information from the ELF directly. Checking information in the ELF is very inexpensive as creating an ELF file is simply writing a base pointer. The main desire to do this was to reorganize everything into the ELF image. We can then do the majority of these checks without first initializing the plugin. A future patch will move the first ELF checks to happen without initializing the plugin so we no longer need to initialize and plugins that don't have needed images. This patch also adds a lot more sanity checks for whether or not the ELF is actually compatible. Such as if the images have a valid ABI, 64-bit width, executable, etc.	2023-12-19 20:01:31 -06:00
Joseph Huber	6f3bd3a2f6	[Libomptarget] Add a utility function for checking existence of symbols (#74550 ) Summary: There are now a few cases that check if a symbol is present before continuing, effectively making them optional features if present in the image. This was done in at least three locations and required an ugly operation to consume the error. This patch makes a utility function to handle that instead.	2023-12-06 07:41:27 -06:00
Jon Chesterfield	f184147706	[amdgpu] Default to 1.0, instead of unspecified, for dynamic hsa (#74098 ) The plugin checks the values of HSA_AMD_INTERFACE_VERSION_* so we now set them to something safe in the header.	2023-12-01 16:37:49 +00:00
Johannes Doerfert	e2299e8d9d	[OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718 )	2023-11-29 08:29:41 -08:00
Johannes Doerfert	db96a9c3b7	[OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725 ) For historic reasons we had it setup that there was ` plugin-nextgen/common/PluginInterface/<sources + headers>` which is not what we do anywhere else. Now it looks like the rest: ``` plugin-nextgen/common/include/<headers> plugin-nextgen/common/src/<sources> ``` As part of this, `dlwrap.h` was moved into common/include (as `DLWrap.h`) since it is exclusively used by the plugins.	2023-11-29 07:57:01 -08:00
Jan Patrick Lehr	3930a0b57a	[OpenMP][libomptarget] Use two SDMA engines (#73633 ) Limit the use to two SDMA engines which are optimized for such transfers.	2023-11-29 14:21:44 +01:00
Johannes Doerfert	7233e42dff	[OpenMP][NFC] Move Environment.h and SourceInfo.h into "Shared" folder (#73703 )	2023-11-28 15:10:06 -08:00
Johannes Doerfert	8327f4a851	[OpenMP][NFC] Move Utils.h and Debug.h into a "Shared" include folder (#73701 ) Headers used throughout the different runtimes are different from the internal headers. This is a first step to bring structure in into the include folder.	2023-11-28 13:44:57 -08:00
Johannes Doerfert	0783bf1cb3	[OpenMP][NFC] Merge MemoryManager into PluginInterface (#73678 ) Similar to #73677, there is no benefit from keeping MemoryManager seperate; it's tied into the current design. Except the move I also replaced the getenv call with our Env handling.	2023-11-28 10:17:51 -08:00
Johannes Doerfert	4667dd62ee	[OpenMP][NFC] Merge elf_common into PluginInterface (#73677 ) The overhead of a library and 4 files seems high without benefit. This simply tries to consolidate our structure.	2023-11-28 10:03:25 -08:00
Johannes Doerfert	6663df30c0	[OpenMP][NFC] Remove std::move to silence warnings	2023-11-20 17:15:33 -08:00
Joseph Huber	47a3ad5be1	[Libomptarget] Handle dynamic stack sizes for AMD COV5 (#72606 ) Summary: One of the changes in the AMD code-object version five was that kernels that use an unknown amount of private stack memory now no longer default to 16 KBs. Instead it emits a flag that indicates the runtime must provide a value. This patch checks if we must provide such a stack, and uses the existing handling of the stack environment variable to configure it.	2023-11-20 12:48:42 -06:00
Jan Patrick Lehr	5c22b907dc	Reland [OpenMP][libomptarget] Enable parallel copies via multiple SDM… (#72307 ) …A engines (#71801) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 21:30:04 +01:00
Joseph Huber	cc9e19ee59	Revert "[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 )" This causes the tests to fail because the bots were not updated in time. Revert until we update the bots to a valid version. This reverts commit `e876250b63`.	2023-11-14 12:34:27 -06:00
Jan Patrick Lehr	e876250b63	[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 ) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 19:16:39 +01:00
Joseph Huber	237adfca4e	[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739 ) Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549	2023-11-10 14:53:53 -06:00
Saiyedul Islam	21861991e7	[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234 ) Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses already available global variable "oclc_ABI_version" instead of "llvm.amdgcn.abi.verion". It also adds some minor fields in ImplicitArg structure.	2023-11-09 10:34:35 +05:30
Jon Chesterfield	f0e100a05a	[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987 ) If you build with dynamic_hsa, the symbol is known and compilation succeeds. If you then run with a slightly older libhsa, this argument is not recognised and an error returned. I'd rather the program runs with a misleading omp wtime than refuses to run at all.	2023-11-01 22:43:34 +00:00
Jon Chesterfield	896749aa0d	[amdgpu][openmp] Avoiding writing to packet header twice (#70695 ) I think it follows from the HSA spec that a write to the first byte is deemed significant to the GPU in which case writing to the second short and reading back from it later would be safe. However, the examples for this all involve an atomic write to the first 32 bits and it seems a credible risk that the occasional CI errors abound invalid packets have as their root cause that the firmware notices the early write to packet->setup and treats that as a sign that the packet is ready to go. That was overly-paranoid, however in passing noticed the code in libc is genuinely invalid. The memset writes a zero to the header byte, changing it from type_invalid (1) to type_vendor (0), at which point the GPU is free to read the 64 byte packet and interpret it as a vendor packet, which is probably why libc CI periodically errors about invalid packets. Also a drive by change to do the atomic store on a uint32_t consistently. I'm not sure offhand what __atomic_store_n on a uint16_t* and an int resolves to, seems better to be unambiguous there.	2023-10-30 18:35:52 +00:00
Konstantinos Parasyris	d6a3d6b96d	[openmp] Fixed Support for VA for record-replay. (#70396 ) The commit was discussed in phabricator (https://reviews.llvm.org/D157186). Record replay currently fails on AMD as it conflicts with the heap memory allocator introduced in #69806. The workaround is setting `LIBOMPTARGET_HEAP_SIZE=0` during both record and replay run.	2023-10-29 12:27:19 -07:00
Johannes Doerfert	d346c82435	[OpenMP] Associate the KernelEnvironment with the GenericKernelTy (#70383 ) By associating the kernel environment with the generic kernel we can access middle-end information easily, including the launch bounds ranges that are acceptable. By constraining the number of threads accordingly, we now obey the user-provided bounds that were passed via attributes.	2023-10-29 11:35:34 -07:00
Jon Chesterfield	840d0b7e03	[amdgpu] D2D memcpy via streams and HSA (#69977 ) hsa_amd_memory_async_copy can handle device to device copies if passed the corresponding parameters. No functional change - currently D2D copy goes through a fallback in libomptarget that stages through a host malloc, after this it goes directly through HSA. Works under exactly the situations that HSA works. Verified locally on a performance benchmark. Hoping to attract further testing from internal developers after it lands.	2023-10-24 00:05:04 +01:00
Johannes Doerfert	d3921e4670	[OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806 ) The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing https://github.com/llvm/llvm-project/issues/66708	2023-10-21 14:49:30 -07:00
Joseph Huber	34a3fb9f62	[Libomptarget][NFC] Remove use of VLA in the AMDGPU plugin (#69761 ) Summary: We should not rely on a VLA in C++ for the handling of this string. The size is a true runtime value so we cannot rely on constexpr handling. We simply use a small vector, whose default size is most likely large enough to handle whatever size gets output within the stack, but is safe in cases where it is not.	2023-10-20 16:02:51 -04:00
Ye Luo	8c2da6bb7f	[libomptarget] document ActionFunctions in the amdgpu plugin. (#66397 )	2023-09-14 12:18:49 -05:00
Ye Luo	6c8248e38b	[libomptarget] Rename AMDGPUSignalTy member Signal to HSASignal.	2023-09-11 22:42:34 -05:00
Ye Luo	08352b99a4	[libomptarget][NFC] update comments.	2023-09-11 22:30:25 -05:00
Joseph Huber	460840c09d	[OpenMP] Support 'omp_get_num_procs' on the device (#65501 ) Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.	2023-09-06 13:45:05 -05:00
Saiyedul Islam	f616c3eeb4	[OpenMP][DeviceRTL][AMDGPU] Support code object version 5 Update DeviceRTL and the AMDGPU plugin to support code object version 5. Default is code object version 4. CodeGen for __builtin_amdgpu_workgroup_size generates code for cov4 as well as cov5 if -mcode-object-version=none is specified. DeviceRTL compilation passes this argument via Xclang option to generate abi-agnostic code. Generated code for the above builtin uses a clang control constant "llvm.amdgcn.abi.version" to branch on the abi version, which is available during linking of user's OpenMP code. Load of this constant gets eliminated during linking. AMDGPU plugin queries the ELF for code object version and then prepares various implicitargs accordingly. Differential Revision: https://reviews.llvm.org/D139730 Reviewed By: jhuber6, yaxunl	2023-08-29 06:35:44 -05:00
Joseph Huber	06adac8c4e	[Libomptarget] Configure the RPC port count from the plugin This patch allows us to configure the port count to what the specific card would desire for parallelism. For AMDGPU we need to use the maximum number of hardware parallelism to avoid deadlocks. For NVPTX we don't have this problem due to the friendlier scheduler, so we use the number of warps active on an SM times the number of SMs as a good guess. Note that the max ports currently is going to be smaller than these numbers. That will be improved in the future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155903	2023-08-11 12:54:47 -05:00
Michael Halkenhaeuser	7eba3e58d5	[OpenMP][AMDGPU] Add Envar for controlling HSA busy queue tracking If the Envar is set to true (default), busy HSA queues will be actively avoided when assigning a queue to a Stream. Otherwise, we will initialize a new HSA queue for each requested Stream, then default to round robin once the set maximum has been reached. Reviewed By: jdoerfert, kevinsala Differential Revision: https://reviews.llvm.org/D156996	2023-08-07 10:48:02 -04:00
Kevin Sala	b8e297d1af	[OpenMP][libomptarget] Improve kernel initialization in plugins This patch modifies the plugins so that the initialization of KernelTy objects is done in the init method. Part of the initialization was done in the constructKernelEntry method. Now this method is called constructKernel and only allocates and constructs a KernelTy object. This patch prepares the kernel class for the new implementation of device reductions. Differential Revision: https://reviews.llvm.org/D156917	2023-08-06 11:53:58 +02:00
Kevin Sala	4f46a48aaf	[OpenMP][libomptarget] Remove unused virtual functions in GenericKernelTy The virtual functions getDefaultNumBlocks and getDefaultNumThreads from the kernels are only forwarding the call to the generic device's ones. This patch removes those two functions from the kernels (and their derived ones). Now calls are made to the device's functions directly. Differential Revision: https://reviews.llvm.org/D156905	2023-08-02 17:18:50 +02:00
Michael Halkenhaeuser	5b19f42b63	[OpenMP][AMDGPU] Single eager resource init + HSA queue utilization tracking This patch lazily initializes queues/streams/events since their initialization might come at a cost even if we do not use them. To further benefit from this, AMDGPU/HSA queue management is moved into the AMDGPUStreamManager of an AMDGPUDevice. Streams may now use different HSA queues during their lifetime and identify busy queues. When a Stream is requested from the resource manager, it will search for and try to assign an idle queue. During the search for an idle queue the manager may initialize more queues, up to the set maximum (default: 4). When no idle queue could be found: resort to round robin selection. With contributions from Johannes Doerfert <johannes@jdoerfert.de> Depends on D156245 Reviewed By: kevinsala Differential Revision: https://reviews.llvm.org/D154523	2023-08-02 08:22:26 -04:00
Kevin Sala	523ac0fcdf	[OpenMP][libomptarget] Retrieve multiple resources from resource managers This patch extends the plugin resource managers to return more than one resource per call. The return function is not extended since we do not return more than one resource anywhere. Differential Revision: https://reviews.llvm.org/D155629	2023-07-28 00:37:08 +02:00
Kevin Sala	53e4c7c309	[OpenMP][libomptarget] Improving plugin resource managers This patch improves the resource managers in the plugins by properly handling the errors. Until now, errors when creating and destroying resources were not propagated and were directly handled inside the resource managers. Now, all errors are propagated as in the rest of the plugin infrastructure. The code is now ready to implement the request/return of multiple resources in a single getResource/returnResource call. Differential Revision: https://reviews.llvm.org/D155621	2023-07-28 00:37:08 +02:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Shilei Tian	6bd74fd65f	Revert commits for kernel environment This reverts commits for kernel environments as they causes issues in AMD BB.	2023-07-23 23:32:31 -04:00
Shilei Tian	c5c8040390	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-23 18:36:01 -04:00
Shilei Tian	763fdb1ffa	[OpenMP][Plugin] Update the global address calculation Current global address caculation doesn't work for AMDGPU in some cases (https://reviews.llvm.org/D142569#4506212). The root cause is the `sh_addr` is not substracted when caculating the address. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155886	2023-07-23 17:41:06 -04:00
Joseph Huber	8a0763f19c	[Libomptarget] Remove RPCHandleTy indirection The 'RPCHandleTy' was intended to capture the intention that a specific device owns its slot in the RPC server. However, this required creating a temporary store to hold these pointers. This was causing really weird spurious failure due to undefined behaviour in the order of library teardown. For example, the x64 plugin would be torn down, set this to some invalid memory, and then the CUDA plugin would crash. Rather than spend the time to fully diagnose this problem I found it pertinent to simply remove the failure mode. This patch removes this indirection so now the usage of the RPC server must always be done with the intended device. This just requires some extra handling for the AMDGPU indirection where we need to store a reference to the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154971	2023-07-11 10:54:40 -05:00
Michael Halkenhaeuser	142faf56f5	[OpenMP] [OMPT] [amdgpu] [5/8] Implemented device init/fini/load callbacks Added support in the generic plugin to invoke registered callbacks. Depends on D124070 Patch from John Mellor-Crummey <johnmc@rice.edu> (With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>) Differential Revision: https://reviews.llvm.org/D124652	2023-07-11 07:13:22 -04:00
Joseph Huber	691dc2d10d	[Libomptarget] Begin implementing support for RPC services This patch adds the intial support for running an RPC server in libomptarget to handle host services. We interface with the library provided by the `libc` project to stand up a basic server. We introduce a new type that is controlled by the plugin and has each device intialize its interface. We then run a basic server to check the RPC buffer. This patch does not fully implement the interface. In the future each plugin will want to define special handlers via the interface to support things like malloc or H2D copies coming from RPC. We will also want to allow the plugin to specify t he number of ports. This is currently capped in the implementation but will be adjusted soon. Right now running the server is handled by whatever thread ends up doing the waiting. This is probably not a completely sound solution but I am not overly familiar with the behaviour of OpenMP tasks and what would be required here. This works okay with synchrnous regions, and somewhat fine with `nowait` regions, but I've observed some weird behavior when one of those regions calls `exit`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D154312	2023-07-07 12:36:46 -05:00
Joseph Huber	071c8a41cc	[Libomptarget] Fix tests after deleting the next-gen plugins The next-gen plugins didn't correctly configure tests and were never actually being run. Since deleting the old plugin we stopped getting `libomptarget` tests. This patch fixes the issue and allows the targets to be built Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154619	2023-07-06 10:44:50 -05:00
Joseph Huber	e90ab9148b	[OpenMP] Delete old plugins It's time to remove the old plugins as the next-gen has already been set to default in LLVM 16. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D142820	2023-07-05 17:39:47 -05:00
Joseph Huber	6764301a6b	[Libomptarget] Correctly implement `getWTime` on AMDGPU AMDGPU provides a fixed frequency clock since some generations back. However, the frequency is variable by card and must be looked up at runtime. This patch adds a new device environment line for the clock frequency so that we can use it in the same way as NVPTX. This is the correct implementation and the version in ASO should be replaced. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D154456	2023-07-04 21:50:43 -05:00
Johannes Doerfert	2e3c6c3f80	[OpenMP][NFC] Eliminate warning	2023-05-18 13:27:43 -07:00
Joseph Huber	b09953a4a3	[Libomptarget] Fix AMDGPU Note handling after D150022 Summary: The changes in https://reviews.llvm.org/D150022 changed the API for this function that we query. Simply pass in the alignment from the associated header to fix.	2023-05-10 14:12:39 -05:00

1 2

88 Commits