clang-p2996

Author	SHA1	Message	Date
Joseph Huber	ba192debb4	[Libomptarget][Obvious] Fix typo in attribute lookup Summary: These are keys into the AMDGPU target metadata. One of them had a typo which prevented it from being extracted.	2023-12-20 19:03:35 -06:00
Joseph Huber	f324584ae3	[Libomptarget][NFCI] Remove caching of created ELF files (#76080 ) Summary: We currently keep a cache of created ELF files from the relevant images. This shouldn't be necessary as the entire ELF interface is generally trivially constructable and extremely cheap. The cost of constructing one of these objects is simply a size check and writing a pointer to the underlying data. Given that, keeping a cache of these images should not be necessary overall.	2023-12-20 17:13:41 -06:00
Joseph Huber	e4f4022b70	[Libomptarget][NFC] Fix linting warnings in the plugins Summary: Fix some linting warnings present in the plugins.	2023-12-20 10:07:34 -06:00
Joseph Huber	ac029e02a9	[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720 ) Summary: This patch reorganizes a lot of the code used to check for compatibility with the current environment. The main bulk of this patch involves moving from using a separate `__tgt_image_info` struct (which just contains a string for the architecture) to instead simply checking this information from the ELF directly. Checking information in the ELF is very inexpensive as creating an ELF file is simply writing a base pointer. The main desire to do this was to reorganize everything into the ELF image. We can then do the majority of these checks without first initializing the plugin. A future patch will move the first ELF checks to happen without initializing the plugin so we no longer need to initialize and plugins that don't have needed images. This patch also adds a lot more sanity checks for whether or not the ELF is actually compatible. Such as if the images have a valid ABI, 64-bit width, executable, etc.	2023-12-19 20:01:31 -06:00
Shilei Tian	3768039913	[OpenMP] Directly use user's grid and block size in kernel language mode (#70612 ) In kernel language mode, use user's grid and blocks size directly. No validity check, which means if user's values are too large, the launch will fail, similar to what CUDA and HIP are doing right now.	2023-12-18 12:26:18 -05:00
Joseph Huber	913622d012	[Libomptarget] Remove remaining global constructors in plugins (#75814 ) Summary: This patch fixes the remaining global constructor in the plguins after addressing the ones in the JIT interface. This struct was mistakenly using global constructors as not all the members were being initialized properly. This was almost certainly being optimized out because it's trivial, but would still be present in debug builds and prevented us from compiling with `-Werror=global-constructors`. We will want to do that once offloading is moved to a runtimes only build.	2023-12-18 11:01:02 -06:00
Joseph Huber	1580877555	[Libomptarget] Remove bitcode image map used for JIT processing (#75672 ) Summary: Libomptarget supports JIT by treating an LLVM-IR file as a regular input image. The handling here used a global map to keep track of triples once it was parsed. This was done to same time, however this created a global constructor as well as an extra mutex to handle it. This patch removes the use of this map. Instead, we simply use the file magic to perform a quick check if the input image is valid bitcode. If not, we then create a lazy module. This should roughly equivalent to the old handling that create an IR symbol table. Here we can prevent the module from materializing everything but the single triple metadata we read in later.	2023-12-18 09:28:06 -06:00
Kazu Hirata	b8f89b84bc	Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-16 15:02:17 -08:00
Joseph Huber	0ab663d202	[Libomptarget] Move ELF symbol extraction to the ELF utility (#74717 ) Summary: We shouldn't have the format specific ELF handling in the generic plugin manager. This patch moves that out of the implementation and into the ELF utilities. This patch changes the SHT_NOBITS case to be a hard error, which should be correct as the existing use already seemed to return an error if the result was a null pointer. This also uses a `const_cast`, which is bad practice. However, rebuilding the `constness` of all of this would be a massive overhaul, and this matches the previous behaviour (We would take a pointer to the image that is most likely read-only in the ELF).	2023-12-14 11:04:13 -06:00
Johannes Doerfert	12cbccc312	[OpenMP] Add extra flags to libomptarget and plugin builds (#74520 )	2023-12-11 10:41:50 -08:00
Johannes Doerfert	0ace6ee73a	[OpenMP][FIX] Ensure we do not read outside the device image (#74669 ) Before we expected all symbols in the device image to be backed up with data that we could read. However, uninitialized values are not. We now check for this case and avoid reading random memory. This also replaces the correct readGlobalFromImage call with a isSymbolInImage check after https://github.com/llvm/llvm-project/pull/74550 picked the wrong one. Fixes: https://github.com/llvm/llvm-project/issues/74582	2023-12-06 14:57:57 -08:00
Joseph Huber	6f3bd3a2f6	[Libomptarget] Add a utility function for checking existence of symbols (#74550 ) Summary: There are now a few cases that check if a symbol is present before continuing, effectively making them optional features if present in the image. This was done in at least three locations and required an ugly operation to consume the error. This patch makes a utility function to handle that instead.	2023-12-06 07:41:27 -06:00
Johannes Doerfert	68db7aef74	[OpenMP] Reorganize the initialization of `PluginAdaptorTy` (#74397 ) This introduces checked errors into the creation and initialization of `PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the user if the initialization failed. The new organization avoids the "initOnce" stuff but we still do not eagerly initialize the plugin devices (I think we should merge `PluginAdaptorTy::initDevices` into `PluginAdaptorTy::init`)	2023-12-05 16:04:01 -08:00
Johannes Doerfert	9f87509b19	[OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532 ) This fixes two bugs and adds a test for them: - A shared library with declare target functions but without kernels should not error out due to missing globals. - Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of indirect declare targets.	2023-12-05 15:25:10 -08:00
Johannes Doerfert	5fe741f08e	[OpenMP] Separate Requirements into a standalone header (#74126 ) This is not completely NFC since we now check all 4 requirements and the test is checking the good and the bad case for combining flags.	2023-12-01 14:47:00 -08:00
dhruvachak	ca2d79f9ca	[OpenMP] Add an INFO message for data transfer of kernel launch env. (#74030 )	2023-12-01 10:58:23 -08:00
Jon Chesterfield	f184147706	[amdgpu] Default to 1.0, instead of unspecified, for dynamic hsa (#74098 ) The plugin checks the values of HSA_AMD_INTERFACE_VERSION_* so we now set them to something safe in the header.	2023-12-01 16:37:49 +00:00
Johannes Doerfert	148dec9fa4	[OpenMP][NFC] Separate Envar (environment variable) handling (#73994 )	2023-11-30 15:23:34 -08:00
Johannes Doerfert	2e7f47d4a8	[OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868 )	2023-11-29 16:04:19 -08:00
Johannes Doerfert	fae233c63f	[OpenMP] Avoid initializing the KernelLaunchEnvironment if possible (#73864 ) If we don't have a team reduction we don't need a kernel launch environment (for now). In that case we can avoid the cost.	2023-11-29 14:49:13 -08:00
Johannes Doerfert	e2299e8d9d	[OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718 )	2023-11-29 08:29:41 -08:00
Johannes Doerfert	db96a9c3b7	[OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725 ) For historic reasons we had it setup that there was ` plugin-nextgen/common/PluginInterface/<sources + headers>` which is not what we do anywhere else. Now it looks like the rest: ``` plugin-nextgen/common/include/<headers> plugin-nextgen/common/src/<sources> ``` As part of this, `dlwrap.h` was moved into common/include (as `DLWrap.h`) since it is exclusively used by the plugins.	2023-11-29 07:57:01 -08:00
Jan Patrick Lehr	3930a0b57a	[OpenMP][libomptarget] Use two SDMA engines (#73633 ) Limit the use to two SDMA engines which are optimized for such transfers.	2023-11-29 14:21:44 +01:00
Johannes Doerfert	7233e42dff	[OpenMP][NFC] Move Environment.h and SourceInfo.h into "Shared" folder (#73703 )	2023-11-28 15:10:06 -08:00
Johannes Doerfert	8327f4a851	[OpenMP][NFC] Move Utils.h and Debug.h into a "Shared" include folder (#73701 ) Headers used throughout the different runtimes are different from the internal headers. This is a first step to bring structure in into the include folder.	2023-11-28 13:44:57 -08:00
Johannes Doerfert	0783bf1cb3	[OpenMP][NFC] Merge MemoryManager into PluginInterface (#73678 ) Similar to #73677, there is no benefit from keeping MemoryManager seperate; it's tied into the current design. Except the move I also replaced the getenv call with our Env handling.	2023-11-28 10:17:51 -08:00
Johannes Doerfert	4667dd62ee	[OpenMP][NFC] Merge elf_common into PluginInterface (#73677 ) The overhead of a library and 4 files seems high without benefit. This simply tries to consolidate our structure.	2023-11-28 10:03:25 -08:00
Joseph Huber	71e3082d85	[OpenMP] Enable position independent code for libomptarget Summary: This option used to be passed manually by the `-fPIC` option that was always enabled by the LLVM flags. Since we now do this manually we want to specify that these are supposed for use fPIC code.	2023-11-27 14:51:48 -06:00
Johannes Doerfert	7bfcce3e94	[OpenMP] Tear down GenericDeviceTy's with GenericPluginTy (#73557 ) There is no point in keeping GenericDeviceTy objects alive longer than the associated GenericPluginTy. Instead of the old API we now tear them down with the plugin, avoiding ordering issues.	2023-11-27 11:42:12 -08:00
Johannes Doerfert	2b2e711afc	[OpenMP][NFC] Remove no-op __tgt_rtl_deinit_plugin The order in which we deinit things, especially when shared libraries are involved, is complicated. To simplify our lives the nextgen plugin deinitializes the GenericPluginTy and subclasses automatically. The old __tgt_rtl_deinit_plugin is not needed anymore.	2023-11-27 11:07:57 -08:00
Johannes Doerfert	f48c4d8aa1	[OpenMP] Be more forgiving during record and replay When we record and replay kernels we should not error out early if there is a chance the program might still run fine. This patch will: 1) Fallback to the allocation heuristic if the VAMap doesn't work. 2) Adjust the memory start to match the required address if possible. 3) Adjust the (guessed) pointer arguments if the memory start adjustment is impossible. This will allow kernels without indirect accesses to work while indirect accesses will most likely fail.	2023-11-20 17:15:34 -08:00
Johannes Doerfert	41566fb852	[OpenMP][FIX] Ensure recording works properly w/ late allocations	2023-11-20 17:15:33 -08:00
Johannes Doerfert	6663df30c0	[OpenMP][NFC] Remove std::move to silence warnings	2023-11-20 17:15:33 -08:00
Joseph Huber	47a3ad5be1	[Libomptarget] Handle dynamic stack sizes for AMD COV5 (#72606 ) Summary: One of the changes in the AMD code-object version five was that kernels that use an unknown amount of private stack memory now no longer default to 16 KBs. Instead it emits a flag that indicates the runtime must provide a value. This patch checks if we must provide such a stack, and uses the existing handling of the stack environment variable to configure it.	2023-11-20 12:48:42 -06:00
Jan Patrick Lehr	5c22b907dc	Reland [OpenMP][libomptarget] Enable parallel copies via multiple SDM… (#72307 ) …A engines (#71801) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 21:30:04 +01:00
Joseph Huber	cc9e19ee59	Revert "[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 )" This causes the tests to fail because the bots were not updated in time. Revert until we update the bots to a valid version. This reverts commit `e876250b63`.	2023-11-14 12:34:27 -06:00
Jan Patrick Lehr	e876250b63	[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 ) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 19:16:39 +01:00
Joseph Huber	237adfca4e	[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739 ) Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549	2023-11-10 14:53:53 -06:00
Konstantinos Parasyris	b34d31d2e1	[OpenMP] Fix record-replay allocation order for kernel environment (#71863 )	2023-11-09 12:51:22 -08:00
Saiyedul Islam	21861991e7	[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234 ) Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses already available global variable "oclc_ABI_version" instead of "llvm.amdgcn.abi.verion". It also adds some minor fields in ImplicitArg structure.	2023-11-09 10:34:35 +05:30
Johannes Doerfert	002f422410	[OpenMP] Replace CUDART_VERSION with CUDA_VERSION	2023-11-06 12:30:40 -08:00
Johannes Doerfert	726ee40f52	[OpenMP] Move the recording code to account for KernelLaunchEnvironment We need to record late to account for the kernel launch environment as well as the potential changes in block and thread count.	2023-11-06 12:30:40 -08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Konstantinos Parasyris	d301a28950	[OpenMP] Guard Virtual Memory Management API and Types (#70986 )	2023-11-03 16:24:18 -07:00
Neale Ferguson	1111ef0257	Add openmp support to System z (#66081 ) * openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get\|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x	2023-11-03 12:42:55 +01:00
Jon Chesterfield	f0e100a05a	[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987 ) If you build with dynamic_hsa, the symbol is known and compilation succeeds. If you then run with a slightly older libhsa, this argument is not recognised and an error returned. I'd rather the program runs with a misleading omp wtime than refuses to run at all.	2023-11-01 22:43:34 +00:00
Johannes Doerfert	a273d17d4a	[OpenMP][FIX] Do not add implicit argument to device Ctors and Dtors Constructors and destructors on the device do not take any arguments, also not the implicit dyn_ptr argument other kernels automatically take.	2023-11-01 11:18:11 -07:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Jon Chesterfield	896749aa0d	[amdgpu][openmp] Avoiding writing to packet header twice (#70695 ) I think it follows from the HSA spec that a write to the first byte is deemed significant to the GPU in which case writing to the second short and reading back from it later would be safe. However, the examples for this all involve an atomic write to the first 32 bits and it seems a credible risk that the occasional CI errors abound invalid packets have as their root cause that the firmware notices the early write to packet->setup and treats that as a sign that the packet is ready to go. That was overly-paranoid, however in passing noticed the code in libc is genuinely invalid. The memset writes a zero to the header byte, changing it from type_invalid (1) to type_vendor (0), at which point the GPU is free to read the 64 byte packet and interpret it as a vendor packet, which is probably why libc CI periodically errors about invalid packets. Also a drive by change to do the atomic store on a uint32_t consistently. I'm not sure offhand what __atomic_store_n on a uint16_t* and an int resolves to, seems better to be unambiguous there.	2023-10-30 18:35:52 +00:00

1 2 3 4

193 Commits