clang-p2996

Author	SHA1	Message	Date
Johannes Doerfert	7233e42dff	[OpenMP][NFC] Move Environment.h and SourceInfo.h into "Shared" folder (#73703 )	2023-11-28 15:10:06 -08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Jan Patrick Lehr	07f5cf1992	[OpenMP][libomptarget] Fixes possible no-return warning (#70808 ) The UNREACHABLE macro resolves to message + trap, which may still warn, so we add call to __builtin_unreachable.	2023-11-06 16:45:03 +01:00
Johannes Doerfert	d3e7a48cbd	[OpenMP][NFC] Remove a no-op function	2023-11-03 10:28:36 -07:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Johannes Doerfert	d3921e4670	[OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806 ) The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing https://github.com/llvm/llvm-project/issues/66708	2023-10-21 14:49:30 -07:00
Johannes Doerfert	d571af7f62	[OpenMP][FIX] Ensure thread states do not crash on the GPU The nested parallelism causes thread states which still do not properly work but at least don't crash anymore.	2023-10-21 14:43:09 -07:00
Johannes Doerfert	1cea309b7e	[OpenMP][NFC] Move DebugKind to make it reusable from the host	2023-10-20 19:28:09 -07:00
Joseph Huber	b69081e324	Attributes (#69358 ) - [Libomptarget] Make the references to 'malloc' and 'free' weak. - [Libomptarget][NFC] Use C++ style attributes instead	2023-10-18 12:52:43 -04:00
Joseph Huber	460840c09d	[OpenMP] Support 'omp_get_num_procs' on the device (#65501 ) Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.	2023-09-06 13:45:05 -05:00
Joseph Huber	aa78e94b0b	[Libomptarget] Support mapping indirect host calls to device functions The changes in D157738 allowed for us to emit stub globals on the device in the offloading entry section. These globals contain the addresses of device functions and allow us to map host functions to their corresponding device equivalent. This patch provides the initial support required to build a table on the device to lookup the associated value. This is done by finding these entries and creating a global table on the device that can be searched with a simple binary search. This requires an allocation, which supposedly should be automatically freed at plugin shutdown. This includes a basic test which looks up device pointers via a host pointer using the added function. This will need to be built upon to provide full support for these calls in the runtime. To support reverse offloading it would also be useful to provide a reverse table that allows us to get host functions from device stubs. Depends on D157738 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D157918	2023-08-25 18:51:56 -05:00
Johannes Doerfert	ed16143593	[OpenMP][FIX] Ensure __assert_fail is compatible with the host Fixes: https://github.com/llvm/llvm-project/issues/64360	2023-08-04 11:36:58 -07:00
Joseph Huber	46642cc83d	[Libomptarget] Remove debug RAII from libomptarget This feature was supposed to allow you to trace execution inside of Libomptarget. However, this never really worked properly. The printing was always reoganized, only worked for single threads, and pretty much only told you a handful of things about a runtime library that's an implementation detail to all users. Despite this, it contributed about 40% of the total filesize of the deviceRTL. This patch simply removes this functionalit which I think was past due. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D157001	2023-08-03 09:37:47 -05:00
Johannes Doerfert	1f3a28d4e5	[OpenMP][NFC] Reorganize the ompx::mapping layer in the GPU runtime This change makes the naming more consistent, I hope.	2023-07-31 13:44:51 -07:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Shilei Tian	6bd74fd65f	Revert commits for kernel environment This reverts commits for kernel environments as they causes issues in AMD BB.	2023-07-23 23:32:31 -04:00
Shilei Tian	c5c8040390	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-23 18:36:01 -04:00
Johannes Doerfert	f914208c43	[OpenMP][NFCI] Avoid storing non-constant values in ICV If we store a constant in an ICV it is easier for the optimizer to propagate it. Since we often use the full block for the thread limit and the parallel team size, we can instead replace that dynamic value with a constant that otherwise cannot occur, here 0.	2023-07-18 16:50:50 -07:00
Johannes Doerfert	88a68de14c	[OpenMP][NFCI] Split assertion message from assertion expression We ended up with `llvm.assume(icmp ne ptr as(4) null, as(4) @str)` because the string in address space 4 was not known to be non-null. There is no need to create these assumes.	2023-07-18 16:50:50 -07:00
Joseph Huber	6764301a6b	[Libomptarget] Correctly implement `getWTime` on AMDGPU AMDGPU provides a fixed frequency clock since some generations back. However, the frequency is variable by card and must be looked up at runtime. This patch adds a new device environment line for the clock frequency so that we can use it in the same way as NVPTX. This is the correct implementation and the version in ASO should be replaced. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D154456	2023-07-04 21:50:43 -05:00
Dhruva Chakrabarti	6a1d1f7eef	[OpenMP] Added memory scope to atomic::inc API and used the device scope in reduction. With https://reviews.llvm.org/D137524, memory scope and ordering attributes are being used to generate the required instructions for atomic inc/dec on AMDGPU. This patch adds the memory scope attribute to the atomic::inc API and uses the device scope in reduction. Without the device scope in atomic_inc, the default system scope leads to unnecessary L2 write-backs/invalidates. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154172	2023-06-30 15:05:01 -04:00
Doru Bercea	04609b09e9	Enable up to 64 arguments for outlined regions in OpenMP device code. Co-Author: Fabio Luporini <fabio@devitocodes.com> Review: https://reviews.llvm.org/D150134	2023-05-24 10:31:39 -04:00
Joseph Huber	47800a12dc	[OpenMP][NFC] clang-format the OpenMP device runtime These files aren't fully formatted. I'm guessing this was a holdover from when `clang-format` was totally broken for OpenMP offloading. Format the files to be more consistent. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D151226	2023-05-23 11:19:09 -05:00
Shilei Tian	d4ecd1241c	Revert "[OpenMP] Introduce kernel environment" This reverts commit `35cfadfbe2`. It makes a couple of buildbots unhappy because of the following test failures: - `Transforms/OpenMP/add_attributes.ll'` - `mapping/declare_mapper_target_data.cpp` on AMDGPU	2023-04-22 20:56:35 -04:00
Shilei Tian	35cfadfbe2	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-04-22 20:46:38 -04:00
Johannes Doerfert	67fed132f3	[OpenMP] Ensure memory fences are created with barriers for AMDGPUs It turns out that the __builtin_amdgcn_s_barrier() alone does not emit a fence. We somehow got away with this and assumed it would work as it (hopefully) is correct on the NVIDIA path where we just emit a __syncthreads. After talking to @arsenm we now (mostly) align with the OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs. It seems this was the underlying cause for #59759, but I am not 100% certain. There is a chance this simply hides the problem. Fixes: https://github.com/llvm/llvm-project/issues/59759 [1] `07b347366e/opencl/src/workgroup/wgbarrier.cl (L21)`	2023-04-17 15:27:17 -07:00
Rafael A. Herrera Guaitero	64549f0903	[OpenMP][5.1] Fix parallel masked is ignored #59939 Code generation support for 'parallel masked' directive. The `EmitOMPParallelMaskedDirective` was implemented. In addition, the appropiate device functions were added. Fix #59939. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D143527	2023-04-03 20:33:55 +00:00
Ye Luo	ead2d86ee9	Revert "[OpenMP] Ensure memory fences are created with barriers for AMDGPUs" This reverts commit `36d6217c4e`.	2023-03-24 21:10:03 -05:00
Ye Luo	36d6217c4e	[OpenMP] Ensure memory fences are created with barriers for AMDGPUs It turns out that the `__builtin_amdgcn_s_barrier()` alone does not emit a fence. We somehow got away with this and assumed it would work as it (hopefully) is correct on the NVIDIA path where we just emit a `__syncthreads`. After talking to @arsenm we now (mostly) align with the OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs. It seems this was the underlying cause for #59759, but I am not 100% certain. There is a chance this simply hides the problem. Fixes: https://github.com/llvm/llvm-project/issues/59759 [1] `07b347366e/opencl/src/workgroup/wgbarrier.cl (L21)` Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D145290	2023-03-24 20:36:51 -05:00
Joseph Huber	cfd18167c8	Revert "[Libomptarget] Use freestanding stdint.h header for DeviceRTL" This patch breaks the handling of `printf` in the OpenMP library. Usiing `-ffreestanding` prevents clang from emitting LLVM builtins, which we use for OpenMP printing support. Shelve this until we have functioning `printf` in the GPU `libc` and we can remove that code. This reverts commit `a92eaa3ebe`.	2023-03-13 14:17:05 -05:00
Joseph Huber	a92eaa3ebe	[Libomptarget] Use freestanding stdint.h header for DeviceRTL The `stdint.h` header provides the standard types. Previously we used `-nostdinc` and defined these ourselves. This patch switches to a freestanding version which should work properly. Without `-ffreestanding` the `stdint.h` header will include other libraries. But in a freestanding environment it should work given the primitives. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D145963	2023-03-13 12:32:58 -05:00
Shilei Tian	693358d787	[OpenMP][DeviceRTL][NFC] Use `OMPTgtExecModeFlags` from `llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h` This patch makes preparation for a series that will enable per-kernel information used in both host and device runtime. Some variables/enums, such as `OMPTgtExecModeFlags`, have to be shared by both of them. A new header `OMPDeviceConstants.h` is added, containing code that will be shared by them. We will introduce more variables soon. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142320	2023-01-22 19:10:54 -05:00
Shilei Tian	18959be84d	[OpenMP][DeviceRTL] Fix the support for tasking on the device This patch fixes the support for tasking on the device. Note: AMDGPU doesn't support it yet because of no support for `malloc` and `free`. Fix #59946. ``` ➜ ./test_parallel_master_device [OMPVV_RESULT: test_parallel_master_device.c] Test passed on the device. ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D141562	2023-01-11 23:50:35 -05:00
Johannes Doerfert	2b5a99b3d9	[OpenMP] Rename the `_OMP` namespace in the device runtime to `ompx` Differential Revision: https://reviews.llvm.org/D140334	2022-12-19 14:43:59 -08:00
Johannes Doerfert	90609fb68f	[OpenMP][NFCI] Remove effectively dead code in clang and the runtime Differential Revision: https://reviews.llvm.org/D136903	2022-12-13 18:44:19 -08:00
Johannes Doerfert	f9c29878b0	Revert "[OpenMP][NFCI] Remove effectively dead code in clang and the runtime" This reverts commit `c1c8cbbf5f`. One of the tests seems to be flaky/non-deterministic.	2022-12-12 22:08:28 -08:00
Johannes Doerfert	c1c8cbbf5f	[OpenMP][NFCI] Remove effectively dead code in clang and the runtime	2022-12-12 20:55:36 -08:00
Joseph Huber	9223315903	[DeviceRTL] Allow IsSPMDMode to be optimized out in LTO mode A previous patch merged the static and bitcode versions of the deviceRTL. We previously used the static library's separate compilation to set a special flag that prevented `IsSPMDMode` from being put in the used list and preventing it from being optimized out. When they were merged we could no longer do this separate compilation that allowed users of LTO to get more optimal code. This patch rearranges the code. The `IsSPMDMode` global is now transitively used by its inclusion in the changed `__keep_alive` function. This allows us to then manually delete the `__keep_alive` function from the module when building the static library via `llvm-extract`. The result is that the bitcode library correctly will maintain the needed shared state, while the static library will be able to internalize it and optimize it out. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D135280	2022-10-05 14:40:01 -05:00
Johannes Doerfert	f8ee045c6d	[OpenMP] Eliminate the ThreadStates array in favor of indirection If we have thread states, the program is going to be rather slow. If we don't, we want to avoid wasting shared memory. This patch introduces a slight penalty (malloc + indirection) for the slow path and reduces resource usage for the fast path. Differential Revision: https://reviews.llvm.org/D135037	2022-10-04 20:27:34 -07:00
Johannes Doerfert	b113965073	[OpenMP] Introduce more atomic operations into the runtime We should use OpenMP atomics but they don't take variable orderings. Maybe we should expose all of this in the header but that solves only part of the problem anyway. Differential Revision: https://reviews.llvm.org/D135036	2022-10-04 20:20:55 -07:00
Johannes Doerfert	f85c1f3b7c	[OpenMP] Replace __ATOMIC_XYZ with atomic::xyz for style Also fixes one ordering argument not used. Differential Revision: https://reviews.llvm.org/D135035	2022-10-04 19:43:30 -07:00
Johannes Doerfert	abbc3fa17b	[OpenMP] Replace pointer comparison with `isSharedMemPtr` check The pointer comparison was causing confusion for capture tracking, let's avoid confusion. Differential Revision: https://reviews.llvm.org/D135160	2022-10-04 19:24:22 -07:00
Dhruva Chakrabarti	839ac62c50	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `7539e9cf81`.	2022-09-15 03:08:46 +00:00
Giorgis Georgakoudis	7539e9cf81	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6, ABataev Differential Revision: https://reviews.llvm.org/D102107	2022-09-15 00:54:05 +00:00
Joseph Huber	2b8f722e63	[OpenMP] Add option to assert no nested OpenMP parallelism on the GPU The OpenMP device runtime needs to support the OpenMP standard. However constructs like nested parallelism are very uncommon in real application yet lead to complexity in the runtime that is sometimes difficult to optimize out. As a stop-gap for performance we should supply an argument that selectively disables this feature. This patch adds the `-fopenmp-assume-no-nested-parallelism` argument which explicitly disables the usee of nested parallelism in OpenMP. Reviewed By: carlo.bertolli Differential Revision: https://reviews.llvm.org/D132074	2022-08-23 14:09:51 -05:00
Shilei Tian	db5a2afa62	[OpenMP][DeviceRTL] Implement libc function `memcmp` We will add some simple implementation of libc functions starting from this patch, and the first one is `memcmp`, which is reported in #56929. Note that `malloc` and `free` are not included in this patch because of the use of `declare variant`. In the near future we will implement the two functions w/o using any vendor provided function. This fixes #56929. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D131182	2022-08-04 14:37:54 -04:00
Joseph Huber	b08369f7f2	Revert "[OpenMP] Remove noinline attributes in the device runtime" The behaviour of this patch is not great, but it has some side-effects that are required for OpenMPOpt to work. The problem is that when we use `-mlink-builtin-bitcode` we only import used symbols from the runtime. Then OpenMPOpt will insert calls to symbols that were not previously included. This patch removed this implicit behaviour as these functions were kept alive by the `noinline` simply because it kept calls to them in the module. This caused regression in some tests that relied on some OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but will try to fix it more correctly on main. This reverts commit `d61d72dae6`. Fixes #56752	2022-07-27 11:09:18 -04:00
Joseph Huber	d61d72dae6	[OpenMP] Remove noinline attributes in the device runtime We previously used the `noinline` attributes to specify some defintions which should be kept alive in the runtime. These were then stripped immediately in the OpenMPOpt module pass. However, Since the changes in D130298, we not explicitly state which functions will have external visiblity in the bitcode library. Additionally the OpenMPOpt module pass should run before the inliner pass, so this shouldn't make a difference in whether or not the functions will be alive for the initial pass of OpenMPOpt. This should simplify the interface, and additionally save time spend on scanning funciton names for noinline. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D130368	2022-07-25 15:44:50 -04:00
Johannes Doerfert	d150152615	[OpenMP] Introduce more fine-grained control over the thread state use We can help optimizations by making sure we use the team state whenever it is clear there is no thread state. To this end we introduce a new state flag (`state::HasThreadState`) and explicit control for the `state::ValueRAII` helpers, including a dedicated "assert equal". Differential Revision: https://reviews.llvm.org/D130113	2022-07-21 12:30:38 -05:00

1 2

99 Commits