clang-p2996

Author	SHA1	Message	Date
Jon Chesterfield	c02b935a9b	[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905 ) Spirv doesn't have implicit conversions between address spaces (at least at present, we might need to change that) and address space qualified *this pointers are not handled well by clang. This commit changes the single instance of the smartstack to be explicitly a singleton, for fractionally simpler IR generation (no this pointer) and to sidestep the work in progress spirv64-- openmp target not being able to compile the original version.	2025-03-18 20:33:24 +00:00
Joseph Huber	ed9107f2d7	[OpenMP] Replace use of target address space with <gpuintrin.h> local (#126119 ) Summary: This definition is more portable since it defines the correct value for the target. I got rid of the helper mostly because I think it's easy enough to use now that it's a type and being explicit about what's `undef` or `poison` is good.	2025-02-09 10:25:25 -06:00
Joseph Huber	bb7ab2557c	[OpenMP] Port the OpenMP device runtime to direct C++ compilation (#123673 ) Summary: This removes the use of OpenMP offloading to build the device runtime. The main benefit here is that we no longer need to rely on offloading semantics to build a device only runtime. Things like variants are now no longer needed and can just be simple if-defs. In the future, I will remove most of the special handling here and fold it into calls to the `<gpuintrin.h>` functions instead. Additionally I will rework the compilation to make this a separate runtime. The current plan is to have this, but make including OpenMP and offloading either automatically add it, or print a warning if it's missing. This will allow us to use a normal CMake workflow and delete all the weird 'lets pull the clang binary out of the build' business. ``` -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=offload -DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa ``` After that, linking the OpenMP device runtime will be `-Xoffload-linker -lomp`. I.e. no more fat binary business. Only look at the most recent commit since this includes the two dependencies (fix to AMDGPUEmitPrintfBinding and the PointerToMember bug).	2025-02-05 08:18:52 -06:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
Joseph Huber	760a786d15	[Clang] Prevent `mlink-builtin-bitcode` from internalizing the RPC client (#118661 ) Summary: Currently, we only use `-mlink-builtin-bitcode` for non-LTO NVIDIA compiliations. This has the problem that it will internalize the RPC client symbol which needs to be visible to the host. To counteract that, I put `retain` on it, but this also prevents optimizations on the global itself, so the passes we have that remove the symbol don't work on OpenMP anymore. This patch does the dumbest solution, adding a special string check for it in clang. Not the best solution, the runner up would be to have a clang attribute for `externally_initialized` because those can't be internalized, but that might have some unfortunate side-effects. Alternatively we could make NVIDIA compilations do LTO all the time, but that would affect some users and it's harder than I thought.	2025-01-27 19:30:59 -06:00
Joseph Huber	f233a54ae8	[OpenMP] Remove usage of pointer-to-member in lookup (#123671 ) Summary: This is buggy and is currently being tracked in https://github.com/llvm/llvm-project/issues/123241. For now, replace it with a macro so that we can use address spaces directly.	2025-01-21 07:50:40 -06:00
Joseph Huber	3274bf6b42	[OpenMP] Make each atomic helper take an atomic scope argument (#122786 ) Summary: Right now we just default to device for each type, and mix an ad-hoc scope with the one used by the compiler's builtins. Unify this can make each version take the scope optionally. For @ronlieb, this will remove the need for `add_system` in the fork as well as the extra `cas` with system scope, just pass `system`.	2025-01-20 21:58:27 -06:00
Joseph Huber	2d9f406943	[OpenMP] Adjust 'printf' handling in the OpenMP runtime (#123670 ) Summary: We used to avoid a lot of this stuff because we didn't properly handle variadics in device code. That's been solved for now, so we can just make an internal printf handler that forwards to the external `vprintf` function. This is either provided by NVIDIA's SDK or by the GPU libc implementation. The main reason for doing this is because it prevents the stupid AMDGPU printf pass from mangling our beautiful printfs!	2025-01-20 21:56:46 -06:00
Joseph Huber	723a3e746a	[OpenMP] Fix mispelled attribute and warning Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.	2025-01-20 08:40:19 -06:00
Joseph Huber	58af82b462	[OpenMP] Remove 'omp assumes' scopes now that we have no inline ASM (#123611 ) Summary: We used this globally scoped `ext_no_call_asm` as a sort of hack around the compiler that allowed the attributor to optimize out inline assembly calls to PTX instructions. Quite some time ago I got rid of every inline assembly call and replaced it with a builitin, so this can just be deleted. Furthermore, I use the `[[omp::assume]]` attribute directly for the aligned barrier usage. This prints an unknown assumption warning (even though it isn't) so I'm just silencing that for now until I fix it later. --------- Co-authored-by: Michael Kruse <github@meinersbur.de>	2025-01-20 08:11:06 -06:00
Joseph Huber	1c00d0d776	[OpenMP] Remove hack around missing atomic load (#122781 ) Summary: We used to do a fetch add of zero to approximate a load. This is because the NVPTX backend didn't handle this properly. It's not an issue anymore so simply use the proper atomic builtin.	2025-01-16 15:17:15 -06:00
Joseph Huber	74d5373f49	[OpenMP] Fix missing type getter for SFINAE helper Summary: This didn't get the type, which made using this always return false.	2025-01-10 19:35:41 -06:00
Joseph Huber	f53cb84df6	[OpenMP] Use __builtin_bit_cast instead of UB type punning (#122325 ) Summary: Use a normal bitcast, remove from the shared utils since it's not available in GCC 7.4	2025-01-09 13:59:21 -06:00
Joseph Huber	b57c0bac81	[OpenMP] Update atomic helpers to just use headers (#122185 ) Summary: Previously we had some indirection here, this patch updates these utilities to just be normal template functions. We use SFINAE to manage the special case handling for floats. Also this strips address spaces so it can be used more generally.	2025-01-09 13:57:39 -06:00
Joseph Huber	34f8573a51	[OpenMP] Use generic IR for the OpenMP DeviceRTL (#119091 ) Summary: We previously built this for every single architecture to deal with incompatibility. This patch updates it to use the 'generic' IR that `libc` and other projects use. Who knows if this will have any side-effects, probably worth testing more but it passes the tests I expect to pass on my side.	2024-12-24 18:05:28 -06:00
Joseph Huber	b0fbddde38	[OpenMP] Only put `retain` for NVPTX so it can be optimized out for AMD Summary: This is a hack that only NVPTX needs.	2024-12-17 15:16:51 -06:00
Joseph Huber	f4ee5a673f	[OpenMP] Replace AMDGPU fences with generic scoped fences (#119619 ) Summary: This is simpler and more common. I would've replaced the CUDA uses and made this the same but currently it doesn't codegen these fences fully and just emits a full system wide barrier as a fallback.	2024-12-12 07:54:51 -06:00
hidekisaito	f2bceb2311	[Offload][AMDGPU] accept generic target (#118919 ) Enables generic ISA, e.g., "--offload-arch=gfx11-generic" device code to run on gfx11-generic ISA capable device. Executable may contain one ELF that has specific target ISA and another ELF that has compatible generic ISA. Under that circumstance, this code should say both ELFs are compatible, leaving the rest to PluginManager to handle. Suggestions on how best to address that is welcome.	2024-12-09 19:11:38 -05:00
Michał Górny	69227a11fe	[offload] Support LIBOMPTARGET_DEVICE_ARCHITECTURES={amdgpu\|nvptx} (#119070 ) Add two more special values for LIBOMPTARGET_DEVICE_ARCHITECTURES: `amdgpu` and `nvptx`, to support building for all AMDGPU and NVPTX targets respectively. This can be used in place of `all` when offload is built with one of the GPU plugins only.	2024-12-07 15:37:28 +00:00
Michał Górny	b54ba5361e	[offload] Add gfx1012 (Navi 14) to AMDGPU models list (#118857 ) Fixes #118824	2024-12-06 03:24:55 +00:00
Jan Patrick Lehr	c7babfa6a3	[Offload] Find libc relative to DeviceRTL path (#118497 ) This was discussed as a potential solution in https://github.com/llvm/llvm-project/pull/118173	2024-12-03 16:37:57 +01:00
Joseph Huber	91f5f974cb	[OpenMP] Unconditionally provide an RPC client interface for OpenMP (#117933 ) Summary: This patch adds an RPC interface that lives directly in the OpenMP device runtime. This allows OpenMP to implement custom opcodes. Currently this is only providing the host call interface, which is the raw version of reverse offloading. Previously this lived in `libc/` as an extension which is not the correct place. The interface here uses a weak symbol for the RPC client by the same name that the `libc` interface uses. This means that it will defer to the libc one if both are present so we don't need to set up multiple instances. The presense of this symbol is what controls whether or not we set up the RPC server. Because this is an external symbol it normally won't be optimized out, so there's a special pass in OpenMPOpt that deletes this symbol if it is unused during linking. That means at `O0` the RPC server will always be present now, but will be removed trivially if it's not used at O1 and higher.	2024-12-02 14:31:51 -06:00
Joseph Huber	506ca19dc9	[OpenMP] Remove use of '__AMDGCN_WAVEFRONT_SIZE' (#113156 ) Summary: This is going to be deprecated in https://github.com/llvm/llvm-project/pull/112849. This patch ports it to use the builtin instead. This isn't a compile constant, so it could slightly negatively affect codegen. There really should be an IR pass to turn it into a constant if the function has known attributes. Using the builtin is correct when we just do it for knowing the size like we do here. Obviously guarding w32/w64 code with this check would be broken.	2024-11-25 07:38:28 -06:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Joseph Huber	e8d2057ca4	[OpenMP] Add critical region lock for NVPTX targets (#110148 ) Summary: We define this on AMDGCN but not NVPTX, which leads to some failures dependong on the target.	2024-09-26 11:33:52 -07:00
Joseph Huber	c3ac3fe825	[OpenMP] Fix redefining `stdint.h` types (#108607 ) Summary: We can include `stdint.h` just fine as long as we don't allow it to find system headers, passing `-nostdlibinc` and `-nogpuinc` suppresses these extra paths so we will just use the clang resource headers for `stdint.h` and `stddef.h`.	2024-09-13 13:22:44 -05:00
Johannes Doerfert	08533a3ee8	[Offload][NFC] Reorganize `utils::` and make Device/Host/Shared clearer (#100280 ) We had three `utils::` namespaces, all with different "meaning" (host, device, hsa_utils). We should, when we can, keep "include/Shared" accessible from host and device, thus RefCountTy has been moved to a separate header. `hsa_utils` was introduced to make `utils::` less overloaded. And common functionality was de-duplicated, e.g., `utils::advance` and `utils::advanceVoidPtr` -> `utils:advancePtr`. Type punning now checks for the size of the result to make sure it matches the source type. No functional change was intended.	2024-09-05 13:36:26 -07:00
WÁNG Xuěruì	9adf81182e	[Offload] Fix stray libomptarget message helper calls (#106837 ) In #92581 the `LibomptargetUitls.cmake` helpers have been removed, but only uses of `libomptarget_say` were migrated. Migrate the remaining few warning and error messages so the `check-offload` target would not fail due to missing `libomptarget_warning_say`. While at it, update the `check-offload` unavailability message to say `check-offload` instead of `check-libomptarget`. Fixes #92581	2024-08-31 07:06:41 -05:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Joseph Huber	74d23f15b6	[OpenMP] Implement 'omp_alloc' on the device (#102526 ) Summary: The 'omp_alloc' function should be callable from a target region. This patch implemets it by simply calling `malloc` for every non-default trait value allocator. All the special access modifiers are unimplemented and return null. The null allocator returns null as the spec states it should not be usable from the target.	2024-08-14 13:38:55 -05:00
Joseph Huber	dbb8b7a0f4	Reapply "[OpenMP][libc] Remove special handling for OpenMP printf (#98940 )" This reverts commit `fea5914c92`.	2024-07-26 17:21:56 -05:00
Joseph Huber	fea5914c92	Revert "[OpenMP][libc] Remove special handling for OpenMP printf (#98940 )" This reverts commit `069e8bcd82`. Summary: Some tests failing, revert this for now.	2024-07-26 16:39:12 -05:00
Joseph Huber	069e8bcd82	[OpenMP][libc] Remove special handling for OpenMP printf (#98940 ) Summary: Currently there are several layers to handle `printf`. Since we now have varargs and an implementation of `printf` this can be heavily simplified. 1. The frontend renames `printf` into `omp_vprintf` and gives it an argument buffer. Removing 1. triggered some code in the AMDGPU backend menat for HIP / OpenCL, so I hadded an exception to it. 2. Forward this to CUDA vprintf or ignore it. We no longer need special handling for it since we have varargs. So now we just forward this to CUDA vprintf if we have libc, otherwise just leave `printf` as an external function and expect that `libc` will be linked in.	2024-07-26 16:03:36 -05:00
Joseph Huber	7ebd97b852	[OpenMP] Do not define '__assert_fail' if we have the GPU libc (#100409 ) Summary: The C library is intended to provide `__assert_fail`, so in the cases that we have both we should defer to that. This means that if you build the C library for GPUs you'll get the RPC based asser, and if not you'll get the trap based one.	2024-07-26 15:18:10 -05:00
Shilei Tian	41f6599ae1	[NFC][Offload] Move variables to where they are used (#99956 )	2024-07-22 19:52:16 -04:00
Joseph Huber	3c50cbfda4	[DeviceRTL] Make defined 'libc' functions weak in OpenMP (#97356 ) Summary: These functions provide special-case implementations internal to the OpenMP device runtime. This can potentially conflict with the symbols pulled in from the actual GPU `libc`. This patch makes these weak, so in the case that the GPU libc functions exist they will be overridden. This should not impact performance in the average case because the old `-mlink-builtin-bitcode` version does internalization, deleting weak, and the new LTO path will resolve to the strong reference and then internalize it.	2024-07-02 13:23:53 -05:00
Gheorghe-Teodor Bercea	1a478a69bc	[OpenMP][offload] Fix dynamic schedule tracking (#97065 ) This patch fixes the dynamic schedule tracking.	2024-07-01 10:23:11 -04:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit `5fd2af38e4`. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Shilei Tian	b448efb8ea	Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" (#94139 )	2024-06-03 11:17:36 -04:00
Shilei Tian	cf9eeb67e5	Revert "Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )"" This reverts commit `7b48655822`.	2024-05-26 01:04:39 -04:00
Shilei Tian	7b48655822	Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" This reverts commit `9b31cc71d6`.	2024-05-26 00:57:50 -04:00
Joseph Huber	9b31cc71d6	Revert "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" This reverts commit `098c6dfa81`. This reverts commit `8c718a3a91`. This reverts commit `4fb02de9d4`.	2024-05-24 19:07:53 -05:00
Shilei Tian	4fb02de9d4	[OpenMP][OMPX] Add shfl_down_sync (#93311 )	2024-05-24 14:00:43 -04:00
Shilei Tian	7eeec8e6d1	[OpenMP][OMPX] Add ballot_sync (#91297 ) This patch adds the support for `ballot_sync` in ompx.	2024-05-24 09:54:54 -04:00
Joseph Huber	770d928303	[Offload][NFC] Remove 'libomptarget' message helpers (#92581 ) Summary: This isn't `libomptarget` anymore, and these messages were always unnecessary because no other project uses these prefixed messages. The effect of this is that no longer will the logs have `LIBOMPTARGET --` in front of everything. We have a message stating when we start building the offload project so it'll still be trivial to find.	2024-05-17 13:24:32 -05:00
Joseph Huber	16bb7e89a9	[Offload][NFC] Remove all trailing whitespace from offload/ (#92578 ) Summary: This patch cleans up the training whitespace in a bunch of tests and CMake files. Most just in preparation for other cleanups.	2024-05-17 13:15:04 -05:00
Joseph Huber	c4017cda00	[Offload][NFC] Remove header license in CMake files (#92544 ) Summary: No other project has these in the CMake itself, and they're wildly inconsistent even within the project. These don't really add anything so I think they should be removed.	2024-05-17 09:05:03 -05:00

1 2

51 Commits