clang-p2996

Author	SHA1	Message	Date
Ross Brunton	637df705e5	[Offload] Add `OFFLOAD_INCLUDE_TESTS` (#143388 ) This is a cmake variable which, if set to `OFF` will disable building of tests. It defaults to the value of `LLVM_INCLUDE_TESTS`.	2025-06-09 10:27:40 -05:00
Callum Fare	835497a4dc	[Offload] Make olMemcpy src parameter const (#143161 )	2025-06-06 10:25:00 -05:00
Ross Brunton	269c29ae67	[Offload] Allow setting null arguments in olLaunchKernel (#141958 )	2025-06-06 07:05:11 -05:00
Joseph Huber	051945304b	[Offload] Fix APU detection for MI300 testing (#143026 ) Summary: We have this check when the target is MI300 but it fails if this environment variable isn't set. Set a default value of '0' if not present so that will be converted to bool false.	2025-06-05 15:31:55 -05:00
Callum Fare	f44df93a9c	[Offload] Explicitly create directories that contain tablegen output (#142817 ) This isn't required when building with Ninja, but with the Makefile generator these directories don't get implicitly created.	2025-06-04 13:46:19 -05:00
Callum Fare	817af2ddf2	[Offload] Fix missing dependencies in Offload API generation (#142776 ) Thanks to @RossBrunton for spotting this. We attempt to clang-format the generated Offload header files, but if clang-format isn't available we just copy the generated files instead. That fallback path was missing the correct dependencies. Fixes #142756	2025-06-04 08:51:50 -05:00
Callum Fare	b78bc35d16	[Offload] Don't check in generated files (#141982 ) Previously we decided to check in files that we generate with tablegen. The justification at the time was that it helped reviewers unfamiliar with `offload-tblgen` see the actual changes to the headers in PRs. After trying it for a while, it's ended up causing some headaches and is also not how tablegen is used elsewhere in LLVM. This changes our use of tablegen to be more conventional. Where possible, files are still clang-formatted, but this is no longer a hard requirement. Because `OffloadErrcodes.inc` is shared with libomptarget it now gets generated in a more appropriate place.	2025-06-03 10:39:04 -05:00
Jan Patrick Lehr	e97f42e931	[OpenMP][Offload] Fix typo in error message (#142589 ) It appears that the spelling was incorrect in those test cases. At least on machines with ROCm version > 6.3. I had no chance to test with ROCm version version < 6.2 and would be interested in the result if someone has the chance.	2025-06-03 07:33:45 -05:00
Joseph Huber	eb9ed93fce	[Offload] Optimistically accept SM architectures (#142399 ) Summary: We try to clamp these to ones known to work, but we should probably just optimistically accept these. I'd prefer to update the flag check, but since NVIDIA refuses to publish their ELF format it's too much effort to reverse engineer. Fixes: https://github.com/llvm/llvm-project/issues/138532	2025-06-02 14:32:05 -05:00
Ross Brunton	e83c80340f	[Offload] Split offload unittests into multiple files (#142418 ) Rather than a single `offload.unittests` file, this will produce `device.unittests`, `event.unittests`, etc.. This should reduce time spent building tests, and make it easier to manually run a subset of the tests. Note that `check-offload-unit` will still run all the tests.	2025-06-02 11:48:12 -05:00
Joseph Huber	5b8031a7f7	[Offload][AMDGPU] Correctly handle variable implicit argument sizes (#142199 ) Summary: The size of the implicit argument struct can vary depending on optimizations, it is not always the size as listed by the full struct. Additionally, the implicit arguments are always aligned on a pointer boundary. This patch updates the handling to use the correctly aligned offset and only initialize the members if they are contained in the reported size. Additionally, we modify the `alloc` and `free` routines to allow `alloc(0)` and `free(nullptr)` as these are mandated by the C standard and allow us to easily handle cases where the user calls a kernel with no arguments.	2025-06-02 09:35:16 -05:00
Ross Brunton	41e22aa31b	[Offload] Set size correctly in olLaunchKernel cts test (#142398 ) It was previously not scaled by `sizeof(uint32_t)`.	2025-06-02 09:27:09 -05:00
Joseph Huber	b26baf1779	[Offload] Make AMDGPU plugin handle empty allocation properly (#142383 ) Summary: `malloc(0)` and `free(nullptr)` are both defined by the standard but we current trigger erros and assertions on them. Fix that so this works with empty arguments.	2025-06-02 08:12:20 -05:00
Ross Brunton	7efb79b705	[Offload] Fix Error checking (#141939 ) All errors must be checked - this includes the local variable we were using to increase the lifetime of `Res`. As we were not explicitly checking it, it resulted in an `abort` in debug builds.	2025-05-29 08:17:08 -05:00
Ross Brunton	a1191b4875	[Offload] Fix broken tablegen test after #140879 (#141796 )	2025-05-28 11:30:15 -05:00
Joseph Huber	0ebe5557d9	[Offload] Add specifier for the host type (#141635 ) Summary: We use this sepcial type to indicate a host value, this will be refined later but for now it's used as a stand-in device for transfers and queues. It needs a special kind because it is not a device target as the other ones so we need to differentiate it between a CPU and GPU type. Fixes: https://github.com/llvm/llvm-project/issues/141436	2025-05-28 08:51:14 -05:00
Joseph Huber	a9b64bb318	[Offload] Fix segfault when looking for host device name (#141632 ) Summary: This is done using the generic device into pointe, but no such thing exists for the host device, leading to a segfault. This patch fixes that for now, but in the future we should probably be more careful in general handling the possibility that the handle is null everywhere. Fixes: https://github.com/llvm/llvm-project/issues/141434	2025-05-27 13:43:29 -05:00
Joseph Huber	20f9f1fc02	[Offload][NFCI] Remove coupling to `omp` target for version scripting (#141637 ) Summary: This is a weird dependency on libomp just for testing if version scripts work. We shouldn't need to do this because LLVM already checks for this. I believe this should be available as well in standalone when we call `addLLVM` but I did not test that directly.	2025-05-27 13:43:07 -05:00
Ross Brunton	7e9d708be0	[Offload] Use llvm::Error throughout liboffload internals (#140879 ) This removes the `ol_impl_result_t` helper class, replacing it with `llvm::Error`. In addition, some internal functions that returned `ol_errc_t` now return `llvm::Error` (with a fancy message).	2025-05-27 13:42:56 -05:00
Johannes Doerfert	57a90edacd	[OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786 ) The generic GPU barrier implementation checked if it was the main thread in generic mode to identify single threaded regions. This doesn't work since inside of a non-active (=sequential) parallel, that thread becomes the main thread of a team, and is not the main thread in generic mode. At least that is the implementation of the APIs today. To identify single threaded regions we now check the team size explicitly. This exposed three other issues; one is, for now, expected and not a bug, the second one is a bug and has a FIXME in the single_threaded_for_barrier_hang_1.c file, and the final one is also benign as described in the end. The non-bug issue comes up if we ever initialize a thread state. Afterwards we will never run any region in parallel. This is a little conservative, but I guess thread states are really bad for performance anyway. The bug comes up if we optimize single_threaded_for_barrier_hang_1 and execute it in Generic-SPMD mode. For some reason we loose all the updates to b. This looks very much like a compiler bug, but could also be another logic issue in the runtime. Needs to be investigated. Issue number 3 comes up if we have nested parallels inside of a target region. The clang SPMD-check logic gets confused, determines SPMD (which is fine) but picks an unreasonable thread count. This is all benign, I think, just weird: ``` #pragma omp target teams #pragma omp parallel num_threads(64) #pragma omp parallel num_threads(10) {} ``` Was launched with 10 threads, not 64.	2025-05-20 19:33:54 -07:00
Ross Brunton	c19a3cb613	[Offload] Make OffloadAPI gtest error messages more readable (#140728 )	2025-05-20 08:50:26 -05:00
Ross Brunton	050892d2f8	[Offload] Use new error code handling mechanism and lower-case messages (#139275 ) [Offload] Use new error code handling mechanism This removes the old ErrorCode-less error method and requires every user to provide a concrete error code. All calls have been updated. In addition, for consistency with error messages elsewhere in LLVM, all messages have been made to start lower case.	2025-05-20 08:50:20 -05:00
Ross Brunton	1532ee6916	[Offload] Add Error Codes to PluginInterface (#138258 ) A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch. ~~Depends on #137339 , please ignore first commit of this MR.~~ This has been merged.	2025-05-19 09:38:34 -05:00
Ethan Luis McDonough	1043810769	[PGO][Offload] Update PGO GPU tests (#132262 )	2025-05-14 17:17:52 -05:00
Dhruva Chakrabarti	f965996cfb	[Offload] Remove unused field IsBareKernel. (#139815 )	2025-05-13 17:35:55 -07:00
agozillon	f687ed9ff7	[Flang][OpenMP] Initial defaultmap implementation (#135226 ) This aims to implement most of the initial arguments for defaultmap aside from firstprivate and none, and some of the more recent OpenMP 6 additions which will come in subsequent updates (with the OpenMP 6 variants needing parsing/semantic support first).	2025-05-12 16:30:43 +02:00
Joseph Huber	d60eeda2e5	[Offload] Do not load images from the same descriptor on the same device (#139147 ) Summary: Right now we generally assume that we have one image per device. The binary descriptor represents a single 'compilation'. This means that each image is going to contain the same code built for different architectures when used through the OpenMP interface. This is problematic when we have cases where the same code will then be loaded multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This patch is the quick and dirty slution, we just prevent this from happening at all. This means we use the first one we find, which might not be overly optimal, but it should be better than the alternative. Note that this does not affect shared library loads as it is per binary descriptor, not per device.	2025-05-09 08:21:40 -05:00
agozillon	b291cfcad4	[Flang][OpenMP] Generate correct present checks for implicit maps of optional allocatables (#138210 ) Currently, we do not generate the appropriate checks to check if an optional allocatable argument is present before accessing relevant components of it, in particular when creating bounds, we must generate a presence check and we must make sure we do not generate/keep an load external to the presence check by utilising the raw address rather than the regular address of the info data structure. Similarly in cases for optional allocatables we must treat them like non-allocatable arguments and generate an intermediate allocation that we can have as a location in memory that we can access later in the lowering without causing segfaults when we perform "mapping" on it, even if the end result is an empty allocatable (basically, we shouldn't explode if someone tries to map a non-present optional, similar to C++ when mapping null data).	2025-05-09 13:57:45 +02:00
Joseph Huber	dbe070eb3e	[Offload] Fix PowerPC builds that pass -mcpu (#138327 ) Summary: Another hacky fix done until https://github.com/llvm/llvm-project/pull/136729 lands. This time for `-mcpu`.	2025-05-06 14:14:16 -05:00
Joseph Huber	dfcb8cb2a9	[OpenMP] Add pre sm_70 load hack back in (#138589 ) Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes https://github.com/llvm/llvm-project/issues/138560	2025-05-05 16:33:41 -05:00
Ye Luo	dcb43307ce	[Offload] Fix dependency issue #126143 in CMake	2025-05-05 00:38:48 -05:00
Michał Górny	d1e38eab95	[offload] Fix enabling unittests in standalone builds (#138418 ) Modify the unittest logic in offload to only look for `third-party/unittest` directory when `llvm_gtest` is not provided by LLVM itself (in-tree or installed). This makes it possible to run unittests in sparse checkouts without the `third-party/unittest` tree. While at it, also make sure `LLVM_THIRD_PARTY_DIR` is actually set while performing standalone builds. The logic is copied from `compiler-rt`. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-05-03 20:06:14 +02:00
Ross Brunton	f6ac5276ee	[Offload] Ensure all `llvm::Error`s are handled (#137339 ) `llvm::Error`s containing errors must be explicitly handled or an assert will be raised. With this change, `ol_impl_result_t` can accept and consume an `llvm::Error` for errors raised by PluginInterface that have multiple causes and other places now call `llvm::consumeError`. Note that there is currently no facility for PluginInterface to communicate exact error codes, but the constructor is designed in such a way that it can be easily added later. This MR is to convert a crash into an error code. A new test was added, however due to the aforementioned issue with error codes, it does not pass and instead is marked as a skip.	2025-05-02 07:37:19 -05:00
Joseph Huber	a60984ec8d	[Offload] Add 'Maintainers.md' file for offload (#138177 ) Summary: The offload project lacks a maintainers file. Adding it with myself and Johannes as the still active maintainers.	2025-05-01 14:06:33 -05:00
Ross Brunton	49941749a8	[offload] Don't print device path during configure (#138109 )	2025-05-01 07:44:26 -05:00
Callum Fare	7bc16a0f63	[Offload] Adding missing Offload unit tests for event entry points (#137315 ) A couple of liboffload entry points were missed out from the tests, and unsurprisingly a crash in one of them made it in. Add the tests and fix the unchecked error in `olDestroyEvent`.	2025-04-30 09:06:00 -05:00
Callum Fare	6022a5214b	[Offload] Add check-offload-unit for liboffload unittests (#137312 ) Adds a `check-offload-unit` target for running the liboffload unit test suite. This unit test binary runs the tests for every available device. This can optionally filtered to devices from a single platform, but the check target runs on everything. The target is not part of `check-offload` and does not get propagated to the top level build. I'm not sure if either of these things are desirable, but I'm happy to look into it if we want. Also remove the `offload/unittests/Plugins` test as it's dead code and doesn't build.	2025-04-29 11:21:59 -05:00
Joseph Huber	346792aafb	[Offload] Override linker for device build (#137246 ) Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed properly when https://github.com/llvm/llvm-project/pull/136729 lands. Fixes https://github.com/llvm/llvm-project/issues/136822	2025-04-25 17:22:07 +02:00
Joseph Huber	6d0d50f0ac	[OpenMP] Update the bitcode library install and search path (#136754 ) Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with https://github.com/llvm/llvm-project/pull/136729.	2025-04-23 08:20:15 -05:00
Joseph Huber	92bba68634	[Offload] Fix handling of 'bare' mode when environment missing (#136794 ) Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.	2025-04-23 08:16:39 -05:00
Callum Fare	800d949bb3	[Offload] Implement the remaining initial Offload API (#122106 ) Implement the complete initial version of the Offload API, to the extent that is usable for simple offloading programs. Tested with a basic SYCL program. As far as possible, these are simple wrappers over existing functionality in the plugins. * Allocating and freeing memory (host, device, shared). * Creating a program * Creating a queue (wrapper over asynchronous stream resource) * Enqueuing memcpy operations * Enqueuing kernel executions * Waiting on (optional) output events from the enqueue operations * Waiting on a queue to finish Objects created with the API have reference counting semantics to handle their lifetime. They are created with an initial reference count of 1, which can be incremented and decremented with retain and release functions. They are freed when their reference count reaches 0. Platform and device objects are not reference counted, as they are expected to persist as long as the library is in use, and it's not meaningful for users to create or destroy them. Tests have been added to `offload.unittests`, including device code for testing program and kernel related functionality. The API should still be considered unstable and it's very likely we will need to change the existing entry points.	2025-04-22 13:27:50 -05:00
Joseph Huber	56bf0e7202	[OpenMP] Remove dependency on LLVM include directory from DeviceRTL (#136359 ) Summary: Currently we depend on a single LLVM include directory. This is actually only required to define one enum, which is highly unlikely to change. THis patch makes the `Environment.h` include directory more hermetic so we no long depend on other libraries. In exchange, we get a simpler dependency list for the price of hard-coding `1` somewhere. I think it's a valid trade considering that this flag is highly unlikely to change at this point. @ronlieb AMD version https://gist.github.com/jhuber6/3313e6f957be14dc79fe85e5126d2cb3	2025-04-21 15:21:47 -05:00
Michał Górny	ac8fc09688	[offload] Unset `-march` when building GPU libraries (#136442 ) Unset `-march` when invoking the compiler and linker to build the GPU libraries. These libraries use GPU targets rather than the CPU targets, and an incidental `-march=native` causes Clang to be able to determine the GPU used — which causes the build to fail when there is no GPU available. Resetting `-march=` should suffice to revert to building generic code for the time being. See the discussion in: https://github.com/llvm/llvm-project/pull/126143#issuecomment-2816718492	2025-04-20 04:16:19 +00:00
Joseph Huber	5eabececb0	[Offload] Fix JIT test	2025-04-18 12:01:04 -05:00
Joseph Huber	6c5f50f186	[Offload] Fix typo on `-Xoffload-linker`	2025-04-18 10:47:45 -05:00
Joseph Huber	db0f754c5a	[OpenMP] Remove 'libomptarget.devicertl.a' fatbinary and use static library (#126143 ) Summary: Currently, we build a single `libomptarget.devicertl.a` which is a fatbinary. It is a host object file that contains the embedded archive files for both the NVIDIA and AMDGPU targets. This was done primarily as a convenience due to naming conflicts. Now that the clang driver for the GPU targets can appropriate link via the per-target runtime-dir, we can just make two separate static libraries and remove the indirection. This patch creates two new static libraries that get installed into ``` /lib/amdgcn-amd-amdhsa/libomp.a /lib/nvptx64-nvidia-cuda/libomp.a ``` for AMDGPU and NVPTX respectively. The link job created by the linker wrapper now simply needs to do `-lomp` and it will search those directories and link those static libraries. This requires far less special handling. This patch is a precursor to changing the build system entirely to be a runtimes based one. Soon this target will be a standard `add_library` and done through the GPU runtime targets. NOTE that this actually does remove an additional optimization step. Previously we merged all of the files into a single bitcode object and forcibly internalized some definitions. This, instead, just treats them like a normal static library. This may possibly affect performance for some files, but I think it's better overall to use static library semantics because it allows us to have an 'include-what-you-use' relationship with the library. Performance testing will be required. If we really need the merged blob then we can simply pack that into a new static library.	2025-04-18 07:43:31 -05:00
agozillon	b2c9a58b8f	[Flang][OpenMP][MLIR] Check for presence of Box type before emitting store in MapInfoFinalization pass (#135477 ) Currently we don't check for the presence of descriptor/BoxTypes before emitting stores which lower to memcpys, the issue with this is that users can have optional arguments, where they don't provide an input, making the argument effectively null. This can still be mapped and this causes issues at the moment as we'll emit a memcpy for function arguments to store to a local variable for certain edge cases, when we perform this memcpy on a null input, we cause a segfault at runtime. The fix to this is to simply create a branch around the store that checks if the data we're copying from is actually present. If it is, we proceed with the store, if it isn't we skip it.	2025-04-14 17:15:56 +02:00
Joseph Huber	2f41fa387d	[AMDGPU] Fix code object version not being set to 'none' (#135036 ) Summary: Previously, we removed the special handling for the code object version global. I erroneously thought that this meant we cold get rid of this weird `-Xclang` option. However, this also emits an LLVM IR module flag, which will then cause linking issues.	2025-04-10 11:31:21 -05:00
Zequan Wu	78b21ddba7	Revert "Reland "Symbolize line zero as if no source info is available (#124846 )" (#133798 )" This reverts commit `3483740289` because #128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.	2025-04-09 18:09:31 -07:00
Joel E. Denny	5709506de0	[offload] Fix finding amdgpu/nvptx-arch to generate tests (#135072 ) PR #134713, which landed as `79cb6f05da`, causes this on my test systems: ``` -- Building AMDGPU plugin for dlopened libhsa -- Not generating AMDGPU tests, no supported devices detected. Use 'LIBOMPTARGET_FORCE_AMDGPU_TESTS' to override. -- Building CUDA plugin for dlopened libcuda -- Not generating NVIDIA tests, no supported devices detected. Use 'LIBOMPTARGET_FORCE_NVIDIA_TESTS' to override. ``` The problem is it cannot locate amdgpu-arch and nvptx-arch. This patch enables it to. I suspect there is more cleanup to do here. amdgpu-arch and nvptx-arch do not appear to exist as cmake targets anymore, but there is still cmake code here that looks for those targets.	2025-04-09 15:54:29 -04:00

1 2 3 4 5 ...

298 Commits