clang-p2996

Author	SHA1	Message	Date
agozillon	f687ed9ff7	[Flang][OpenMP] Initial defaultmap implementation (#135226 ) This aims to implement most of the initial arguments for defaultmap aside from firstprivate and none, and some of the more recent OpenMP 6 additions which will come in subsequent updates (with the OpenMP 6 variants needing parsing/semantic support first).	2025-05-12 16:30:43 +02:00
Joseph Huber	d60eeda2e5	[Offload] Do not load images from the same descriptor on the same device (#139147 ) Summary: Right now we generally assume that we have one image per device. The binary descriptor represents a single 'compilation'. This means that each image is going to contain the same code built for different architectures when used through the OpenMP interface. This is problematic when we have cases where the same code will then be loaded multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This patch is the quick and dirty slution, we just prevent this from happening at all. This means we use the first one we find, which might not be overly optimal, but it should be better than the alternative. Note that this does not affect shared library loads as it is per binary descriptor, not per device.	2025-05-09 08:21:40 -05:00
agozillon	b291cfcad4	[Flang][OpenMP] Generate correct present checks for implicit maps of optional allocatables (#138210 ) Currently, we do not generate the appropriate checks to check if an optional allocatable argument is present before accessing relevant components of it, in particular when creating bounds, we must generate a presence check and we must make sure we do not generate/keep an load external to the presence check by utilising the raw address rather than the regular address of the info data structure. Similarly in cases for optional allocatables we must treat them like non-allocatable arguments and generate an intermediate allocation that we can have as a location in memory that we can access later in the lowering without causing segfaults when we perform "mapping" on it, even if the end result is an empty allocatable (basically, we shouldn't explode if someone tries to map a non-present optional, similar to C++ when mapping null data).	2025-05-09 13:57:45 +02:00
Joseph Huber	dbe070eb3e	[Offload] Fix PowerPC builds that pass -mcpu (#138327 ) Summary: Another hacky fix done until https://github.com/llvm/llvm-project/pull/136729 lands. This time for `-mcpu`.	2025-05-06 14:14:16 -05:00
Joseph Huber	dfcb8cb2a9	[OpenMP] Add pre sm_70 load hack back in (#138589 ) Summary: Different ordering modes aren't supported for an atomic load, so we just do an add of zero as the same thing. It's less efficient, but it works. Fixes https://github.com/llvm/llvm-project/issues/138560	2025-05-05 16:33:41 -05:00
Ye Luo	dcb43307ce	[Offload] Fix dependency issue #126143 in CMake	2025-05-05 00:38:48 -05:00
Michał Górny	d1e38eab95	[offload] Fix enabling unittests in standalone builds (#138418 ) Modify the unittest logic in offload to only look for `third-party/unittest` directory when `llvm_gtest` is not provided by LLVM itself (in-tree or installed). This makes it possible to run unittests in sparse checkouts without the `third-party/unittest` tree. While at it, also make sure `LLVM_THIRD_PARTY_DIR` is actually set while performing standalone builds. The logic is copied from `compiler-rt`. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-05-03 20:06:14 +02:00
Ross Brunton	f6ac5276ee	[Offload] Ensure all `llvm::Error`s are handled (#137339 ) `llvm::Error`s containing errors must be explicitly handled or an assert will be raised. With this change, `ol_impl_result_t` can accept and consume an `llvm::Error` for errors raised by PluginInterface that have multiple causes and other places now call `llvm::consumeError`. Note that there is currently no facility for PluginInterface to communicate exact error codes, but the constructor is designed in such a way that it can be easily added later. This MR is to convert a crash into an error code. A new test was added, however due to the aforementioned issue with error codes, it does not pass and instead is marked as a skip.	2025-05-02 07:37:19 -05:00
Joseph Huber	a60984ec8d	[Offload] Add 'Maintainers.md' file for offload (#138177 ) Summary: The offload project lacks a maintainers file. Adding it with myself and Johannes as the still active maintainers.	2025-05-01 14:06:33 -05:00
Ross Brunton	49941749a8	[offload] Don't print device path during configure (#138109 )	2025-05-01 07:44:26 -05:00
Callum Fare	7bc16a0f63	[Offload] Adding missing Offload unit tests for event entry points (#137315 ) A couple of liboffload entry points were missed out from the tests, and unsurprisingly a crash in one of them made it in. Add the tests and fix the unchecked error in `olDestroyEvent`.	2025-04-30 09:06:00 -05:00
Callum Fare	6022a5214b	[Offload] Add check-offload-unit for liboffload unittests (#137312 ) Adds a `check-offload-unit` target for running the liboffload unit test suite. This unit test binary runs the tests for every available device. This can optionally filtered to devices from a single platform, but the check target runs on everything. The target is not part of `check-offload` and does not get propagated to the top level build. I'm not sure if either of these things are desirable, but I'm happy to look into it if we want. Also remove the `offload/unittests/Plugins` test as it's dead code and doesn't build.	2025-04-29 11:21:59 -05:00
Joseph Huber	346792aafb	[Offload] Override linker for device build (#137246 ) Summary: Override the default linker in case the user is passing it separately. This requires `lld` but it always did. This will be fixed properly when https://github.com/llvm/llvm-project/pull/136729 lands. Fixes https://github.com/llvm/llvm-project/issues/136822	2025-04-25 17:22:07 +02:00
Joseph Huber	6d0d50f0ac	[OpenMP] Update the bitcode library install and search path (#136754 ) Summary: This was accidentally kept in the old location when we moved to the new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the delta with https://github.com/llvm/llvm-project/pull/136729.	2025-04-23 08:20:15 -05:00
Joseph Huber	92bba68634	[Offload] Fix handling of 'bare' mode when environment missing (#136794 ) Summary: We treated the missing kernel environment as a unique mode, but it was kind of this random bool that was doing the same thing and it explicitly expects the kernel environment to be zero. It broke after the previous change since it used to default to SPMD and didn't handle zero in any of the other cases despite being used. This fixes that and queries for it without needing to consume an error.	2025-04-23 08:16:39 -05:00
Callum Fare	800d949bb3	[Offload] Implement the remaining initial Offload API (#122106 ) Implement the complete initial version of the Offload API, to the extent that is usable for simple offloading programs. Tested with a basic SYCL program. As far as possible, these are simple wrappers over existing functionality in the plugins. * Allocating and freeing memory (host, device, shared). * Creating a program * Creating a queue (wrapper over asynchronous stream resource) * Enqueuing memcpy operations * Enqueuing kernel executions * Waiting on (optional) output events from the enqueue operations * Waiting on a queue to finish Objects created with the API have reference counting semantics to handle their lifetime. They are created with an initial reference count of 1, which can be incremented and decremented with retain and release functions. They are freed when their reference count reaches 0. Platform and device objects are not reference counted, as they are expected to persist as long as the library is in use, and it's not meaningful for users to create or destroy them. Tests have been added to `offload.unittests`, including device code for testing program and kernel related functionality. The API should still be considered unstable and it's very likely we will need to change the existing entry points.	2025-04-22 13:27:50 -05:00
Joseph Huber	56bf0e7202	[OpenMP] Remove dependency on LLVM include directory from DeviceRTL (#136359 ) Summary: Currently we depend on a single LLVM include directory. This is actually only required to define one enum, which is highly unlikely to change. THis patch makes the `Environment.h` include directory more hermetic so we no long depend on other libraries. In exchange, we get a simpler dependency list for the price of hard-coding `1` somewhere. I think it's a valid trade considering that this flag is highly unlikely to change at this point. @ronlieb AMD version https://gist.github.com/jhuber6/3313e6f957be14dc79fe85e5126d2cb3	2025-04-21 15:21:47 -05:00
Michał Górny	ac8fc09688	[offload] Unset `-march` when building GPU libraries (#136442 ) Unset `-march` when invoking the compiler and linker to build the GPU libraries. These libraries use GPU targets rather than the CPU targets, and an incidental `-march=native` causes Clang to be able to determine the GPU used — which causes the build to fail when there is no GPU available. Resetting `-march=` should suffice to revert to building generic code for the time being. See the discussion in: https://github.com/llvm/llvm-project/pull/126143#issuecomment-2816718492	2025-04-20 04:16:19 +00:00
Joseph Huber	5eabececb0	[Offload] Fix JIT test	2025-04-18 12:01:04 -05:00
Joseph Huber	6c5f50f186	[Offload] Fix typo on `-Xoffload-linker`	2025-04-18 10:47:45 -05:00
Joseph Huber	db0f754c5a	[OpenMP] Remove 'libomptarget.devicertl.a' fatbinary and use static library (#126143 ) Summary: Currently, we build a single `libomptarget.devicertl.a` which is a fatbinary. It is a host object file that contains the embedded archive files for both the NVIDIA and AMDGPU targets. This was done primarily as a convenience due to naming conflicts. Now that the clang driver for the GPU targets can appropriate link via the per-target runtime-dir, we can just make two separate static libraries and remove the indirection. This patch creates two new static libraries that get installed into ``` /lib/amdgcn-amd-amdhsa/libomp.a /lib/nvptx64-nvidia-cuda/libomp.a ``` for AMDGPU and NVPTX respectively. The link job created by the linker wrapper now simply needs to do `-lomp` and it will search those directories and link those static libraries. This requires far less special handling. This patch is a precursor to changing the build system entirely to be a runtimes based one. Soon this target will be a standard `add_library` and done through the GPU runtime targets. NOTE that this actually does remove an additional optimization step. Previously we merged all of the files into a single bitcode object and forcibly internalized some definitions. This, instead, just treats them like a normal static library. This may possibly affect performance for some files, but I think it's better overall to use static library semantics because it allows us to have an 'include-what-you-use' relationship with the library. Performance testing will be required. If we really need the merged blob then we can simply pack that into a new static library.	2025-04-18 07:43:31 -05:00
agozillon	b2c9a58b8f	[Flang][OpenMP][MLIR] Check for presence of Box type before emitting store in MapInfoFinalization pass (#135477 ) Currently we don't check for the presence of descriptor/BoxTypes before emitting stores which lower to memcpys, the issue with this is that users can have optional arguments, where they don't provide an input, making the argument effectively null. This can still be mapped and this causes issues at the moment as we'll emit a memcpy for function arguments to store to a local variable for certain edge cases, when we perform this memcpy on a null input, we cause a segfault at runtime. The fix to this is to simply create a branch around the store that checks if the data we're copying from is actually present. If it is, we proceed with the store, if it isn't we skip it.	2025-04-14 17:15:56 +02:00
Joseph Huber	2f41fa387d	[AMDGPU] Fix code object version not being set to 'none' (#135036 ) Summary: Previously, we removed the special handling for the code object version global. I erroneously thought that this meant we cold get rid of this weird `-Xclang` option. However, this also emits an LLVM IR module flag, which will then cause linking issues.	2025-04-10 11:31:21 -05:00
Zequan Wu	78b21ddba7	Revert "Reland "Symbolize line zero as if no source info is available (#124846 )" (#133798 )" This reverts commit `3483740289` because #128619 doesn't handle the case when we have an empty frame from `getInliningInfoForAddress` because line num is 0 which makes it non-differentiable from missing debug info. So, we end up using the base filename from symtab again. Reverting for now until that issus is solved.	2025-04-09 18:09:31 -07:00
Joel E. Denny	5709506de0	[offload] Fix finding amdgpu/nvptx-arch to generate tests (#135072 ) PR #134713, which landed as `79cb6f05da`, causes this on my test systems: ``` -- Building AMDGPU plugin for dlopened libhsa -- Not generating AMDGPU tests, no supported devices detected. Use 'LIBOMPTARGET_FORCE_AMDGPU_TESTS' to override. -- Building CUDA plugin for dlopened libcuda -- Not generating NVIDIA tests, no supported devices detected. Use 'LIBOMPTARGET_FORCE_NVIDIA_TESTS' to override. ``` The problem is it cannot locate amdgpu-arch and nvptx-arch. This patch enables it to. I suspect there is more cleanup to do here. amdgpu-arch and nvptx-arch do not appear to exist as cmake targets anymore, but there is still cmake code here that looks for those targets.	2025-04-09 15:54:29 -04:00
Joel E. Denny	ad9f6d3cee	[PGO][Offload] Use %profdata in PGO tests (#135015 ) So that the wrong llvm-profdata is not picked up from PATH.	2025-04-09 10:40:46 -04:00
Jan Leyonberg	fbc8335311	[MLIR][OpenMP] Add codegen for teams reductions (#133310 ) This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.	2025-04-07 12:47:16 -04:00
Sergio Afonso	66fca0674d	[OpenMP] Fix num_iters in __kmpc__loop DeviceRTL functions (#133435 ) This patch removes the addition of 1 to the number of iterations when calling the following DeviceRTL functions: - `__kmpc_distribute_for_static_loop` - `__kmpc_distribute_static_loop` - `__kmpc_for_static_loop` Calls to these functions are currently only produced by the OMPIRBuilder from flang, which already passes the correct number of iterations to these functions. By adding 1 to the received `num_iters` variable, worksharing can produce incorrect results. This impacts flang OpenMP offloading of `do`, `distribute` and `distribute parallel do` constructs. Expecting the application to pass `tripcount - 1` as the argument seems unexpected as well, so rather than updating flang I think it makes more sense to update the runtime.	2025-04-01 10:29:08 +01:00
Zequan Wu	3483740289	Reland "Symbolize line zero as if no source info is available (#124846 )" (#133798 ) This land commits `23aca2f88d` and `1b15a89a23`. https://github.com/llvm/llvm-project/pull/128619 makes symbolizer to always use debug info when available so we can reland this chagnge.	2025-03-31 19:13:46 -04:00
Alex	021a1f6974	[OFFLOAD] Stricter enforcement of user offload disable (#133470 ) If user specifies offload is disabled (e.g., OMP_TARGET_OFFLOAD=disable), disable library almost completely. This reduces resources spent to a minimum and ensures all APIs behave as if the only available device is the host device. Currently some of the APIs behave as if there were devices avaible for offload even when under OMP_TARGET_OFFLOAD=disable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2025-03-28 17:28:14 -05:00
Ethan Luis McDonough	0c81105373	[PGO][Offload] Disable PGO on NVPTX (#133522 )	2025-03-28 16:32:32 -05:00
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
macurtis-amd	21a8c63cdc	[offload] Remove bad assert in StaticLoopChunker::Distribute (#132705 ) When building with asserts enabled, this can actually cause strange miscompilations because an incorrect llvm.assume is generated at the point of the assertion.	2025-03-28 04:53:00 -05:00
Joseph Huber	75f810e025	[Offload] Guard HSA implicit arguments if they aren't created (#133073 ) Summary: We conditionally allocate the implicit arguments, so they possibly are null. The flang compiler seems to hit this case, even though it shouldn't when it's supposed to conform to the HSA code object. For now guard this to fix the regression and cover a case in the future where someone rolls a fully custom implementatation. Fixes: https://github.com/llvm/llvm-project/issues/132982	2025-03-26 08:54:33 -05:00
Joseph Huber	25bf4e262c	[Offload] Remove handling for COV4 binaries from offload/ (#131033 ) Summary: We moved from cov4 to cov5 a long time ago, and it guards simplifying some front end code, so we should be able to move up with this.	2025-03-24 18:58:20 -05:00
Ethan Luis McDonough	c50d39f073	[PGO][Offload] Allow PGO flags to be used on GPU targets (#94268 ) This pull request is the third part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on https://github.com/llvm/llvm-project/pull/93365. This PR makes the following changes: - Allows PGO flags to be supplied to GPU targets - Pulls version global from device - Modifies `__llvm_write_custom_profile` and `lprofWriteDataImpl` to allow the PGO version to be overridden	2025-03-19 19:01:38 -05:00
Joseph Huber	cb493d2bab	[OpenMP] Replace utilities with 'gpuintrin.h' definitions (#131644 ) Summary: Port more instructions. AMD version is at https://gist.github.com/jhuber6/235d7ee95f747c75f9a3cfd8eedac6aa	2025-03-19 10:47:21 -05:00
Jon Chesterfield	deb0f3c09b	[openmp][nfc] Use builtin align in the devicertl (#131918 ) Noticed while extracting the smartstack as a test case	2025-03-18 21:31:49 +00:00
Jon Chesterfield	395bdebebd	Revert "[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905 )" This reverts commit `c02b935a9b`. Failed a check-offload test under CI	2025-03-18 20:43:05 +00:00
Joseph Huber	206f78dfec	[OpenMP] Use 'gpuintrin.h' definitions for simple block identifiers (#131631 ) Summary: This patch ports the runtime to use `gpuintrin.h` instead of calling the builtins for most things. The `lanemask_gt` stuff was left for now with a fallback. AMD version for Ron https://gist.github.com/jhuber6/42014d635b9a8158727640876bf47226.	2025-03-18 15:38:46 -05:00
Jon Chesterfield	c02b935a9b	[openmp][nfc] Refactor shared/lds smartstack for spirv (#131905 ) Spirv doesn't have implicit conversions between address spaces (at least at present, we might need to change that) and address space qualified *this pointers are not handled well by clang. This commit changes the single instance of the smartstack to be explicitly a singleton, for fractionally simpler IR generation (no this pointer) and to sidestep the work in progress spirv64-- openmp target not being able to compile the original version.	2025-03-18 20:33:24 +00:00
Joseph Huber	8437b7f558	[libc] Make RPC server handling header only (#131205 ) Summary: This patch moves the RPC server handling to be a header only utility stored in the `shared/` directory. This is intended to be shared within LLVM for the loaders and `offload/` handling. Generally, this makes it easier to share code without weird cross-project binaries being plucked out of the build system. It also allows us to soon move the loader interface out of the `libc` project so that we don't need to bootstrap those and can build them in LLVM.	2025-03-13 19:23:21 -05:00
Michael Kruse	d3255474be	Reapply "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-" (#130274 ) Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime for the amdgpu-offload- buildbots. This pre-population cmake cache files is referred to by the llvm-zorg annotated builder factory [script](`872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)`). The corresponding change in llvm-zorg is llvm/llvm-zorg#402 This reverts commit `e296fb8ff6`. The worker of amdgpu-offload-rhel-8-cmake-build-only has been updated with a newer version of Ninja that supports Fortran.	2025-03-13 13:21:36 +01:00
Krzysztof Parzyszek	f4fc2d731c	[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832 ) Improve the check for whether a type can be passed by copy. Currently, passing by copy is done via the OMP_MAP_LITERAL mapping, which can only transfer as much data as can be contained in a pointer representation.	2025-03-12 19:41:11 -05:00
Nikita Popov	f137c3d592	[TargetRegistry] Accept Triple in createTargetMachine() (NFC) (#130940 ) This avoids doing a Triple -> std::string -> Triple round trip in lots of places, now that the Module stores a Triple.	2025-03-12 17:35:09 +01:00
Krzysztof Parzyszek	d67947162f	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568 ) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>	2025-03-10 08:11:01 -05:00
agozillon	f1178815d2	[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586 ) This PR adds an initial implementation for the map modifiers close, present and ompx_hold, primarily just required adding the appropriate map type flags to the map type bits. In the case of ompx_hold it required adding the map type to the OpenMP dialect. Close has a bit of a problem when utilised with the ALWAYS map type on descriptors, so it is likely we'll have to make sure close and always are not applied to the descriptor simultaneously in the future when we apply always to the descriptors to facilitate movement of descriptor information to device for consistency, however, we may find an alternative to this with further investigation. For the moment, it is a TODO/Note to keep track of it.	2025-03-07 22:22:30 +01:00
Michael Kruse	e296fb8ff6	Revert "[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-*" (#130274 ) Reverts llvm/llvm-project#129692 The builder amdgpu-offload-rhel-8-cmake-build-only fails because its version of Ninja is too old. At least Ninja 1.10 is required for its support for dependencies between Fortran modules. https://lab.llvm.org/buildbot/#/builders/204/builds/2696	2025-03-07 12:30:18 +01:00
Michael Kruse	68578b38cf	[Offload][AMDGPU] LLVM_ENABLE_RUNTIMES=flang-rt for amdgpu-offload-* (#129692 ) Enable the LLVM_ENABLE_RUNTIMES=flang-rt build of the Fortran runtime for the amdgpu-offload-* buildbots. This pre-population cmake cache files is referred to by the llvm-zorg annotated builder factory [script](`872f477610/zorg/buildbot/builders/annotated/amdgpu-offload-cmake.py (L26)`). The corresponding change in llvm-zorg is https://github.com/llvm/llvm-zorg/pull/402	2025-03-07 11:56:00 +01:00
Nikita Popov	4f469ae046	[offload] Fix build after Module::getTargetTriple() change Adjust for #129868.	2025-03-06 11:04:00 +01:00

1 2 3 4 5 ...

273 Commits