Commit Graph

327 Commits

Author SHA1 Message Date
Ross Brunton
4f02965ae2 [Offload] Store kernel name in GenericKernelTy (#142799)
GenericKernelTy has a pointer to the name that was used to create it.
However, the name passed in as an argument may not outlive the kernel.
Instead, GenericKernelTy now contains a std::string, and copies the
name into there.
2025-07-02 14:11:05 +01:00
Callum Fare
acb52a8a98 [Offload] Improve liboffload documentation (#142403)
- Update the main README to reflect the current project status
- Rework the main API generation documentation. General fixes/tidying,
but also spell out explicitly how to make API changes at the top of the
document since this is what most people will care about.

---------

Co-authored-by: Martin Grant <martingrant@outlook.com>
2025-07-02 13:52:27 +01:00
Kewen12
2b16af8df2 [Offload][cmake] Add GPU test job limit for AMDGPU buildbot cmake cache (#146611)
Added GPU test job limit to make it consistent with current config
https://github.com/llvm/llvm-zorg/blob/main/buildbot/osuosl/master/config/builders.py#L2027C31-L2027C77
2025-07-01 19:18:28 -05:00
Joseph Huber
3cff3d882b [Offload] Add skeleton for offload conformance tests (#146391)
Summary:
This adds a basic outline for adding 'conformance' tests. These are
tests that are intended to check device code against a standard. In this
case, we will expect this to be filled with math conformance tests to
make sure their results are within the ULP requirements we demand.

Right now this just *assumes* the GPU libc is there, meaning you'll
likely need to do a manual `ninja` before doing `ninja -C
runtimes/runtimes-bins offload.conformance`.
2025-07-01 10:20:40 -05:00
Callum Fare
1a253e213d [NFC][Offload] Fix possible edge cases in offload-tblgen (#146511)
Fix a couple of unhandled edge cases in offload-tblgen that were found
by static analysis
* `LineStart` may wrap around to 0 when processing multi-line strings.
The value is not actually being used in that case, but still better to
explicitly handle it
* Possible unchecked nullptr when processing parameter flags
2025-07-01 14:09:49 +01:00
Ye Luo
536ba87726 [libomptarget] Add a test for OMP_TARGET_OFFLOAD=disabled (#146385)
closes https://github.com/llvm/llvm-project/issues/144786
2025-06-30 13:29:36 -05:00
Ross Brunton
67e73ba605 [Offload] Refactor device/platform info queries (#146345)
This makes several small changes to how the platform and device info
queries are handled:
* ReturnHelper has been replaced with InfoWriter which is more explicit
  in how it is invoked.
* InfoWriter consumes `llvm::Expected` rather than values directly, and
  will early exit if it returns an error.
* As a result of the above, `GetInfoString` now correctly returns errors
  rather than empty strings.
* The host device now has its own dedicated "getInfo" function rather
  than being checked in multiple places.
2025-06-30 15:00:43 +01:00
Ross Brunton
003145d0c8 [Offload] Implement olShutDown (#144055)
`olShutDown` was not properly calling deinit on the platforms, resulting
in random segfaults on AMD devices.

As part of this, `olInit` and `olShutDown` now alloc and free the
offload context rather than it being static. This
allows `olShutDown` to be called within a destructor of a static object
(like the tests do) without having to worry about destructor ordering.
2025-06-30 12:14:00 +01:00
Julian Brown
b62b58d1bb [OpenMP] Fix crash with duplicate mapping on target directive (#146136)
OpenMP allows duplicate mappings, i.e. in OpenMP 6.0, 7.9.6 "map
Clause":

  Two list items of the map clauses on the same construct must not share
  original storage unless one of the following is true: they are the same
  list item [or other omitted reasons]"

Duplicate mappings can arise as a result of user-defined mapper
processing (which I think is a separate bug, and is not addressed here),
but also in straightforward cases such as:

  #pragma omp target map(tofrom: s.mem[0:10]) map(tofrom: s.mem[0:10])

Both these cases cause crashes at runtime at present, due to an
unfortunate interaction between reference counting behaviour and shadow
pointer handling for blocks. This is what happens:

  1.  The member "s.mem" is copied to the target
  2.  A shadow pointer is created, modifying the pointer on the target
  3.  The member "s.mem" is copied to the target again
  4. The previous shadow pointer metadata is still present, so the runtime doesn't modify the target pointer a second time.

The fix is to disable step 3 if we've already done step 2 for a given
block that has the "is new" flag set.
2025-06-29 22:41:24 +01:00
Ross Brunton
39f19f2f1f [Offload] Store device info tree in device handle (#145913)
Rather than creating a new device info tree for each call to
`olGetDeviceInfo`, we instead do it on device initialisation. As well
as improving performance, this fixes a few lifetime issues with returned
strings.

This does unfortunately mean that device information is immutable,
but hopefully that shouldn't be a problem for any queries we want to
implement.

This also meant allowing offload initialization to fail, which it can
now do.
2025-06-27 15:10:43 +01:00
Ross Brunton
102cf1b999 [Offload] Make CUDA Driver Version a string (#146049)
AMD treats this value as a string, so for consistency require this in
NVIDIA as well. This shouldn't change the output of the
`llvm-offload-device-info` tool, but does fix an issue in liboffload
when it tries to query the version.
2025-06-27 15:07:04 +01:00
Joseph Huber
df5097dd94 [Offload] Add default for HSA agent type to silence warning (#145943)
Summary:
There's a new one called the AIE (AI Engine). We could handle this, but
since we don't use it currently I'm just making it future-proof. Adding
the AIE check would require checking the HSA version which isn't
worthwhile just yet.
2025-06-26 14:46:08 -05:00
Ross Brunton
3e337bc308 [Offload] Add a stub unloadBinaryImpl for host device (#145716) 2025-06-25 17:06:17 +01:00
Ross Brunton
0870c8838b [Offload] Add an unloadBinary interface to PluginInterface (#143873)
This allows removal of a specific Image from a Device, rather than
requiring all image data to outlive the device they were created for.

This is required for `ol_program_handle_t`s, which now specify the
lifetime of the buffer used to create the program.
2025-06-25 14:53:18 +01:00
Ross Brunton
4359e55838 [Offload] Properly report errors when jit compiling (#145498)
Previously, if a binary failed to load due to failures when jit
compiling, the function would return success with nullptr. Now it
returns a new plugin error, `COMPILE_FAILURE`.
2025-06-24 16:27:12 +01:00
Ross Brunton
4785832144 [Offload] Fix cmake warning (#145488)
Cmake was unhappy that there was no space between arguments, now it
is.
2025-06-24 13:42:03 +01:00
Ross Brunton
02d2a1646a [Offload] Fix entry_points.td test (#145292)
This was broken as part of #144494 , and just needs an update to the
check lines.
2025-06-23 11:09:08 +01:00
Ross Brunton
613c38a992 [Offload] Fix type mismatch warning in test (#143700) 2025-06-23 10:14:12 +01:00
Joseph Huber
3f1de197b1 [Offload] Rework compiling device code for unit test suites (#144776)
Summary:
I'll probably want to use this as a more generic utility in the future.
This patch reworks it to make it a top level function. I also tried to
decouple this from the OpenMP utilities to make that easier in the
future. Instead, I just use `-march=native` functionality which is the
same thing. Needed a small hack to skip the linker stage for checking if
that works.

This should still create the same output as far as I'm aware.
2025-06-20 10:31:54 -05:00
Ross Brunton
f242360e15 [Offload] Add type information to device info nodes (#144535)
Rather than being "stringly typed", store values as a std::variant that
can hold various types. This means that liboffload doesn't have to do
any string parsing for integer/bool device info keys.
2025-06-20 09:05:05 -05:00
Ross Brunton
e0633d59b9 [Offload] Check for initialization (#144370)
All entry points (except olInit) now check that offload has been
initialized. If not, a new `OL_ERRC_UNINITIALIZED` error is returned.
2025-06-20 09:04:50 -05:00
Ross Brunton
53336ad488 [Offload] Move (most) global state to an OffloadContext struct (#144494)
Rather than having a number of static local variables, we now use
a single `OffloadContext` struct to store global state. This is
initialised by `olInit`, but is never deleted (de-initialization of
Offload isn't yet implemented).

The error reporting mechanism has not been moved to the struct, since
that's going to cause issues with teardown (error messages must outlive
liboffload).
2025-06-19 16:02:03 -05:00
Jan Patrick Lehr
dd65e6e060 [Offload][libc] Add cmake cache AMDGPU buildbot (#144500)
An upcoming libc4GPU buildbot will be using this CMake cache file for
its build configuration.
2025-06-17 20:51:40 +02:00
Ross Brunton
e6a3579653 [Offload] Replace device info queue with a tree (#144050)
Previously, device info was returned as a queue with each element having
a "Level" field indicating its nesting level. This replaces this queue
with a more traditional tree-like structure.

This should not result in a change to the output of
`llvm-offload-device-info`.
2025-06-13 09:22:47 -05:00
Ethan Luis McDonough
daee5eee85 [Offload][PGO] Fix new GPU PGO tests (#143645)
`pgo_atomic_teams.c` and `pgo_atomic_threads.c` currently are set to run
on NVPTX despite the changes for that target not being upstreamed yet.
This patch also replaces instances of `llvm-profdata` with `%profdata`
in those tests.
2025-06-12 11:14:21 -05:00
Ross Brunton
4f60321ca1 [Offload] Add ol_dimensions_t and convert ranges from size_t -> uint32_t (#143901)
This is a three element x, y, z size_t vector that can be used any place
where a 3D vector is required. This ensures that all vectors across
liboffload are the same and don't require any resizing/reordering
dances.
2025-06-12 09:59:59 -05:00
Abhinav Gaba
02b6849cf1 [Clang][OpenMP] Fix mapping of arrays of structs with members with mappers (#142511)
This builds upon #101101 from @jyu2-git, which used compiler-generated
mappers when mapping an array-section of structs with members that have
user-defined default mappers.

Now we do the same when mapping arrays of structs.
2025-06-11 19:03:55 +00:00
Kewen12
bbe59e19b6 [OpenMP][Offload] Update the Logic for Configuring Auto Zero-Copy (#143638)
Summary:

Currently the Auto Zero-Copy is enabled by checking every initialized
device to ensure that no dGPU is attached to an APU. However, an APU is
designed to comprise a homogeneous set of GPUs, therefore, it should be
sufficient to check any device for configuring Auto Zero-Copy. In this
PR, it checks the first initialized device in the list.

The changes in this PR are to clearly reflect the design and logic of
enabling the feature for further improving the readibility.
2025-06-11 14:12:54 -04:00
Ethan Luis McDonough
67ff66e677 [PGO][Offload] Fix offload coverage mapping (#143490)
This pull request fixes coverage mapping on GPU targets. 

- It adds an address space cast to the coverage mapping generation pass.
- It reads the profiled function names from the ELF directly. Reading it
from public globals was causing issues in cases where multiple
device-code object files are linked together.
2025-06-10 20:19:38 -05:00
Ross Brunton
637df705e5 [Offload] Add OFFLOAD_INCLUDE_TESTS (#143388)
This is a cmake variable which, if set to `OFF` will disable building of
tests. It defaults to the value of `LLVM_INCLUDE_TESTS`.
2025-06-09 10:27:40 -05:00
Callum Fare
835497a4dc [Offload] Make olMemcpy src parameter const (#143161) 2025-06-06 10:25:00 -05:00
Ross Brunton
269c29ae67 [Offload] Allow setting null arguments in olLaunchKernel (#141958) 2025-06-06 07:05:11 -05:00
Joseph Huber
051945304b [Offload] Fix APU detection for MI300 testing (#143026)
Summary:
We have this check when the target is MI300 but it fails if this
environment variable isn't set. Set a default value of '0' if not
present so that will be converted to bool false.
2025-06-05 15:31:55 -05:00
Callum Fare
f44df93a9c [Offload] Explicitly create directories that contain tablegen output (#142817)
This isn't required when building with Ninja, but with the Makefile
generator these directories don't get implicitly created.
2025-06-04 13:46:19 -05:00
Callum Fare
817af2ddf2 [Offload] Fix missing dependencies in Offload API generation (#142776)
Thanks to @RossBrunton for spotting this.

We attempt to clang-format the generated Offload header files, but if
clang-format isn't available we just copy the generated files instead.
That fallback path was missing the correct dependencies.

Fixes #142756
2025-06-04 08:51:50 -05:00
Callum Fare
b78bc35d16 [Offload] Don't check in generated files (#141982)
Previously we decided to check in files that we generate with tablegen.
The justification at the time was that it helped reviewers unfamiliar
with `offload-tblgen` see the actual changes to the headers in PRs.
After trying it for a while, it's ended up causing some headaches and is
also not how tablegen is used elsewhere in LLVM.

This changes our use of tablegen to be more conventional. Where
possible, files are still clang-formatted, but this is no longer a hard
requirement. Because `OffloadErrcodes.inc` is shared with libomptarget
it now gets generated in a more appropriate place.
2025-06-03 10:39:04 -05:00
Jan Patrick Lehr
e97f42e931 [OpenMP][Offload] Fix typo in error message (#142589)
It appears that the spelling was incorrect in those test cases. At least
on machines with ROCm version > 6.3.

I had no chance to test with ROCm version version < 6.2 and would be
interested in the result if someone has the chance.
2025-06-03 07:33:45 -05:00
Joseph Huber
eb9ed93fce [Offload] Optimistically accept SM architectures (#142399)
Summary:
We try to clamp these to ones known to work, but we should probably just
optimistically accept these. I'd prefer to update the flag check, but
since NVIDIA refuses to publish their ELF format it's too much effort to
reverse engineer.

Fixes: https://github.com/llvm/llvm-project/issues/138532
2025-06-02 14:32:05 -05:00
Ross Brunton
e83c80340f [Offload] Split offload unittests into multiple files (#142418)
Rather than a single `offload.unittests` file, this will produce
`device.unittests`, `event.unittests`, etc.. This should reduce time
spent building tests, and make it easier to manually run a subset of
the tests.

Note that `check-offload-unit` will still run all the tests.
2025-06-02 11:48:12 -05:00
Joseph Huber
5b8031a7f7 [Offload][AMDGPU] Correctly handle variable implicit argument sizes (#142199)
Summary:
The size of the implicit argument struct can vary depending on
optimizations, it is not always the size as listed by the full struct.
Additionally, the implicit arguments are always aligned on a pointer
boundary. This patch updates the handling to use the correctly aligned
offset and only initialize the members if they are contained in the
reported size.

Additionally, we modify the `alloc` and `free` routines to allow
`alloc(0)` and `free(nullptr)` as these are mandated by the C standard
and allow us to easily handle cases where the user calls a kernel with
no arguments.
2025-06-02 09:35:16 -05:00
Ross Brunton
41e22aa31b [Offload] Set size correctly in olLaunchKernel cts test (#142398)
It was previously not scaled by `sizeof(uint32_t)`.
2025-06-02 09:27:09 -05:00
Joseph Huber
b26baf1779 [Offload] Make AMDGPU plugin handle empty allocation properly (#142383)
Summary:
`malloc(0)` and `free(nullptr)` are both defined by the standard but we
current trigger erros and assertions on them. Fix that so this works
with empty arguments.
2025-06-02 08:12:20 -05:00
Ross Brunton
7efb79b705 [Offload] Fix Error checking (#141939)
All errors must be checked - this includes the local variable we were
using to increase the lifetime of `Res`. As we were not explicitly
checking it, it resulted in an `abort` in debug builds.
2025-05-29 08:17:08 -05:00
Ross Brunton
a1191b4875 [Offload] Fix broken tablegen test after #140879 (#141796) 2025-05-28 11:30:15 -05:00
Joseph Huber
0ebe5557d9 [Offload] Add specifier for the host type (#141635)
Summary:
We use this sepcial type to indicate a host value, this will be refined
later but for now it's used as a stand-in device for transfers and
queues. It needs a special kind because it is not a device target as the
other ones so we need to differentiate it between a CPU and GPU type.

Fixes: https://github.com/llvm/llvm-project/issues/141436
2025-05-28 08:51:14 -05:00
Joseph Huber
a9b64bb318 [Offload] Fix segfault when looking for host device name (#141632)
Summary:
This is done using the generic device into pointe, but no such thing
exists for the host device, leading to a segfault. This patch fixes that
for now, but in the future we should probably be more careful in general
handling the possibility that the handle is null everywhere.

Fixes: https://github.com/llvm/llvm-project/issues/141434
2025-05-27 13:43:29 -05:00
Joseph Huber
20f9f1fc02 [Offload][NFCI] Remove coupling to omp target for version scripting (#141637)
Summary:
This is a weird dependency on libomp just for testing if version scripts
work. We shouldn't need to do  this because LLVM already checks for
this. I believe this should be available as well in standalone when we
call `addLLVM` but I did not test that directly.
2025-05-27 13:43:07 -05:00
Ross Brunton
7e9d708be0 [Offload] Use llvm::Error throughout liboffload internals (#140879)
This removes the `ol_impl_result_t` helper class, replacing it with
`llvm::Error`. In addition, some internal functions that returned
`ol_errc_t` now return `llvm::Error` (with a fancy message).
2025-05-27 13:42:56 -05:00
Johannes Doerfert
57a90edacd [OpenMP][GPU][FIX] Enable generic barriers in single threaded contexts (#140786)
The generic GPU barrier implementation checked if it was the main thread
in generic mode to identify single threaded regions. This doesn't work
since inside of a non-active (=sequential) parallel, that thread becomes
the main thread of a team, and is not the main thread in generic mode.
At least that is the implementation of the APIs today.

To identify single threaded regions we now check the team size
explicitly.

This exposed three other issues; one is, for now, expected and not a
bug, the second one is a bug and has a FIXME in the
single_threaded_for_barrier_hang_1.c file, and the final one is also
benign as described in the end.

The non-bug issue comes up if we ever initialize a thread state.
Afterwards we will never run any region in parallel. This is a little
conservative, but I guess thread states are really bad for performance
anyway.

The bug comes up if we optimize single_threaded_for_barrier_hang_1 and
execute it in Generic-SPMD mode. For some reason we loose all the
updates to b. This looks very much like a compiler bug, but could also
be another logic issue in the runtime. Needs to be investigated.

Issue number 3 comes up if we have nested parallels inside of a target
region. The clang SPMD-check logic gets confused, determines SPMD (which
is fine) but picks an unreasonable thread count. This is all benign, I
think, just weird:

```
  #pragma omp target teams
  #pragma omp parallel num_threads(64)
  #pragma omp parallel num_threads(10)
  {}
```
Was launched with 10 threads, not 64.
2025-05-20 19:33:54 -07:00
Ross Brunton
c19a3cb613 [Offload] Make OffloadAPI gtest error messages more readable (#140728) 2025-05-20 08:50:26 -05:00