Summary:
I'll probably want to use this as a more generic utility in the future.
This patch reworks it to make it a top level function. I also tried to
decouple this from the OpenMP utilities to make that easier in the
future. Instead, I just use `-march=native` functionality which is the
same thing. Needed a small hack to skip the linker stage for checking if
that works.
This should still create the same output as far as I'm aware.
Rather than being "stringly typed", store values as a std::variant that
can hold various types. This means that liboffload doesn't have to do
any string parsing for integer/bool device info keys.
Rather than having a number of static local variables, we now use
a single `OffloadContext` struct to store global state. This is
initialised by `olInit`, but is never deleted (de-initialization of
Offload isn't yet implemented).
The error reporting mechanism has not been moved to the struct, since
that's going to cause issues with teardown (error messages must outlive
liboffload).
Previously, device info was returned as a queue with each element having
a "Level" field indicating its nesting level. This replaces this queue
with a more traditional tree-like structure.
This should not result in a change to the output of
`llvm-offload-device-info`.
`pgo_atomic_teams.c` and `pgo_atomic_threads.c` currently are set to run
on NVPTX despite the changes for that target not being upstreamed yet.
This patch also replaces instances of `llvm-profdata` with `%profdata`
in those tests.
This is a three element x, y, z size_t vector that can be used any place
where a 3D vector is required. This ensures that all vectors across
liboffload are the same and don't require any resizing/reordering
dances.
This builds upon #101101 from @jyu2-git, which used compiler-generated
mappers when mapping an array-section of structs with members that have
user-defined default mappers.
Now we do the same when mapping arrays of structs.
Summary:
Currently the Auto Zero-Copy is enabled by checking every initialized
device to ensure that no dGPU is attached to an APU. However, an APU is
designed to comprise a homogeneous set of GPUs, therefore, it should be
sufficient to check any device for configuring Auto Zero-Copy. In this
PR, it checks the first initialized device in the list.
The changes in this PR are to clearly reflect the design and logic of
enabling the feature for further improving the readibility.
This pull request fixes coverage mapping on GPU targets.
- It adds an address space cast to the coverage mapping generation pass.
- It reads the profiled function names from the ELF directly. Reading it
from public globals was causing issues in cases where multiple
device-code object files are linked together.
Summary:
We have this check when the target is MI300 but it fails if this
environment variable isn't set. Set a default value of '0' if not
present so that will be converted to bool false.
Thanks to @RossBrunton for spotting this.
We attempt to clang-format the generated Offload header files, but if
clang-format isn't available we just copy the generated files instead.
That fallback path was missing the correct dependencies.
Fixes#142756
Previously we decided to check in files that we generate with tablegen.
The justification at the time was that it helped reviewers unfamiliar
with `offload-tblgen` see the actual changes to the headers in PRs.
After trying it for a while, it's ended up causing some headaches and is
also not how tablegen is used elsewhere in LLVM.
This changes our use of tablegen to be more conventional. Where
possible, files are still clang-formatted, but this is no longer a hard
requirement. Because `OffloadErrcodes.inc` is shared with libomptarget
it now gets generated in a more appropriate place.
It appears that the spelling was incorrect in those test cases. At least
on machines with ROCm version > 6.3.
I had no chance to test with ROCm version version < 6.2 and would be
interested in the result if someone has the chance.
Summary:
We try to clamp these to ones known to work, but we should probably just
optimistically accept these. I'd prefer to update the flag check, but
since NVIDIA refuses to publish their ELF format it's too much effort to
reverse engineer.
Fixes: https://github.com/llvm/llvm-project/issues/138532
Rather than a single `offload.unittests` file, this will produce
`device.unittests`, `event.unittests`, etc.. This should reduce time
spent building tests, and make it easier to manually run a subset of
the tests.
Note that `check-offload-unit` will still run all the tests.
Summary:
The size of the implicit argument struct can vary depending on
optimizations, it is not always the size as listed by the full struct.
Additionally, the implicit arguments are always aligned on a pointer
boundary. This patch updates the handling to use the correctly aligned
offset and only initialize the members if they are contained in the
reported size.
Additionally, we modify the `alloc` and `free` routines to allow
`alloc(0)` and `free(nullptr)` as these are mandated by the C standard
and allow us to easily handle cases where the user calls a kernel with
no arguments.
Summary:
`malloc(0)` and `free(nullptr)` are both defined by the standard but we
current trigger erros and assertions on them. Fix that so this works
with empty arguments.
All errors must be checked - this includes the local variable we were
using to increase the lifetime of `Res`. As we were not explicitly
checking it, it resulted in an `abort` in debug builds.
Summary:
We use this sepcial type to indicate a host value, this will be refined
later but for now it's used as a stand-in device for transfers and
queues. It needs a special kind because it is not a device target as the
other ones so we need to differentiate it between a CPU and GPU type.
Fixes: https://github.com/llvm/llvm-project/issues/141436
Summary:
This is done using the generic device into pointe, but no such thing
exists for the host device, leading to a segfault. This patch fixes that
for now, but in the future we should probably be more careful in general
handling the possibility that the handle is null everywhere.
Fixes: https://github.com/llvm/llvm-project/issues/141434
Summary:
This is a weird dependency on libomp just for testing if version scripts
work. We shouldn't need to do this because LLVM already checks for
this. I believe this should be available as well in standalone when we
call `addLLVM` but I did not test that directly.
This removes the `ol_impl_result_t` helper class, replacing it with
`llvm::Error`. In addition, some internal functions that returned
`ol_errc_t` now return `llvm::Error` (with a fancy message).
The generic GPU barrier implementation checked if it was the main thread
in generic mode to identify single threaded regions. This doesn't work
since inside of a non-active (=sequential) parallel, that thread becomes
the main thread of a team, and is not the main thread in generic mode.
At least that is the implementation of the APIs today.
To identify single threaded regions we now check the team size
explicitly.
This exposed three other issues; one is, for now, expected and not a
bug, the second one is a bug and has a FIXME in the
single_threaded_for_barrier_hang_1.c file, and the final one is also
benign as described in the end.
The non-bug issue comes up if we ever initialize a thread state.
Afterwards we will never run any region in parallel. This is a little
conservative, but I guess thread states are really bad for performance
anyway.
The bug comes up if we optimize single_threaded_for_barrier_hang_1 and
execute it in Generic-SPMD mode. For some reason we loose all the
updates to b. This looks very much like a compiler bug, but could also
be another logic issue in the runtime. Needs to be investigated.
Issue number 3 comes up if we have nested parallels inside of a target
region. The clang SPMD-check logic gets confused, determines SPMD (which
is fine) but picks an unreasonable thread count. This is all benign, I
think, just weird:
```
#pragma omp target teams
#pragma omp parallel num_threads(64)
#pragma omp parallel num_threads(10)
{}
```
Was launched with 10 threads, not 64.
[Offload] Use new error code handling mechanism
This removes the old ErrorCode-less error method and requires
every user to provide a concrete error code. All calls have been
updated.
In addition, for consistency with error messages elsewhere in LLVM, all
messages have been made to start lower case.
A new ErrorCode enumeration is present in PluginInterface which can
be used when returning an llvm::Error from offload and PluginInterface
functions.
This enum must be kept up to sync with liboffload's ol_errc_t enum, so
both are automatically generated from liboffload's enum definition.
Some error codes have also been shuffled around to allow for future
work. Note that this patch only adds the machinery; actual error codes
will be added in a future patch.
~~Depends on #137339 , please ignore first commit of this MR.~~ This has
been merged.
This aims to implement most of the initial arguments for defaultmap
aside from firstprivate and none, and some of the more recent OpenMP 6
additions which will come in subsequent updates (with the OpenMP 6
variants needing parsing/semantic support first).
Summary:
Right now we generally assume that we have one image per device. The
binary descriptor represents a single 'compilation'. This means that
each image is going to contain the same code built for different
architectures when used through the OpenMP interface. This is
problematic when we have cases where the same code will then be loaded
multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This
patch is the quick and dirty slution, we just prevent this from
happening at all. This means we use the first one we find, which might
not be overly optimal, but it should be better than the alternative.
Note that this does not affect shared library loads as it is per binary
descriptor, not per device.
Currently, we do not generate the appropriate checks to check if an
optional
allocatable argument is present before accessing relevant components of
it,
in particular when creating bounds, we must generate a presence check
and we
must make sure we do not generate/keep an load external to the presence
check
by utilising the raw address rather than the regular address of the info
data structure.
Similarly in cases for optional allocatables we must treat them like
non-allocatable
arguments and generate an intermediate allocation that we can have as a
location
in memory that we can access later in the lowering without causing
segfaults when
we perform "mapping" on it, even if the end result is an empty
allocatable
(basically, we shouldn't explode if someone tries to map a non-present
optional,
similar to C++ when mapping null data).
Summary:
Different ordering modes aren't supported for an atomic load, so we just
do an add of zero as the same thing. It's less efficient, but it works.
Fixes https://github.com/llvm/llvm-project/issues/138560
Modify the unittest logic in offload to only look for
`third-party/unittest` directory when `llvm_gtest` is not provided by
LLVM itself (in-tree or installed). This makes it possible to run
unittests in sparse checkouts without the `third-party/unittest` tree.
While at it, also make sure `LLVM_THIRD_PARTY_DIR` is actually set while
performing standalone builds. The logic is copied from `compiler-rt`.
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
`llvm::Error`s containing errors must be explicitly handled or an assert
will be raised. With this change, `ol_impl_result_t` can accept and
consume an `llvm::Error` for errors raised by PluginInterface that have
multiple causes and other places now call `llvm::consumeError`.
Note that there is currently no facility for PluginInterface to
communicate exact error codes, but the constructor is designed in
such a way that it can be easily added later. This MR is to
convert a crash into an error code.
A new test was added, however due to the aforementioned issue with
error codes, it does not pass and instead is marked as a skip.
A couple of liboffload entry points were missed out from the tests, and
unsurprisingly a crash in one of them made it in. Add the tests and fix
the unchecked error in `olDestroyEvent`.
Adds a `check-offload-unit` target for running the liboffload unit test
suite. This unit test binary runs the tests for every available device.
This can optionally filtered to devices from a single platform, but the
check target runs on everything.
The target is not part of `check-offload` and does not get propagated to
the top level build. I'm not sure if either of these things are
desirable, but I'm happy to look into it if we want.
Also remove the `offload/unittests/Plugins` test as it's dead code and
doesn't build.