Commit Graph

174 Commits

Author SHA1 Message Date
Joseph Huber
4213f4a9ae [Libomptarget] Fix resizing the buffer of RPC handles
Summary:
The previous code would potentially make it smaller if a device with a
lower ID touched it later. Also we should minimize changes to the state
for multi threaded reasons. This just sets up an owned slot for each at
initialization time.
2024-04-01 07:29:57 -05:00
Joseph Huber
a1a8bb1d3a [libc] Change RPC interface to not use device ids (#87087)
Summary:
The current implementation of RPC tied everything to device IDs and
forced us to do init / shutdown to manage some global state. This turned
out to be a bad idea in situations where we want to track multiple
hetergeneous devices that may report the same device ID in the same
process.

This patch changes the interface to instead create an opaque handle to
the internal device and simply allocates it via `new`. The user will
then take this device and store it to interface with the attached
device. This interface puts the burden of tracking the device identifier
to mapped d evices onto the user, but in return heavily simplifies the
implementation.
2024-03-29 12:49:16 -05:00
Joseph Huber
ed68aac9f2 [Libomptarget] Move API implementations into GenericPluginTy (#86683)
Summary:
The plan is to remove the entire plugin interface and simply use the
`GenericPluginTy` inside of `libomptarget` by statically linking against
it. This means that inside of `libomptarget` we will simply do
`Plugin.data_alloc` without the dynamically loaded interface. To reduce
the amount of code required, this patch simply moves all of the RTL
implementation functions inside of the Generic device. Now the
`__tgt_rtl_` interface is simply a shallow wrapper that will soon go
away. There is some redundancy here, this will be improved later. For
now what is important is minimizing the changes to the API.
2024-03-27 14:10:54 -05:00
Joseph Huber
4dc3225248 [Libomptarget] Replace global PluginTy::get interface with references (#86595)
Summary:
We have a plugin singleton that implements the Plugin interface. This
then spawns separate device and kernels. Previously when these needed to
reach into the global singleton they would use the `PluginTy::get`
routine to get access to it. In the future we will move away from this
as the lifetime of the plugin will be handled by `libomptarget`
directly. This patch removes uses of this inside of the plugin
implementaion themselves by simply keeping a reference to the plugin
inside of the device.

The external `__tgt_rtl` functions still use the global method, but will
be removed later.
2024-03-26 07:13:59 -05:00
Joseph Huber
2cad43c1ba [Libomptarget] Factor functions out of 'Plugin' interface (#86528)
Summary:
This patch factors common functions out of the `Plugin` interface prior
to its removal in a future patch. This simply temporarily renames it to
`PluginTy` so that we could re-use `Plugin::check` internally as this
needs to be defined statically per plugin now. We can refactor this
later.

The future patch will delete `PluginTy` and `PluginTy::get` entirely.
This simply tries to minimize a few changes to make it easier to land.
2024-03-25 15:24:39 -05:00
Joseph Huber
9f0321ccf1 [Libomptarget] Make plugins depend explicitly on intrinsics_gen
Summary:
It's possible for the OpenMP offloading plugins to be build before
tablegen is run despite the fact that we rely on it. Simply make it
depend on it currently.
2024-03-24 15:24:35 -05:00
Joseph Huber
909ea28ac6 [Libomptarget] Specificall add LLVM include dirs in plugins 2024-03-24 14:29:39 -05:00
Joseph Huber
dcbddc2525 [Libomptarget] Unify and simplify plugin CMake (#86191)
Summary:
This patch reworks the CMake handling for building plugins. All this
does is pull a lot of shared and common logic into a single helper
function.
This also simplifies the OMPT libraries from being built separately
instead of just added.
2024-03-22 16:13:58 -05:00
dhruvachak
b5d02bbd0d [OpenMP] Increment kernel args version, used by runtime for detecting dyn_ptr. (#85363)
A kernel implicit parameter (dyn_ptr) was introduced some time back.
This patch increments the kernel args version for a compiler supporting
dyn_ptr. The version will be used by the runtime to determine whether
the implicit parameter is generated by the compiler. The versioning is
required to support use cases where code generated by an older compiler
is linked with a newer runtime.

If approved, this patch should be backported to release 18.
2024-03-19 16:40:22 -07:00
Joseph Huber
470040bd4d [Libomptarget][NFC] Remove warning on return value const 2024-03-15 18:50:33 -05:00
Ulrich Weigand
2210c85a66 Reapply [libomptarget] Support BE ELF files in plugins-nextgen (#85246)
Code in plugins-nextgen reading ELF files is currently hard-coded to
assume a 64-bit little-endian ELF format. Unfortunately, this assumption
is even embedded in the interface between GlobalHandler and Utils/ELF
routines, which use ELF64LE types.

To fix this, I've refactored the interface to use generic types, in
particular by using (a unique_ptr to) ObjectFile instead of
ELF64LEObjectFile, and ELFSymbolRef instead of ELF64LE::Sym.

This allows properly templating over multiple ELF format variants inside
Utils/ELF; specifically, this patch adds support for 64-bit big-endian
ELF files in addition to 64-bit little-endian files.
2024-03-15 18:28:28 +01:00
Ulrich Weigand
4c8714efc5 Revert "[libomptarget] Support BE ELF files in plugins-nextgen (#85246)"
This reverts commit 611c62b30d.
2024-03-14 18:38:13 +01:00
Ulrich Weigand
611c62b30d [libomptarget] Support BE ELF files in plugins-nextgen (#85246)
Code in plugins-nextgen reading ELF files is currently hard-coded to
assume a 64-bit little-endian ELF format. Unfortunately, this assumption
is even embedded in the interface between GlobalHandler and Utils/ELF
routines, which use ELF64LE types.

To fix this, I've refactored the interface to use generic types, in
particular by using (a unique_ptr to) ObjectFile instead of
ELF64LEObjectFile, and ELFSymbolRef instead of ELF64LE::Sym.

This allows properly templating over multiple ELF format variants inside
Utils/ELF; specifically, this patch adds support for 64-bit big-endian
ELF files in addition to 64-bit little-endian files.
2024-03-14 18:19:12 +01:00
Joseph Huber
8a79003307 [libc] Move RPC opcodes include out of the header
Summary:
This header isn't strictly necessary, and is currently broken because we
install these to separate locations.
2024-03-10 14:07:47 -05:00
Ulrich Weigand
fb7cc73975 Revert "[libomptarget] Support BE ELF files in plugins-nextgen (#83976)"
This reverts commit 15b7b3182c.
2024-03-06 21:37:45 +01:00
Ulrich Weigand
15b7b3182c [libomptarget] Support BE ELF files in plugins-nextgen (#83976)
Code in plugins-nextgen reading ELF files is currently hard-coded to
assume a 64-bit little-endian ELF format. Unfortunately, this assumption
is even embedded in the interface between GlobalHandler and Utils/ELF
routines, which use ELF64LE types.

To fix this, I've refactored the interface to push all ELF specific
types into Utils/ELF. Specifically, this patch removes both the
getSymbol and getSymbolAddress routines and replaces them with a
single findSymbolInImage, which gets a StringRef identifying the
raw object file image as input, and returns a StringRef covering
the data addressed by the symbol (address and size) if found, or
std::nullopt otherwise.

This allows properly templating over multiple ELF format variants inside
Utils/ELF; specifically, this patch adds support for 64-bit big-endian
ELF files in addition to 64-bit little-endian files.
2024-03-06 20:49:12 +01:00
Daniel Martinez
aa6ebf9be1 Replace some C headers with C++ ones (#82697)
#81434

Replaced some C headers with C++ ones

Co-authored-by: Daniel Martinez <danielmartinez@cock.li>
2024-03-04 01:21:31 -05:00
Joseph Huber
1a2ecbb398 [libc] Remove 'llvm-gpu-none' directory from build (#82816)
Summary:
This directory is leftover from when we handled both AMDGPU and NVPTX in
the same build and merged them into a pseudo triple. Now the only thing
it contains is the RPC server header. This gets rid of it, but now that
it's in the base install directory we should make it clear that it's an
LLVM libc header.
2024-02-23 14:11:31 -06:00
Joseph Huber
47b7c91abe [libc] Rework the GPU build to be a regular target (#81921)
Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.

This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.

The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.

The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.

The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.

We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.

This approach is vastly superior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.

Some certain utilities need to be built with 
`--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine
workaround as we
will always assume that the GPU `libc` is a cross-build with a
functioning host.

Depends on https://github.com/llvm/llvm-project/pull/81557
2024-02-22 15:29:29 -06:00
Joseph Huber
0ac4438560 [Libomptarget] Remove unused 'SupportsEmptyImages' API function (#80316)
Summary:
This function is always false in the current implementation and is not
even considered required. Just remove it and if someone needs it in the
future they can add it back in. This is done to simplify the interface
prior to other changes
2024-02-05 10:00:09 -06:00
Joseph Huber
621bafd5c1 [Libomptarget] Move target table handling out of the plugins (#77150)
Summary:
This patch removes the bulk of the handling of the
`__tgt_offload_entries` out of the plugins itself. The reason for this
is because the plugins themselves should not be handling this
implementation detail of the OpenMP runtime. Instead, we expose two new
plugin API functions to get the points to a device pointer for a global
as well as a kernel type.

This required introducing a new type to represent a binary image that
has been loaded on a device. We can then use this to load the addresses
as needed. The creation of the mapping table is then handled just in
`libomptarget` where we simply look up each address individually. This
should allow us to expose these operations more generically when we
provide a separate API.
2024-01-22 11:06:47 -06:00
carlobertolli
ae99966a27 [OpenMP] Enable automatic unified shared memory on MI300A. (#77512)
This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch also introduces an environment variable OMPX_APU_MAPS that,
if set, triggers automatic zero-copy also on non APU GPUs (e.g., on
discrete GPUs).
This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-22 10:30:22 -06:00
Joseph Huber
37c1a5e3f5 [Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828)
Summary:
The constructors and destructors look up a symbol in the ELF quickly to
determine if they need to be run on the GPU. This allows us to avoid the
very slow actions required to do the slower lookup using the vendor API.

One problem occurs with how we handle the lifetime of these images.
Right now there is no invariant to specify the lifetime of the
underlying binary image that is loaded. In the typical case, this comes
from the binary itself in the `.llvm.offloading` section, meaning that
the lifetime of the binary should match the executable itself. This
would work fine, if it weren't for the fact that the plugin is loaded
via `dlopen` and can have a teardown order out of sync with the main
executable.

This was likely what was occuring when this failed on some systems but
not others. A potential solution would be to simply copy images into
memory so the runtime does not rely on external references. Another
would be to manually zero these out after initialization as to prevent
this mistake from happening accidentally. The former has the benefit of
making some checks easier, and allowing for constant initialization be
done on the ELF itself (normally we can't do this because writing to a
constant section, e.g. .llvm.offloading is a segfault.). The downside
would be the extra time required to copy the image in bulk (Although we
are likely doing this in the vendor runtimes as well).

This patch went with a quick solution to simply set a boolean value at
initialization time if we need to call destructors.

Fixes: https://github.com/llvm/llvm-project/issues/77798
2024-01-11 15:00:53 -06:00
Joseph Huber
e203968e41 [Libomptarget] Do not abort on failed plugin init (#77623)
Summary:
The current code logic is supposed to skip plugins that aren't found or
could not be loaded. However, the plugic ontained a call to `abort` if
it failed, which prevented us from continuing if initilalization the
plugin failed (such as if `dlopen` failed for the dyanmic plugins).
2024-01-10 11:42:04 -06:00
Joseph Huber
d03b8c3a04 [Libomptarget][NFC] Format in-line comments consistently (#77530)
Summary:
The LLVM style uses /*Foo=*/ when indicating the name of a constant. See
https://llvm.org/docs/CodingStandards.html#comment-formatting. This is
useful for consistency, as well as because `clang-format` understands
this syntax and formats it more cleanly. Do a bulk update of this
syntax.
2024-01-10 10:10:08 -06:00
carlobertolli
ce4144406c Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371)
Reverts llvm/llvm-project#75999

lit test is failing.
2024-01-08 14:38:29 -06:00
carlobertolli
22a73e7c46 [OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999)
…on (zero-copy) on MI300A.

This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-08 14:17:28 -06:00
Joseph Huber
bda562519b [Libomptarget][NFC] Fix unhandled allocator enum value 2024-01-08 10:17:05 -06:00
Joseph Huber
fb32977ac7 [Libomptarget] Fix RPC-based malloc on NVPTX (#72440)
Summary:
The device allocator on NVPTX architectures is enqueued to a stream that
the kernel is potentially executing on. This can lead to deadlocks as
the kernel will not proceed until the allocation is complete and the
allocation will not proceed until the kernel is complete. CUDA 11.2
introduced async allocations that we can manually place on separate
streams to combat this. This patch makes a new allocation type that's
guaranteed to be non-blocking so it will actually make progress, only
Nvidia needs to care about this as the others are not blocking in this
way by default.

I had originally tried to make the `alloc` and `free` methods take a
`__tgt_async_info`. However, I observed that with the large volume of
streams being created by a parallel test it quickly locked up the system
as presumably too many streams were being created. This implementation
not just creates a new stream and immediately destroys it. This
obviously isn't very fast, but it at least gets the cases to stop
deadlocking for now.
2024-01-02 16:53:53 -06:00
Joseph Huber
64f0681e97 [Libomptarget] Rework image checking further (#76120)
Summary:
In the future, we may have more checks for different kinds of inputs,
e.g. SPIR-V. This patch simply reworks the handling to be more generic
and do the magic detection up-front. The checks inside the routines are
now asserts so we don't spend time checking this stuff over and over
again.

This patch also tweaked the bitcode check. I used a different function
to get the Lazy-IR module now, as it returns the raw expected value
rather than the SM diganostic.

No functionality change intended.
2023-12-29 15:14:39 -06:00
Joseph Huber
f324584ae3 [Libomptarget][NFCI] Remove caching of created ELF files (#76080)
Summary:
We currently keep a cache of created ELF files from the relevant images.
This shouldn't be necessary as the entire ELF interface is generally
trivially constructable and extremely cheap. The cost of constructing
one of these objects is simply a size check and writing a pointer to the
underlying data. Given that, keeping a cache of these images should not
be necessary overall.
2023-12-20 17:13:41 -06:00
Joseph Huber
e4f4022b70 [Libomptarget][NFC] Fix linting warnings in the plugins
Summary:
Fix some linting warnings present in the plugins.
2023-12-20 10:07:34 -06:00
Joseph Huber
ac029e02a9 [Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720)
Summary:
This patch reorganizes a lot of the code used to check for compatibility
with the current environment. The main bulk of this patch involves
moving from using a separate `__tgt_image_info` struct (which just
contains a string for the architecture) to instead simply checking this
information from the ELF directly. Checking information in the ELF is
very inexpensive as creating an ELF file is simply writing a base
pointer.

The main desire to do this was to reorganize everything into the ELF
image. We can then do the majority of these checks without first
initializing the plugin. A future patch will move the first ELF checks
to happen without initializing the plugin so we no longer need to
initialize and plugins that don't have needed images.

This patch also adds a lot more sanity checks for whether or not the ELF
is actually compatible. Such as if the images have a valid ABI, 64-bit
width, executable, etc.
2023-12-19 20:01:31 -06:00
Shilei Tian
3768039913 [OpenMP] Directly use user's grid and block size in kernel language mode (#70612)
In kernel language mode, use user's grid and blocks size directly. No
validity
check, which means if user's values are too large, the launch will fail,
similar
to what CUDA and HIP are doing right now.
2023-12-18 12:26:18 -05:00
Joseph Huber
913622d012 [Libomptarget] Remove remaining global constructors in plugins (#75814)
Summary:
This patch fixes the remaining global constructor in the plguins after
addressing the ones in the JIT interface. This struct was mistakenly
using global constructors as not all the members were being initialized
properly. This was almost certainly being optimized out because it's
trivial, but would still be present in debug builds and prevented us
from compiling with `-Werror=global-constructors`. We will want to do
that once offloading is moved to a runtimes only build.
2023-12-18 11:01:02 -06:00
Joseph Huber
1580877555 [Libomptarget] Remove bitcode image map used for JIT processing (#75672)
Summary:
Libomptarget supports JIT by treating an LLVM-IR file as a regular input
image. The handling here used a global map to keep track of triples once
it was parsed. This was done to same time, however this created a global
constructor as well as an extra mutex to handle it. This patch removes
the use of this map.

Instead, we simply use the file magic to perform a quick check if the
input image is valid bitcode. If not, we then create a lazy module. This
should roughly equivalent to the old handling that create an IR symbol
table. Here we can prevent the module from materializing everything but
the single triple metadata we read in later.
2023-12-18 09:28:06 -06:00
Kazu Hirata
b8f89b84bc Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-16 15:02:17 -08:00
Joseph Huber
0ab663d202 [Libomptarget] Move ELF symbol extraction to the ELF utility (#74717)
Summary:
We shouldn't have the format specific ELF handling in the generic plugin
manager. This patch moves that out of the implementation and into the
ELF utilities. This patch changes the SHT_NOBITS case to be a hard
error, which should be correct as the existing use already seemed to
return an error if the result was a null pointer.

This also uses a `const_cast`, which is bad practice. However,
rebuilding the `constness` of all of this would be a massive overhaul,
and this matches the previous behaviour (We would take a pointer to the
image that is most likely read-only in the ELF).
2023-12-14 11:04:13 -06:00
Johannes Doerfert
12cbccc312 [OpenMP] Add extra flags to libomptarget and plugin builds (#74520) 2023-12-11 10:41:50 -08:00
Johannes Doerfert
0ace6ee73a [OpenMP][FIX] Ensure we do not read outside the device image (#74669)
Before we expected all symbols in the device image to be backed up with
data that we could read. However, uninitialized values are not. We now
check for this case and avoid reading random memory.

This also replaces the correct readGlobalFromImage call with a
isSymbolInImage check after
https://github.com/llvm/llvm-project/pull/74550 picked the wrong one.

Fixes: https://github.com/llvm/llvm-project/issues/74582
2023-12-06 14:57:57 -08:00
Joseph Huber
6f3bd3a2f6 [Libomptarget] Add a utility function for checking existence of symbols (#74550)
Summary:
There are now a few cases that check if a symbol is present before
continuing, effectively making them optional features if present in the
image. This was done in at least three locations and required an ugly
operation to consume the error. This patch makes a utility function to
handle that instead.
2023-12-06 07:41:27 -06:00
Johannes Doerfert
68db7aef74 [OpenMP] Reorganize the initialization of PluginAdaptorTy (#74397)
This introduces checked errors into the creation and initialization of
`PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the
user if the initialization failed. The new organization avoids the
"initOnce" stuff but we still do not eagerly initialize the plugin
devices (I think we should merge `PluginAdaptorTy::initDevices` into
`PluginAdaptorTy::init`)
2023-12-05 16:04:01 -08:00
Johannes Doerfert
9f87509b19 [OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532)
This fixes two bugs and adds a test for them:
- A shared library with declare target functions but without kernels
should not error out due to missing globals.
- Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of
indirect declare targets.
2023-12-05 15:25:10 -08:00
Johannes Doerfert
5fe741f08e [OpenMP] Separate Requirements into a standalone header (#74126)
This is not completely NFC since we now check all 4 requirements and the
test is checking the good and the bad case for combining flags.
2023-12-01 14:47:00 -08:00
dhruvachak
ca2d79f9ca [OpenMP] Add an INFO message for data transfer of kernel launch env. (#74030) 2023-12-01 10:58:23 -08:00
Johannes Doerfert
148dec9fa4 [OpenMP][NFC] Separate Envar (environment variable) handling (#73994) 2023-11-30 15:23:34 -08:00
Johannes Doerfert
2e7f47d4a8 [OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868) 2023-11-29 16:04:19 -08:00
Johannes Doerfert
fae233c63f [OpenMP] Avoid initializing the KernelLaunchEnvironment if possible (#73864)
If we don't have a team reduction we don't need a kernel launch
environment (for now). In that case we can avoid the cost.
2023-11-29 14:49:13 -08:00
Johannes Doerfert
e2299e8d9d [OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718) 2023-11-29 08:29:41 -08:00
Johannes Doerfert
db96a9c3b7 [OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725)
For historic reasons we had it setup that there was
`  plugin-nextgen/common/PluginInterface/<sources + headers>`
which is not what we do anywhere else.
Now it looks like the rest:
```
  plugin-nextgen/common/include/<headers>
  plugin-nextgen/common/src/<sources>
```
As part of this, `dlwrap.h` was moved into common/include (as
`DLWrap.h`)
since it is exclusively used by the plugins.
2023-11-29 07:57:01 -08:00