Summary:
Relanding after reverting, only applies to AMDGPU for now.
This patch adds an implementation of printf that's provided by the GPU
C library runtime. This pritnf currently implemented using the same
wrapper handling that OpenMP sets up. This will be removed once we have
proper varargs support.
This printf differs from the one CUDA offers in that it is synchronous
and uses a finite size. Additionally we support pretty much every
format specifier except the %n option.
Depends on #85331
Summary:
This patch adds an implementation of `printf` that's provided by the GPU
C library runtime. This `pritnf` currently implemented using the same
wrapper handling that OpenMP sets up. This will be removed once we have
proper varargs support.
This `printf` differs from the one CUDA offers in that it is synchronous
and uses a finite size. Additionally we support pretty much every format
specifier except the `%n` option.
Depends on https://github.com/llvm/llvm-project/pull/85331
Summary:
The previous code would potentially make it smaller if a device with a
lower ID touched it later. Also we should minimize changes to the state
for multi threaded reasons. This just sets up an owned slot for each at
initialization time.
Similar to H2D and D2H, use synchronous mode for large data transfers
beyond a certain size for D2D as well. As with H2D and D2H, this size is
controlled by an env-var.
Summary:
The current implementation of RPC tied everything to device IDs and
forced us to do init / shutdown to manage some global state. This turned
out to be a bad idea in situations where we want to track multiple
hetergeneous devices that may report the same device ID in the same
process.
This patch changes the interface to instead create an opaque handle to
the internal device and simply allocates it via `new`. The user will
then take this device and store it to interface with the attached
device. This interface puts the burden of tracking the device identifier
to mapped d evices onto the user, but in return heavily simplifies the
implementation.
Summary:
This patch removes most of the special handling from the
`PluginAdaptorTy` in preparation for changing this to be the
`GenericPluginTy`. Doing this requires that the OpenMP specific handling
of stuff like device offsets be contained within the OpenMP plugin
manager. Generally this was uninvasive expect for the change to tracking
the offset and size of the used devices. The eaiest way I could think to
do this was to use some maps, which double as indicators for which
plugins have devices active. This should not affect the logic.
Summary:
Previously we had an interface that checked these functions pointers to
see if they are implemented by the plugin. This was removed as currently
every single function is implemented as a part of the common interface.
These checks are now always true and do nothing.
Summary:
The plan is to remove the entire plugin interface and simply use the
`GenericPluginTy` inside of `libomptarget` by statically linking against
it. This means that inside of `libomptarget` we will simply do
`Plugin.data_alloc` without the dynamically loaded interface. To reduce
the amount of code required, this patch simply moves all of the RTL
implementation functions inside of the Generic device. Now the
`__tgt_rtl_` interface is simply a shallow wrapper that will soon go
away. There is some redundancy here, this will be improved later. For
now what is important is minimizing the changes to the API.
Summary:
We have a plugin singleton that implements the Plugin interface. This
then spawns separate device and kernels. Previously when these needed to
reach into the global singleton they would use the `PluginTy::get`
routine to get access to it. In the future we will move away from this
as the lifetime of the plugin will be handled by `libomptarget`
directly. This patch removes uses of this inside of the plugin
implementaion themselves by simply keeping a reference to the plugin
inside of the device.
The external `__tgt_rtl` functions still use the global method, but will
be removed later.
Summary:
This patch factors common functions out of the `Plugin` interface prior
to its removal in a future patch. This simply temporarily renames it to
`PluginTy` so that we could re-use `Plugin::check` internally as this
needs to be defined statically per plugin now. We can refactor this
later.
The future patch will delete `PluginTy` and `PluginTy::get` entirely.
This simply tries to minimize a few changes to make it easier to land.
Summary:
It's possible for the OpenMP offloading plugins to be build before
tablegen is run despite the fact that we rely on it. Simply make it
depend on it currently.
Use `LINK_COMPONENTS` parameter of `add_llvm_library` rather than
passing LLVM components directly to `target_link_libraries`, in order to
ensure that LLVM dylib is linked correctly when used. Otherwise, CMake
insists on linking to static libraries that aren't present on
distributions doing pure dylib installs, such as Gentoo.
This fixes a regression introduced
in dcbddc2525.
Summary:
This patch reworks the CMake handling for building plugins. All this
does is pull a lot of shared and common logic into a single helper
function.
This also simplifies the OMPT libraries from being built separately
instead of just added.
This PR refactors bounds offsetting by combining the two differing
implementations (one applying to initial derived type member map
implementation for descriptors and the other for regular arrays,
effectively allocatable array vs regular array in fortran) now that it's
a little simpler to do.
The PR also moves the utilization of createAlteredByCaptureMap into
genMapInfoOp, where it will be correctly applied to all MapInfoData,
appropriately offsetting and altering Pointer data set in the kernel
argument structure on the host. This primarily means bounds offsets will
now correctly apply to enter/exit/update map clauses as opposed to just
the Target directive that is currently the case. A few fortran runtime
tests have been added to verify this new behavior.
This PR depends on: https://github.com/llvm/llvm-project/pull/84328 and
is an extraction of the larger derived type member map PR stack (so a
requirement for it to land).
Commit a7d5f73a03 introduced an
error in a target_compile_definitions on the SystemZ, causing
the build to break. Fixed by adding the missing "PRIVATE".
Summary:
All of these CPU targets use the same underlying implementation. We
should consolidate them into a single target to make it easier to update
this to a static library based approach. I have decided to call this the
'host' target so it can be given a single name. We still only build
these if the system processor matches and we are on Linux.
This patch updates the construction of packet headers to replace the
usage of ACQUIRE/RELEASE with SCACQUIRE/SCRELEASE which is now
recommended.
The patch also ensures consistency across kernel dispatches.
A kernel implicit parameter (dyn_ptr) was introduced some time back.
This patch increments the kernel args version for a compiler supporting
dyn_ptr. The version will be used by the runtime to determine whether
the implicit parameter is generated by the compiler. The versioning is
required to support use cases where code generated by an older compiler
is linked with a newer runtime.
If approved, this patch should be backported to release 18.
This patch enables the BodyCodeGen callback to still trigger for the
TargetData nested region during the device pass. There maybe Target code
nested within the TargetData region for which this is required.
Also add tests for the same.
This adds an API call ompx_dump_mapping_tables.
This allows users to debug the mapping tables and can be especially
useful for unified shared memory applications to check if the code
behaves in the way it should. The implementation reuses code already
present to dump mapping tables (in a debug setting).
---------
Co-authored-by: Joseph Huber <huberjn@outlook.com>
When running OpenMP offloading application with LIBOMPTARGET_INFO=-1,
the addresses of the Copying data from **device** to **host**, the
address are swap.
As an example, Currently the address would be
```
omptarget device 0 info: Mapping exists with HstPtrBegin=0x00007ffddaf0fb90, TgtPtrBegin=0x00007fb385404000, Size=8000, DynRefCount=0 (decremented, delayed deletion), HoldRefCount=0
omptarget device 0 info: Copying data from device to host, TgtPtr=0x00007ffddaf0fb90, HstPtr=0x00007fb385404000, Size=8000, Name=d
```
And it should be
```
omptarget device 0 info: Copying data from device to host, TgtPtr=0x00007fb385404000, HstPtr=0x00007ffddaf0fb90, Size=8000, Name=d
```
---------
Co-authored-by: fel-cab <fel-cab@github.com>
The plugin was not getting built as the build_generic_elf64 macro
assumes the LLVM triple processor name matches the CMake processor name,
which is unfortunately not the case for SystemZ.
Fix this by providing two separate arguments instead.
Actually building the plugin exposed a number of other issues causing
various test failures. Specifically, I've had to add the SystemZ target
to
- CompilerInvocation::ParseLangArgs
- linkDevice in ClangLinuxWrapper.cpp
- OMPContext::OMPContext (to set the device_kind_cpu trait)
- LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt
- a check_plugin_target call in libomptarget/src/CMakeLists.txt
Finally, I've had to set a number of test cases to UNSUPPORTED on
s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED
for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on
s390x for what seem to be the same reason.
In addition, this also requires support for BE ELF files in
plugins-nextgen: https://github.com/llvm/llvm-project/pull/85246
Code in plugins-nextgen reading ELF files is currently hard-coded to
assume a 64-bit little-endian ELF format. Unfortunately, this assumption
is even embedded in the interface between GlobalHandler and Utils/ELF
routines, which use ELF64LE types.
To fix this, I've refactored the interface to use generic types, in
particular by using (a unique_ptr to) ObjectFile instead of
ELF64LEObjectFile, and ELFSymbolRef instead of ELF64LE::Sym.
This allows properly templating over multiple ELF format variants inside
Utils/ELF; specifically, this patch adds support for 64-bit big-endian
ELF files in addition to 64-bit little-endian files.
Code in plugins-nextgen reading ELF files is currently hard-coded to
assume a 64-bit little-endian ELF format. Unfortunately, this assumption
is even embedded in the interface between GlobalHandler and Utils/ELF
routines, which use ELF64LE types.
To fix this, I've refactored the interface to use generic types, in
particular by using (a unique_ptr to) ObjectFile instead of
ELF64LEObjectFile, and ELFSymbolRef instead of ELF64LE::Sym.
This allows properly templating over multiple ELF format variants inside
Utils/ELF; specifically, this patch adds support for 64-bit big-endian
ELF files in addition to 64-bit little-endian files.
Summary:
We are currently taking the lower 5 bites of the thread ID as the warp
ID. This doesn't work in non-1D grids and is also slower than just using
the dedicated hardware register.
As per the OpenMP standard, "If a variable appears in a link clause on a
declare target directive that does not have a device_type clause with
the nohost device-type-description then it is treated as if it had
appeared in a map clause with a map-type of tofrom" is an implicit
mapping rule. Before this change, such variables were mapped as to by
default.
After 3ecd38c8e, the Handler.getELFObjectFile routine is no
longer available. Call ELF64LEObjectFile::create directly,
which should always be suitable for CUDA images.
The plugin was not getting built as the build_generic_elf64 macro
assumes the LLVM triple processor name matches the CMake processor name,
which is unfortunately not the case for SystemZ.
Fix this by providing two separate arguments instead.
Actually building the plugin exposed a number of other issues causing
various test failures. Specifically, I've had to add the SystemZ target
to
- CompilerInvocation::ParseLangArgs
- linkDevice in ClangLinuxWrapper.cpp
- OMPContext::OMPContext (to set the device_kind_cpu trait)
- LIBOMPTARGET_ALL_TARGETS in libomptarget/CMakeLists.txt
- a check_plugin_target call in libomptarget/src/CMakeLists.txt
Finally, I've had to set a number of test cases to UNSUPPORTED on
s390x-ibm-linux-gnu; all these tests were already marked as UNSUPPORTED
for x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu and are failing on
s390x for what seem to be the same reason.
In addition, this also requires support for BE ELF files in
plugins-nextgen: https://github.com/llvm/llvm-project/pull/83976
Code in plugins-nextgen reading ELF files is currently hard-coded to
assume a 64-bit little-endian ELF format. Unfortunately, this assumption
is even embedded in the interface between GlobalHandler and Utils/ELF
routines, which use ELF64LE types.
To fix this, I've refactored the interface to push all ELF specific
types into Utils/ELF. Specifically, this patch removes both the
getSymbol and getSymbolAddress routines and replaces them with a
single findSymbolInImage, which gets a StringRef identifying the
raw object file image as input, and returns a StringRef covering
the data addressed by the symbol (address and size) if found, or
std::nullopt otherwise.
This allows properly templating over multiple ELF format variants inside
Utils/ELF; specifically, this patch adds support for 64-bit big-endian
ELF files in addition to 64-bit little-endian files.