Commit Graph

213 Commits

Author SHA1 Message Date
Joseph Huber
ea707baca2 [Libomptarget][NFCI] Move logic out of PluginAdaptorTy (#86971)
Summary:
This patch removes most of the special handling from the
`PluginAdaptorTy` in preparation for changing this to be the
`GenericPluginTy`. Doing this requires that the OpenMP specific handling
of stuff like device offsets be contained within the OpenMP plugin
manager. Generally this was uninvasive expect for the change to tracking
the offset and size of the used devices. The eaiest way I could think to
do this was to use some maps, which double as indicators for which
plugins have devices active. This should not affect the logic.
2024-03-29 07:19:22 -05:00
nicebert
20f5bcfb1a [OpenMP] Add OpenMP extension API to dump mapping tables (#85381)
This adds an API call ompx_dump_mapping_tables.
This allows users to debug the mapping tables and can be especially
useful for unified shared memory applications to check if the code
behaves in the way it should. The implementation reuses code already
present to dump mapping tables (in a debug setting).

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-03-18 14:09:20 -05:00
Daniel Martinez
aa6ebf9be1 Replace some C headers with C++ ones (#82697)
#81434

Replaced some C headers with C++ ones

Co-authored-by: Daniel Martinez <danielmartinez@cock.li>
2024-03-04 01:21:31 -05:00
Michael Halkenhäuser
e521752c04 [OpenMP][OMPT] Add OMPT callback for device data exchange 'Device-to-Device' (#81991)
Since there's no `ompt_target_data_transfer_tofrom_device` (within
ompt_target_data_op_t enum) or something other that conveys the meaning
of inter-device data exchange we decided to indicate a Device-to-Device
transfer by using: optype == ompt_target_data_transfer_from_device (=3)

Hence, a device transfer may be identified e.g. by checking for: (optype
== 3) &&
(src_device_num < omp_get_num_devices()) &&
(dest_device_num < omp_get_num_devices())

Fixes: #66478
2024-02-26 11:16:25 +01:00
Joseph Huber
87b4108211 [Libomptarget][NFC] Remove concept of optional plugin functions (#82681)
Summary:
Ever since the introduction of the new plugins we haven't exercised the
concept of "optional" plugin functions. This is done in perparation for
making the plugins use a static interface as it will greatly simplify
the implementation if we assert that every function has the entrypoints.
Currently some unsupported functions will just return failure or some
other default value, so this shouldn't change anything.
2024-02-22 16:49:21 -06:00
Daniel Martinez
45fe67dd61 Fix build on musl by including stdint.h (#81434)
openmp fails to build on musl since it lacks the defines for int32_t

Co-authored-by: Daniel Martinez <danielmartinez@cock.li>
2024-02-22 13:14:27 -08:00
Joseph Huber
ea174c0934 [Libomptarget] Remove global ctor and use reference counting (#80499)
Summary:
Currently we rely on global constructors to initialize and shut down the
OpenMP runtime library and plugin manager. This causes some issues
because we do not have a defined lifetime that we can rely on to release
and allocate resources. This patch instead adds some simple reference
counted initialization and deinitialization function.

A future patch will use the `deinit` interface to more intelligently
handle plugin deinitilization. Right now we do nothing and rely on
`atexit` inside of the plugins to tear them down. This isn't great
because it limits our ability to control these things.

Note that I made the `__tgt_register_lib` functions do the
initialization instead of adding calls to the new runtime functions in
the linker wrapper. The reason for this is because in the past it's been
easier to not introduce a new function call, since sometimes the user's
compiler will link against an older `libomptarget`. Maybe if we change
the name with offloading in the future we can simplify this.

Depends on https://github.com/llvm/llvm-project/pull/80460
2024-02-22 12:01:52 -06:00
Joseph Huber
cc374d8056 [OpenMP] Remove register_requires global constructor (#80460)
Summary:
Currently, OpenMP handles the `omp requires` clause by emitting a global
constructor into the runtime for every translation unit that requires
it. However, this is not a great solution because it prevents us from
having a defined order in which the runtime is accessed and used.

This patch changes the approach to no longer use global constructors,
but to instead group the flag with the other offloading entires that we
already handle. This has the effect of still registering each flag per
requires TU, but now we have a single constructor that handles
everything.

This function removes support for the old `__tgt_register_requires` and
replaces it with a warning message. We just had a recent release, and
the OpenMP policy for the past four releases since we switched to LLVM
is that we do not provide strict backwards compatibility between major
LLVM releases now that the library is versioned. This means that a user
will need to recompile if they have an old binary that relied on
`register_requires` having the old behavior. It is important that we
actively deprecate this, as otherwise it would not solve the problem of
having no defined init and shutdown order for `libomptarget`. The
problem of `libomptarget` not having a define init and shutdown order
cascades into a lot of other issues so I have a strong incentive to be
rid of it.

It is worth noting that the current `__tgt_offload_entry` only has space
for a 32-bit integer here. I am planning to overhaul these at some point
as well.
2024-02-21 11:33:32 -06:00
dhruvachak
0d7f232baf [libomptarget] [OMPT] Fixed return address computation for OMPT events. (#80498)
Currently, __builtin_return_address is used to generate the return
address when the callback invoker is created. However, this may result
in the return address pointing to an internal runtime function. This is
not what a tool would typically want. A tool would want to know the
corresponding user code from where the runtime entry point is invoked.

This change adds a thread local variable that is assigned the return
address at the OpenMP runtime entry points. An RAII is used to manage
the modifications to the thread local variable. Whenever the return
address is required for OMPT events, it is read from the thread local
variable.
2024-02-07 17:29:08 -08:00
Joseph Huber
0ac4438560 [Libomptarget] Remove unused 'SupportsEmptyImages' API function (#80316)
Summary:
This function is always false in the current implementation and is not
even considered required. Just remove it and if someone needs it in the
future they can add it back in. This is done to simplify the interface
prior to other changes
2024-02-05 10:00:09 -06:00
Joseph Huber
2333865546 [Libomptarget] Fix data mapping on dynamic loads (#80559)
Summary:
The current logic tries to map target mapping tables to the current
device. Right now it assumes that data is only mapped a single time per
device. This is only true if we have a single instance of the runtime
running on a single program. However, in the case of dynamic library
loads or shared libraries, this may happen multiple times.

Given a case of a simple dynamic library load which has its own target
kernel instruction, the current logic had only the first call to
`__tgt_target_kernel` to the data mapping for that device. Then, when
the next dynamic library load got called, it would see that the global
were already mapped for that device and skip registering its own
entires, even though they were distinct. This resulted in none of the
mappings being done and hitting an assertion.

This patch simply gets rid of this per-device check. The check should
instead be on the host offloading entries. We already have logic that
calls `continue` if we already have entries for that pointer, so we can
simply rely on that instead.
2024-02-03 15:28:20 -06:00
Joseph Huber
254287658f [Libomptarget] Remove handling of old ctor / dtor entries (#80153)
Summary:
A previous patch removed creating these entries in clang in favor of the
backend emitting a callable kernel and having the runtime call that if
present. The support for the old style was kept around in LLVM 18.0 but
now that we have forked to 19.0 we should remove the support.

The effect of this would be that an application linking against a newer
libomptarget that still had the old constructors will no longer be
called. In that case, they can either recompile or use the
`libomptarget.so.18` that comes with the previous release.
2024-01-31 11:48:07 -06:00
Joseph Huber
621bafd5c1 [Libomptarget] Move target table handling out of the plugins (#77150)
Summary:
This patch removes the bulk of the handling of the
`__tgt_offload_entries` out of the plugins itself. The reason for this
is because the plugins themselves should not be handling this
implementation detail of the OpenMP runtime. Instead, we expose two new
plugin API functions to get the points to a device pointer for a global
as well as a kernel type.

This required introducing a new type to represent a binary image that
has been loaded on a device. We can then use this to load the addresses
as needed. The creation of the mapping table is then handled just in
`libomptarget` where we simply look up each address individually. This
should allow us to expose these operations more generically when we
provide a separate API.
2024-01-22 11:06:47 -06:00
carlobertolli
ae99966a27 [OpenMP] Enable automatic unified shared memory on MI300A. (#77512)
This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch also introduces an environment variable OMPX_APU_MAPS that,
if set, triggers automatic zero-copy also on non APU GPUs (e.g., on
discrete GPUs).
This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-22 10:30:22 -06:00
Joseph Huber
d03b8c3a04 [Libomptarget][NFC] Format in-line comments consistently (#77530)
Summary:
The LLVM style uses /*Foo=*/ when indicating the name of a constant. See
https://llvm.org/docs/CodingStandards.html#comment-formatting. This is
useful for consistency, as well as because `clang-format` understands
this syntax and formats it more cleanly. Do a bulk update of this
syntax.
2024-01-10 10:10:08 -06:00
Joseph Huber
0fe86f9c51 [Libomptarget] Remove extra cache for offloading entries (#77012)
Summary:
The offloading entries right now are assumed to be baked into the binary
itself, and thus always valid whenever the library is executing. This
means that we don't need to copy them to additional storage and can
instead simply pass around references to it.

This is not likely to change in the expected operation of the OpenMP
library. Additionally, the indirection for the offload entry struct is
simply two pointers, so moving it by value is trivial.
2024-01-08 16:49:33 -06:00
carlobertolli
ce4144406c Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371)
Reverts llvm/llvm-project#75999

lit test is failing.
2024-01-08 14:38:29 -06:00
carlobertolli
22a73e7c46 [OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999)
…on (zero-copy) on MI300A.

This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-08 14:17:28 -06:00
Joseph Huber
fb32977ac7 [Libomptarget] Fix RPC-based malloc on NVPTX (#72440)
Summary:
The device allocator on NVPTX architectures is enqueued to a stream that
the kernel is potentially executing on. This can lead to deadlocks as
the kernel will not proceed until the allocation is complete and the
allocation will not proceed until the kernel is complete. CUDA 11.2
introduced async allocations that we can manually place on separate
streams to combat this. This patch makes a new allocation type that's
guaranteed to be non-blocking so it will actually make progress, only
Nvidia needs to care about this as the others are not blocking in this
way by default.

I had originally tried to make the `alloc` and `free` methods take a
`__tgt_async_info`. However, I observed that with the large volume of
streams being created by a parallel test it quickly locked up the system
as presumably too many streams were being created. This implementation
not just creates a new stream and immediately destroys it. This
obviously isn't very fast, but it at least gets the cases to stop
deadlocking for now.
2024-01-02 16:53:53 -06:00
Felipe Cabarcas
9b6ea5e8f8 [OpenMP] Improve omp offload profiler (#68016)
Summary:
Adding information to the LIBOMPTARGET profiler runtime kernel and API
calls.

Key changes:
* Adding information to runtime calls for better understanding of how
the application
is executing. For example teams requested by the user, size of memory
transfers.
* Profile timer was changed from 'us' to 'ns', since 'us' was too
coarse-grain
  to register some important details like key kernel duration
* Removed non API or Runtime calls, to reduce complexity of profile for
application
  developers.

---------

Co-authored-by: Felipe Cabarcas <cabarcas@leia.crpl.cis.udel.edu>
Co-authored-by: fel-cab <fel-cab@github.com>
2023-12-22 14:58:11 -05:00
Joseph Huber
ac029e02a9 [Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720)
Summary:
This patch reorganizes a lot of the code used to check for compatibility
with the current environment. The main bulk of this patch involves
moving from using a separate `__tgt_image_info` struct (which just
contains a string for the architecture) to instead simply checking this
information from the ELF directly. Checking information in the ELF is
very inexpensive as creating an ELF file is simply writing a base
pointer.

The main desire to do this was to reorganize everything into the ELF
image. We can then do the majority of these checks without first
initializing the plugin. A future patch will move the first ELF checks
to happen without initializing the plugin so we no longer need to
initialize and plugins that don't have needed images.

This patch also adds a lot more sanity checks for whether or not the ELF
is actually compatible. Such as if the images have a valid ABI, 64-bit
width, executable, etc.
2023-12-19 20:01:31 -06:00
dhruvachak
e4de6a602f [OpenMP] [OMPT] A pointer to HostOpId should be passed in EMI callbacks. (#75574)
With this change, TargetRegionOpId is no more used and hence deleted.
2023-12-15 12:07:42 -08:00
Johannes Doerfert
fe6f137e48 [OpenMP][NFC] Move mapping related code into OpenMP/Mapping.cpp (#75239)
DeviceTy provides an abstraction for "middle-level" operations that can
be done with a offload device. Mapping was tied into it but is not
strictly necessary. Other languages do not track mapping, and even
OpenMP can be used completely without mapping. This simply moves the
relevant code into the OpenMP/Mapping.cpp as part of a new class
MappingInfoTy. Each device still has one, but it does not clutter the
device.cpp anymore.
2023-12-12 12:49:46 -08:00
Johannes Doerfert
5dd1fc7008 [OpenMP][NFC] Improve profiling for the offload runtime 2023-12-11 17:30:35 -08:00
Johannes Doerfert
2ada7bb68b [OpenMP][NFCI] Remove effectively unused mutex
The only use was already guarded by a different lock in the caller of
loadBinary.
2023-12-11 17:30:35 -08:00
Johannes Doerfert
cee6918d87 [OpenMP][NFC] Move api.cpp to OpenMP/API.cpp 2023-12-11 17:30:34 -08:00
Johannes Doerfert
13b8826508 Revert " [OpenMP][NFC] Remove DelayedBinDesc" (#74679)
Reverts llvm/llvm-project#74360

As I wrote in the analysis of #74360:

Since
bc4e0c048a
we will not add PluginAdaptors into the container of all plugin adaptors
before the plugin is not ready. The error is thereby gone. When and old
HSA loads other libraries they can call register_image but that will
simply not register the image with the plugin we are currently
initializing. That seems like reasonable behavior, thought it is good to
keep in mind if we ever want a kernel library (@jhuber6 @mjklemm). We
can still have a standalone kernel library though or load it late after
all plugins are setup (which seems reasonable).

I did not expect one our tests actually doing exactly what this will not
allow anymore, at least when you use rocm <5.5.0. Need to figure out if
we want this behavior (for rocm <5.5.0).
2023-12-06 16:04:23 -08:00
Johannes Doerfert
d552ce2638 [OpenMP][NFC] Remove DelayedBinDesc (#74360)
Remove `DelayedBinDesc` as it is not necessary since
bc4e0c048a.
See
https://github.com/llvm/llvm-project/pull/74360#issuecomment-1843603736
for details.
2023-12-06 14:48:23 -08:00
Johannes Doerfert
68db7aef74 [OpenMP] Reorganize the initialization of PluginAdaptorTy (#74397)
This introduces checked errors into the creation and initialization of
`PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the
user if the initialization failed. The new organization avoids the
"initOnce" stuff but we still do not eagerly initialize the plugin
devices (I think we should merge `PluginAdaptorTy::initDevices` into
`PluginAdaptorTy::init`)
2023-12-05 16:04:01 -08:00
Johannes Doerfert
9f87509b19 [OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532)
This fixes two bugs and adds a test for them:
- A shared library with declare target functions but without kernels
should not error out due to missing globals.
- Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of
indirect declare targets.
2023-12-05 15:25:10 -08:00
Johannes Doerfert
66784dcb3b [OpenMP] Ensure Devices is accessed exlusively (#74374)
We accessed the `Devices` container most of the time while holding the
RTLsMtx, but not always. Sometimes we used the mutex for the size query,
but then accessed Devices again unguarded. From now we properly
encapsulate the container in a ProtectedObj which ensures exclusive
accesses. We also hide the "isReady" part in the `getDevice` accessor
and use an `llvm::Expected` to allow to return errors.
2023-12-04 17:10:37 -08:00
Johannes Doerfert
27f17837bb [OpenMP][NFC] Remove PluginAdaptorManagerTy 2023-12-01 15:23:17 -08:00
Johannes Doerfert
7169c45efa [OpenMP][NFCI] Organize offload entry logic
This moves the offload entry logic into classes and provides convenient
accessors. No functional change intended but we can now print all
offload entries (and later look them up), tested via
`OMPTARGET_DUMP_OFFLOAD_ENTRIES=<device_no>`.
2023-12-01 15:10:52 -08:00
Johannes Doerfert
b091a887e0 [OpenMP][NFC] Extract device image handling into a class/header (#74129) 2023-12-01 14:59:12 -08:00
Johannes Doerfert
5fe741f08e [OpenMP] Separate Requirements into a standalone header (#74126)
This is not completely NFC since we now check all 4 requirements and the
test is checking the good and the bad case for combining flags.
2023-12-01 14:47:00 -08:00
Johannes Doerfert
3530428b8f [OpenMP][NFC] Extract OffloadPolicy into a helper class (#74029)
OpenMP allows 3 different offload policies, handling of which we want to
encapsulate.
2023-12-01 10:55:18 -08:00
Johannes Doerfert
bc4e0c048a [OpenMP][NFC] Modernize the plugin handling (#74034)
This basically moves code around again, but this time to provide cleaner
interfaces and remove duplication. PluginAdaptorManagerTy is almost all
gone after this.
2023-12-01 10:36:59 -08:00
Johannes Doerfert
51fc8544c7 [OpenMP][NFC] Move mapping related logic into Mapping.h (#74009) 2023-11-30 17:08:41 -08:00
Johannes Doerfert
1035cc7029 [OpenMP][NFC] Encapsulate Devices.size() (#74010) 2023-11-30 16:44:47 -08:00
Johannes Doerfert
b8b2a279d0 [OpenMP][NFC] Encapsulate profiling logic (#74003)
This simply puts the profiling logic into the `Profiler` class and
allows non-RAII profiling via `beginSection` and `endSection`.
2023-11-30 15:52:02 -08:00
Johannes Doerfert
148dec9fa4 [OpenMP][NFC] Separate Envar (environment variable) handling (#73994) 2023-11-30 15:23:34 -08:00
Johannes Doerfert
b80b5f180b [OpenMP] Replace copy and paste code with instantiation (#73991) 2023-11-30 14:16:34 -08:00
Johannes Doerfert
fce4c0acd6 [OpenMP] Start organizing PluginManager, PluginAdaptors (#73875) 2023-11-30 13:47:47 -08:00
Johannes Doerfert
2e7f47d4a8 [OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868) 2023-11-29 16:04:19 -08:00
Johannes Doerfert
40422bf150 [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (#73817)
While this does not really encapsulate the mapping code, it at least
moves most of the declarations out of the way.
2023-11-29 10:29:54 -08:00
Johannes Doerfert
8391bb3f5c [OpenMP][NFC] Move more declarations out of private.h (#73823) 2023-11-29 09:22:03 -08:00
Johannes Doerfert
b465f94b7c [OpenMP][NFC] Put ExponentialBackoff in a Utils header (#73816)
"private.h" will go.
2023-11-29 09:10:29 -08:00
Johannes Doerfert
fd2d0bf90e [OpenMP][NFC] Replace unnecessary typedefs (#73815) 2023-11-29 08:40:41 -08:00
Johannes Doerfert
e2299e8d9d [OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718) 2023-11-29 08:29:41 -08:00
Johannes Doerfert
db96a9c3b7 [OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725)
For historic reasons we had it setup that there was
`  plugin-nextgen/common/PluginInterface/<sources + headers>`
which is not what we do anywhere else.
Now it looks like the rest:
```
  plugin-nextgen/common/include/<headers>
  plugin-nextgen/common/src/<sources>
```
As part of this, `dlwrap.h` was moved into common/include (as
`DLWrap.h`)
since it is exclusively used by the plugins.
2023-11-29 07:57:01 -08:00