Commit Graph

194 Commits

Author SHA1 Message Date
Felipe Cabarcas
9b6ea5e8f8 [OpenMP] Improve omp offload profiler (#68016)
Summary:
Adding information to the LIBOMPTARGET profiler runtime kernel and API
calls.

Key changes:
* Adding information to runtime calls for better understanding of how
the application
is executing. For example teams requested by the user, size of memory
transfers.
* Profile timer was changed from 'us' to 'ns', since 'us' was too
coarse-grain
  to register some important details like key kernel duration
* Removed non API or Runtime calls, to reduce complexity of profile for
application
  developers.

---------

Co-authored-by: Felipe Cabarcas <cabarcas@leia.crpl.cis.udel.edu>
Co-authored-by: fel-cab <fel-cab@github.com>
2023-12-22 14:58:11 -05:00
Joseph Huber
ac029e02a9 [Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720)
Summary:
This patch reorganizes a lot of the code used to check for compatibility
with the current environment. The main bulk of this patch involves
moving from using a separate `__tgt_image_info` struct (which just
contains a string for the architecture) to instead simply checking this
information from the ELF directly. Checking information in the ELF is
very inexpensive as creating an ELF file is simply writing a base
pointer.

The main desire to do this was to reorganize everything into the ELF
image. We can then do the majority of these checks without first
initializing the plugin. A future patch will move the first ELF checks
to happen without initializing the plugin so we no longer need to
initialize and plugins that don't have needed images.

This patch also adds a lot more sanity checks for whether or not the ELF
is actually compatible. Such as if the images have a valid ABI, 64-bit
width, executable, etc.
2023-12-19 20:01:31 -06:00
dhruvachak
e4de6a602f [OpenMP] [OMPT] A pointer to HostOpId should be passed in EMI callbacks. (#75574)
With this change, TargetRegionOpId is no more used and hence deleted.
2023-12-15 12:07:42 -08:00
Johannes Doerfert
fe6f137e48 [OpenMP][NFC] Move mapping related code into OpenMP/Mapping.cpp (#75239)
DeviceTy provides an abstraction for "middle-level" operations that can
be done with a offload device. Mapping was tied into it but is not
strictly necessary. Other languages do not track mapping, and even
OpenMP can be used completely without mapping. This simply moves the
relevant code into the OpenMP/Mapping.cpp as part of a new class
MappingInfoTy. Each device still has one, but it does not clutter the
device.cpp anymore.
2023-12-12 12:49:46 -08:00
Johannes Doerfert
5dd1fc7008 [OpenMP][NFC] Improve profiling for the offload runtime 2023-12-11 17:30:35 -08:00
Johannes Doerfert
2ada7bb68b [OpenMP][NFCI] Remove effectively unused mutex
The only use was already guarded by a different lock in the caller of
loadBinary.
2023-12-11 17:30:35 -08:00
Johannes Doerfert
cee6918d87 [OpenMP][NFC] Move api.cpp to OpenMP/API.cpp 2023-12-11 17:30:34 -08:00
Johannes Doerfert
13b8826508 Revert " [OpenMP][NFC] Remove DelayedBinDesc" (#74679)
Reverts llvm/llvm-project#74360

As I wrote in the analysis of #74360:

Since
bc4e0c048a
we will not add PluginAdaptors into the container of all plugin adaptors
before the plugin is not ready. The error is thereby gone. When and old
HSA loads other libraries they can call register_image but that will
simply not register the image with the plugin we are currently
initializing. That seems like reasonable behavior, thought it is good to
keep in mind if we ever want a kernel library (@jhuber6 @mjklemm). We
can still have a standalone kernel library though or load it late after
all plugins are setup (which seems reasonable).

I did not expect one our tests actually doing exactly what this will not
allow anymore, at least when you use rocm <5.5.0. Need to figure out if
we want this behavior (for rocm <5.5.0).
2023-12-06 16:04:23 -08:00
Johannes Doerfert
d552ce2638 [OpenMP][NFC] Remove DelayedBinDesc (#74360)
Remove `DelayedBinDesc` as it is not necessary since
bc4e0c048a.
See
https://github.com/llvm/llvm-project/pull/74360#issuecomment-1843603736
for details.
2023-12-06 14:48:23 -08:00
Johannes Doerfert
68db7aef74 [OpenMP] Reorganize the initialization of PluginAdaptorTy (#74397)
This introduces checked errors into the creation and initialization of
`PluginAdaptorTy`. We also allow the adaptor to "hide" devices from the
user if the initialization failed. The new organization avoids the
"initOnce" stuff but we still do not eagerly initialize the plugin
devices (I think we should merge `PluginAdaptorTy::initDevices` into
`PluginAdaptorTy::init`)
2023-12-05 16:04:01 -08:00
Johannes Doerfert
9f87509b19 [OpenMP][FIX] Ensure we allow shared libraries without kernels (#74532)
This fixes two bugs and adds a test for them:
- A shared library with declare target functions but without kernels
should not error out due to missing globals.
- Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of
indirect declare targets.
2023-12-05 15:25:10 -08:00
Johannes Doerfert
66784dcb3b [OpenMP] Ensure Devices is accessed exlusively (#74374)
We accessed the `Devices` container most of the time while holding the
RTLsMtx, but not always. Sometimes we used the mutex for the size query,
but then accessed Devices again unguarded. From now we properly
encapsulate the container in a ProtectedObj which ensures exclusive
accesses. We also hide the "isReady" part in the `getDevice` accessor
and use an `llvm::Expected` to allow to return errors.
2023-12-04 17:10:37 -08:00
Johannes Doerfert
27f17837bb [OpenMP][NFC] Remove PluginAdaptorManagerTy 2023-12-01 15:23:17 -08:00
Johannes Doerfert
7169c45efa [OpenMP][NFCI] Organize offload entry logic
This moves the offload entry logic into classes and provides convenient
accessors. No functional change intended but we can now print all
offload entries (and later look them up), tested via
`OMPTARGET_DUMP_OFFLOAD_ENTRIES=<device_no>`.
2023-12-01 15:10:52 -08:00
Johannes Doerfert
b091a887e0 [OpenMP][NFC] Extract device image handling into a class/header (#74129) 2023-12-01 14:59:12 -08:00
Johannes Doerfert
5fe741f08e [OpenMP] Separate Requirements into a standalone header (#74126)
This is not completely NFC since we now check all 4 requirements and the
test is checking the good and the bad case for combining flags.
2023-12-01 14:47:00 -08:00
Johannes Doerfert
3530428b8f [OpenMP][NFC] Extract OffloadPolicy into a helper class (#74029)
OpenMP allows 3 different offload policies, handling of which we want to
encapsulate.
2023-12-01 10:55:18 -08:00
Johannes Doerfert
bc4e0c048a [OpenMP][NFC] Modernize the plugin handling (#74034)
This basically moves code around again, but this time to provide cleaner
interfaces and remove duplication. PluginAdaptorManagerTy is almost all
gone after this.
2023-12-01 10:36:59 -08:00
Johannes Doerfert
51fc8544c7 [OpenMP][NFC] Move mapping related logic into Mapping.h (#74009) 2023-11-30 17:08:41 -08:00
Johannes Doerfert
1035cc7029 [OpenMP][NFC] Encapsulate Devices.size() (#74010) 2023-11-30 16:44:47 -08:00
Johannes Doerfert
b8b2a279d0 [OpenMP][NFC] Encapsulate profiling logic (#74003)
This simply puts the profiling logic into the `Profiler` class and
allows non-RAII profiling via `beginSection` and `endSection`.
2023-11-30 15:52:02 -08:00
Johannes Doerfert
148dec9fa4 [OpenMP][NFC] Separate Envar (environment variable) handling (#73994) 2023-11-30 15:23:34 -08:00
Johannes Doerfert
b80b5f180b [OpenMP] Replace copy and paste code with instantiation (#73991) 2023-11-30 14:16:34 -08:00
Johannes Doerfert
fce4c0acd6 [OpenMP] Start organizing PluginManager, PluginAdaptors (#73875) 2023-11-30 13:47:47 -08:00
Johannes Doerfert
2e7f47d4a8 [OpenMP][NFC] Move out plugin API and APITypes into standalone headers (#73868) 2023-11-29 16:04:19 -08:00
Johannes Doerfert
40422bf150 [OpenMP][NFC] Separate OpenMP/OpenACC specific mapping code (#73817)
While this does not really encapsulate the mapping code, it at least
moves most of the declarations out of the way.
2023-11-29 10:29:54 -08:00
Johannes Doerfert
8391bb3f5c [OpenMP][NFC] Move more declarations out of private.h (#73823) 2023-11-29 09:22:03 -08:00
Johannes Doerfert
b465f94b7c [OpenMP][NFC] Put ExponentialBackoff in a Utils header (#73816)
"private.h" will go.
2023-11-29 09:10:29 -08:00
Johannes Doerfert
fd2d0bf90e [OpenMP][NFC] Replace unnecessary typedefs (#73815) 2023-11-29 08:40:41 -08:00
Johannes Doerfert
e2299e8d9d [OpenMP][NFC] Move OMPT headers into OpenMP/OMPT (#73718) 2023-11-29 08:29:41 -08:00
Johannes Doerfert
db96a9c3b7 [OpenMP][NFC] Flatten plugin-nextgen/common folder sturcture (#73725)
For historic reasons we had it setup that there was
`  plugin-nextgen/common/PluginInterface/<sources + headers>`
which is not what we do anywhere else.
Now it looks like the rest:
```
  plugin-nextgen/common/include/<headers>
  plugin-nextgen/common/src/<sources>
```
As part of this, `dlwrap.h` was moved into common/include (as
`DLWrap.h`)
since it is exclusively used by the plugins.
2023-11-29 07:57:01 -08:00
Johannes Doerfert
2cfe7b1b66 [OpenMP][NFC] Extract timescope profile support into its own header (#73727) 2023-11-29 07:54:35 -08:00
Johannes Doerfert
d1057014a1 [OpenMP][NFC] Create an "OpenMP" folder in the include folder (#73713)
Not everything in libomptarget (include) is "OpenMP", but some things
most certainly are. This commit moves some code around to start making
this distinction without the intention to change functionality.
2023-11-28 15:41:31 -08:00
Johannes Doerfert
7233e42dff [OpenMP][NFC] Move Environment.h and SourceInfo.h into "Shared" folder (#73703) 2023-11-28 15:10:06 -08:00
Johannes Doerfert
8327f4a851 [OpenMP][NFC] Move Utils.h and Debug.h into a "Shared" include folder (#73701)
Headers used throughout the different runtimes are different from the
internal headers. This is a first step to bring structure in into the
include folder.
2023-11-28 13:44:57 -08:00
Johannes Doerfert
7bfcce3e94 [OpenMP] Tear down GenericDeviceTy's with GenericPluginTy (#73557)
There is no point in keeping GenericDeviceTy objects alive longer than
the associated GenericPluginTy. Instead of the old API we now tear them
down with the plugin, avoiding ordering issues.
2023-11-27 11:42:12 -08:00
Johannes Doerfert
f9436464a9 [OpenMP][NFC] Minor name and code simplification 2023-11-27 11:08:29 -08:00
Johannes Doerfert
2b2e711afc [OpenMP][NFC] Remove no-op __tgt_rtl_deinit_plugin
The order in which we deinit things, especially when shared libraries
are involved, is complicated. To simplify our lives the nextgen plugin
deinitializes the GenericPluginTy and subclasses automatically. The old
__tgt_rtl_deinit_plugin is not needed anymore.
2023-11-27 11:07:57 -08:00
Johannes Doerfert
9c33bf62a7 [OpenMP][NFC] Remove unused (un)register_lib plugin API
These APIs have not been hooked up for a while. No need to carry them.
2023-11-27 11:07:57 -08:00
Johannes Doerfert
f48c4d8aa1 [OpenMP] Be more forgiving during record and replay
When we record and replay kernels we should not error out early if there
is a chance the program might still run fine. This patch will:
1) Fallback to the allocation heuristic if the VAMap doesn't work.
2) Adjust the memory start to match the required address if possible.
3) Adjust the (guessed) pointer arguments if the memory start adjustment
   is impossible. This will allow kernels without indirect accesses to
   work while indirect accesses will most likely fail.
2023-11-20 17:15:34 -08:00
Johannes Doerfert
3de645efe3 [OpenMP][NFC] Split the reduction buffer size into two components
Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the number
into two parts, the size of the reduction data (=all reduction
variables) and the (maximal) length of the buffer. This will allow us to
allocate less if we need less, e.g., if we have less teams than the
maximal length. It also allows us to move code from clangs codegen into
the runtime as we now know how large the reduction data is.
2023-11-06 11:50:41 -08:00
Johannes Doerfert
f9a89e6b9c [OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752)
We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.

Fixes: https://github.com/llvm/llvm-project/issues/70249
2023-11-01 11:11:48 -07:00
Johannes Doerfert
b8cbc5c02c [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.

Fixes: https://github.com/llvm/llvm-project/issues/70249
2023-10-31 19:38:43 -07:00
Konstantinos Parasyris
d6a3d6b96d [openmp] Fixed Support for VA for record-replay. (#70396)
The commit was discussed in phabricator
(https://reviews.llvm.org/D157186).

Record replay currently fails on AMD as it conflicts with the heap
memory allocator introduced in #69806. The workaround is setting
`LIBOMPTARGET_HEAP_SIZE=0` during both record and replay run.
2023-10-29 12:27:19 -07:00
Mehdi Amini
f390a76b7e Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)""
This reverts commit ddbaa11e9f.

Reapply the original commit, the broken test was repaired in 5e51363f38 in the meantime.
2023-10-26 17:30:01 -07:00
Mehdi Amini
ddbaa11e9f Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)"
This reverts commit c2a1249a82.

The MLIR bots are broken with an omp test failure.
2023-10-26 17:25:20 -07:00
Johannes Doerfert
c2a1249a82 [OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)
The runtime needs to know about the acceptable launch bounds, especially
if the compiler (middle- or backend) assumed those bounds. While this
patch does not yet inform the runtime, it stores the bounds in a place
that can/will be accessed and is associated with the kernel.
2023-10-26 14:46:55 -07:00
Johannes Doerfert
d3921e4670 [OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806)
The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to
run more tests. The allocator implements `malloc`, both internally and
externally, while we continue to default to the NVIDIA `malloc` when we
target NVIDIA GPUs. Once we have smarter or customizable allocators we
should consider this choice, for now, this allocator is better than
none. It traps if it is out of memory, making it easy to debug. Heap
size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB.
It allows to track allocation statistics via
`LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with
`-fopenmp-target-debug=8`). Two tests were added, and one was enabled.

This is the next step towards fixing
 https://github.com/llvm/llvm-project/issues/66708
2023-10-21 14:49:30 -07:00
Johannes Doerfert
1cea309b7e [OpenMP][NFC] Move DebugKind to make it reusable from the host 2023-10-20 19:28:09 -07:00
Michael Klemm
f93a697e47 [libomptarget][OpenMP] Initial implementation of omp_target_memset() and omp_target_memset_async() (#68706)
Implement a slow-path version of omp_target_memset*() 

There is a TODO to implement a fast path that uses an on-device
kernel instead of the host-based memory fill operation.  This may
require some additional plumbing to have kernels in libomptarget.so
2023-10-19 15:29:36 +02:00