Commit Graph

214 Commits

Author SHA1 Message Date
Jon Chesterfield
4f4c826e75 [libomptarget] Drop remote plugin cmake version requirement to match llvm
LLVM docs at https://llvm.org/docs/CMake.html#quick-start state 3.13.4

Reviewed By: atmnpatel

Differential Revision: https://reviews.llvm.org/D113271
2021-11-05 17:34:28 +00:00
Kazu Hirata
3cfc1757c5 Ensure newlines at the end of files (NFC) 2021-10-29 20:26:09 -07:00
Jon Chesterfield
6c7b203d1d Revert "[libomptarget] Build DeviceRTL for amdgpu"
- more tests failing on CI than failed locally when writing this patch

This reverts commit 33427fdb7b.
2021-10-28 01:01:53 +01:00
Jon Chesterfield
33427fdb7b [libomptarget] Build DeviceRTL for amdgpu
Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112227
2021-10-28 00:41:45 +01:00
Kazu Hirata
d8e4170b0a Ensure newlines at the end of files (NFC) 2021-10-23 08:45:29 -07:00
Jon Chesterfield
bf6f955f39 [libomptarget] Run GPU offloading tests on both new and old runtime
Implemented by patching python config instead of modifying all
the tests so that -generic and XFAIL work as usual. Expectation is for
this to be reverted once the old runtime is deleted.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D112225
2021-10-22 23:28:44 +01:00
Joseph Huber
b1ce454930 [OpenMP] Remove macro guards for device debugging
The plugin currently uses a macro to check if this is a debug built
before assigning the debug kind variable to the device environment
struct. This is being deprecated because the new device runtime does not
maintain separate debug builds and should always be availible.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112083
2021-10-19 12:21:43 -04:00
Ron Lieberman
d022f39d9f [libomptarget][amdgpu][NFC] tweak a comment 2021-10-09 12:51:53 -04:00
Shilei Tian
c060c634ef [OpenMP][NVPTX] Fix an error in configuring #teams and #threads
It must be a copy mistake.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D111407
2021-10-08 11:07:43 -04:00
Jon Chesterfield
1bc3a6e41b [libomptarget] Reapply 2bc4d48a78 which was accidentally reverted 2021-10-07 20:17:48 +01:00
Jon Chesterfield
0c554a4769 [libomptarget] Move device environment to shared header, remove divergence
Follow on to D110006, related to D110957

Where implementations have diverged this resolves to match the new DeviceRTL

- replaces definitions of this struct in deviceRTL and plugins with include
- changes the dynamic_shared_size field from D110006 to 32 bits
- handles stdint being unavailable in DeviceRTL
- adds a zero initializer for the field to amdgpu
- moves the extern declaration for deviceRTL to target_interface
  (omptarget.h is more natural, but doesn't work due to include order
  with debug.h)
- Renames the fields everywhere to match the LLVM format used in DeviceRTL
- Makes debug_level uint32_t everywhere (previously sometimes int32_t)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111069
2021-10-07 12:03:48 +01:00
Michał Górny
0873b9bef4 [openmp] [elf_common] Fix linking against LLVM dylib
The hand-rolled linking logic in elf_common does not account for
the possibility of using LLVM dylib rather than a dozen static
libraries.  Since it does not seem to be easily convertible
to add_llvm_library, just hand-roll support for LLVM_LINK_LLVM_DYLIB.
This is necessary to support stand-alone builds against installed LLVM.

Differential Revision: https://reviews.llvm.org/D111038
2021-10-04 09:29:06 +02:00
Jon Chesterfield
05ba9ff6a6 [libomptarget][amdgpu] Refactor memory pool collection 2021-10-01 14:58:01 +01:00
Jon Chesterfield
b75a7481ba [libomptarget] Apply D110029 to amdgpu
Use enum for execution mode.

This is partly a port from ROCm and partly a port from D110029. Attempted to
make the same choices as ROCm as far as comments etc go to reduce the merge
conflicts.

There is some cleanup warranted here - in particular I like the cuda patch
factoring out the comparisons into named variables - but I'd like to leave
that for a follow up patch, keeping this one minimal.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D110845
2021-09-30 21:29:37 +01:00
Dhruva Chakrabarti
6226270253 [libomptarget] [amdgpu] After a kernel dispatch packet is published, its contents must not be accessed.
Fixes: SWDEV-275232 (With contributions from Ammar Elwazir, Laurent Morichetti, and Tony Tye)

The current code is racy. After the packet is submitted, the GPU will increment the read index. If this wraps around before the memory is read from it'll refer to a signal from an unrelated packet. Change avoids reading from the packet post-submission.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110679
2021-09-29 09:22:07 -07:00
Jon Chesterfield
2bc4d48a78 [libomptarget][amdgpu] Follow on to D110513, empty kernarg pools are not fatal 2021-09-27 22:44:35 +01:00
Jon Chesterfield
738734f655 [libomptarget][amdgpu] Report zero devices if plugin construction fails, instead of segv 2021-09-27 22:13:12 +01:00
Pushpinder Singh
b1695c2eb8 [AMDGPU][OpenMP] Add memory pool size check to isValidMemoryPool
Keeping all the checks in one place for future simplification.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D110513
2021-09-27 12:29:00 +00:00
Pushpinder Singh
9d0eb440ff [libomptarget][nfc][amdgpu] Reorder function to clarify review diff 2021-09-27 09:30:55 +00:00
Jon Chesterfield
726a34f063 [libomptarget][amdgpu] Replace dead exit call with returning error 2021-09-27 09:43:37 +01:00
Jon Chesterfield
8cf93a35d4 [libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109511
2021-09-26 15:34:21 +01:00
Shilei Tian
ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
Shilei Tian
49e976c934 [OpenMP][NVPTX] Fix a warning that data argument not used by format string
Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D110104
2021-09-20 17:22:14 -04:00
Joseph Huber
f1c821fa85 [OpenMP] Add support for dynamic shared memory in new RTL
This patch adds support for using dynamic shared memory in the new
device runtime. The new function `__kmpc_get_dynamic_shared` will return a
pointer to the buffer of dynamic shared memory. Currently the amount of memory
allocated is set by an environment variable.

In the future this amount will be added to the amount used for the smart stack
which will be configured in a similar way.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D110006
2021-09-17 21:25:36 -04:00
Joseph Huber
ec02c34b6d [OpenMP] Add additional fields to device environment
This patch adds fields for the device number and number of devices into
the device environment struct and debugging values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110004
2021-09-17 21:25:32 -04:00
Hansang Bae
ae2a5facce [OpenMP][libomptarget] Minor fix in x86_64 plugin
Call to remove() was passing invalid address for the file name.

Differential Revision: https://reviews.llvm.org/D109846
2021-09-15 15:57:06 -05:00
Jon Chesterfield
6760234e8d [libomptarget][amdgpu] Precisely manage hsa lifetime
The hsa library must be initialized before any calls into it and
destructed after the last call into it. There have been a number of bugs in
this area related to member variables which would like to use raii to manage
resources acquired from hsa.

This patch moves the init/shutdown of hsa into a class, such that when used as
the first member variable (could be a base), the lifetime of other member
variables are reliably scoped within it. This will allow other classes to use
raii reliably when used as member variables within the global.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109512
2021-09-09 17:28:11 +01:00
Jon Chesterfield
d642156f8f [libomptarget][nfc] Hoist hsa_init into rtl.cpp 2021-09-09 16:09:34 +01:00
Jon Chesterfield
3153bdd547 [libomptarget][amdgpu] Drop env variables
Use the same debug print as the rest of libomptarget plugins with
the same environment control. Also drop the max queue size debugging hook as
I don't believe it is still in use, can bring it back near the rest of the env
handling in rtl.cpp if someone objects.

That makes most of rt.h and all of utils.cpp unused. Clean that up and simplify
control flow in a couple of places.

Behaviour change is that debug prints that used to use the old environment
variable now use the new one and print in slightly different format, and the
removal of the max queue size variable.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D108784
2021-09-02 11:02:39 +01:00
Jon Chesterfield
f8bcbb82a7 [libomptarget] Normalise a cmake debug string, checking it triggers CI 2021-09-01 14:24:28 +01:00
Shilei Tian
e8fdacfd81 [OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin
`CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to
`openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108878
2021-08-28 18:08:10 -04:00
Shilei Tian
29df4ab3f3 [OpenMP][Offloading] Add support for event related interfaces
This patch adds the support form event related interfaces, which will be used
later to fix data race. See D104418 for more details.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D108528
2021-08-28 16:24:14 -04:00
Jon Chesterfield
78f92c3810 [openmp][amdgpu] Initial gfx10 offloading implementation
Lets wavefront size be 32 for amdgpu openmp, as well as 64.

Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108708
2021-08-27 12:34:03 +01:00
Jon Chesterfield
3d85342982 [libomptarget][amdgpu][nfc] Rename variables, delete dead code 2021-08-26 19:58:38 +01:00
Jon Chesterfield
68ab93f4d7 [libomptarget][amdgpu][nfc] Rename source files 2021-08-26 18:29:44 +01:00
Jon Chesterfield
ba0af885e7 [libomptarget][amdgpu][nfc] Make grid value access match devicertl 2021-08-25 15:11:19 +01:00
Jon Chesterfield
9b2c6c07b5 [libomptarget][amdgpu] Refactor debug printing
Move most debug printing in rtl.cpp behind DP() macro
Adjust the print output for gpu arch mismatch when the architectures match
Convert an assert into graceful failure

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108562
2021-08-25 14:57:51 +01:00
Jon Chesterfield
ba8547775b [libomptarget][amdgpu] Fix debug build from D104696 2021-08-25 01:27:51 +01:00
Pushpinder Singh
9b8b7c1180 [AMDGPU][Libomptarget] Delete g_atl_machine global
With uses of g_atl_machine gone, a significant portion of dead
code has been removed.

This patch depends on D104691 and D104695.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104696
2021-08-24 07:59:40 +00:00
Jon Chesterfield
77579b99e9 [openmp][nfc] Replace OMPGridValues array with struct
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.

Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108339
2021-08-19 13:25:42 +01:00
Joseph Huber
edb8acdc6e [Libomptarget] Correctly default to Generic if exec_mode is not present
Currently, the runtime returns an error when the `exec_mode` global is
not present. The expected behvaiour is that the region will default to
Generic. This prevents global constructors from being called because
they do not contain execution mode globals.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108255
2021-08-18 11:24:28 -04:00
Dimitry Andric
400cd6d2f0 [libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD
On FreeBSD, the `environ` symbol is undefined at link time for shared
libraries, but resolved by the dynamic linker at runtime. Therefore,
allow the symbol to be undefined when creating a shared library, by
using the `--allow-shlib-undefined` linker flag, instead of `-z defs`
(a.k.a `--no-undefined`).

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107698
2021-08-08 13:52:44 +02:00
Dimitry Andric
71ae2e0221 [libomptarget][amdgpu] don't declare Elf_Note on FreeBSD
On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note`
indirectly (via `<sys/elf_common.h>`). This results in compile errors
when building the libomptarget amdgpu plugin. Avoid redeclaring `struct
Elf_Note` on FreeBSD to fix the errors.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D107661
2021-08-06 21:45:26 +02:00
Shilei Tian
28939b6ae5 [NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp 2021-08-05 22:32:28 -04:00
Jon Chesterfield
a90da62adb [libomptarget][amdgpu] Update printed plugin name 2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz
88e66fa60a [OpenMP] Fixing missing variables when CUDA SDK not in system
This patch fixes the error reported in D106751. When there is no CUDA SDK
installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE`
variables.

Using @zsrkmyn sugested fix

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106933
2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz
313c523995 [OpenMP][Tool] Introducing the llvm-omp-device-info tool
This patch introduces the `llvm-omp-device-info` tool, which uses the
omptarget library and interface to query the device info from all the
available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`

Since omptarget usually requires a description structure with executable
kernels, I split the initialization of the RTLs and Devices to be able to
initialize all possible devices and query each of them.

This revision relies on the patch that introduces the print device info.

A limitation is that the order in which the devices are initialized, and the
corresponding device ID is not necesarily the one seen by OpenMP.

The changes are as follows:
1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.

Example Output:
```
Device (0):
    print_device_info not implemented

Device (1):
    print_device_info not implemented

Device (2):
    print_device_info not implemented

Device (3):
    print_device_info not implemented

Device (4):
    CUDA Driver Version:                11000
    CUDA Device Number:                 0
    Device Name:                        Quadro P1000
    Global Memory Size:                 4236312576 bytes
    Number of Multiprocessors:          5
    Concurrent Copy and Execution:      Yes
    Total Constant Memory:              65536 bytes
    Max Shared Memory per Block:        49152 bytes
    Registers per Block:                65536
    Warp Size:                          32 Threads
    Maximum Threads per Block:          1024
    Maximum Block Dimensions:           1024, 1024, 64
    Maximum Grid Dimensions:            2147483647 x 65535 x 65535
    Maximum Memory Pitch:               2147483647 bytes
    Texture Alignment:                  512 bytes
    Clock Rate:                         1480500 kHz
    Execution Timeout:                  Yes
    Integrated Device:                  No
    Can Map Host Memory:                Yes
    Compute Mode:                       DEFAULT
    Concurrent Kernels:                 Yes
    ECC Enabled:                        No
    Memory Clock Rate:                  2505000 kHz
    Memory Bus Width:                   128 bits
    L2 Cache Size:                      1048576 bytes
    Max Threads Per SMP:                2048
    Async Engines:                      Yes (2)
    Unified Addressing:                 Yes
    Managed Memory:                     Yes
    Concurrent Managed Memory:          Yes
    Preemption Supported:               Yes
    Cooperative Launch:                 Yes
    Multi-Device Boars:                 No
    Compute Capabilities:               61
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106752
2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz
d2f85d0910 [OpenMP][Libomptarget] Adding print_device_info to RTL and omptarget
This patch introduces a function in the device's plugin to print the
device information. This patch relates to another patch that introduces
a CLI tool to obtain the device information from the omplibrary directly.
It is inspired by PGI's pgaccelinfo.

The modifications are as follows:
1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL.
2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented
3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy`
4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106751
2021-07-27 21:47:57 -04:00
Jon Chesterfield
2a613a7790 [libomptarget] Build amdgpu plugin without hsa
Default to building the amdgpu plugin to use dlopen when hsa is
not found instead of disabling it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106600
2021-07-26 09:54:51 +01:00
Jon Chesterfield
dd0b463dd9 [libomptarget][amdgpu] More robust handling of failure to init HSA
If hsa_init fails, subsequent calls into hsa are not safe. Except for
hsa_init, but we don't retry on failure.

This patch:
- deletes a print that called into hsa to ask why it can't call into hsa
- drops a merge conflict block next to that print
- reliably initializes number of devices to zero
- skips the plugin destructor contents if the constructor failed to init hsa

Tested by making hsa_init return error, and by forcing the dynamic library
use which was then deleted from disk. Before this patch, both segv. After it,
friendly message about offloading being unavailable.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106774
2021-07-25 23:15:58 +01:00