We can now dump the IR before and after JIT optimizations into the
files passed via `LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE` and
`LIBOMPTARGET_JIT_POST_OPT_IR_MODULE`, respectively.
Similarly, users can set `LIBOMPTARGET_JIT_REPLACEMENT_MODULE` to
replace the IR in the image with a custom IR module in a file.
All options take file paths, documentation was added.
Reviewed by: tianshilei1992
Differential revision: https://reviews.llvm.org/D140945
To JIT kernels for AMDGPUs we need to provide the architecture, the
triple, and a post-link callback. The first two are simple, the last one
is a little more complicated since we need to invoke `lld`. There is
some library interface but for that we need the lld library, which is
not generally available, thus we go with the executable for now. In
either way we need to manifest the (amdgcn) object file and read the
output from another file. We should try to avoid that in the future.
The options for `lld` are copied from the way clang invokes it.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D140720
Defaulting to Generic mode doesn't make much sense as the kernel needs
to be prepared for it. SPMD mode is the "native" execution, e.g., for
"bare" kernels. It also is the execution method for constructors and
destructors (as we might otherwise throw an extra warp onto them).
Differential Revision: https://reviews.llvm.org/D140718
This patch fixes a couple of issues:
1. Instead of using `llvm_unreachable` for those base virtual functions, unknown
value will be returned. The previous method could cause runtime error for those
targets where the image is not compatible but JIT is not implemented.
2. Fixed the type in CMake that causes the `Target` CMake variable is undefined.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D140732
This patch adds the basic JIT support for OpenMP. Currently it only works on Nvidia GPUs.
The support for AMDGPU can be extended easily by just implementing three interface functions. However, the infrastructure requires a small extra extension (add a pre process hook) to support portability for AMDGPU because the AMDGPU backend reads target features of functions. 02bc7effcc (diff-321c2038035972ad4994ff9d85b29950ba72c08a79891db5048b8f5d46915314R432) shows how it roughly works.
As for the test, even though I added the corresponding code in CMake files, the test still cannot be triggered because some code is missing in the new plugin CMake file, which has nothing to do with this patch. It will be fixed later.
In order to enable JIT mode, when compiling, `-foffload-lto` is needed, and when linking, `-foffload-lto -Wl,--embed-bitcode` is needed. That implies that, LTO is required to enable JIT mode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D139287
This patch adds the basic JIT support for OpenMP. Currently it only works on Nvidia GPUs.
The support for AMDGPU can be extended easily by just implementing three interface functions. However, the infrastructure requires a small extra extension (add a pre process hook) to support portability for AMDGPU because the AMDGPU backend reads target features of functions. 02bc7effcc (diff-321c2038035972ad4994ff9d85b29950ba72c08a79891db5048b8f5d46915314R432) shows how it roughly works.
As for the test, even though I added the corresponding code in CMake files, the test still cannot be triggered because some code is missing in the new plugin CMake file, which has nothing to do with this patch. It will be fixed later.
In order to enable JIT mode, when compiling, `-foffload-lto` is needed, and when linking, `-foffload-lto -Wl,--embed-bitcode` is needed. That implies that, LTO is required to enable JIT mode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D139287
I think it's reasonable to upgrade to Python 3 for LIT test requirement because `lit` itself (`llvm/utils/lit/lit.py`) already switched to Python 3. In addition, LLVM already requires Python 3.6 to be the minimum version (https://llvm.org/docs/GettingStarted.html#software).
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D139855
This patch moves the management/tracking of host pinned buffers to the common PluginInterface
in NextGen plugins. For the moment, the management consists of tracking the host pinned
allocations into a map in each device.
Differential Revision: https://reviews.llvm.org/D140502
Support for taskwait nowait clause with placeholder for runtime changes.
Reviewed By: cchen, ABataev
Differential Revision: https://reviews.llvm.org/D131830
With this patch we:
- pick more sensible defaults for the number of teams, inspired by the
old plugin, and configured via LIBOMPTARGET_AMDGPU_TEAMS_PER_CU.
- check the input signal of a kernel launch late, after the queue lock
was taken, to avoid a barrier packet more often.
- copy the kernel arguments in one swoop into the appropriate memory.
- manually specialize the callbacks to avoid potential indirect calls.
This commit adds the AMDGPU NextGen plugin inheriting from PluginInterface's classes.
It also implements the asynchronous behavior in the plugin operations: kernel launches
and memory transfers. To this end, it implements the concept of streams of asynchronous
operations. The streams are implemented using the HSA signals to define input and output
dependencies between asynchronous operations.
Missing features:
- Retrieve the maximum number of threads per group that a kernel can run. This requires
reading the image.
- Implement __tgt_rtl_sync_event, not used on the libomptarget side.
Differential Revision: https://reviews.llvm.org/D138389
Summary:
We use the dynamic HSA file to forward declare needed definitions from
the HSA runtime if not present at build time. These definitions were not
included so using them caused problems on systems without it if used.
Just add them.
I accidentally left out the test for the pinned API introduced by D138933. Adding it back.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D140077
This commit adds the AMDGPU NextGen plugin inheriting from PluginInterface's classes.
It also implements the asynchronous behavior in the plugin operations: kernel launches
and memory transfers. To this end, it implements the concept of streams of asynchronous
operations. The streams are implemented using the HSA signals to define input and output
dependencies between asynchronous operations.
Missing features:
- Retrieve the maximum number of threads per group that a kernel can run. This requires
reading the image.
- Implement __tgt_rtl_sync_event, not used on the libomptarget side.
Differential Revision: https://reviews.llvm.org/D138389
This patch prepares the PluginInterface for the new AMDGPU NextGen plugin. The original and the
NextGen plugin will share some structures and functionalities. We use this header for defining
them and avoiding code duplication.
Differential Revision: https://reviews.llvm.org/D139792
This patch better integrates the target nowait functions with the tasking runtime. It splits the nowait execution into two stages: a dispatch stage, which triggers all the necessary asynchronous device operations and stores a set of post-processing procedures that must be executed after said ops; and a synchronization stage, responsible for synchronizing the previous operations in a non-blocking manner and running the appropriate post-processing functions. Suppose during the synchronization stage the operations are not completed. In that case, the attached hidden helper task is re-enqueued to any hidden helper thread to be later synchronized, allowing other target nowait regions to be concurrently dispatched.
Reviewed By: jdoerfert, tianshilei1992
Differential Revision: https://reviews.llvm.org/D132005
This patch adds API support for the atk_pinned trait for omp_alloc.
It does not implement kmp_target_lock_mem and kmp_target_unlock_mem in libomptarget,
but prepares libomp for it. Patches to libomptarget to implement
lock/unlock coming after this one.
Reviewed by: jlpeyton, jdoerfert
Differential Revision: https://reviews.llvm.org/D138933
This new function was added in b72f1ec9fb,
but wasn't exported from the DLL on Windows.
This fixes the parallel/omp_parallel_if.c OpenMP testcase on
Windows.
If testing for a warning option like -Wno-<foo> with GCC, GCC won't
print any diagnostic at all, leading to the options being accepted
incorrectly. However later, if compiling a file that actually prints
another warning, GCC will also print warnings about these -Wno-<foo>
options being unrecognized.
This avoids warning spam like this, for every OpenMP source file that
produces build warnings with GCC:
cc1plus: warning: unrecognized command line option ‘-Wno-int-to-void-pointer-cast’
cc1plus: warning: unrecognized command line option ‘-Wno-return-type-c-linkage’
cc1plus: warning: unrecognized command line option ‘-Wno-covered-switch-default’
cc1plus: warning: unrecognized command line option ‘-Wno-enum-constexpr-conversion’
This matches how such warning options are detected and added in
llvm/cmake/modules/HandleLLVMOptions.cmake, e.g. like this:
check_cxx_compiler_flag("-Wclass-memaccess" CXX_SUPPORTS_CLASS_MEMACCESS_FLAG)
append_if(CXX_SUPPORTS_CLASS_MEMACCESS_FLAG "-Wno-class-memaccess" CMAKE_CXX_FLAGS)
This also matches how LLDB warning options were restructured for
GCC compatibility in e546bbfda0.
Differential Revision: https://reviews.llvm.org/D139922
This fixes the following test cases:
* affinity/kmp-affinity.c
* affinity/kmp-hw-subset.c
* affinity/omp-places.c
Differential Revision: https://reviews.llvm.org/D139802
Code for serial parallel regions and teams construct have been moved
out of __kmp_fork_call and into separate functions. This is to reduce
the size of the __kmp_fork_call function, and aid in debugging.
Differential Revision: https://reviews.llvm.org/D139116