The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name. The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources. The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary. Documentation and debug features are still mostly missing and will be added over time. All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols. Differential Revision: https://reviews.llvm.org/D106803
87 lines
2.3 KiB
C++
87 lines
2.3 KiB
C++
//===--------- Mapping.h - OpenMP device runtime mapping helpers -- C++ -*-===//
|
|
//
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
//
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#ifndef OMPTARGET_MAPPING_H
|
|
#define OMPTARGET_MAPPING_H
|
|
|
|
#include "Types.h"
|
|
|
|
namespace _OMP {
|
|
|
|
namespace mapping {
|
|
|
|
#pragma omp declare target
|
|
|
|
inline constexpr uint32_t MaxThreadsPerTeam = 1024;
|
|
|
|
#pragma omp end declare target
|
|
|
|
/// Initialize the mapping machinery.
|
|
void init(bool IsSPMD);
|
|
|
|
/// Return true if the kernel is executed in SPMD mode.
|
|
bool isSPMDMode();
|
|
|
|
/// Return true if the kernel is executed in generic mode.
|
|
bool isGenericMode();
|
|
|
|
/// Return true if the executing thread is the main thread in generic mode.
|
|
bool isMainThreadInGenericMode();
|
|
|
|
/// Return true if the executing thread has the lowest Id of the active threads
|
|
/// in the warp.
|
|
bool isLeaderInWarp();
|
|
|
|
/// Return a mask describing all active threads in the warp.
|
|
LaneMaskTy activemask();
|
|
|
|
/// Return a mask describing all threads with a smaller Id in the warp.
|
|
LaneMaskTy lanemaskLT();
|
|
|
|
/// Return a mask describing all threads with a larget Id in the warp.
|
|
LaneMaskTy lanemaskGT();
|
|
|
|
/// Return the thread Id in the warp, in [0, getWarpSize()).
|
|
uint32_t getThreadIdInWarp();
|
|
|
|
/// Return the thread Id in the block, in [0, getBlockSize()).
|
|
uint32_t getThreadIdInBlock();
|
|
|
|
/// Return the warp id in the block.
|
|
uint32_t getWarpId();
|
|
|
|
/// Return the warp size, thus number of threads in the warp.
|
|
uint32_t getWarpSize();
|
|
|
|
/// Return the number of warps in the block.
|
|
uint32_t getNumberOfWarpsInBlock();
|
|
|
|
/// Return the block Id in the kernel, in [0, getKernelSize()).
|
|
uint32_t getBlockId();
|
|
|
|
/// Return the block size, thus number of threads in the block.
|
|
uint32_t getBlockSize();
|
|
|
|
/// Return the number of blocks in the kernel.
|
|
uint32_t getNumberOfBlocks();
|
|
|
|
/// Return the kernel size, thus number of threads in the kernel.
|
|
uint32_t getKernelSize();
|
|
|
|
/// Return the number of processing elements on the device.
|
|
uint32_t getNumberOfProcessorElements();
|
|
|
|
} // namespace mapping
|
|
|
|
} // namespace _OMP
|
|
|
|
#endif
|