Commit Graph

64 Commits

Author SHA1 Message Date
Ivan Butygin
26ca8ef836 [libc] GPU RPC interface: add return value to rpc_host_call (#111288) 2024-10-06 20:22:07 +03:00
agozillon
0215579dab [Flang][Offload][Tests] Set default OpenMP version to 5.2 (52) (#110138)
We recently added versioning support to Flang's OpenMP, which restricts
and enables certain things based on the OpenMP specification version.
Currently one of the check-offload tests makes use of a feature that's
at a slightly higher version than the current default causing it to
fail.

This PR basically applies the highest current OpenMP version number as a
default argument for the lit.cfg, if we need more fine grained control
in the future we can expand it to different lit commands for each
relevant version than can then be added in each test. But for now, to
keep it simple, just set the max level version.
2024-09-26 18:56:26 +02:00
Joseph Huber
5f02558d82 [OpenMP] Fix not linking C libraries when enabled (#109168)
Summary:
We used to do this automatically, add it back in to do it manually.
2024-09-18 10:02:16 -07:00
Akash Banerjee
a67b6e1635 Fix typo in test. 2024-09-06 14:23:06 +01:00
Akash Banerjee
142433684a [OpenMP][Flang] Fix dynamic-extent array mapping (#107247)
This patch fixes the mapping and lowering of arrays with dynamic extents
and adds a new test for the same. The fix discards the incomplete the
dynamic extent information and replacing it with just the base type.
When lowering to llvm later, the bounds information is used instead.
2024-09-05 12:44:10 +01:00
Jan Patrick Lehr
1a0cf245ac [Offload] Change x86_64-pc-linux to x86_64-unknown-linux (#107023)
It appears that the RUNTIMES build prefers the x86-64-unknown-linux-gnu
triple notation for the host. This fixes runtime / test breakages when
compiler-rt is used as the CLANG_DEFAULT_RTLIB.
2024-09-03 14:25:33 +02:00
WÁNG Xuěruì
9adf81182e [Offload] Fix stray libomptarget message helper calls (#106837)
In #92581 the `LibomptargetUitls.cmake` helpers have been removed, but
only uses of `libomptarget_say` were migrated. Migrate the remaining few
warning and error messages so the `check-offload` target would not fail
due to missing `libomptarget_warning_say`.

While at it, update the `check-offload` unavailability message to say
`check-offload` instead of `check-libomptarget`.

Fixes #92581
2024-08-31 07:06:41 -05:00
agozillon
f4cf93fb50 [Flang][OpenMP] Align map clause generation and fix issue with non-shared allocations for assumed shape/size descriptor types (#97855)
This PR aims to unify the map argument generation behavior across both
the implicit capture (captured in a target region) and the explicit
capture (process map), currently the varPtr field of the MapInfo for the
same variable will be different depending on how it's captured. This PR
tries to align that across the generations of MapInfoOp in the OpenMP
lowering.

Currently, I have opted to utilise the rawInput (input memref to a HLFIR
DeclareInfoOp) as opposed to the addr field which includes more
information. The side affect of this is that we have to deal with
BoxTypes less often, which will result in simpler maps in these cases.
The negative side affect of this is that we don't have access to the
bounds information through the resulting value, however, I believe the
bounds information we require in our case is still appropriately stored
in the map bounds, and this seems to be the case from testing so far.

The other fix is for cases where we end up with a BoxType argument into
a function (certain assumed shape and sizes cases do this) that has no
fir.ref wrapping it. As we need the Box to be a reference type to
actually utilise the operation to access the base address stored inside
and create the correct mappings we currently generate an intermediate
allocation in these cases, and then store into it, and utilise this as
the map argument, as opposed to the original.

However, as we were not sharing the same intermediate allocation across
all of the maps for a variable, this resulted in errors in certain cases
when detatching/attatching the data e.g. via enter and exit. This PR
adjusts this for cases

Currently we only maintain tracking of all intermediate allocations for
the current function scope, as opposed to module. Primarily as the only
case I am aware of that this is required is in cases where we pass
certain types of arguments to functions (so I opted to minimize the
overhead of the pass for now). It could likely be extended to module
scope if required if we find other cases where it's applicable and
causing issues.
2024-08-23 19:48:43 +02:00
Ethan Luis McDonough
fde2d23ee2 [PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587) (#102691)
This pull request is a revised version of #76587. This pull request
fixes some build issues that were present in the previous version of
this change.

> This pull request is the first part of an ongoing effort to extends
PGO instrumentation to GPU device code. This PR makes the following
changes:
>
> - Adds blank registration functions to device RTL
> - Gives PGO globals protected visibility when targeting a supported
GPU
> - Handles any addrspace casts for PGO calls
> - Implements PGO global extraction in GPU plugins (currently only
dumps info)
>
> These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-08-22 01:10:54 -05:00
Johannes Doerfert
3b7611594f [Offload] Improve error reporting on memory faults (#104254)
Since we can already track allocations, we can diagnose memory faults to
some degree. If the fault happens in a prior allocation (use after free)
or "close but outside" one, we can provide that information to the user.
Note that the fault address might be page aligned, and not all accesses
trigger a fault, especially for allocations that are backed by a
MemoryManager. Still, if people disable the MemoryManager or the
allocation is big enough, we can sometimes provide valueable feedback.
2024-08-21 10:01:35 -07:00
Joseph Huber
e96146cd46 [OpenMP] Temporarily disable test to keep bots green
Summary:
This test mysteriously fails on the bots but not locally, disable until
I can figure out why.
2024-08-20 15:16:05 -05:00
Joseph Huber
e0326b668e [OpenMP] Map omp_default_mem_alloc to global memory (#104790)
Summary:
Currently, we assign this to private memory. This causes failures on
some SOLLVE tests. The standard isn't clear on the semantics of this
allocation type, but there seems to be a consensus that it's supposed to
be shared memory.
2024-08-20 12:00:41 -05:00
Joseph Huber
161e250add [OpenMP] Fix buildbot failing on allocator test 2024-08-14 13:56:12 -05:00
Joseph Huber
74d23f15b6 [OpenMP] Implement 'omp_alloc' on the device (#102526)
Summary:
The 'omp_alloc' function should be callable from a target region. This
patch implemets it by simply calling `malloc` for every non-default
trait value allocator. All the special access modifiers are
unimplemented and return null. The null allocator returns null as the
spec states it should not be usable from the target.
2024-08-14 13:38:55 -05:00
Johannes Doerfert
ff12c0061b [Offload] Ensure to load images when the device is used (#103002)
When we use the device, e.g., with an API that interacts with it, we
need to ensure the image is loaded and the constructors are executed.
Two tests are included to verify we 1) load images and run constructors
when needed, and 2) we do so lazily only if the device is actually used.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-08-13 14:41:26 -07:00
Joseph Huber
dcc27ea41e [LinkerWrapper] Always pass -flto if the linker supports it (#102972)
Summary;
Now that we use the linker to do LTO / device linking, we need to inform
the `clang` invocation to use `-flto` so it forwards arguments like
`-On` correctly.
2024-08-13 11:23:55 -05:00
Johannes Doerfert
3f9c9acedd [Offload] Add the right paths to the CUDA lit tests (#102997) 2024-08-12 20:34:33 -07:00
Johannes Doerfert
80525dfcde [Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be
lowered to the LLVM/Offload API. On the Clang side, this is simply done
by using the OpenMP offload toolchain and emitting calls to `llvm*`
functions to orchestrate the kernel launch rather than `cuda*`
functions. These `llvm*` functions are implemented on top of the
existing LLVM/Offload API.

As we are about to redefine the Offload API, this wil help us in the
design process as a second offload language.

We do not support any CUDA APIs yet, however, we could:
  https://www.osti.gov/servlets/purl/1892137

For proper host execution we need to resurrect/rebase
  https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf
(which was designed for debugging).

```
❯❯❯ cat test.cu
extern "C" {
void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum);
void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum);
}

__global__ void square(int *A) { *A = 42; }

int main(int argc, char **argv) {
  int DevNo = 0;
  int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo));
  *Ptr = 7;
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  square<<<1, 1>>>(Ptr);
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  llvm_omp_target_free_shared(Ptr, DevNo);
}

❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native

❯❯❯ llvm-objdump --offloading test123

test123:        file format elf64-x86-64

OFFLOADING IMAGE [0]:
kind            elf
arch            gfx90a
triple          amdgcn-amd-amdhsa
producer        openmp

❯❯❯ LIBOMPTARGET_INFO=16 ./test123
Ptr 0x155448ac8000, *Ptr 7
Ptr 0x155448ac8000, *Ptr 42
```
2024-08-12 17:44:58 -07:00
jyu2-git
a43677c172 Test faild with amd. (#101781)
Add unspport.

This is relate #101101
2024-08-02 17:53:23 -07:00
jyu2-git
d8b61dd84b [OpenMP] Generate implicit default mapper for mapping array section. (#101101)
This is only for struct containing nested structs with user defined
mappers.

Add four functions:
1>buildImplicitMap: build map for default mapper
2>buildImplicitMapper:  build default mapper.
3>hasUserDefinedMapper for given mapper name and mapper type, lookup
user defined map, if found one return true.
4>isImplicitMapperNeeded check if Mapper is needed

During create map, in checkMappableExpressionList, call
isImplicitMapperNeeded when it return true, call buildImplicitMapper to
generate implicit mapper and added to map clause.

https://github.com/llvm/llvm-project/pull/101101
2024-08-02 17:22:40 -07:00
Johannes Doerfert
f3bfc56327 [Offload][OpenMP] Prettify error messages by "demangling" the kernel name (#101400)
The kernel names for OpenMP are manually mangled and not ideal when we
report something to the user. We demangle them now, providing the
function and line number of the target region, together with the actual
kernel name.
2024-08-01 15:24:15 -07:00
Johannes Doerfert
9a1013220b [Offload] Allow to record kernel launch stack traces (#100472)
Similar to (de)allocation traces, we can record kernel launch stack
traces and display them in case of an error. However, the AMD GPU plugin
signal handler, which is invoked on memroy faults, cannot pinpoint the
offending kernel. Insteade print `<NUM>`, set via
`OFFLOAD_TRACK_NUM_KERNEL_LAUNCH_TRACES=<NUM>`, many traces. The
recoding/record uses a ring buffer of fixed size (for now 8).
For `trap` errors, we print the actual kernel name, and trace if
recorded.
2024-07-31 11:49:50 -07:00
Johannes Doerfert
c95abe94ae [Offload] Implement double free (and other allocation error) reporting (#100261)
As a first step towards a GPU sanitizer we now can track allocations and
deallocations in order to report double frees, and other problems during
deallocation.
2024-07-30 10:10:57 -07:00
Joseph Huber
363c1e6e51 [OpenMP] Re-enable test after correctly forwarding mllvm 2024-07-29 10:06:54 -05:00
Joseph Huber
7ebd97b852 [OpenMP] Do not define '__assert_fail' if we have the GPU libc (#100409)
Summary:
The C library is intended to provide `__assert_fail`, so in the cases
that we have both we should defer to that. This means that if you build
the C library for GPUs you'll get the RPC based asser, and if not you'll
get the trap based one.
2024-07-26 15:18:10 -05:00
Johannes Doerfert
7102592af7 [Offload] Repair and rename llvm-omp-device-info (to -offload-) (#100309)
The `llvm-omp-device-info` tool is very handy, but broke due to the lazy
evaluation of devices. This repairs the functionality and adds a test.
The tool is also renamed into `llvm-offload-device-info` as `-omp-` is
going away.
2024-07-24 09:35:09 -07:00
Joseph Huber
8d8fa01a66 Reapply "[libc] Remove 'packaged' GPU build support (#100208)"
This reverts commit 550b83d658.
2024-07-24 10:24:53 -05:00
Johannes Doerfert
3c8efd7928 [OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927)
In debug mode there is a wrapper (the kernel) around the function in
which we generate the kernel code. We worked around this before to get
the correct kernel name, but now we really distinguish both to attach
the launch bounds to the kernel, not the inner function.
2024-07-23 09:02:47 -07:00
Joseph Huber
4854e25359 [Offload] Re-enable tests that are now passing
Summary:
Some recent patches made these stop failing so the XFAIL now makes the
bots go red.

Fixes https://github.com/llvm/llvm-project/issues/98903
2024-07-23 10:56:55 -05:00
Pranav Bhandarkar
d7e185cca9 [OMPIRBuilder] - Handle dependencies in createTarget (#93977)
This patch handles dependencies specified by the `depend` clause on an
OpenMP target construct. It does this much the same way clang does it by
materializing an OpenMP `task` that is tagged with the dependencies.

The following functions are relevant to this patch -
1) `createTarget` - This function itself is largely unchanged except
that it now accepts a vector of `DependData` objects that it simply
forwards to `emitTargetCall`
2) `emitTargetCall` - This function has changed now to check if an outer
target-task needs to be materialized (i.e if `target` construct has
`nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to
do all the heavy lifting for creating and dispatching the task.
3) `emitTargetTask` - Bulk of the change is here. See the large comment
explaining what it does at the beginning of this function
2024-07-22 10:56:45 -05:00
Jan Patrick Lehr
caaf8099ef [Offload][OMPT] Add callbacks for (dis)associate_ptr (#99046)
This adds the OMPT callbacks for the API functions disassociate_ptr and
associate_ptr.
2024-07-17 10:15:19 +02:00
Jan Patrick Lehr
4ed0f84d38 [Offload] XFAIL four tests while working on fix (#98899)
omp_dynamic_shared_memory_mixed_amdgpu.c
omp_dynamic_shared_memory_amdgpu.c
amdgcn-amd-amdhsa::bug51982.c
amdgcn-amd-amdhsa::bug51781.c
2024-07-15 15:45:59 +02:00
Jinsong Ji
6556ba66b2 [Offload][test]Fix typo of requires (#98327)
Typos in 8823448807.
2024-07-10 10:51:47 -04:00
jyu2-git
32f7672acc [Clang][OpenMP] This is addition fix for #92210. (#94802)
Fix another runtime problem when explicit map both pointer and pointee
in target data region.

In #92210, problem is only addressed in target region, but missing for
target data region.

The change just passing AreBothBasePtrAndPteeMapped in
generateInfoForComponentList when processing target data.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-07-03 20:56:53 -07:00
dhruvachak
946f5d111d [OpenMP] [OMPT] Callback registration should not depend on the device init callback. (#96371)
Even if the device init callback is not registered, a tool should be
allowed to register other callbacks.
2024-07-01 10:07:05 -07:00
Gheorghe-Teodor Bercea
1a478a69bc [OpenMP][offload] Fix dynamic schedule tracking (#97065)
This patch fixes the dynamic schedule tracking.
2024-07-01 10:23:11 -04:00
Ethan Luis McDonough
8823448807 [Offload] Refactor offload test requirements (#95196)
Many tests in the `offload` project have requirements defined by which
targets are not supported rather than which platforms are supported.
This patch aims to streamline the requirement definitions by adding four
new feature tags: `host`, `gpu`, `amdgpu`, and `nvidiagpu`.
2024-06-29 00:56:18 -05:00
Ethan Luis McDonough
2c8b912f63 Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587)"
This reverts commit 5fd2af38e4. It caused build issues and broke the buildbot.
2024-06-28 12:30:45 -05:00
Ethan Luis McDonough
5fd2af38e4 [PGO][OpenMP] Instrumentation for GPU devices (#76587)
This pull request is the first part of an ongoing effort to extends PGO
instrumentation to GPU device code. This PR makes the following changes:

- Adds blank registration functions to device RTL
- Gives PGO globals protected visibility when targeting a supported GPU
- Handles any addrspace casts for PGO calls
- Implements PGO global extraction in GPU plugins (currently only dumps
info)

These changes can be tested by supplying `-fprofile-instrument=clang`
while targeting a GPU.
2024-06-28 10:42:19 -05:00
agozillon
aec735cf47 [Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829)
This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.

The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.

For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.

In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).

In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.

Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
2024-06-25 20:54:04 +02:00
Joseph Huber
3de162fee4 [Offload] Fix using old deprecated CUDA root variable (#96307)
Summary:
This variable isn't being set properly since we moved to the new way to
find the CUDA directory. That means this variable was just unset the
whole time. This patch adds it in by calculating it using the binary
directory so it can be passed to `--cuda-path`.
2024-06-21 10:08:05 -05:00
Tim Gymnich
597d2f7662 [OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts (#89239)
Sometimes it might be beneficial to spawn more thread blocks instead of
reusing existing for multiple loop iterations.

**Alternatives considered:**

Make `DefaultNumBlocks` settable via an environment variable.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-06-14 07:35:23 -07:00
agozillon
0aeaa2d93d [OMPIRBuilder][OpenMP][LLVM] Modify and use ReplaceConstant utility in convertTarget (#94541)
This PR seeks to expand/replace the Constant -> Instruction conversion
that needs to occur inside of the OpenMP Target kernel generation to
allow kernel argument replacement of uses within the kernel (cannot
replace constant uses within constant expressions with non-constants).
It does so by making use of the new-ish utility
convertUsersOfConstantsToInstructions which is a much more expansive
version of what the smaller "version" of the function I wrote does,
effectively expanding uses of the input argument that are constant
expressions into instructions so that we can replace with the
appropriate kernel argument.

Also alters convertUsersOfConstantsToInstructions to optionally
restrict the replacement to a function and optionally leave
dead constants alone, the latter is necessary when lowering from 
MLIR as we cannot be sure we can remove the constants at this 
stage, even if rewritten to instructions the ModuleTranslation may 
maintain links to the original constants and utilise them in further 
lowering steps (as when we're lowering the kernel, the module is 
still in the process of being lowered). This can result in unusual 
ICEs later. These dead constants can be tidied up later (and 
appear to be in subsequent lowering from checking with 
emit-llvm).
2024-06-13 15:57:15 +02:00
Joseph Huber
2cc1644299 [Offload] Fix missing abs function for test
Summary:
We don't have the abs function to link against, just use the builtin.
2024-06-06 14:37:06 -05:00
Shilei Tian
4cd115ca34 [OpenMP][OMPX] Fix ompx_ballot_sync test (#94140)
The current test is not really correct because the mask is set to
0xffffffff
even if it is on an AMDGPU whose wavefront size is 64. Besides,
`__AMDGCN_WAVEFRONT_SIZE` is not set on host compilation so the
verification
happens to work.
2024-06-03 11:18:26 -04:00
Shilei Tian
b448efb8ea Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)" (#94139) 2024-06-03 11:17:36 -04:00
Krzysztof Parzyszek
adc4e45f2e [Offload] Update test to use target parallel for reduction
Re-enable test disabled in 1bf1f93d with a fix.
2024-05-30 09:17:17 -05:00
Krzysztof Parzyszek
1bf1f93d94 [Offload] Temporarily disable failing test after eb88e7c1
The `target reduction` combination is no longer accepted.
Disable the test to avoid build failures, until a better fix is ready.
2024-05-30 08:52:29 -05:00
Shilei Tian
cf9eeb67e5 Revert "Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)""
This reverts commit 7b48655822.
2024-05-26 01:04:39 -04:00
Shilei Tian
7b48655822 Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311)"
This reverts commit 9b31cc71d6.
2024-05-26 00:57:50 -04:00