Fixes a small issues in an offloading test where the test dependec on
the host and device being assigned certains numeric IDs. This however is
not stable and fails in situations where any of the devices is assigned
an ID different from the expected value. The fix just checks that
offloading succeeded by making sure the IDs are different.
The test was failing locally for me.
This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.
The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.
This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.
This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
When we generate the loop body function, we need to be sure, that all
original loop counters are replaced by the new counter.
We need to save all items which use the original loop counter and then
perform replacement of the original loop counter. If we don't do it,
there is a risk that some values are not updated.
Summary:
This patch is an attempt to get a clean run of `check-openmp` running on
an NVPTX machine. I simply took the lists of tests that failed on my
`sm_89` machine and disabled them or fixed them. A lot of these tests
are disabled on AMDGPU already, so it makes sense that NVPTX fails. The
others are simply problems with NVPTX optimized debugging which will
need to be fixed. I opened an issue on one of them.
These tests were slightly broken, in one case a failing test that now works. In the other case
some accidentally left over code during a name change that broke compilation due to missing
symbols.
This patch fixes the erroneous multiple-target requirement in Fortran
offloading tests. Additionally, it adds two new variables
(test_flags_clang, test_flags_flang) to lit.cfg so that
compiler-specific flags for Clang and Flang can be specified.
This patch re-lands: #74543. The error was caused by having:
```
config.substitutions.append(("%flags", config.test_flags))
config.substitutions.append(("%flags_clang", config.test_flags_clang))
config.substitutions.append(("%flags_flang", config.test_flags_flang))
```
when instead it has to be:
```
config.substitutions.append(("%flags_clang", config.test_flags_clang))
config.substitutions.append(("%flags_flang", config.test_flags_flang))
config.substitutions.append(("%flags", config.test_flags))
```
because LIT replaces with the first longest sub-string match.
This patch reduces the size of heap memory required by the test
`malloc_parallel.c` and `malloc.c`. The original size is too large such
that `malloc` returns `nullptr` on many threads, causing illegal
memory access.
This patch fixes the erroneous multiple-target requirement in Fortran
offloading tests. Additionally, it adds two new variables
(`test_flags_clang`, `test_flags_flang`) to `lit.cfg` so that
compiler-specific flags for Clang and Flang can be specified.
In kernel language mode, use user's grid and blocks size directly. No
validity
check, which means if user's values are too large, the launch will fail,
similar
to what CUDA and HIP are doing right now.
Fix mapping of structs to device.
The following example fails:
```
#include <stdio.h>
#include <stdlib.h>
struct Descriptor {
int *datum;
long int x;
int xi;
long int arr[1][30];
};
int main() {
Descriptor dat = Descriptor();
dat.datum = (int *)malloc(sizeof(int)*10);
dat.xi = 3;
dat.arr[0][0] = 1;
#pragma omp target enter data map(to: dat.datum[:10]) map(to: dat)
#pragma omp target
{
dat.xi = 4;
dat.datum[dat.arr[0][0]] = dat.xi;
}
#pragma omp target exit data map(from: dat)
return 0;
}
```
This is a rework of the previous attempt:
https://github.com/llvm/llvm-project/pull/72410
Currently we are missing set up-boundary address for FinalArraySection
as highests elements in partial struct data.
Currently for:
\#pragma omp target map(D.a) map(D.b[:2])
The size is:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%2 = getelementptr float, ptr %arrayidx, i32 1
%3 = ptrtoint ptr %2 to i64
%4 = ptrtoint ptr %a to i64
%5 = sub i64 %3, %4
%6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
Where %2 is wrong for (D.b[:2]) is pointer to first element of array
section. It should pointe to last element of array section.
The fix is to emit the pointer to the last element of array section and
use this pointer as the highest element in partial struct data.
After change IR:
%a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
%b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
%b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
%arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1
%1 = getelementptr float, ptr %arrayidx2, i32 1
%2 = ptrtoint ptr %1 to i64
%3 = ptrtoint ptr %a to i64
%4 = sub i64 %2, %3
%5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
Reverts llvm/llvm-project#74360
As I wrote in the analysis of #74360:
Since
bc4e0c048a
we will not add PluginAdaptors into the container of all plugin adaptors
before the plugin is not ready. The error is thereby gone. When and old
HSA loads other libraries they can call register_image but that will
simply not register the image with the plugin we are currently
initializing. That seems like reasonable behavior, thought it is good to
keep in mind if we ever want a kernel library (@jhuber6 @mjklemm). We
can still have a standalone kernel library though or load it late after
all plugins are setup (which seems reasonable).
I did not expect one our tests actually doing exactly what this will not
allow anymore, at least when you use rocm <5.5.0. Need to figure out if
we want this behavior (for rocm <5.5.0).
Before we expected all symbols in the device image to be backed up with
data that we could read. However, uninitialized values are not. We now
check for this case and avoid reading random memory.
This also replaces the correct readGlobalFromImage call with a
isSymbolInImage check after
https://github.com/llvm/llvm-project/pull/74550 picked the wrong one.
Fixes: https://github.com/llvm/llvm-project/issues/74582
This fixes two bugs and adds a test for them:
- A shared library with declare target functions but without kernels
should not error out due to missing globals.
- Enabling LIBOMPTARGET_INFO=32 should not deadlock in the presence of
indirect declare targets.
This moves the offload entry logic into classes and provides convenient
accessors. No functional change intended but we can now print all
offload entries (and later look them up), tested via
`OMPTARGET_DUMP_OFFLOAD_ENTRIES=<device_no>`.
This patch removes the val field from the `MapInfoOp`.
Previously when lowering `TargetOp`, the bounds information for the
`BoxValues` were also being mapped. Instead these ops are now cloned
inside the target region to prevent mapping of non reference typed
values.
Currently there's an edge cases where constant indexing in target
regions can lead to incorrect results as we do not correctly replace
uses of mapped variables in generated target functions with the target
arguments (and accessor instructions) that replace them. This patch
seeks to fix that by extending the current logic in the OMPIRBuilder.
Things like GEP's can come in the form of Constants/ConstantExprs,
Constants and ConstantExpr's do not have access to the knowledge of what
they're contained in, so we must dig a little to find an instruction so
we can tell if they're used inside of the function we're outlining so we
can be sure they are replaceable and we are not accidentally replacing a
usage somewhere else in the module that's still necessary.
This patch handles these by replacing the original constant expression
with a new instruction equivalent; an instruction as it allows easy
modification in the following loop, as we can now know the constant
(instruction) is owned by our target function (as it holds this
knowledge) and replaceUsesOfWith can now be invoked on it (cannot do
this with constants it seems), a brand new one also allows us to be
cautious as it is perhaps possible the old expression was used inside of
the function but exists and is used externally (unlikely by the nature
of a Constant, but still a positive side affect).
If we have more than one reduction variable we need to be consistent
wrt. indexing. In 3de645efe3 we broke this
as the buffer type was reduced to a singleton but the index computation
was not adjusted to account for that offset. This fixes it by
interleaving the reduction variables properly in a array-of-struct
style. We can revert it back to struct-of-array in a follow up if turns
out to be a problem. I doubt it since half the accesses should benefit
from the locallity this layout offers and only the other half were
consecutive before.
Based on https://github.com/llvm/llvm-project/pull/70766 I think it
would be good to have a few more offloading reduction tests, so we do
not accidentally break minimum and maximum reductions another time.
We already have all the information to automatically map function
pointers that have been declared as `indirect` declare target by the
user. This is just enabling and testing the functionality by looking
through the one level of indirection.
target_map_common_block2.f90
- Fix the extra space in the print message.
- #67164 fixes this. So moving it outside of failing and also removing XFAIL marker.
basic-target-region-3D-array.f90
- Corrected the check to account for the new lines printed.
Depends on #67319
This reverts commit e9a48f9e05 because it breaks
3 sollve 5.0 tests:
test_loop_reduction_and_device.c
test_loop_reduction_bitand_device.c
test_loop_reduction_multiply_device.c
A lot of the code was from a time when we had multiple parallel levels.
The new runtime is much simpler, the code can be simplified a lot which
should speed up reductions too.
We used to perform team reduction on global memory allocated in the
runtime and by clang. This was racy as multiple instances of a kernel,
or different kernels with team reductions, would use the same locations.
Since we now have the kernel launch environment, we can allocate dynamic
memory per-launch, allowing us to move all the state into a non-racy
place.
Fixes: https://github.com/llvm/llvm-project/issues/70249
This was a regression introduced by myself in:
6a62707c04
where I too hastily removed the basic handling of implicit captures
we have currently. This will be superseded by all implicit captures
being added to target operations map_info entries in a soon landing
series of patches, however, that is currently not the case so we must
continue to do some basic handling of these captures for the time
being. This patch re-adds that behaviour to avoid regressions.
Unfortunately this means some test changes as well as
getUsedValuesDefinedAbove grabs constants used outside
of the target region which aren't handled particularly
well currently.