Commit Graph

1461 Commits

Author SHA1 Message Date
Carlos Eduardo Seo
dcd7c8b7c9 [OpenMP][AArch64] Workaround for ompt/synchronization tests (#75848)
ompt/synchronization/[masked.c | master.c] tests fail due to a wrong
offset being calculated for the possible return addreses. PR #65936
fixes this for Darwin and the same has to be done for Linux.

Updates #69627
2023-12-19 19:26:23 +01:00
Shilei Tian
a4d1d5f5b5 [OpenMP] Use simple VLA implementation to replace uses of actual VLA
Use of VLA can cause compile warning that was introduced in D156565. This patch
implements a simple stack/heap-based VLA that can miminc the behavior of an
actual VLA and prevent the warning. By default the stack accomodates the
elements. If the number of emelements is greater than N, which by default is 8,
a heap buffer will be allocated and used to acccomodate the elements.
2023-12-15 15:12:33 -05:00
Andrew Brown
68ea91dd8b [openmp][wasm] Allow compiling OpenMP to WebAssembly (#71297)
This change allows building the static OpenMP runtime, `libomp.a`, as
WebAssembly. It builds on the work done in [D142593] but goes further in
several ways:
 - it makes the OpenMP CMake files more WebAssembly-aware
- it conditions much more code (or code that had been refactored since
[D142593]) for `KMP_ARCH_WASM` and `KMP_OS_WASI`
- it fixes a Clang crash due to unimplemented common symbols in
WebAssembly

The commit messages have more details. Please understand this PR as a
start, not the completed work, for WebAssembly support in OpenMP.
Getting the tests running somehow would be a good next step, e.g.; but
what is contained here works, at least with recent versions of
[wasi-sdk] and engines that support [wasi-threads]. I suspect the same
is true for Emscripten and browsers, but I have not tested that
workflow.

[D142593]: https://reviews.llvm.org/D142593
[wasi-sdk]: https://github.com/WebAssembly/wasi-sdk
[wasi-threads]: https://github.com/WebAssembly/wasi-threads

---------

Co-authored-by: Atanas Atanasov <atanas.atanasov@intel.com>
2023-12-14 13:48:01 -06:00
Brad Smith
8b5af3139c [OpenMP] Change check for OS to check for defined for a macro (#75012)
Check for the existence of the macro instead of checking for Solaris.
illumos has this macro in sys/time.h.

/export/home/brad/llvm-brad/openmp/runtime/src/z_Linux_util.cpp:77:9: warning: 'TIMEVAL_TO_TIMESPEC' macro redefined [-Wmacro-redefined]
   77 | #define TIMEVAL_TO_TIMESPEC(tv, ts)                                            \
      |         ^
/usr/include/sys/time.h:424:9: note: previous definition is here
  424 | #define TIMEVAL_TO_TIMESPEC(tv, ts) { \
      |         ^
2023-12-11 09:54:24 -05:00
Sandeep Kosuri
ecc080c07d [OpenMP] return empty stmt for nothing (#74042)
- `nothing` directive was effecting the `if` block structure which it
should not. So return an empty statement instead of an error statement
while parsing to avoid this.
2023-12-03 13:33:38 +05:30
Brad Smith
027935d3cd [OpenMP] Re-enable KMP_HAVE_QUAD on NetBSD 10.0 with GCC 10.5 (#73478) 2023-12-01 16:07:16 -05:00
Shilei Tian
5f864ba195 Revert "[OpenMP] Use simple VLA implementation to replace uses of actual VLA"
This reverts commit 97e16da450 because it
causes build error on i386 system.
2023-11-30 16:15:54 -05:00
Joseph Huber
8b9a6af450 [OpenMP] Add an 'stddef.h' include to 'omp.h' (#73876)
Summary:
We use `size_t` internally in the omp.h header, which is normally
provided by `stdlib.h` which is already included. Howevever, some cases
when using `-ffreestanding` can result in this not being defined via
`stdlib.h`. This patch simply adds an explicit inclusion of this header,
which is provided by the `clang` resource directory, to resolve this in
all cases.
2023-11-29 18:53:30 -06:00
Shilei Tian
97e16da450 [OpenMP] Use simple VLA implementation to replace uses of actual VLA
Use of VLA can cause compile warning that was introduced in D156565. This patch
implements a simple stack/heap-based VLA that can miminc the behavior of an
actual VLA and prevent the warning. By default the stack accomodates the
elements. If the number of emelements is greater than N, which by default is 8,
a heap buffer will be allocated and used to acccomodate the elements.
2023-11-28 19:04:30 -05:00
Shilei Tian
351c3ee5f6 Revert "[OpenMP] Use simple VLA implementation to replace uses of actual VLA"
This reverts commit d46f63553a.
2023-11-28 18:58:47 -05:00
Shilei Tian
d46f63553a [OpenMP] Use simple VLA implementation to replace uses of actual VLA
Use of VLA can cause compile warning that was introduced in D156565. This patch
implements a simple stack/heap-based VLA that can miminc the behavior of an
actual VLA and prevent the warning. By default the stack accomodates the
elements. If the number of emelements is greater than N, which by default is 8,
a heap buffer will be allocated and used to acccomodate the elements.
2023-11-28 18:54:48 -05:00
Shilei Tian
e7f5d609dd Revert "[OpenMP] Use simple VLA implementation to replace uses of actual VLA (#71412)"
This reverts commit eaab947a8a because it
causes link error.
2023-11-28 18:34:24 -05:00
Shilei Tian
eaab947a8a [OpenMP] Use simple VLA implementation to replace uses of actual VLA (#71412)
Use of VLA can cause compile warning that was introduced in D156565.
This patch
implements a simple stack/heap-based VLA that can miminc the behavior of
an
actual VLA and prevent the warning. By default the stack accomodates the
elements. If the number of emelements is greater than N, which by
default is 8,
a heap buffer will be allocated and used to acccomodate the elements.
2023-11-28 18:30:06 -05:00
Alex
d6f00654fb [OpenMP][Runtime][test] Fix ompt task testcase fail randomly (#72337)
Fixed #72231
2023-11-28 14:22:57 +01:00
Brad Smith
20406af31b [runtime] Have the runtime use the compiler builtin for alloca on NetBSD (#73480)
Most of the tests were failing with the following in their logs..

| /usr/bin/ld: /home/brad/llvm-build/runtimes/runtimes-bins/openmp/runtime/src/libomp.so:
warning: Warning: reference to the libc supplied alloca(3); this most likely will not
work. Please use the compiler provided version of alloca(3), by supplying the appropriate
compiler flags (e.g. -std=gnu99).

By making use of __builtin_alloca..

before:

Total Discovered Tests: 353
  Unsupported:  59 (16.71%)
  Passed     :  51 (14.45%)
  Failed     : 243 (68.84%)

after:

Total Discovered Tests: 353
  Unsupported:  59 (16.71%)
  Passed     : 290 (82.15%)
  Failed     :   4 (1.13%)
2023-11-27 13:22:54 -05:00
Lixi Zhou
a3c0f705db [NFC] fix failed ompt tests on M1 device (#65696)
Fix the 2 failed ompt tests on M1 device found on #63194.

```
libomp :: ompt/synchronization/masked.c
libomp :: ompt/synchronization/master.c
```

For the details of this fix, please check the origin discussion in
https://github.com/llvm/llvm-project/issues/63194#issuecomment-1710494689

Thanks @jprotze for the fix.
2023-11-24 23:40:14 +01:00
Joachim Jenke
f5e50b21da [OpenMP] Optimized trivial multiple edges from task dependency graph
From "3.1 Reducing the number of edges" of this [[ https://hal.science/hal-04136674v1/ | paper ]] - Optimization (b)

Task (dependency) nodes have a `successors` list built upon passed dependency.
Given the following code, B will be added to A's successors list building the graph `A` -> `B`
```
// A
 # pragma omp task depend(out: x)
{}

// B
 # pragma omp task depend(in: x)
{}
```

In the following code, B is currently added twice to A's successor list
```
// A
 # pragma omp task depend(out: x, y)
{}

// B
 # pragma omp task depend(in: x, y)
{}
```

This patch removes such dupplicates by checking lastly inserted task in `A` successor list.

Authored by: Romain Pereira (rpereira-dev)
Differential Revision: https://reviews.llvm.org/D158544
2023-11-21 18:36:12 +01:00
Brad Smith
3425e11a11 [OpenMP] Add missing pieces in __kmp_launch_worker for Solaris support (#72613) 2023-11-17 13:04:13 -05:00
Brad Smith
5feebdcef2 [OpenMP] Link against libm on OpenBSD (#70614)
Needed for some math functions in libomp.
2023-11-11 20:37:50 -05:00
Ilya Leoshkevich
72552fc5cb [OpenMP][SystemZ] Compile __kmpc_omp_task_begin_if0() with backchain (#71834)
OpenMP runtime fails to build on SystemZ with clang with the following
error message:

    LLVM ERROR: Unsupported stack frame traversal count

__kmpc_omp_task_begin_if0() uses OMPT_GET_FRAME_ADDRESS(1), which
delegates to __builtin_frame_address(), which in turn works with nonzero
values on SystemZ only if backchain is in use. If backchain is not in
use, the above error is emitted.

Compile __kmpc_omp_task_begin_if0() with backchain. Note that this only
resolves the build error. If at runtime its caller is compiled without
backchain, __builtin_frame_address() will produce an incorrect value,
but will not cause a crash. Since the value is relevant only for OMPT,
this is acceptable.
2023-11-09 23:54:16 +01:00
xingxue-ibm
90a9e9f638 [OpenMP] Fix a condition for KMP_OS_SOLARIS. (#71831)
Line 75 of `z_Linux_util.cpp` checks `#ifdef KMP_OS_SOLARIS` which is
always true regardless of the building platform because macro
`KMP_OS_SOLARIS` is always defined in line 23 of `kmp_platform.h`:
`define KMP_OS_SOLARIS 0`.
2023-11-09 13:30:36 -05:00
Jonathan Peyton
5cc603cb22 [OpenMP] Add skewed iteration distribution on hybrid systems (#69946)
This commit adds skewed distribution of iterations in
nonmonotonic:dynamic schedule (static steal) for hybrid systems when
thread affinity is assigned. Currently, it distributes the iterations at
60:40 ratio. Consider this loop with dynamic schedule type,
for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware
threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to
performance cores and 12 iterations will be assigned to efficient cores.
Each thread with CORE core will process 5 iterations + extras and with
ATOM core will process 3 iterations.

Differential Revision: https://reviews.llvm.org/D152955
2023-11-08 10:19:37 -06:00
Neale Ferguson
1111ef0257 Add openmp support to System z (#66081)
* openmp/README.rst
  - Add s390x to those platforms supported

* openmp/libomptarget/plugins-nextgen/CMakeLists.txt
  - Add s390x subdirectory

* openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
  - Add s390x definitions

* openmp/runtime/CMakeLists.txt
  - Add s390x to those platforms supported

* openmp/runtime/cmake/LibompGetArchitecture.cmake
  - Define s390x ARCHITECTURE

* openmp/runtime/cmake/LibompMicroTests.cmake
  - Add dependencies for System z (aka s390x)

* openmp/runtime/cmake/LibompUtils.cmake
  - Add S390X to the mix

* openmp/runtime/cmake/config-ix.cmake
  - Add s390x as a supported LIPOMP_ARCH

* openmp/runtime/src/kmp_affinity.h
  - Define __NR_sched_[get|set]addinity for s390x

* openmp/runtime/src/kmp_config.h.cmake
  - Define CACHE_LINE for s390x

* openmp/runtime/src/kmp_os.h
  - Add KMP_ARCH_S390X to support checks

* openmp/runtime/src/kmp_platform.h
  - Define KMP_ARCH_S390X

* openmp/runtime/src/kmp_runtime.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/kmp_tasking.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
  - Define ITT_ARCH_S390X

* openmp/runtime/src/z_Linux_asm.S
  - Instantiate __kmp_invoke_microtask for s390x

* openmp/runtime/src/z_Linux_util.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/test/ompt/callback.h
  - Define print_possible_return_addresses for s390x

* openmp/runtime/tools/lib/Platform.pm
  - Return s390x as platform and host architecture

* openmp/runtime/tools/lib/Uname.pm
  - Set hardware platform value for s390x
2023-11-03 12:42:55 +01:00
Brad Smith
b5b251aac8 [OpenMP] Add support for Solaris/x86_64 (#70593)
Tested on `amd64-pc-solaris2.11`.
2023-11-02 23:29:02 -04:00
Brad Smith
0a29879e41 [OpenMP] Add missing bit with the Hurd support (#70609)
Looking at 855d09855d it looks like a bit was
missing. The padding variable is used further down by the KMP_ALLOCA()
function.
2023-10-29 22:35:03 -04:00
Brad Smith
0d1da7c37f [OpenMP] Make use of getloadavg() on *BSD OS's (#70586)
OpenBSD does not have /proc filesystem, neither does FreeBSD (by default).
2023-10-29 18:30:11 -04:00
Brad Smith
223852aecf [OpenMP] Fix building for 32-bit DragonFly, NetBSD, OpenBSD (#70527)
Fixing ```#error "Unknown or unsupported OS"```
2023-10-27 22:53:24 -04:00
Joseph Huber
84d8ace51a [OpenMP][Obvious] Fix function prototype when used in C mode
Summary:
The `llvm_omp_target_dynamic_shared_alloc` prototype in `omp.h`
accidentally left the void argument unspecified. This created unintended
code when called from the C language, causing some `nvlink` failures in
certain scenarios.
2023-10-25 09:35:23 -05:00
Ilya Leoshkevich
77c2b623ca [OpenMP][Tests] Sync struct DEP with the runtime (#69982)
struct DEP defined in multiple testcases must correspond to runtime's
struct kmp_depend_info. The former defines flags as int, and the latter
as kmp_uint8_t. This discrepancy goes unnoticed on little-endian
systems, but breaks big-endian ones.

Make flags in struct DEP unsigned char.
2023-10-24 19:40:08 +02:00
Ilya Leoshkevich
34459b72da [OpenMP] Provide big-endian bitfield definitions (#69995)
structs kmp_depend_info.flags and kmp_tasking_flags contain bitfields,
which overlay integer flag values. The current bitfield definitions
target little-endian machines. On big-endian machines bitfields are laid
out in the opposite order, so the current definitions do not work there.

There are two ways to fix this: either provide big-endian bitfield
definitions, or bit-swap integer flag values. Go with the former, since
it's localized to one place and therefore is more maintainable.
2023-10-24 19:39:50 +02:00
Michael Klemm
f93a697e47 [libomptarget][OpenMP] Initial implementation of omp_target_memset() and omp_target_memset_async() (#68706)
Implement a slow-path version of omp_target_memset*() 

There is a TODO to implement a fast path that uses an on-device
kernel instead of the host-based memory fill operation.  This may
require some additional plumbing to have kernels in libomptarget.so
2023-10-19 15:29:36 +02:00
Shilei Tian
103bb69c04 [OpenMP] Fix a potential memory buffer overflow (#67252)
#67167 reports a potential memory overflow caused by the wrong size
passed to the function `memcpy_s`. This patch fixes it.

Fix #67167.
2023-09-29 12:41:32 -04:00
Kazushi Marukawa
7b8130c2c3 [OpenMP][VE] Limit the number of threads to create (#66729)
VE supports up to 64 threads per a VE process. So, we limit the number
of threads defined by KMP_MAX_NTH. We also modify the __kmp_sys_max_nth
initialization to use KMP_MAX_NTH as a limit.
2023-09-20 17:44:24 +09:00
Terry Wilmarth
102d864719 Fix /tmp approach, and add environment variable method as third fallback during library registration
The /tmp fallback for /dev/shm did not write to a fixed filename, so multiple instances of the runtime would not be able to detect each other. Now, we create the /tmp file in much the same way as the /dev/shm file was created, since mkstemp approach would not work to create a file that other instances of the runtime would detect. Also, add the environment variable method as a third fallback to /dev/shm and /tmp for library registration, as some systems do not have either. Also, add ability to fallback to a subsequent method should a failure occur during any part of the registration process. When unregistering, it is assumed that the method chosen during registration should work, so errors at that point are ignored. This also avoids a problem with multiple threads trying to unregister the library.
2023-09-13 13:50:49 -05:00
Rodrigo Ceccato de Freitas
f94b6f3396 [OpenMP] Remove optimization skipping reduction struct initialization (#65697)
This commit removes an optimization that skips the initialization of the
reduction struct if the number of threads in a team is 1. This
optimization
caused a bug with Hidden Helper Threads. When the task group is
initially
initialized by the master thread but a Hidden Helper Thread executes a
target
nowait region, it requires the reduction struct initialization to
properly
accumulate the data.

This commit also adds a LIT test for issue #57522 to ensure that the
issue is
properly addressed and that the optimization removal does not introduce
any
regressions.

Fixes: #57522
2023-09-12 16:09:16 -05:00
Kazushi Marukawa
e8679b93da [OpenMP][test][VE] Limit the number of AFFINITY_MAX_CPUS for VE (#65872)
Limit the number of AFFINITY_MAX_CPUS for VE because VE's
sched_getaffinity doesn't work correctly with large sized mask buffer.
2023-09-12 23:45:56 +09:00
Kazushi Marukawa
f8efa65ca5 [OpenMP][test][VE] Change to use VE_LD_LIBRARY_PATH for VE (#65869)
Change to use VE_LD_LIBRARY_PATH for VE instead of LD_LIBRARY_PATH. The
VE is connected to the host, and compiled test programs for VE is
invoked on the host and transferred to the VE. If programs are compiled
for the host, we use LD_LIBRARY_PATH. Otherwise, we use
VE_LD_LIBRARY_PATH.
2023-09-10 12:07:16 +09:00
Kazushi (Jam) Marukawa
18b6724355 [OpenMP][VE] Support OpenMP runtime on VE
Support OpenMP runtime library on VE.  This patch makes OpenMP compilable
for VE architecture.  Almost all tests run correctly on VE.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D159401
2023-09-10 08:29:53 +09:00
Brad Smith
7e31b45d6a [OpenMP] Use the more appropriate function to retrieve the thread id on OpenBSD (#65553)
Use the getthrid() function instead of a syscall.
2023-09-07 21:05:25 -04:00
Shilei Tian
010a5a737b [OpenMP] Fix build issue with libomp when OMPT is disabled 2023-09-06 23:40:24 -04:00
Brad Smith
fd4c80dec9 [OpenMP] Fix gettid warnings on DragonFly (#65549)
Define __kmp_gettid() as appropriate for DragonFly.
2023-09-06 20:21:11 -04:00
Shilei Tian
99d67fb9aa [OpenMP] Align up the size when calling aligned_alloc (#65525)
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the
`size` is supposed
to be a multiple of `alignment`, and it is implementation defined
behavior if not.
We have a non-conformant use in `kmp_barrier.h` when allocating
distribute barrier.
The size of the barrier is 576 and the alignment is `4*CACHE_LINE`,
which is 256
on most systems. Apparently it works perfectly fine for Linux and
Intel-based Mac,
but not for Apple Silicon based Mac.

Fix #63194.
2023-09-06 16:28:07 -04:00
Shilei Tian
ff5c7261ef [OpenMP] Fix a wrong assertion in __kmp_get_global_thread_id
The function assumes that `__kmp_gtid_get_specific` always returns a valid gtid.
That is not always true, because when creating the key for thread-specific data,
a destructor is assigned. The dtor will be called at thread exit. However, before
the dtor is called, the thread-specific data will be reset to NULL first
(https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html):

> At thread exit, if a key value has a non-NULL destructor pointer, and the thread
> has a non-NULL value associated with that key, the value of the key is set to NULL.

This will lead to that `__kmp_gtid_get_specific` returns `KMP_GTID_DNE`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D159369
2023-09-06 12:21:43 -04:00
Shilei Tian
518b08c193 [OpenMP] Fix issue of indirect function call in __kmpc_fork_call_if (#65436)
The outlined function is typically invoked by using
`__kmp_invoke_microtask`,
which is written in asm. D138495 introduces a new interface function for
parallel
region for OpenMPIRBuilder, where the outlined function is called via
the function
pointer. For some reason, it works perfectly well on x86 and x86-64
system, but
doesn't work on Apple Silicon. The 3rd argument in the callee is always
`nullptr`, even
if it is not in caller. It appears `x2` always contains `0x0`. This
patch adopts
the typical method to invoke the function pointer. It works on my M2
Ultra Mac.

Fix #63194.
2023-09-06 12:17:45 -04:00
Fangrui Song
678e3ee123 [lldb] Fix duplicate word typos; NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 21:32:24 -07:00
Martin Storsjö
c2019c416c [OpenMP] [test] Fix target_thread_limit.cpp to not assume 4 or more cores
Previously, the test ran a section with

    #pragma omp target thread_limit(4)

and expected it to execute exactly 4 times, even though it would
in practice execute min(cores, 4) times.

Increment a counter and check that it executed 1-4 times.

Differential Revision: https://reviews.llvm.org/D159311
2023-09-01 21:16:58 +03:00
Shilei Tian
35fdf8d703 [OpenMP] Fix a segment fault in __kmp_get_global_thread_id
In `__kmp_get_global_thread_id`, if the gtid mode is 1, after getting the gtid
from TLS, it will store the gtid value to the thread stack maintained in the thread
descriptor. However, `__kmp_get_global_thread_id` can be called when the library
is destructed, after the corresponding thread info has been release. This will
cause a segment fault. This can happen on an Intel-based Mac.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D159324
2023-08-31 21:15:28 -04:00
Martin Storsjö
81ecc887aa [OpenMP] Export __kmpc_set_thread_limit on Windows
This fixes the new test target/target_thread_limit.cpp on
Windows, which was added recently in
08bbff4aad /
https://reviews.llvm.org/D152054.

Differential Revision: https://reviews.llvm.org/D159070
2023-08-29 23:22:21 +03:00
Joachim Jenke
cec855af3e [OpenMP][OMPT] Fix ompt_get_task_memory implementation
Since td_allow_completion_event is a member of the taskdata struct, not all
firstprivate/shared variables are stored at the end of the task memory
allocation. Simply report the whole allocation instead.

Furthermore, the function should always return 0 since in no case there is
another block to report.

Differential Review: https://reviews.llvm.org/D158080
2023-08-28 09:19:52 +02:00
Sandeep Kosuri
08bbff4aad [OpenMP] Codegen support for thread_limit on target directive for host
offloading

- This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5]
- The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region.

Differential Revision: https://reviews.llvm.org/D152054
2023-08-26 22:18:49 -05:00