Commit Graph

1517 Commits

Author SHA1 Message Date
Xing Xue
b3792ae42a [OpenMP][AIX] Fix test config for AIX (#88272)
This patch fixes the test config so that it works for
`tasking/omp50_taskdep_depobj.c` which uses different flags to test with
compiler's `omp.h`.
* set test environment variable `OBJECT_MODE` to `64` if it is set
explicitly to `64` in the AIX environment. `OBJECT_MODE` is default to
`32` and is recognized by AIX compilers and toolchain. In this way, we
don't need to set `-m64` for all compiler flags for 64-bit mode
* add option `-Wl,-bmaxdata` to 32-bit `test_openmp_flags` used by
`tasking/omp50_taskdep_depobj.c`
2024-04-10 16:06:31 -04:00
Joseph Huber
d022f6b8ff [Libomp] Place generated OpenMP headers into build resource directory (#88007)
Summary:
These headers are a part of the compiler's resource directory once
installed. However, they are currently placed in the binary directory
temporarily. This makes it more difficult to use the compiler out of the
build directory and will cause issues when moving to `liboffload`. This
patch changes the logic to write these instead to the copmiler's
resource directory inside of the build tree.

NOTE: This doesn't change the Fortran headers, I don't know enough about
those and it won't use the same directory.
2024-04-09 08:47:51 -05:00
Pete Steinfeld
25e3d2b0fc Revert "[Libomp] Place generated OpenMP headers into build resource d… (#88083)
…irectory (#88007)"

This reverts commit 8671429151.

This commit broke the flang build, so I'm reverting it. See the comments
in merge request #88007 for more information.
2024-04-08 20:20:27 -07:00
Joseph Huber
8671429151 [Libomp] Place generated OpenMP headers into build resource directory (#88007)
Summary:
These headers are a part of the compiler's resource directory once
installed. However, they are currently placed in the binary directory
temporarily. This makes it more difficult to use the compiler out of the
build directory and will cause issues when moving to `liboffload`. This
patch changes the logic to write these instead to the copmiler's
resource directory inside of the build tree.

NOTE: This doesn't change the Fortran headers, I don't know enough about
those and it won't use the same directory.
2024-04-08 15:26:54 -05:00
Jonathan Peyton
eeaaf33fc2 [OpenMP] Unsupport absolute KMP_HW_SUBSET test for s390x (#87555) 2024-04-04 13:54:40 -05:00
Jonathan Peyton
2ff3850ea1 [OpenMP] Add absolute KMP_HW_SUBSET functionality (#85326)
Users can put a : in front of KMP_HW_SUBSET to indicate that the
specified subset is an "absolute" subset. Currently, when a user puts
KMP_HW_SUBSET=1t. This gets translated to KMP_HW_SUBSET="*s,*c,1t",
where * means "use all of". If a user wants only one thread as the
entire topology they can now do KMP_HW_SUBSET=:1t.

Along with the absolute syntax is a fix for newer machines and making
them easier to use with only the 3-level topology syntax. When a user
puts KMP_HW_SUBSET=1s,4c,2t on a machine which actually has 4 layers,
(say 1s,2m,3c,2t as the entire machine) the user gets an unexpected "too
many resources asked" message because KMP_HW_SUBSET currently translates
the "4c" value to mean 4 cores per module. To help users out, the
runtime can assume that these newer layers, module in this case, should
be ignored if they are not specified, but the topology should always
take into account the sockets, cores, and threads layers.
2024-04-03 11:43:23 -05:00
Jonathan Peyton
4ea24946e3 [OpenMP] Fix nested parallel with tasking (#87309)
When a nested parallel region ends, the runtime calls __kmp_join_call().
During this call, the primary thread of the nested parallel region will
reset its tid (retval of omp_get_thread_num()) to what it was in the
outer parallel region. A data race occurs with the current code when
another worker thread from the nested inner parallel region tries to
steal tasks from the primary thread's task deque. The worker thread
reads the tid value directly from the primary thread's data structure
and may read the wrong value.

This change just uses the calculated victim_tid from execute_tasks()
directly in the steal_task() routine rather than reading tid from the
data structure.

Fixes: #87307
2024-04-02 15:56:50 -05:00
nihui
31880df994 [OpenMP] get logical core count on modern apple platform (#87231)
`hw.logicalcpu` returns the available logical core count

Fix build error for watchOS

```
runtime/src/z_Linux_util.cpp:1821:8: error: 'host_info' is unavailable: not available on watchOS
  rc = host_info(mach_host_self(), HOST_BASIC_INFO, (host_info_t)&info, &num);
       ^
/Applications/Xcode_15.2.app/Contents/Developer/Platforms/WatchOS.platform/Developer/SDKs/WatchOS10.2.sdk/usr/include/mach/mach_host.h:82:15: note: 'host_info' has been explicitly marked unavailable here
kern_return_t host_info
              ^
1 warning and 1 error generated.
make[2]: *** [runtime/src/CMakeFiles/omp.dir/z_Linux_util.cpp.o] Error 1
```
2024-04-02 11:38:40 -04:00
nihui
c5bbdb6494 [OpenMP] arm64_32 port for Apple WatchOS (#87246)
detect `aarch64_32` with compiler defined macro `__ARM64_ARCH_8_32__`
reuse ARM `__kmp_unnamed_critical_addr` and add `KMP_PREFIX_UNDERSCORE`
macro like AARCH64
reuse AARCH64 `__kmp_invoke_microtask`


build log for watchos armv7k + arm64_32 and watchos simulator x86_64 +
arm64

https://github.com/nihui/action-protobuf/actions/runs/8520684611/job/23337305030
2024-04-02 11:38:32 -04:00
Jonathan Peyton
038e66fe59 [OpenMP] Have hidden helper team allocate new OS threads only (#87119)
The hidden helper team pre-allocates the gtid space [1,
num_hidden_helpers] (inclusive). If regular host threads are allocated,
then put back in the thread pool, then the hidden helper team is
initialized, the hidden helper team tries to allocate the threads from
the thread pool with gtids higher than [1, num_hidden_helpers]. Instead,
have the hidden helper team fork OS threads so the correct gtid range
used for hidden helper threads.

Fixes: #87117
2024-03-29 17:26:00 -05:00
Ulrich Weigand
b999e631c0 [OpenMP] Fix node destruction race in __kmpc_omp_taskwait_deps_51 (#86130)
The __kmpc_omp_taskwait_deps_51 allocates a kmp_depnode_t node on its
stack, and there is currently a race condition where another thread
might still be accessing that node after the function has returned and
its stack frame was released.

While the function does wait until the node's npredecessors count has
reached zero before exiting, there is still a window where the function
that last decremented the npredecessors count assumes the node is still
accessible.

For heap-allocated kmp_depnode_t nodes, this normally works via a
separate ndeps count that only reaches zero at the point where no
accesses to the node are expected at all; in fact, at this point the
heap allocation will be freed.

For this case of a stack-allocated kmp_depnode_t node, it therefore
makes sense to similarly respect the ndeps count; we need to wait until
this reaches 1 (not 0, because it is not heap-allocated so there's
always one extra count to prevent it from being freed), before we can
safely deallocate our stack frame.

As this is expected to be a short race window of only a few
instructions, it should be fine to just use a busy wait loop checking
the ndeps count.

Fixes: https://github.com/llvm/llvm-project/issues/85963
2024-03-28 12:15:39 +01:00
Terry Wilmarth
aa2c14de1a [OpenMP] Close up permissions on /tmp files (#85469)
The SHM or /tmp files that might be created during library registration
don't need to have such open permissions, so this change fixes that.
2024-03-27 11:27:28 -04:00
Shilei Tian
a7ac0dd624 [NFC][OpenMP] Use SimpleVLA to replace variable length arrays in C++ 2024-03-27 00:23:32 -04:00
Shilei Tian
fa9ee4a7f9 [NFC][OpenMP] Silent unused variable in kmp_collapse.cpp 2024-03-27 00:09:40 -04:00
Vadim Paretsky
7db4046322 [OpenMP] add loop collapse tests (#86243)
This PR adds loop collapse tests ported from MSVC.

---------

Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>
2024-03-26 16:41:31 -07:00
Xing Xue
d394f3a162 [OpenMP][AIX] Affinity implementation for AIX (#84984)
This patch implements `affinity` for AIX, which is quite different from
platforms such as Linux.
- Setting CPU affinity through masks and related functions are not
supported. System call `bindprocessor()` is used to bind a thread to one
CPU per call.
- There are no system routines to get the affinity info of a thread. The
implementation of `get_system_affinity()` for AIX gets the mask of all
available CPUs, to be used as the full mask only.
- Topology is not available from the file system. It is obtained through
system SRAD (Scheduler Resource Allocation Domain).

This patch has run through the libomp LIT tests successfully with
`affinity` enabled.
2024-03-22 15:25:08 -04:00
Michael Klemm
fb5fd2d82f [flang][OpenMP] Compile proper omp_lib.mod from the openmp/src/include sources (#80874)
This PR changes the build system to use use the sources for the module
`omp_lib` and the `omp_lib.h` include file from the `openmp` runtime
project and not from a separate copy of these files. This will greatly
reduce potential for inconsistencies when adding features to the OpenMP
runtime implementation.

When the OpenMP subproject is not configured, this PR also disables the
corresponding LIT tests with a "REQUIRES" directive at the beginning of
the OpenMP test files.

---------

Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
2024-03-20 13:47:26 +01:00
Brad Smith
c7de4a39d5 [OpenMP] Enable the affinity tests on FreeBSD, NetBSD and DragonFly (#85500)
FreeBSD, NetBSD and DragonFly also have affinity support. So enable the tests there as well.
2024-03-19 13:29:19 -04:00
nicebert
20f5bcfb1a [OpenMP] Add OpenMP extension API to dump mapping tables (#85381)
This adds an API call ompx_dump_mapping_tables.
This allows users to debug the mapping tables and can be especially
useful for unified shared memory applications to check if the code
behaves in the way it should. The implementation reuses code already
present to dump mapping tables (in a debug setting).

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-03-18 14:09:20 -05:00
David CARLIER
6d3cec01a6 Revert "[openmp] __kmp_x86_cpuid fix for i386/PIC builds." (#85526)
Reverts llvm/llvm-project#84626
2024-03-16 13:41:12 +00:00
Andrew Brown
d83660827f [openmp][wasm] Fix microtask type mismatch (#84355)
When OpenMP is compiled for WebAssembly (see #71297), it invokes a
microtask via a `switch` statement that dispatches to the `void *`
microtask pointer with spelled-out arguments (not varargs). As #83329
points out, however, this can result in a type mismatch when the
indirect call is executed by WebAssembly; WebAssembly expects the called
pointer to have the precise type of the call site. This change fixes the
issue by bringing back the approach in [D142593] of type-casting all the
`switch` arms to the precise type. This fixes #83329.

[D142593]: https://reviews.llvm.org/D142593
2024-03-14 10:23:44 -05:00
MessyHack
ea848d0a6d [OpenMP] Sort topology after adding processor group layer. (#83943)
Various behavior around creating affinity masks and detecting uniform
topology depends on the topology being sorted.

resort topology after adding processor group layer to ensure that the
updated topology reflects the newly added processor group info.

Observed that the topology was not sorted correctly on high core count
AMD Epyc Genoa (2 sockets, 96 cores, 2 threads) using NUMA (NPS 2+).
2024-03-13 16:22:23 -05:00
Jonathan Peyton
6272500e0b [OpenMP] Remove unused logical/physical CPUID information (#83298) 2024-03-12 11:37:01 -07:00
Jonathan Peyton
3303be63fc [OpenMP] Make sure mask is set to nullptr (#83299) 2024-03-12 11:36:43 -07:00
Jonathan Peyton
f5334f5da5 [OpenMP] Add debug checks for divide by zero (#83300) 2024-03-12 11:36:19 -07:00
Jonathan Peyton
9b1c496898 [OpenMP] Fixup while loops to avoid bad NULL check (#83302) 2024-03-11 10:28:12 -05:00
Jonathan Peyton
de4d7015d0 [OpenMP] Remove unnecessary check of ap (#83303) 2024-03-11 10:27:53 -05:00
Jonathan Peyton
1ed463d961 [OpenMP] Make sure ptr is used after NULL check (#83304) 2024-03-11 10:27:31 -05:00
Jonathan Peyton
b4e39ad117 [OpenMP] Remove dead code of checking int > INT_MAX (#83305) 2024-03-11 10:26:53 -05:00
David CARLIER
facb89ae12 [openmp] __kmp_x86_cpuid fix for i386/PIC builds. (#84626) 2024-03-11 13:15:43 +00:00
David CARLIER
fa4cc39255 [openmp] adding affinity support to DragonFlyBSD. (#84672) 2024-03-10 09:56:55 +00:00
Vadim Paretsky
110141b378 [OpenMP] fix endianness dependent definitions in OMP headers for MSVC (#84540)
MSVC does not define __BYTE_ORDER__ making the check for BigEndian
erroneously evaluate to true and breaking the struct definitions in MSVC
compiled builds correspondingly. The fix adds an additional check for
whether __BYTE_ORDER__ is defined by the compiler to fix these.

---------

Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>
2024-03-09 10:47:31 -08:00
David CARLIER
11cd2a33f1 [openmp] porting affinity feature to netbsd. (#84618)
netbsd supports the portable hwloc's layer as well. for a hardware with
4 cpus, a cpu set is 4 and maxcpus is 256.
2024-03-09 11:45:07 +00:00
David CARLIER
05280b582a [OpenMP] Implements __kmp_is_address_mapped for Solaris/Illumos. (#82930)
Also fixing OpenMP build itself for this platform.
2024-03-08 20:34:43 +00:00
vadikp-intel
fcd2d48325 [OpenMP] runtime support for efficient partitioning of collapsed triangular loops (#83939)
This PR adds OMP runtime support for more efficient partitioning of
certain types of collapsed loops that can be used by compilers that
support loop collapsing (i.e. MSVC) to achieve more optimal thread load
balancing.

In particular, this PR addresses double nested upper and lower isosceles
triangular loops of the following types

1. lower triangular 'less_than'
   for (int i=0; i<N; i++)
     for (int j=0; j<i; j++)
2. lower triangular 'less_than_equal'
   for (int i=0; i<N; j++)
     for (int j=0; j<=i; j++)
3. upper triangular
   for (int i=0; i<N; i++)
     for (int j=i; j<N; j++)

Includes tests for the three supported loop types.

---------

Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>
2024-03-07 16:28:03 -08:00
Jonathan Peyton
0e0bee26e7 [OpenMP] Fix distributed barrier hang for OMP_WAIT_POLICY=passive (#83058)
The resume thread logic inside __kmp_free_team() is faulty. Only
checking b_go for sleep status doesn't wake up distributed barrier.
Change to generic check for th_sleep_loc and calling
__kmp_null_resume_wrapper().

Fixes: #80664
2024-02-27 14:15:48 -06:00
Joachim
822142ffdf [OpenMP][OMPD] libompd must not link libomp (#83119)
Fixes a regression introduced in 91ccd8248.
The code for libompd includes kmp.h for enum kmp_sched. The dependency
to hwloc is not necessary. Avoid the dependency by skipping the
definitions in kmp.h using types from hwloc.h.

Fixes #80750
2024-02-27 16:24:55 +01:00
Xing Xue
a4dcfbcb78 [OpenMP][AIX] XFAIL capacity tests on AIX in 32-bit (#83014)
This patch XFAILs two capacity tests on AIX in 32-bit because running
out resource with `4 x omp_get_max_threads()` in 32-bit mode.
2024-02-26 13:13:05 -05:00
David CARLIER
9e7c0b1385 [OpenMP] Implement __kmp_is_address_mapped on DragonFlyBSD. (#82895)
implement internal __kmp_is_address_mapped.
2024-02-25 14:13:04 +00:00
Xing Xue
94100bc2fb [OpenMP][AIX]Add assembly file containing microtasking routines and unnamed common block definitions (#81770)
This patch adds assembly file `z_AIX_asm.S` that contains the 32- and
64-bit XCOFF version of microtasking routines and unnamed common block
definitions. This code has been run through the libomp LIT tests and a
user package successfully.
2024-02-20 12:08:37 -05:00
Martin Storsjö
4b9c089381 [OpenMP] [test] Skip the -mlong-double-80 test on MSVC ABI (#81115)
Within the MSVC ABI, long doubles are the same as regular 64 bit
doubles. This test case, which is compiled with -mlong-double-80, cannot
work when libomp has been compiled without that flag, as
-mlong-double-80 changes the calling convention for the tested
functions.
2024-02-19 11:33:28 +02:00
Xing Xue
2de269a641 [OpenMP][AIX] Set worker stack size to 2 x KMP_DEFAULT_STKSIZE if system stack size is too big (#81996)
This patch sets the stack size of worker threads to `2 x
KMP_DEFAULT_STKSIZE` (2 x 4MB) for AIX if the system stack size is too
big. Also defines maximum stack size for 32-bit AIX.
2024-02-16 15:12:41 -05:00
Xing Xue
ac97562c99 [OpenMP][AIX]Define struct kmp_base_tas_lock with the order of two members swapped for big-endian (#79188)
The direct lock data structure has bit `0` (the least significant bit)
of the first 32-bit word set to `1` to indicate it is a direct lock. On
the other hand, the first word (in 32-bit mode) or first two words (in
64-bit mode) of an indirect lock are the address of the entry allocated
from the indirect lock table. The runtime checks bit `0` of the first
32-bit word to tell if this is a direct or an indirect lock. This works
fine for 32-bit and 64-bit little-endian because its memory layout of a
64-bit address is (`low word`, `high word`). However, this causes
problems for big-endian where the memory layout of a 64-bit address is
(`high word`, `low word`). If an address of the indirect lock table
entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it
is treated as a direct lock. This patch defines `struct
kmp_base_tas_lock` with the ordering of the two 32-bit members flipped
for big-endian PPC64 so that when checking/setting tags in member
`poll`, the second word (the low word) is used. This patch also changes
places where `poll` is not already explicitly specified for
checking/setting tags.
2024-02-13 15:11:24 -05:00
Daniil Fukalov
94272a5a5d [OpenMP] Fix libomp debug build. (#81029)
Disable libstdc++ assertions in the runtime library just like in
https://reviews.llvm.org/D143168.
2024-02-09 17:54:14 +01:00
Xing Xue
7a9b0e4acb [OpenMP][test]Flip bit-fields in 'struct flags' for big-endian in test cases (#79895)
This patch flips bit-fields in `struct flags` for big-endian in test
cases to be consistent with the definition of the structure in libomp
`kmp.h`.
2024-02-07 15:24:52 -05:00
vigbalu
edfc21a575 [OMPD] Runtime Entry Point functions for OMPD in libomp.so need C linkage as per standard. (#79246)
Adding extern "C" to all the entry point functions to make sure that
these functions are not mangled.
2024-02-06 10:12:47 +01:00
Martin Storsjö
2d2f962c9b [openmp] Add a dependency on the separate import library (#80449)
Currently, when doing e.g. "ninja check-openmp", the check-openmp target
only depends on the target "omp", which builds the library. Thus by
doing that, the separate import library "libomp.lib", which is generated
directly from a def file, never gets created, unless one does a separate
invocation first, that builds all targets.

To fix this, make the "omp" target depend on the target for the separate
import library, whenever that is created/used.
2024-02-03 01:06:40 +01:00
Alexandre Ganea
ca0e241791 [openmp] Silence warning when compiling with MSVC targetting x86
This fixes:
```
[3593/7449] Building CXX object projects\openmp\runtime\src\CMakeFiles\omp.dir\kmp_debug.cpp.obj
C:\git\llvm-project\openmp\runtime\src\kmp_os.h(471): warning C4163: '_InlineInterlockedExchange64': not available as an intrinsic function
```
2024-01-25 09:34:19 -05:00
Alexandre Ganea
15fdc7646c Re-land [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)
The reverts 94f960925b and fixes it.
2024-01-23 12:48:38 -05:00
Alexandre Ganea
94f960925b Revert 10f3296dd7 - [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)
It broke the AMDGPU buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/45378
2024-01-23 08:51:12 -05:00