Commit Graph

411 Commits

Author SHA1 Message Date
Xing Xue
b3792ae42a [OpenMP][AIX] Fix test config for AIX (#88272)
This patch fixes the test config so that it works for
`tasking/omp50_taskdep_depobj.c` which uses different flags to test with
compiler's `omp.h`.
* set test environment variable `OBJECT_MODE` to `64` if it is set
explicitly to `64` in the AIX environment. `OBJECT_MODE` is default to
`32` and is recognized by AIX compilers and toolchain. In this way, we
don't need to set `-m64` for all compiler flags for 64-bit mode
* add option `-Wl,-bmaxdata` to 32-bit `test_openmp_flags` used by
`tasking/omp50_taskdep_depobj.c`
2024-04-10 16:06:31 -04:00
Jonathan Peyton
eeaaf33fc2 [OpenMP] Unsupport absolute KMP_HW_SUBSET test for s390x (#87555) 2024-04-04 13:54:40 -05:00
Jonathan Peyton
2ff3850ea1 [OpenMP] Add absolute KMP_HW_SUBSET functionality (#85326)
Users can put a : in front of KMP_HW_SUBSET to indicate that the
specified subset is an "absolute" subset. Currently, when a user puts
KMP_HW_SUBSET=1t. This gets translated to KMP_HW_SUBSET="*s,*c,1t",
where * means "use all of". If a user wants only one thread as the
entire topology they can now do KMP_HW_SUBSET=:1t.

Along with the absolute syntax is a fix for newer machines and making
them easier to use with only the 3-level topology syntax. When a user
puts KMP_HW_SUBSET=1s,4c,2t on a machine which actually has 4 layers,
(say 1s,2m,3c,2t as the entire machine) the user gets an unexpected "too
many resources asked" message because KMP_HW_SUBSET currently translates
the "4c" value to mean 4 cores per module. To help users out, the
runtime can assume that these newer layers, module in this case, should
be ignored if they are not specified, but the topology should always
take into account the sockets, cores, and threads layers.
2024-04-03 11:43:23 -05:00
Jonathan Peyton
4ea24946e3 [OpenMP] Fix nested parallel with tasking (#87309)
When a nested parallel region ends, the runtime calls __kmp_join_call().
During this call, the primary thread of the nested parallel region will
reset its tid (retval of omp_get_thread_num()) to what it was in the
outer parallel region. A data race occurs with the current code when
another worker thread from the nested inner parallel region tries to
steal tasks from the primary thread's task deque. The worker thread
reads the tid value directly from the primary thread's data structure
and may read the wrong value.

This change just uses the calculated victim_tid from execute_tasks()
directly in the steal_task() routine rather than reading tid from the
data structure.

Fixes: #87307
2024-04-02 15:56:50 -05:00
nihui
c5bbdb6494 [OpenMP] arm64_32 port for Apple WatchOS (#87246)
detect `aarch64_32` with compiler defined macro `__ARM64_ARCH_8_32__`
reuse ARM `__kmp_unnamed_critical_addr` and add `KMP_PREFIX_UNDERSCORE`
macro like AARCH64
reuse AARCH64 `__kmp_invoke_microtask`


build log for watchos armv7k + arm64_32 and watchos simulator x86_64 +
arm64

https://github.com/nihui/action-protobuf/actions/runs/8520684611/job/23337305030
2024-04-02 11:38:32 -04:00
Jonathan Peyton
038e66fe59 [OpenMP] Have hidden helper team allocate new OS threads only (#87119)
The hidden helper team pre-allocates the gtid space [1,
num_hidden_helpers] (inclusive). If regular host threads are allocated,
then put back in the thread pool, then the hidden helper team is
initialized, the hidden helper team tries to allocate the threads from
the thread pool with gtids higher than [1, num_hidden_helpers]. Instead,
have the hidden helper team fork OS threads so the correct gtid range
used for hidden helper threads.

Fixes: #87117
2024-03-29 17:26:00 -05:00
Vadim Paretsky
7db4046322 [OpenMP] add loop collapse tests (#86243)
This PR adds loop collapse tests ported from MSVC.

---------

Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>
2024-03-26 16:41:31 -07:00
Xing Xue
d394f3a162 [OpenMP][AIX] Affinity implementation for AIX (#84984)
This patch implements `affinity` for AIX, which is quite different from
platforms such as Linux.
- Setting CPU affinity through masks and related functions are not
supported. System call `bindprocessor()` is used to bind a thread to one
CPU per call.
- There are no system routines to get the affinity info of a thread. The
implementation of `get_system_affinity()` for AIX gets the mask of all
available CPUs, to be used as the full mask only.
- Topology is not available from the file system. It is obtained through
system SRAD (Scheduler Resource Allocation Domain).

This patch has run through the libomp LIT tests successfully with
`affinity` enabled.
2024-03-22 15:25:08 -04:00
Brad Smith
c7de4a39d5 [OpenMP] Enable the affinity tests on FreeBSD, NetBSD and DragonFly (#85500)
FreeBSD, NetBSD and DragonFly also have affinity support. So enable the tests there as well.
2024-03-19 13:29:19 -04:00
Vadim Paretsky
110141b378 [OpenMP] fix endianness dependent definitions in OMP headers for MSVC (#84540)
MSVC does not define __BYTE_ORDER__ making the check for BigEndian
erroneously evaluate to true and breaking the struct definitions in MSVC
compiled builds correspondingly. The fix adds an additional check for
whether __BYTE_ORDER__ is defined by the compiler to fix these.

---------

Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>
2024-03-09 10:47:31 -08:00
vadikp-intel
fcd2d48325 [OpenMP] runtime support for efficient partitioning of collapsed triangular loops (#83939)
This PR adds OMP runtime support for more efficient partitioning of
certain types of collapsed loops that can be used by compilers that
support loop collapsing (i.e. MSVC) to achieve more optimal thread load
balancing.

In particular, this PR addresses double nested upper and lower isosceles
triangular loops of the following types

1. lower triangular 'less_than'
   for (int i=0; i<N; i++)
     for (int j=0; j<i; j++)
2. lower triangular 'less_than_equal'
   for (int i=0; i<N; j++)
     for (int j=0; j<=i; j++)
3. upper triangular
   for (int i=0; i<N; i++)
     for (int j=i; j<N; j++)

Includes tests for the three supported loop types.

---------

Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>
2024-03-07 16:28:03 -08:00
Jonathan Peyton
0e0bee26e7 [OpenMP] Fix distributed barrier hang for OMP_WAIT_POLICY=passive (#83058)
The resume thread logic inside __kmp_free_team() is faulty. Only
checking b_go for sleep status doesn't wake up distributed barrier.
Change to generic check for th_sleep_loc and calling
__kmp_null_resume_wrapper().

Fixes: #80664
2024-02-27 14:15:48 -06:00
Xing Xue
a4dcfbcb78 [OpenMP][AIX] XFAIL capacity tests on AIX in 32-bit (#83014)
This patch XFAILs two capacity tests on AIX in 32-bit because running
out resource with `4 x omp_get_max_threads()` in 32-bit mode.
2024-02-26 13:13:05 -05:00
Martin Storsjö
4b9c089381 [OpenMP] [test] Skip the -mlong-double-80 test on MSVC ABI (#81115)
Within the MSVC ABI, long doubles are the same as regular 64 bit
doubles. This test case, which is compiled with -mlong-double-80, cannot
work when libomp has been compiled without that flag, as
-mlong-double-80 changes the calling convention for the tested
functions.
2024-02-19 11:33:28 +02:00
Xing Xue
7a9b0e4acb [OpenMP][test]Flip bit-fields in 'struct flags' for big-endian in test cases (#79895)
This patch flips bit-fields in `struct flags` for big-endian in test
cases to be consistent with the definition of the structure in libomp
`kmp.h`.
2024-02-07 15:24:52 -05:00
Xing Xue
2edce427a8 [openmp][AIX]Initial changes for porting to AIX (#76841)
This PR contains initial changes for building and testing libomp on AIX.
More changes will follow.
- `KMP_OS_AIX` is defined for the AIX platform
- `KMP_ARCH_PPC` is defined for 32-bit PPC
- `KMP_ARCH_PPC_XCOFF` and `KMP_ARCH_PPC64_XCOFF` are for 32- and 64-bit
XCOFF object formats respectively
- Assembly file `z_AIX_asm.S` is used for AIX specific assembly code and
will be added in a separate PR
- The target library is disabled because AIX does not have the device
support
- OMPT is temporarily disabled
2024-01-08 08:33:00 -05:00
Carlos Eduardo Seo
dcd7c8b7c9 [OpenMP][AArch64] Workaround for ompt/synchronization tests (#75848)
ompt/synchronization/[masked.c | master.c] tests fail due to a wrong
offset being calculated for the possible return addreses. PR #65936
fixes this for Darwin and the same has to be done for Linux.

Updates #69627
2023-12-19 19:26:23 +01:00
Sandeep Kosuri
ecc080c07d [OpenMP] return empty stmt for nothing (#74042)
- `nothing` directive was effecting the `if` block structure which it
should not. So return an empty statement instead of an error statement
while parsing to avoid this.
2023-12-03 13:33:38 +05:30
Alex
d6f00654fb [OpenMP][Runtime][test] Fix ompt task testcase fail randomly (#72337)
Fixed #72231
2023-11-28 14:22:57 +01:00
Lixi Zhou
a3c0f705db [NFC] fix failed ompt tests on M1 device (#65696)
Fix the 2 failed ompt tests on M1 device found on #63194.

```
libomp :: ompt/synchronization/masked.c
libomp :: ompt/synchronization/master.c
```

For the details of this fix, please check the origin discussion in
https://github.com/llvm/llvm-project/issues/63194#issuecomment-1710494689

Thanks @jprotze for the fix.
2023-11-24 23:40:14 +01:00
Joachim Jenke
f5e50b21da [OpenMP] Optimized trivial multiple edges from task dependency graph
From "3.1 Reducing the number of edges" of this [[ https://hal.science/hal-04136674v1/ | paper ]] - Optimization (b)

Task (dependency) nodes have a `successors` list built upon passed dependency.
Given the following code, B will be added to A's successors list building the graph `A` -> `B`
```
// A
 # pragma omp task depend(out: x)
{}

// B
 # pragma omp task depend(in: x)
{}
```

In the following code, B is currently added twice to A's successor list
```
// A
 # pragma omp task depend(out: x, y)
{}

// B
 # pragma omp task depend(in: x, y)
{}
```

This patch removes such dupplicates by checking lastly inserted task in `A` successor list.

Authored by: Romain Pereira (rpereira-dev)
Differential Revision: https://reviews.llvm.org/D158544
2023-11-21 18:36:12 +01:00
Jonathan Peyton
5cc603cb22 [OpenMP] Add skewed iteration distribution on hybrid systems (#69946)
This commit adds skewed distribution of iterations in
nonmonotonic:dynamic schedule (static steal) for hybrid systems when
thread affinity is assigned. Currently, it distributes the iterations at
60:40 ratio. Consider this loop with dynamic schedule type,
for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware
threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to
performance cores and 12 iterations will be assigned to efficient cores.
Each thread with CORE core will process 5 iterations + extras and with
ATOM core will process 3 iterations.

Differential Revision: https://reviews.llvm.org/D152955
2023-11-08 10:19:37 -06:00
Neale Ferguson
1111ef0257 Add openmp support to System z (#66081)
* openmp/README.rst
  - Add s390x to those platforms supported

* openmp/libomptarget/plugins-nextgen/CMakeLists.txt
  - Add s390x subdirectory

* openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
  - Add s390x definitions

* openmp/runtime/CMakeLists.txt
  - Add s390x to those platforms supported

* openmp/runtime/cmake/LibompGetArchitecture.cmake
  - Define s390x ARCHITECTURE

* openmp/runtime/cmake/LibompMicroTests.cmake
  - Add dependencies for System z (aka s390x)

* openmp/runtime/cmake/LibompUtils.cmake
  - Add S390X to the mix

* openmp/runtime/cmake/config-ix.cmake
  - Add s390x as a supported LIPOMP_ARCH

* openmp/runtime/src/kmp_affinity.h
  - Define __NR_sched_[get|set]addinity for s390x

* openmp/runtime/src/kmp_config.h.cmake
  - Define CACHE_LINE for s390x

* openmp/runtime/src/kmp_os.h
  - Add KMP_ARCH_S390X to support checks

* openmp/runtime/src/kmp_platform.h
  - Define KMP_ARCH_S390X

* openmp/runtime/src/kmp_runtime.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/kmp_tasking.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
  - Define ITT_ARCH_S390X

* openmp/runtime/src/z_Linux_asm.S
  - Instantiate __kmp_invoke_microtask for s390x

* openmp/runtime/src/z_Linux_util.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/test/ompt/callback.h
  - Define print_possible_return_addresses for s390x

* openmp/runtime/tools/lib/Platform.pm
  - Return s390x as platform and host architecture

* openmp/runtime/tools/lib/Uname.pm
  - Set hardware platform value for s390x
2023-11-03 12:42:55 +01:00
Ilya Leoshkevich
77c2b623ca [OpenMP][Tests] Sync struct DEP with the runtime (#69982)
struct DEP defined in multiple testcases must correspond to runtime's
struct kmp_depend_info. The former defines flags as int, and the latter
as kmp_uint8_t. This discrepancy goes unnoticed on little-endian
systems, but breaks big-endian ones.

Make flags in struct DEP unsigned char.
2023-10-24 19:40:08 +02:00
Kazushi Marukawa
e8679b93da [OpenMP][test][VE] Limit the number of AFFINITY_MAX_CPUS for VE (#65872)
Limit the number of AFFINITY_MAX_CPUS for VE because VE's
sched_getaffinity doesn't work correctly with large sized mask buffer.
2023-09-12 23:45:56 +09:00
Kazushi Marukawa
f8efa65ca5 [OpenMP][test][VE] Change to use VE_LD_LIBRARY_PATH for VE (#65869)
Change to use VE_LD_LIBRARY_PATH for VE instead of LD_LIBRARY_PATH. The
VE is connected to the host, and compiled test programs for VE is
invoked on the host and transferred to the VE. If programs are compiled
for the host, we use LD_LIBRARY_PATH. Otherwise, we use
VE_LD_LIBRARY_PATH.
2023-09-10 12:07:16 +09:00
Kazushi (Jam) Marukawa
18b6724355 [OpenMP][VE] Support OpenMP runtime on VE
Support OpenMP runtime library on VE.  This patch makes OpenMP compilable
for VE architecture.  Almost all tests run correctly on VE.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D159401
2023-09-10 08:29:53 +09:00
Martin Storsjö
c2019c416c [OpenMP] [test] Fix target_thread_limit.cpp to not assume 4 or more cores
Previously, the test ran a section with

    #pragma omp target thread_limit(4)

and expected it to execute exactly 4 times, even though it would
in practice execute min(cores, 4) times.

Increment a counter and check that it executed 1-4 times.

Differential Revision: https://reviews.llvm.org/D159311
2023-09-01 21:16:58 +03:00
Sandeep Kosuri
08bbff4aad [OpenMP] Codegen support for thread_limit on target directive for host
offloading

- This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5]
- The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region.

Differential Revision: https://reviews.llvm.org/D152054
2023-08-26 22:18:49 -05:00
Jonathan Peyton
b34c7d8c8e [OpenMP] Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY
* Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API

* Add printout of leader to hardware thread dump

* Allow OMP_PLACES to restrict fullMask

This change fixes an issue with the OMP_PLACES=resource(#) syntax.
Before this change, specifying the number of resources did NOT change
the default number of threads created by the runtime. e.g.,
OMP_PLACES=cores(2) would still create __kmp_avail_proc number of
threads. After this change, the fullMask and __kmp_avail_proc are
modified if necessary so that the final place list dictates which
resources are available and how thus, how many threads are created by
default.

* Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY

For OMP_PLACES, two new features are added:
  1) OMP_PLACES=cores:<attribute> where <attribute> is either
     intel_atom, intel_core, or eff# where # is 0 - number of core
     efficiencies-1. This syntax also supports the optional (#)
     number selection of resources.
  2) OMP_PLACES=core_types|core_effs where this setting will create
     the number of core_types (or core_effs|core_efficiencies).

For KMP_AFFINITY, the granularity setting is expanded to include two new
keywords: core_type, and core_eff (or core_efficiency). This will set
the granularity to include all cores with a particular core type (or
efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will
create threads which can float across a single core type.

Differential Revision: https://reviews.llvm.org/D154547
2023-07-31 13:55:32 -05:00
Joachim Jenke
81bc7cf609 [OpenMP][NFC] lit: Allow setting default environment variables for test
Add CHECK_OPENMP_ENV environment variable which will be passed to environment
variables for test (make check-* target). This provides a handy way to
exercise various openmp code with different settings during development.

For example, to change default barrier pattern:
```
$ env CHECK_OPENMP_ENV="KMP_FORKJOIN_BARRIER_PATTERN=hier,hier \
KMP_PLAIN_BARRIER_PATTERN=hier,hier \
KMP_REDUCTION_BARRIER_PATTERN=hier,hier" \
ninja check-openmp
```

Even with this, each test can set appropriate environment variables if needed
as before.

Also, this commit adds missing documention about how to run tests in README.

Patch provided by t-msn

Differential Revision: https://reviews.llvm.org/D122645
2023-07-11 15:00:40 +02:00
Joachim Jenke
124d36e093 [OpenMP][OMPT] Change OMPT kind for OpenMP test lock functions
The OpenMP specification mentions that omp_test_lock and
omp_test_nest_lock dispatch OMPT callbacks with ompt_mutex_test_lock
and ompt_mutex_test_nest_lock for their kind respectively. Previously,
the values ompt_mutex_lock and ompt_mutex_nest_lock were used. This
could cause issues in application relying on the kind to correctly
determine lock states. This commit changes the kind to the expected
ones.

Also update callback.h and OMPT tests to reflect this change.

Patch prepared by Thyre

Differential Review: https://reviews.llvm.org/D153028
Differential Review: https://reviews.llvm.org/D153031
Differential Review: https://reviews.llvm.org/D153032
2023-07-07 14:49:47 +02:00
Joachim Jenke
6ef16f2618 [OpenMP] Add OMPT support for omp_all_memory task dependence
omp_all_memory currently has no representation in OMPT.

Adding new dependency flags as suggested by omp-lang issue #3007.

Differential Revision: https://reviews.llvm.org/D111788
2023-07-07 13:44:53 +02:00
Adrian Munera
028cf8c016 [OpenMP] Implement printing TDGs to dot files
This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies.

It is activated through a new environment variable "KMP_TDG_DOT"

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D150962
2023-06-19 08:27:38 -05:00
Animesh Kumar
0c6f2f629c [OpenMP] Update the default version of OpenMP to 5.1
The default version of OpenMP is updated from 5.0 to 5.1 which means if -fopenmp is specified but -fopenmp-version is not specified with clang, the default version of OpenMP is taken to be 5.1.  After modifying the Frontend for that, various LIT tests were updated. This patch contains all such changes. At a high level, these are the patterns of changes observed in LIT tests -

  # RUN lines which mentioned `-fopenmp-version=50` need to kept only if the IR for version 5.0 and 5.1 are different. Otherwise only one RUN line with no version info(i.e. default version) needs to be there.

  # Test cases of this sort already had the RUN lines with respect to the older default version 5.0 and the version 5.1. Only swapping the version specification flag `-fopenmp-version` from newer version RUN line to older version RUN line is required.

  # Diagnostics: Remove the 5.0 version specific RUN lines if there was no difference in the Diagnostics messages with respect to the default 5.1.

  # Diagnostics: In case there was any difference in diagnostics messages between 5.0 and 5.1, mention version specific messages in tests.

  # If the test contained version specific ifdef's e.g. "#ifdef OMP5" but there were no RUN lines for any other version than 5.X, then bring the code guarded by ifdef's outside and remove the ifdef's.

  # Some tests had RUN lines for both 5.0 and 5.1 versions, but it is found that the IR for 5.0 is not different from the 5.1, therefore such RUN lines are redundant. So, such duplicated lines are removed.

  # To generate CHECK lines automatically, use the script llvm/utils/update_cc_test_checks.py

Reviewed By: saiislam, ABataev

Differential Revision: https://reviews.llvm.org/D129635

(cherry picked from commit 9dd2999907dc791136a75238a6000f69bf67cf4e)
2023-06-15 12:41:09 +05:30
Shilei Tian
375862b481 [OpenMP] Fix the issue in openmp/runtime/test/parallel/bug63197.c
If the system has 32 threads, then the test will fail because of partial match.
2023-06-14 12:23:37 -04:00
Shilei Tian
b14dc71c5e [OpenMP] Use 0 instead of false in the test bug63197.c 2023-06-14 11:51:51 -04:00
Shilei Tian
85592d3d4d [OpenMP] Fix the issue where num_threads still takes effect incorrectly
This patch fixes the issue that, if we have a compile-time serialized parallel
region (such as `if (0)`) with `num_threads`, followed by a regular parallel
region, the regular parallel region will pick up the value set in the serialized
parallel region incorrectly. The reason is, in the front end, if we can prove a
parallel region has to serialized, instead of emitting `__kmpc_fork_call`, the
front end directly emits `__kmpc_serialized_parallel`, body, and `__kmpc_end_serialized_parallel`.
However, this "optimization" doesn't consider the case where `num_threads` is
used such that `__kmpc_push_num_threads` is still emitted. Since we don't reset
the value in `__kmpc_serialized_parallel`, it will affect the next parallel region
followed by it.

Fix #63197.

Reviewed By: tlwilmar

Differential Revision: https://reviews.llvm.org/D152883
2023-06-14 11:46:12 -04:00
Tobias Hieta
f98ee40f4b [NFC][Py Reformat] Reformat python files in the rest of the dirs
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: jhenderson, #libc, Mordante, sivachandra

Differential Revision: https://reviews.llvm.org/D150784
2023-05-25 11:17:05 +02:00
Jonathan Peyton
d67c91b5e7 [OpenMP] Insert missing variable update inside loop
While loop within task priority code did not have necessary update of
variable which could lead to hangs if two threads collided when both
attempted to execute the compare_and_exchange.

Fixes: https://github.com/llvm/llvm-project/issues/62867
Differential Revision: https://reviews.llvm.org/D151138
2023-05-23 09:19:04 -05:00
Joachim Jenke
fe7f620ed6 [OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC
This patch properly marks the support level for libomp test when testing with
GCC.

Some new OpenMP features were only introduced with GCC 11.
Tests using the target construct are incompatibe with GCC.

Tests pass now with GCC 10, 11, 12
2023-05-23 10:33:09 +02:00
Joachim Jenke
39a959eac7 [OpenMP][Tests][NFC] Mark unsupported OMPT tests for GCC
Codegen for some OpenMP directives is different from clang, so some
OMPT tests fail. As we don't expect GCC codegen to change significantly,
we mark the tests as unsupported for GCC.

OMPT Tests pass now with GCC 10, 11, 12
2023-05-23 10:33:09 +02:00
Chenle Yu
36d4e4c9b5 [OpenMP] Implement task record and replay mechanism
This patch implements the "task record and replay" mechanism.  The idea is to be able to store tasks and their dependencies in the runtime so that we do not pay the cost of task creation and dependency resolution for future executions. The objective is to improve fine-grained task performance, both for those from "omp task" and "taskloop".

The entry point of the recording phase is __kmpc_start_record_task, and the end of record is triggered by __kmpc_end_record_task.

Tasks encapsulated between a record start and a record end are saved, meaning that the runtime stores their dependencies and structures, referred to as TDG, in order to replay them in subsequent executions. In these TDG replays, we start the execution by scheduling all root tasks (tasks that do not have input dependencies), and there will be no involvement of a hash table to track the dependencies, yet tasks do not need to be created again.

At the beginning of __kmpc_start_record_task, we must check if a TDG has already been recorded. If yes, the function returns 0 and starts to replay the TDG by calling __kmp_exec_tdg; if not, we start to record, and the function returns 1.

An integer uniquely identifies TDGs. Currently, this identifier needs to be incremented manually in the source code. Still, depending on how this feature would eventually be used in the library, the caller function must do it; also, the caller function needs to implement a mechanism to skip the associated region, according to the return value of __kmpc_start_record_task.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D146642
2023-05-15 10:00:55 -05:00
Martin Storsjö
1bd3fba8f7 Revert "[openmp] [test] Set __COMPAT_LAYER=RunAsInvoker when running tests on Windows"
This reverts commit 63f0fdc262.

Since f1431bbfb1, this environment
variable is always set up by lit itself, so individual test suites
don't need to set it.

Differential Revision: https://reviews.llvm.org/D149356
2023-05-03 09:30:54 +03:00
Animesh Kumar
578b2a36b6 [OpenMP] Add LIT test on task depend clause
The working of depend clause with iterator modifier
can be correctly tested by means of execution tests
and not at the LLVM IR level. These tests
are imported/inspired from the SOLLVE tests.

SOLLVE repo: https://github.com/SOLLVE/sollve_vv

Differential Revision: https://reviews.llvm.org/D146706
2023-04-28 15:53:41 +05:30
Alexey Bataev
0cfe5ae0b6 [OPENMP]Fix PR59947: "Partially-triangular" loop collapse crashes.
The indeces of the dependent loops are properly ordered, just start from
1, so need just subtract 1 to get correct loop index.

Differential Revision: https://reviews.llvm.org/D145514
2023-03-08 13:06:53 -08:00
Alexey Bataev
ddde06906b [OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest.
Need to assign the calculated lower bound back to temp variable,
otherwise incorrect value (upper bound instead of lower bound) might be
used.

Differential Revision: https://reviews.llvm.org/D144015
2023-02-14 10:39:04 -08:00
Shilei Tian
544f8c7f39 [OpenMP] Fix stack overflow for test bug54082.c
When `N` is 1024, `int result[N][N]` is obviously large stack that Windows cannot support...

Fix #60326.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142684
2023-01-26 23:45:11 -05:00
Shilei Tian
5ba8ecb6cc [Clang][OpenMP] Find the type omp_allocator_handle_t from identifier table
In Clang, in order to determine the type of `omp_allocator_handle_t`, Clang
checks the type of those predefined allocators. The first one it checks is
`omp_null_allocator`. If the language is C, and the system is 64-bit, what Clang
gets is a `int`, instead of an enum of size 8, given the fact how we define
`omp_allocator_handle_t` in `omp.h`.  If the allocator is captured by a region,
let's say a parallel region, the allocator will be privatized. Because Clang deems
`omp_allocator_handle_t` as an `int`, it will first cast the value returned by
the runtime library (for `libomp` it is a `void *`) to `int`, and then in the
outlined function, it casts back to `omp_allocator_handle_t`. This two casts
completely shaves the first 32-bit of the pointer value returned from `libomp`,
and when the private "new" pointer is fed to another runtime function
`__kmpc_allocate()`, it causes segment fault. That is the root cause of PR54082.
I have no idea why `-fno-pic` could hide this bug.

In this patch, we detect `omp_allocator_handle_t` using roughly the same method
as `omp_event_handle_t`, by looking it up into the identifier table.

Fix #54082.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D142297
2023-01-24 22:49:05 -05:00
Shilei Tian
7e89420116 [OpenMP] Disable tests that are not supported by GCC if it is used for testing
GCC doesn't support `-fopenmp-version`, causing test failure if the compiler used
for testing is GCC.

GCC's OpenMP 5.2 support is very limited yet. Disable those tests requiring 5.2
feature for GCC as well.

We might want to take a look at all `libomp` tests and mark those tests that
don't support GCC yet.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D142173
2023-01-24 17:00:15 -05:00