ompt/synchronization/[masked.c | master.c] tests fail due to a wrong
offset being calculated for the possible return addreses. PR #65936
fixes this for Darwin and the same has to be done for Linux.
Updates #69627
- `nothing` directive was effecting the `if` block structure which it
should not. So return an empty statement instead of an error statement
while parsing to avoid this.
From "3.1 Reducing the number of edges" of this [[ https://hal.science/hal-04136674v1/ | paper ]] - Optimization (b)
Task (dependency) nodes have a `successors` list built upon passed dependency.
Given the following code, B will be added to A's successors list building the graph `A` -> `B`
```
// A
# pragma omp task depend(out: x)
{}
// B
# pragma omp task depend(in: x)
{}
```
In the following code, B is currently added twice to A's successor list
```
// A
# pragma omp task depend(out: x, y)
{}
// B
# pragma omp task depend(in: x, y)
{}
```
This patch removes such dupplicates by checking lastly inserted task in `A` successor list.
Authored by: Romain Pereira (rpereira-dev)
Differential Revision: https://reviews.llvm.org/D158544
This commit adds skewed distribution of iterations in
nonmonotonic:dynamic schedule (static steal) for hybrid systems when
thread affinity is assigned. Currently, it distributes the iterations at
60:40 ratio. Consider this loop with dynamic schedule type,
for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware
threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to
performance cores and 12 iterations will be assigned to efficient cores.
Each thread with CORE core will process 5 iterations + extras and with
ATOM core will process 3 iterations.
Differential Revision: https://reviews.llvm.org/D152955
* openmp/README.rst
- Add s390x to those platforms supported
* openmp/libomptarget/plugins-nextgen/CMakeLists.txt
- Add s390x subdirectory
* openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
- Add s390x definitions
* openmp/runtime/CMakeLists.txt
- Add s390x to those platforms supported
* openmp/runtime/cmake/LibompGetArchitecture.cmake
- Define s390x ARCHITECTURE
* openmp/runtime/cmake/LibompMicroTests.cmake
- Add dependencies for System z (aka s390x)
* openmp/runtime/cmake/LibompUtils.cmake
- Add S390X to the mix
* openmp/runtime/cmake/config-ix.cmake
- Add s390x as a supported LIPOMP_ARCH
* openmp/runtime/src/kmp_affinity.h
- Define __NR_sched_[get|set]addinity for s390x
* openmp/runtime/src/kmp_config.h.cmake
- Define CACHE_LINE for s390x
* openmp/runtime/src/kmp_os.h
- Add KMP_ARCH_S390X to support checks
* openmp/runtime/src/kmp_platform.h
- Define KMP_ARCH_S390X
* openmp/runtime/src/kmp_runtime.cpp
- Generate code when KMP_ARCH_S390X is defined
* openmp/runtime/src/kmp_tasking.cpp
- Generate code when KMP_ARCH_S390X is defined
* openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
- Define ITT_ARCH_S390X
* openmp/runtime/src/z_Linux_asm.S
- Instantiate __kmp_invoke_microtask for s390x
* openmp/runtime/src/z_Linux_util.cpp
- Generate code when KMP_ARCH_S390X is defined
* openmp/runtime/test/ompt/callback.h
- Define print_possible_return_addresses for s390x
* openmp/runtime/tools/lib/Platform.pm
- Return s390x as platform and host architecture
* openmp/runtime/tools/lib/Uname.pm
- Set hardware platform value for s390x
struct DEP defined in multiple testcases must correspond to runtime's
struct kmp_depend_info. The former defines flags as int, and the latter
as kmp_uint8_t. This discrepancy goes unnoticed on little-endian
systems, but breaks big-endian ones.
Make flags in struct DEP unsigned char.
Change to use VE_LD_LIBRARY_PATH for VE instead of LD_LIBRARY_PATH. The
VE is connected to the host, and compiled test programs for VE is
invoked on the host and transferred to the VE. If programs are compiled
for the host, we use LD_LIBRARY_PATH. Otherwise, we use
VE_LD_LIBRARY_PATH.
Support OpenMP runtime library on VE. This patch makes OpenMP compilable
for VE architecture. Almost all tests run correctly on VE.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D159401
Previously, the test ran a section with
#pragma omp target thread_limit(4)
and expected it to execute exactly 4 times, even though it would
in practice execute min(cores, 4) times.
Increment a counter and check that it executed 1-4 times.
Differential Revision: https://reviews.llvm.org/D159311
offloading
- This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5]
- The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region.
Differential Revision: https://reviews.llvm.org/D152054
* Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API
* Add printout of leader to hardware thread dump
* Allow OMP_PLACES to restrict fullMask
This change fixes an issue with the OMP_PLACES=resource(#) syntax.
Before this change, specifying the number of resources did NOT change
the default number of threads created by the runtime. e.g.,
OMP_PLACES=cores(2) would still create __kmp_avail_proc number of
threads. After this change, the fullMask and __kmp_avail_proc are
modified if necessary so that the final place list dictates which
resources are available and how thus, how many threads are created by
default.
* Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY
For OMP_PLACES, two new features are added:
1) OMP_PLACES=cores:<attribute> where <attribute> is either
intel_atom, intel_core, or eff# where # is 0 - number of core
efficiencies-1. This syntax also supports the optional (#)
number selection of resources.
2) OMP_PLACES=core_types|core_effs where this setting will create
the number of core_types (or core_effs|core_efficiencies).
For KMP_AFFINITY, the granularity setting is expanded to include two new
keywords: core_type, and core_eff (or core_efficiency). This will set
the granularity to include all cores with a particular core type (or
efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will
create threads which can float across a single core type.
Differential Revision: https://reviews.llvm.org/D154547
Add CHECK_OPENMP_ENV environment variable which will be passed to environment
variables for test (make check-* target). This provides a handy way to
exercise various openmp code with different settings during development.
For example, to change default barrier pattern:
```
$ env CHECK_OPENMP_ENV="KMP_FORKJOIN_BARRIER_PATTERN=hier,hier \
KMP_PLAIN_BARRIER_PATTERN=hier,hier \
KMP_REDUCTION_BARRIER_PATTERN=hier,hier" \
ninja check-openmp
```
Even with this, each test can set appropriate environment variables if needed
as before.
Also, this commit adds missing documention about how to run tests in README.
Patch provided by t-msn
Differential Revision: https://reviews.llvm.org/D122645
The OpenMP specification mentions that omp_test_lock and
omp_test_nest_lock dispatch OMPT callbacks with ompt_mutex_test_lock
and ompt_mutex_test_nest_lock for their kind respectively. Previously,
the values ompt_mutex_lock and ompt_mutex_nest_lock were used. This
could cause issues in application relying on the kind to correctly
determine lock states. This commit changes the kind to the expected
ones.
Also update callback.h and OMPT tests to reflect this change.
Patch prepared by Thyre
Differential Review: https://reviews.llvm.org/D153028
Differential Review: https://reviews.llvm.org/D153031
Differential Review: https://reviews.llvm.org/D153032
omp_all_memory currently has no representation in OMPT.
Adding new dependency flags as suggested by omp-lang issue #3007.
Differential Revision: https://reviews.llvm.org/D111788
This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies.
It is activated through a new environment variable "KMP_TDG_DOT"
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D150962
The default version of OpenMP is updated from 5.0 to 5.1 which means if -fopenmp is specified but -fopenmp-version is not specified with clang, the default version of OpenMP is taken to be 5.1. After modifying the Frontend for that, various LIT tests were updated. This patch contains all such changes. At a high level, these are the patterns of changes observed in LIT tests -
# RUN lines which mentioned `-fopenmp-version=50` need to kept only if the IR for version 5.0 and 5.1 are different. Otherwise only one RUN line with no version info(i.e. default version) needs to be there.
# Test cases of this sort already had the RUN lines with respect to the older default version 5.0 and the version 5.1. Only swapping the version specification flag `-fopenmp-version` from newer version RUN line to older version RUN line is required.
# Diagnostics: Remove the 5.0 version specific RUN lines if there was no difference in the Diagnostics messages with respect to the default 5.1.
# Diagnostics: In case there was any difference in diagnostics messages between 5.0 and 5.1, mention version specific messages in tests.
# If the test contained version specific ifdef's e.g. "#ifdef OMP5" but there were no RUN lines for any other version than 5.X, then bring the code guarded by ifdef's outside and remove the ifdef's.
# Some tests had RUN lines for both 5.0 and 5.1 versions, but it is found that the IR for 5.0 is not different from the 5.1, therefore such RUN lines are redundant. So, such duplicated lines are removed.
# To generate CHECK lines automatically, use the script llvm/utils/update_cc_test_checks.py
Reviewed By: saiislam, ABataev
Differential Revision: https://reviews.llvm.org/D129635
(cherry picked from commit 9dd2999907dc791136a75238a6000f69bf67cf4e)
This patch fixes the issue that, if we have a compile-time serialized parallel
region (such as `if (0)`) with `num_threads`, followed by a regular parallel
region, the regular parallel region will pick up the value set in the serialized
parallel region incorrectly. The reason is, in the front end, if we can prove a
parallel region has to serialized, instead of emitting `__kmpc_fork_call`, the
front end directly emits `__kmpc_serialized_parallel`, body, and `__kmpc_end_serialized_parallel`.
However, this "optimization" doesn't consider the case where `num_threads` is
used such that `__kmpc_push_num_threads` is still emitted. Since we don't reset
the value in `__kmpc_serialized_parallel`, it will affect the next parallel region
followed by it.
Fix#63197.
Reviewed By: tlwilmar
Differential Revision: https://reviews.llvm.org/D152883
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
This patch properly marks the support level for libomp test when testing with
GCC.
Some new OpenMP features were only introduced with GCC 11.
Tests using the target construct are incompatibe with GCC.
Tests pass now with GCC 10, 11, 12
Codegen for some OpenMP directives is different from clang, so some
OMPT tests fail. As we don't expect GCC codegen to change significantly,
we mark the tests as unsupported for GCC.
OMPT Tests pass now with GCC 10, 11, 12
This patch implements the "task record and replay" mechanism. The idea is to be able to store tasks and their dependencies in the runtime so that we do not pay the cost of task creation and dependency resolution for future executions. The objective is to improve fine-grained task performance, both for those from "omp task" and "taskloop".
The entry point of the recording phase is __kmpc_start_record_task, and the end of record is triggered by __kmpc_end_record_task.
Tasks encapsulated between a record start and a record end are saved, meaning that the runtime stores their dependencies and structures, referred to as TDG, in order to replay them in subsequent executions. In these TDG replays, we start the execution by scheduling all root tasks (tasks that do not have input dependencies), and there will be no involvement of a hash table to track the dependencies, yet tasks do not need to be created again.
At the beginning of __kmpc_start_record_task, we must check if a TDG has already been recorded. If yes, the function returns 0 and starts to replay the TDG by calling __kmp_exec_tdg; if not, we start to record, and the function returns 1.
An integer uniquely identifies TDGs. Currently, this identifier needs to be incremented manually in the source code. Still, depending on how this feature would eventually be used in the library, the caller function must do it; also, the caller function needs to implement a mechanism to skip the associated region, according to the return value of __kmpc_start_record_task.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D146642
This reverts commit 63f0fdc262.
Since f1431bbfb1, this environment
variable is always set up by lit itself, so individual test suites
don't need to set it.
Differential Revision: https://reviews.llvm.org/D149356
The working of depend clause with iterator modifier
can be correctly tested by means of execution tests
and not at the LLVM IR level. These tests
are imported/inspired from the SOLLVE tests.
SOLLVE repo: https://github.com/SOLLVE/sollve_vv
Differential Revision: https://reviews.llvm.org/D146706
The indeces of the dependent loops are properly ordered, just start from
1, so need just subtract 1 to get correct loop index.
Differential Revision: https://reviews.llvm.org/D145514
Need to assign the calculated lower bound back to temp variable,
otherwise incorrect value (upper bound instead of lower bound) might be
used.
Differential Revision: https://reviews.llvm.org/D144015
When `N` is 1024, `int result[N][N]` is obviously large stack that Windows cannot support...
Fix#60326.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142684
In Clang, in order to determine the type of `omp_allocator_handle_t`, Clang
checks the type of those predefined allocators. The first one it checks is
`omp_null_allocator`. If the language is C, and the system is 64-bit, what Clang
gets is a `int`, instead of an enum of size 8, given the fact how we define
`omp_allocator_handle_t` in `omp.h`. If the allocator is captured by a region,
let's say a parallel region, the allocator will be privatized. Because Clang deems
`omp_allocator_handle_t` as an `int`, it will first cast the value returned by
the runtime library (for `libomp` it is a `void *`) to `int`, and then in the
outlined function, it casts back to `omp_allocator_handle_t`. This two casts
completely shaves the first 32-bit of the pointer value returned from `libomp`,
and when the private "new" pointer is fed to another runtime function
`__kmpc_allocate()`, it causes segment fault. That is the root cause of PR54082.
I have no idea why `-fno-pic` could hide this bug.
In this patch, we detect `omp_allocator_handle_t` using roughly the same method
as `omp_event_handle_t`, by looking it up into the identifier table.
Fix#54082.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D142297
GCC doesn't support `-fopenmp-version`, causing test failure if the compiler used
for testing is GCC.
GCC's OpenMP 5.2 support is very limited yet. Disable those tests requiring 5.2
feature for GCC as well.
We might want to take a look at all `libomp` tests and mark those tests that
don't support GCC yet.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D142173
The test `openmp/runtime/test/atomic/kmp_atomic_float10_max_min.c` uses a compiler
flag `-mlong-double-80` that might not be supported by all targets. Currently it
requires `x86-registered-target`, but that requirement can be true when LLVM
supports X86 while the actual `libomp` arch is not X86. For example, when LLVM
is built on AArch64 with all targets enabled, `x86-registered-target` can be met.
If `libomp` is built with native target, aka. AArch64, the test will still be enabled,
causing test failure.
This patch only enables the test if the actual target is X86. The actual target
is determined by `LIBOMP_ARCH`.
Fix#53696.
Reviewed By: jlpeyton
Differential Revision: https://reviews.llvm.org/D142172
D91464 introduced verbose tool loading, but the test check only considers Linux.
On macOS, the outputs are totally different, causing the regression afterwards.
This patch simply sets the test to XFAIL on macOS.
Fix#56833.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142045
When a team nested inside a teams construct is allocated, it is
allocated to a size specified by the teams thread_limit. In the case
where any mechanism that might not grant the full thread_limit is in
use, we may get a smaller team. This possibility was not reflected in
the code when using the th_teams_size.nth value stored on the master
thread for the team. This value was never updated even when t_nproc on
the team itself was different. I added a line to update it shortly
before the team is forked.
Added a simple teams test that uses KMP_DYNAMIC_MODE=random to mimic
allocating teams with sizes <= thread_limit. Eventually, this
will segfault without the fix in this commit.
Differential Revision: https://reviews.llvm.org/D139960
I accidentally left out the test for the pinned API introduced by D138933. Adding it back.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D140077
This patch adds a new runtime function `fork_call_if` and uses that
to lower parallel if statements when going through OpenMPIRBuilder.
This fixes an issue where the OpenMPIRBuilder passes all arguments to
fork_call as a struct but this struct is not filled corretly in the
non-if branch by handling the fork inside the runtime.
Differential Revision: https://reviews.llvm.org/D138495
This fixes passing an arbitrarily large number of arguments to
microtasks, fixing the misc_bugs/many-microtask-args.c testcase on
ARM.
Differential Revision: https://reviews.llvm.org/D138704
On ARM, a C fallback version of __kmp_invoke_microtask is used,
which only handles up to a fixed number of arguments - while
many-microtask-args.c tests that the function can handle an
arbitrarily large number of arguments (the testcase produces 17
arguments).
On the CMake level, we can't add ${LIBOMP_ARCH} directly to
OPENMP_TEST_COMPILER_FEATURES in OpenMPTesting.cmake, since
that file is parsed before LIBOMP_ARCH is set. Instead
convert the feature list into a proper CMake list, and append
${LIBOMP_ARCH} into it before serializing it to an Python array.
Reapply: Make sure OPENMP_TEST_COMPILER_FEATURES is defined
properly in all other test subdirectories other than
runtime/test too.
Differential Revision: https://reviews.llvm.org/D138738
This reverts commit 03bf001b6d.
This commit broke a number of OpenMP buildbots, e.g.
https://lab.llvm.org/buildbot#builders/84/builds/31839, where
the build ends up with errors like this:
[0/1] Running OpenMP tests
llvm-lit: /b/1/openmp-clang-x86_64-linux-debian/llvm.src/llvm/utils/lit/lit/TestingConfig.py:140: fatal: unable to parse config file '/b/1/openmp-clang-x86_64-linux-debian/llvm.build/projects/openmp/libomptarget/test/x86_64-pc-linux-gnu/lit.site.cfg', traceback: Traceback (most recent call last):
File "/b/1/openmp-clang-x86_64-linux-debian/llvm.src/llvm/utils/lit/lit/TestingConfig.py", line 129, in load_from_path
exec(compile(data, path, 'exec'), cfg_globals, None)
File "/b/1/openmp-clang-x86_64-linux-debian/llvm.build/projects/openmp/libomptarget/test/x86_64-pc-linux-gnu/lit.site.cfg", line 6
config.test_compiler_features =
^
SyntaxError: invalid syntax
Use the correct data type for pointer sized integers on Windows;
"long" is always 32 bit, even on 64 bit Windows - don't use it
for the kmp_intptr_t type.
Provide the exact correct definition of the kmp_depend_info
struct - avoid the risk of mismatches (if a platform would pack
things slightly differently when things are declared differently).
Zero initialize the whole dep_info struct before filling it in;
if only setting the in/out bits, the rest of the unallocated bits
in the bitfield can have undefined values. Libomp reads the flags
in combined form as an kmp_uint8 by reading the flag field - thus,
the unused bits do need to be zeroed. (Alternatively, the flag field
could be set to zero before setting the individual bits in the
bitfield).
Use kmp_intptr_t instead of long for casting pointers to integers.
Differential Revision: https://reviews.llvm.org/D137748
On ARM, a C fallback version of __kmp_invoke_microtask is used,
which only handles up to a fixed number of arguments - while
many-microtask-args.c tests that the function can handle an
arbitrarily large number of arguments (the testcase produces 17
arguments).
On the CMake level, we can't add ${LIBOMP_ARCH} directly to
OPENMP_TEST_COMPILER_FEATURES in OpenMPTesting.cmake, since
that file is parsed before LIBOMP_ARCH is set. Instead
convert the feature list into a proper CMake list, and append
${LIBOMP_ARCH} into it before serializing it to an Python array.
Differential Revision: https://reviews.llvm.org/D138738
Windows heuristics may decide to want to run some tested processes
as elevated (since it may think some of them are installers - executables
with "dispatch" in the name may hit a heuristic looking for "patch").
Set this environment variable to disable this heuristic and just run
the executable with whatever privileges the caller has.
This fixes a couple tests on such versions of Windows where this
heuristic is active.
Differential Revision: https://reviews.llvm.org/D137772
This fixes warnings like this:
```
openmp/runtime/test/omp_testsuite.h:107:60: warning: format specifies type 'unsigned int' but the argument has type 'DWORD' (aka 'unsigned long') [-Wformat]
fprintf(stderr, "CreateThread() failed: Error #%u.\n", GetLastError());
~~ ^~~~~~~~~~~~~~
%lu
```
Differential Revision: https://reviews.llvm.org/D137747
The hidden helper task is only enabled on Linux (kmp_runtime.cpp
initializes __kmp_enable_hidden_helper to TRUE for linux but to
FALSE for any other OS), and the __kmp_stg_parse_use_hidden_helper
function always makes it disabled on non-Linux OSes too.
Add a lit test feature for hidden helper tasks (only made available
on Linux) and mark two tests as requiring this feature.
Disable hidden helper tasks in the test that doesn't really involve
them, for consistent behaviour across platforms.
Differential Revision: https://reviews.llvm.org/D137749