On ARM, a C fallback version of __kmp_invoke_microtask is used,
which only handles up to a fixed number of arguments - while
many-microtask-args.c tests that the function can handle an
arbitrarily large number of arguments (the testcase produces 17
arguments).
On the CMake level, we can't add ${LIBOMP_ARCH} directly to
OPENMP_TEST_COMPILER_FEATURES in OpenMPTesting.cmake, since
that file is parsed before LIBOMP_ARCH is set. Instead
convert the feature list into a proper CMake list, and append
${LIBOMP_ARCH} into it before serializing it to an Python array.
Reapply: Make sure OPENMP_TEST_COMPILER_FEATURES is defined
properly in all other test subdirectories other than
runtime/test too.
Differential Revision: https://reviews.llvm.org/D138738
This reverts commit 03bf001b6d.
This commit broke a number of OpenMP buildbots, e.g.
https://lab.llvm.org/buildbot#builders/84/builds/31839, where
the build ends up with errors like this:
[0/1] Running OpenMP tests
llvm-lit: /b/1/openmp-clang-x86_64-linux-debian/llvm.src/llvm/utils/lit/lit/TestingConfig.py:140: fatal: unable to parse config file '/b/1/openmp-clang-x86_64-linux-debian/llvm.build/projects/openmp/libomptarget/test/x86_64-pc-linux-gnu/lit.site.cfg', traceback: Traceback (most recent call last):
File "/b/1/openmp-clang-x86_64-linux-debian/llvm.src/llvm/utils/lit/lit/TestingConfig.py", line 129, in load_from_path
exec(compile(data, path, 'exec'), cfg_globals, None)
File "/b/1/openmp-clang-x86_64-linux-debian/llvm.build/projects/openmp/libomptarget/test/x86_64-pc-linux-gnu/lit.site.cfg", line 6
config.test_compiler_features =
^
SyntaxError: invalid syntax
Use the correct data type for pointer sized integers on Windows;
"long" is always 32 bit, even on 64 bit Windows - don't use it
for the kmp_intptr_t type.
Provide the exact correct definition of the kmp_depend_info
struct - avoid the risk of mismatches (if a platform would pack
things slightly differently when things are declared differently).
Zero initialize the whole dep_info struct before filling it in;
if only setting the in/out bits, the rest of the unallocated bits
in the bitfield can have undefined values. Libomp reads the flags
in combined form as an kmp_uint8 by reading the flag field - thus,
the unused bits do need to be zeroed. (Alternatively, the flag field
could be set to zero before setting the individual bits in the
bitfield).
Use kmp_intptr_t instead of long for casting pointers to integers.
Differential Revision: https://reviews.llvm.org/D137748
On ARM, a C fallback version of __kmp_invoke_microtask is used,
which only handles up to a fixed number of arguments - while
many-microtask-args.c tests that the function can handle an
arbitrarily large number of arguments (the testcase produces 17
arguments).
On the CMake level, we can't add ${LIBOMP_ARCH} directly to
OPENMP_TEST_COMPILER_FEATURES in OpenMPTesting.cmake, since
that file is parsed before LIBOMP_ARCH is set. Instead
convert the feature list into a proper CMake list, and append
${LIBOMP_ARCH} into it before serializing it to an Python array.
Differential Revision: https://reviews.llvm.org/D138738
Windows heuristics may decide to want to run some tested processes
as elevated (since it may think some of them are installers - executables
with "dispatch" in the name may hit a heuristic looking for "patch").
Set this environment variable to disable this heuristic and just run
the executable with whatever privileges the caller has.
This fixes a couple tests on such versions of Windows where this
heuristic is active.
Differential Revision: https://reviews.llvm.org/D137772
This fixes warnings like this:
```
openmp/runtime/test/omp_testsuite.h:107:60: warning: format specifies type 'unsigned int' but the argument has type 'DWORD' (aka 'unsigned long') [-Wformat]
fprintf(stderr, "CreateThread() failed: Error #%u.\n", GetLastError());
~~ ^~~~~~~~~~~~~~
%lu
```
Differential Revision: https://reviews.llvm.org/D137747
The hidden helper task is only enabled on Linux (kmp_runtime.cpp
initializes __kmp_enable_hidden_helper to TRUE for linux but to
FALSE for any other OS), and the __kmp_stg_parse_use_hidden_helper
function always makes it disabled on non-Linux OSes too.
Add a lit test feature for hidden helper tasks (only made available
on Linux) and mark two tests as requiring this feature.
Disable hidden helper tasks in the test that doesn't really involve
them, for consistent behaviour across platforms.
Differential Revision: https://reviews.llvm.org/D137749
OpenMP tests that use pthread functions include this header instead.
On Unix systems, this header includes pthread.h, while it provides
minimal implementations of the used pthread functions for Windows.
Differential Revision: https://reviews.llvm.org/D137746
Add a missing <process.h> include for _getpid. Don't typedef the
pid_t type on mingw, as mingw headers already provide a typedef for
it.
Differential Revision: https://reviews.llvm.org/D137745
Fix setting affinity type and topology method when affinity is disabled
and fix places that were not taking into account that affinity can be
explicitly disabled by putting proper KMP_AFFINITY_CAPABLE() check.
Differential Revision: https://reviews.llvm.org/D137176
Add new hidden helper affinity via the environment variable,
KMP_HIDDEN_HELPER_AFFINITY, which allows users to assign thread
affinity to hidden helper threads using the same syntax as
KMP_AFFINITY. OMP_PLACES/OMP_PROC_BIND have no interaction with
KMP_HIDDEN_HELPER_AFFINITY.
Differential Revision: https://reviews.llvm.org/D135113
GCC, glibc, binutils, and LLVM have added support for LoongArch64.
This patch adds support for LLVM OpenMP following D59880 for RISCV64.
Reviewed By: MaskRay, SixWeining
Differential Revision: https://reviews.llvm.org/D132925
Fix the LD_LIBRARY_PATH prepending order to make sure that
config.library_path ends up before any potentially-system directories
(e.g. config.hwloc_library_dir). This makes sure that we are testing
against the just-built openmp libraries rather than the version that is
already installed.
Also rename the function to `prepend_*` to make it clearer what it
actually does.
https://github.com/llvm/llvm-project/issues/56821
Differential Revision: https://reviews.llvm.org/D130825
Added control to reset affinity of primary thread after outermost parallel
region to initial affinity encountered before OpenMP runtime was initialized.
KMP_AFFINITY environment variable reset/noreset modifier introduced.
Default behavior is unchanged.
Differential Revision: https://reviews.llvm.org/D125993
This patch implements omp_get_device_num() in the host and the device.
It uses the already existing getDeviceNum in the device config for the device.
And in the host it uses the omp_get_num_devices().
Two simple tests added
Differential Revision: https://reviews.llvm.org/D128347
Without this patch, arguments to the
`llvm::OpenMPIRBuilder::AtomicOpValue` initializer are reversed.
Reviewed By: ABataev, tianshilei1992
Differential Revision: https://reviews.llvm.org/D126619
OpenMP 5.2, sec. 10.2 "teams Construct", p. 232, L9-12 restricts what
regions can be strictly nested within a `teams` construct. This patch
relaxes Clang's enforcement of this restriction in the case of nested
`atomic` constructs unless `-fno-openmp-extensions` is specified.
Cases like the following then seem to work fine with no additional
implementation changes:
```
#pragma omp target teams map(tofrom:x)
#pragma omp atomic update
x++;
```
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D126323
This change adds support for ompt_callback_dispatch with the new
dispatch chunk type introduced in 5.2. Definitions of the new
ompt_work_loop types were also added in the header file.
Differential Revision: https://reviews.llvm.org/D122107
Before this patch task priorities were ignored, that was a valid implementation
as the task priority is a hint according to OpenMP specification.
Implemented shared list of sorted (high -> low) task deques one per task
priority value. Tasks execution changed to first check if priority tasks ready
for execution exist, and these tasks executed before others;
otherwise usual tasks execution mechanics work.
Differential Revision: https://reviews.llvm.org/D119676
Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI
compiler.
Fixup flag detection and compiler ID detection in CMake. Older CMake's
detect IntelLLVM as Clang.
Fix compiler warnings.
Fixup many of the tests to have non-empty parallel regions as they are
elided by oneAPI compiler.
The __kmp_hidden_helper_threads_num set to N+1 if user requested N threads.
Thus number of worker hidden helper threads corresponds to user request,
main thread of helper team excluded as it does not participate in actual work.
This also fixes divide-by-0 issue in the code.
Fixes#48656
Differential Revision: https://reviews.llvm.org/D119586
Eachempati.
This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D113540
Remove restriction forcing users to specify the KMP_HW_SUBSET value in
topology order. This patch sorts the user KMP_HW_SUBSET value before
trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c
Differential Revision: https://reviews.llvm.org/D112027
__ompt_get_task_info_internal function is adapted to support thread_num
determination during the execution of multiple nested serialized
parallel regions enclosed by a regular parallel region.
Consider the following program that contains parallel region R1 executed
by two threads. Let the worker thread T of region R1 executes serialized
parallel regions R2 that encloses another serialized parallel region R3.
Note that the thread T is the master thread of both R2 and R3 regions.
Assume that __ompt_get_task_info_internal function is called with the
argument "ancestor_level == 1" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside
the team of region R2, whose implicit task is at level 1 inside the
hierarchy of active tasks. Since the thread T is the master thread of
region R2, one should expected that "thread_num" takes a value 0.
After the while loop finishes, the following stands: "lwt != NULL",
"prev_lwt == NULL", "prev_team" represents the team information about
the innermost serialized parallel region R3. This results in executing
the assignment "thread_num = prev_team->t.t_master_tid". Note that
"prev_team->t.t_master_tid" was initialized at the moment of
R2’s creation and represents the "thread_num" of the thread T inside
the region R1 which encloses R2. Since the thread T is the worker thread
of the region R1, "the thread_num" takes value 1, which is a contradiction.
This patch proposes to use "lwt" instead of "prev_lwt" when determining
the "thread_num". If "lwt" exists, the task at the requested level belongs
to the serialized parallel region. Since the serialized parallel region
is executed by one thread only, the "thread_num" takes value 0.
Similarly, assume that __ompt_get_task_info_internal function is called
with the argument "ancestor_level == 2" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside the
team of region R1. Since the thread is the worker inside the region R1,
one should expected that "thread_num" takes value 1. After the loop finishes,
the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents
the team information about the innermost serialized parallel region R3.
This leads to execution of the assignment "thread_num = 0", which causes
a contradiction.
Ignoring the "prev_lwt" leads to executing the assignment
"thread_num = prev_team->t.t_master_tid" instead. From the previous explanation,
it is obvious that "thread_num" takes value 1.
Note that the "prev_lwt" variable is marked as unnecessary and thus removed.
This patch introduces the test case which represents the OpenMP program
described earlier in the summary.
Differential Revision: https://reviews.llvm.org/D110699
__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.
The old implementation of __kmpc_fork_call didn’t clear the enter_frame of
active task.
Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.
The first program: Let R1 be the serialized parallel region that encloses
another serialized parallel region R2. Assume that thread that executes R2 is
going to create a new serialized parallel region R3 by executing
__kmpc_fork_call. This thread is responsible to set enter_frame of R2's
implicit task. Note that the information about R2's implicit task is present
inside master_th->th.th_current_task at this moment, while lwt represents the
information about R1's implicit task. The old implementation uses lwt and
resets enter_frame of R1's implicit task instead of R2's implicit task. The
new implementation uses master_th->th.th_current_task instead.
The second program: Consider the OpenMP program that contains parallel region
R1 which encloses an explicit task T. Assume that thread should create another
parallel region R2 during the execution of the T. The __kmpc_fork_call is
responsible to create R2 and set enter frame of T whose information is present
inside the master_th->th.th_current_task.
Old implementation tries to set the frame of
parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit
task of the R1, instead of T.
Differential Revision: https://reviews.llvm.org/D112419
As discussed in D108488, testing for invariants of omp_get_wtime would be more
reliable than testing for duration of sleep, as return from sleep might be
delayed due to system load.
Alternatively/in addition, we could compare the time measured by omp_get_wtime
to time measured with C++11 chrono (for portability?).
Differential Revision: https://reviews.llvm.org/D112458
The CHECK: line in the test had no effect, because the test does not
pipe to FileCheck. Since the test only checks for a single value,
encode the result in the return value of the test.
Where possible change to declare the variable before the loop.
Where not possible, specifically request -std=c99 (could be limited to
specific compilers like icc).
KMP_API_NAME_GOMP_PARALLEL_SECTIONS function was missing the task frame support.
This patch introduced a fix responsible to set properly the exit_frame of
the innermost implicit task that corresponds to the parallel section construct,
as well as the enter_frame of the task that encloses the mentioned implicit task.
This patch also introduced a simple test case sections_serialized.c that contains
serialized parallel section construct and validates whether the mentioned
task frames are set correctly.
Differential Revision: https://reviews.llvm.org/D112205
This patch allows to simplify compiler implementation on "taskwait nowait"
construct. The "taskwait nowait" is semantically equivalent to the empty task.
Instead of creating an empty routine as a task entry, compiler can just send
NULL pointer to the runtime. Then the runtime will make all the work with
dependences and return because of the absent task routine.
Differential Revision: https://reviews.llvm.org/D112015
__ompt_get_task_info_internal is now able to determine the right value of the
“thread_num” argument during the execution of an explicit task.
During the execution of a while loop that iterates over the ancestor tasks
hierarchy, the “prev_team” variable was always set to “team” variable at the
beginning of each loop iteration.
Assume that the program contains a parallel region which encloses an explicit
task executed by the worker thread of the region. Also assume that the tool
inquires the “thread_num” of a worker thread for the implicit task that
corresponds to the region (task at “ancestor_level == 1”) and expects to
receive the value of “thread_num > 0”.
After the loop finishes, both “team” and “prev_team” variables are equal and
point to the team information of the parallel region.
The “thread_num” is set to “prev_team->t.t_master_tid”, that is equal to
“team->t.t_master_tid”. In this case, “team->t.t_master_tid” is 0, since
the master thread of the region is the initial master thread of the program.
This leads to a contradiction.
To prevent this, “prev_team” variable is set to “team” variable only at the
time when the loop that has already encountered the implicit task (“taskdata”
variable contains the information about an implicit task) continues iterating
over the implicit task’s ancestors, if any.
After the mentioned loop finishes, the “prev_team” variable might be equal to
NULL. This means that the task at requested “ancestor_level” belongs to the
innermost parallel region, so the “thread_num” will be determined by calling
the “__kmp_get_tid”.
To prove that this patch works, the test case “explicit_task_thread_num.c” is
provided.
It contains the example of the program explained earlier in the summary.
Differential Revision: https://reviews.llvm.org/D110473
Older intel compilers miss the privatization of nested loop variables for
doacross loops. Declaring the variable in the loop makes the test more
robust.
This patch implements teams affinity on the host.
The default is spread. A user can specify either spread, close, or
primary using KMP_TEAMS_PROC_BIND environment variable. Unlike
OMP_PROC_BIND, KMP_TEAMS_PROC_BIND is only a single value and is not a
list of values. The values follow the same semantics under the OpenMP
specification for parallel regions except T is the number of teams in
a league instead of the number of threads in a parallel region.
Differential Revision: https://reviews.llvm.org/D109921
Added functions those implement "atomic compare".
Though clang does not use library interfaces to implement OpenMP atomics,
the functions added for consistency.
Also added missed functions for 80-bit floating min/max atomics.
Differential Revision: https://reviews.llvm.org/D110109