clang-p2996

Author	SHA1	Message	Date
Xing Xue	b3792ae42a	[OpenMP][AIX] Fix test config for AIX (#88272 ) This patch fixes the test config so that it works for `tasking/omp50_taskdep_depobj.c` which uses different flags to test with compiler's `omp.h`. * set test environment variable `OBJECT_MODE` to `64` if it is set explicitly to `64` in the AIX environment. `OBJECT_MODE` is default to `32` and is recognized by AIX compilers and toolchain. In this way, we don't need to set `-m64` for all compiler flags for 64-bit mode * add option `-Wl,-bmaxdata` to 32-bit `test_openmp_flags` used by `tasking/omp50_taskdep_depobj.c`	2024-04-10 16:06:31 -04:00
Jonathan Peyton	eeaaf33fc2	[OpenMP] Unsupport absolute KMP_HW_SUBSET test for s390x (#87555 )	2024-04-04 13:54:40 -05:00
Jonathan Peyton	2ff3850ea1	[OpenMP] Add absolute KMP_HW_SUBSET functionality (#85326 ) Users can put a : in front of KMP_HW_SUBSET to indicate that the specified subset is an "absolute" subset. Currently, when a user puts KMP_HW_SUBSET=1t. This gets translated to KMP_HW_SUBSET="s,c,1t", where * means "use all of". If a user wants only one thread as the entire topology they can now do KMP_HW_SUBSET=:1t. Along with the absolute syntax is a fix for newer machines and making them easier to use with only the 3-level topology syntax. When a user puts KMP_HW_SUBSET=1s,4c,2t on a machine which actually has 4 layers, (say 1s,2m,3c,2t as the entire machine) the user gets an unexpected "too many resources asked" message because KMP_HW_SUBSET currently translates the "4c" value to mean 4 cores per module. To help users out, the runtime can assume that these newer layers, module in this case, should be ignored if they are not specified, but the topology should always take into account the sockets, cores, and threads layers.	2024-04-03 11:43:23 -05:00
Jonathan Peyton	4ea24946e3	[OpenMP] Fix nested parallel with tasking (#87309 ) When a nested parallel region ends, the runtime calls __kmp_join_call(). During this call, the primary thread of the nested parallel region will reset its tid (retval of omp_get_thread_num()) to what it was in the outer parallel region. A data race occurs with the current code when another worker thread from the nested inner parallel region tries to steal tasks from the primary thread's task deque. The worker thread reads the tid value directly from the primary thread's data structure and may read the wrong value. This change just uses the calculated victim_tid from execute_tasks() directly in the steal_task() routine rather than reading tid from the data structure. Fixes: #87307	2024-04-02 15:56:50 -05:00
nihui	c5bbdb6494	[OpenMP] arm64_32 port for Apple WatchOS (#87246 ) detect `aarch64_32` with compiler defined macro `__ARM64_ARCH_8_32__` reuse ARM `__kmp_unnamed_critical_addr` and add `KMP_PREFIX_UNDERSCORE` macro like AARCH64 reuse AARCH64 `__kmp_invoke_microtask` build log for watchos armv7k + arm64_32 and watchos simulator x86_64 + arm64 https://github.com/nihui/action-protobuf/actions/runs/8520684611/job/23337305030	2024-04-02 11:38:32 -04:00
Jonathan Peyton	038e66fe59	[OpenMP] Have hidden helper team allocate new OS threads only (#87119 ) The hidden helper team pre-allocates the gtid space [1, num_hidden_helpers] (inclusive). If regular host threads are allocated, then put back in the thread pool, then the hidden helper team is initialized, the hidden helper team tries to allocate the threads from the thread pool with gtids higher than [1, num_hidden_helpers]. Instead, have the hidden helper team fork OS threads so the correct gtid range used for hidden helper threads. Fixes: #87117	2024-03-29 17:26:00 -05:00
Vadim Paretsky	7db4046322	[OpenMP] add loop collapse tests (#86243 ) This PR adds loop collapse tests ported from MSVC. --------- Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>	2024-03-26 16:41:31 -07:00
Xing Xue	d394f3a162	[OpenMP][AIX] Affinity implementation for AIX (#84984 ) This patch implements `affinity` for AIX, which is quite different from platforms such as Linux. - Setting CPU affinity through masks and related functions are not supported. System call `bindprocessor()` is used to bind a thread to one CPU per call. - There are no system routines to get the affinity info of a thread. The implementation of `get_system_affinity()` for AIX gets the mask of all available CPUs, to be used as the full mask only. - Topology is not available from the file system. It is obtained through system SRAD (Scheduler Resource Allocation Domain). This patch has run through the libomp LIT tests successfully with `affinity` enabled.	2024-03-22 15:25:08 -04:00
Brad Smith	c7de4a39d5	[OpenMP] Enable the affinity tests on FreeBSD, NetBSD and DragonFly (#85500 ) FreeBSD, NetBSD and DragonFly also have affinity support. So enable the tests there as well.	2024-03-19 13:29:19 -04:00
Vadim Paretsky	110141b378	[OpenMP] fix endianness dependent definitions in OMP headers for MSVC (#84540 ) MSVC does not define __BYTE_ORDER__ making the check for BigEndian erroneously evaluate to true and breaking the struct definitions in MSVC compiled builds correspondingly. The fix adds an additional check for whether __BYTE_ORDER__ is defined by the compiler to fix these. --------- Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>	2024-03-09 10:47:31 -08:00
vadikp-intel	fcd2d48325	[OpenMP] runtime support for efficient partitioning of collapsed triangular loops (#83939 ) This PR adds OMP runtime support for more efficient partitioning of certain types of collapsed loops that can be used by compilers that support loop collapsing (i.e. MSVC) to achieve more optimal thread load balancing. In particular, this PR addresses double nested upper and lower isosceles triangular loops of the following types 1. lower triangular 'less_than' for (int i=0; i<N; i++) for (int j=0; j<i; j++) 2. lower triangular 'less_than_equal' for (int i=0; i<N; j++) for (int j=0; j<=i; j++) 3. upper triangular for (int i=0; i<N; i++) for (int j=i; j<N; j++) Includes tests for the three supported loop types. --------- Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com>	2024-03-07 16:28:03 -08:00
Jonathan Peyton	0e0bee26e7	[OpenMP] Fix distributed barrier hang for OMP_WAIT_POLICY=passive (#83058 ) The resume thread logic inside __kmp_free_team() is faulty. Only checking b_go for sleep status doesn't wake up distributed barrier. Change to generic check for th_sleep_loc and calling __kmp_null_resume_wrapper(). Fixes: #80664	2024-02-27 14:15:48 -06:00
Xing Xue	a4dcfbcb78	[OpenMP][AIX] XFAIL capacity tests on AIX in 32-bit (#83014 ) This patch XFAILs two capacity tests on AIX in 32-bit because running out resource with `4 x omp_get_max_threads()` in 32-bit mode.	2024-02-26 13:13:05 -05:00
Martin Storsjö	4b9c089381	[OpenMP] [test] Skip the -mlong-double-80 test on MSVC ABI (#81115 ) Within the MSVC ABI, long doubles are the same as regular 64 bit doubles. This test case, which is compiled with -mlong-double-80, cannot work when libomp has been compiled without that flag, as -mlong-double-80 changes the calling convention for the tested functions.	2024-02-19 11:33:28 +02:00
Xing Xue	7a9b0e4acb	[OpenMP][test]Flip bit-fields in 'struct flags' for big-endian in test cases (#79895 ) This patch flips bit-fields in `struct flags` for big-endian in test cases to be consistent with the definition of the structure in libomp `kmp.h`.	2024-02-07 15:24:52 -05:00
Xing Xue	2edce427a8	[openmp][AIX]Initial changes for porting to AIX (#76841 ) This PR contains initial changes for building and testing libomp on AIX. More changes will follow. - `KMP_OS_AIX` is defined for the AIX platform - `KMP_ARCH_PPC` is defined for 32-bit PPC - `KMP_ARCH_PPC_XCOFF` and `KMP_ARCH_PPC64_XCOFF` are for 32- and 64-bit XCOFF object formats respectively - Assembly file `z_AIX_asm.S` is used for AIX specific assembly code and will be added in a separate PR - The target library is disabled because AIX does not have the device support - OMPT is temporarily disabled	2024-01-08 08:33:00 -05:00
Carlos Eduardo Seo	dcd7c8b7c9	[OpenMP][AArch64] Workaround for ompt/synchronization tests (#75848 ) ompt/synchronization/[masked.c \| master.c] tests fail due to a wrong offset being calculated for the possible return addreses. PR #65936 fixes this for Darwin and the same has to be done for Linux. Updates #69627	2023-12-19 19:26:23 +01:00
Sandeep Kosuri	ecc080c07d	[OpenMP] return empty stmt for `nothing` (#74042 ) - `nothing` directive was effecting the `if` block structure which it should not. So return an empty statement instead of an error statement while parsing to avoid this.	2023-12-03 13:33:38 +05:30
Alex	d6f00654fb	[OpenMP][Runtime][test] Fix ompt task testcase fail randomly (#72337 ) Fixed #72231	2023-11-28 14:22:57 +01:00
Lixi Zhou	a3c0f705db	[NFC] fix failed ompt tests on M1 device (#65696 ) Fix the 2 failed ompt tests on M1 device found on #63194. ``` libomp :: ompt/synchronization/masked.c libomp :: ompt/synchronization/master.c ``` For the details of this fix, please check the origin discussion in https://github.com/llvm/llvm-project/issues/63194#issuecomment-1710494689 Thanks @jprotze for the fix.	2023-11-24 23:40:14 +01:00
Joachim Jenke	f5e50b21da	[OpenMP] Optimized trivial multiple edges from task dependency graph From "3.1 Reducing the number of edges" of this [[ https://hal.science/hal-04136674v1/ \| paper ]] - Optimization (b) Task (dependency) nodes have a `successors` list built upon passed dependency. Given the following code, B will be added to A's successors list building the graph `A` -> `B` ``` // A # pragma omp task depend(out: x) {} // B # pragma omp task depend(in: x) {} ``` In the following code, B is currently added twice to A's successor list ``` // A # pragma omp task depend(out: x, y) {} // B # pragma omp task depend(in: x, y) {} ``` This patch removes such dupplicates by checking lastly inserted task in `A` successor list. Authored by: Romain Pereira (rpereira-dev) Differential Revision: https://reviews.llvm.org/D158544	2023-11-21 18:36:12 +01:00
Jonathan Peyton	5cc603cb22	[OpenMP] Add skewed iteration distribution on hybrid systems (#69946 ) This commit adds skewed distribution of iterations in nonmonotonic:dynamic schedule (static steal) for hybrid systems when thread affinity is assigned. Currently, it distributes the iterations at 60:40 ratio. Consider this loop with dynamic schedule type, for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to performance cores and 12 iterations will be assigned to efficient cores. Each thread with CORE core will process 5 iterations + extras and with ATOM core will process 3 iterations. Differential Revision: https://reviews.llvm.org/D152955	2023-11-08 10:19:37 -06:00
Neale Ferguson	1111ef0257	Add openmp support to System z (#66081 ) * openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get\|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x	2023-11-03 12:42:55 +01:00
Ilya Leoshkevich	77c2b623ca	[OpenMP][Tests] Sync struct DEP with the runtime (#69982 ) struct DEP defined in multiple testcases must correspond to runtime's struct kmp_depend_info. The former defines flags as int, and the latter as kmp_uint8_t. This discrepancy goes unnoticed on little-endian systems, but breaks big-endian ones. Make flags in struct DEP unsigned char.	2023-10-24 19:40:08 +02:00
Kazushi Marukawa	e8679b93da	[OpenMP][test][VE] Limit the number of AFFINITY_MAX_CPUS for VE (#65872 ) Limit the number of AFFINITY_MAX_CPUS for VE because VE's sched_getaffinity doesn't work correctly with large sized mask buffer.	2023-09-12 23:45:56 +09:00
Kazushi Marukawa	f8efa65ca5	[OpenMP][test][VE] Change to use VE_LD_LIBRARY_PATH for VE (#65869 ) Change to use VE_LD_LIBRARY_PATH for VE instead of LD_LIBRARY_PATH. The VE is connected to the host, and compiled test programs for VE is invoked on the host and transferred to the VE. If programs are compiled for the host, we use LD_LIBRARY_PATH. Otherwise, we use VE_LD_LIBRARY_PATH.	2023-09-10 12:07:16 +09:00
Kazushi (Jam) Marukawa	18b6724355	[OpenMP][VE] Support OpenMP runtime on VE Support OpenMP runtime library on VE. This patch makes OpenMP compilable for VE architecture. Almost all tests run correctly on VE. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D159401	2023-09-10 08:29:53 +09:00
Martin Storsjö	c2019c416c	[OpenMP] [test] Fix target_thread_limit.cpp to not assume 4 or more cores Previously, the test ran a section with #pragma omp target thread_limit(4) and expected it to execute exactly 4 times, even though it would in practice execute min(cores, 4) times. Increment a counter and check that it executed 1-4 times. Differential Revision: https://reviews.llvm.org/D159311	2023-09-01 21:16:58 +03:00
Sandeep Kosuri	08bbff4aad	[OpenMP] Codegen support for thread_limit on target directive for host offloading - This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5] - The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region. Differential Revision: https://reviews.llvm.org/D152054	2023-08-26 22:18:49 -05:00
Jonathan Peyton	b34c7d8c8e	[OpenMP] Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY * Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API * Add printout of leader to hardware thread dump * Allow OMP_PLACES to restrict fullMask This change fixes an issue with the OMP_PLACES=resource(#) syntax. Before this change, specifying the number of resources did NOT change the default number of threads created by the runtime. e.g., OMP_PLACES=cores(2) would still create __kmp_avail_proc number of threads. After this change, the fullMask and __kmp_avail_proc are modified if necessary so that the final place list dictates which resources are available and how thus, how many threads are created by default. * Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY For OMP_PLACES, two new features are added: 1) OMP_PLACES=cores:<attribute> where <attribute> is either intel_atom, intel_core, or eff# where # is 0 - number of core efficiencies-1. This syntax also supports the optional (#) number selection of resources. 2) OMP_PLACES=core_types\|core_effs where this setting will create the number of core_types (or core_effs\|core_efficiencies). For KMP_AFFINITY, the granularity setting is expanded to include two new keywords: core_type, and core_eff (or core_efficiency). This will set the granularity to include all cores with a particular core type (or efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will create threads which can float across a single core type. Differential Revision: https://reviews.llvm.org/D154547	2023-07-31 13:55:32 -05:00
Joachim Jenke	81bc7cf609	[OpenMP][NFC] lit: Allow setting default environment variables for test Add CHECK_OPENMP_ENV environment variable which will be passed to environment variables for test (make check-* target). This provides a handy way to exercise various openmp code with different settings during development. For example, to change default barrier pattern: ``` $ env CHECK_OPENMP_ENV="KMP_FORKJOIN_BARRIER_PATTERN=hier,hier \ KMP_PLAIN_BARRIER_PATTERN=hier,hier \ KMP_REDUCTION_BARRIER_PATTERN=hier,hier" \ ninja check-openmp ``` Even with this, each test can set appropriate environment variables if needed as before. Also, this commit adds missing documention about how to run tests in README. Patch provided by t-msn Differential Revision: https://reviews.llvm.org/D122645	2023-07-11 15:00:40 +02:00
Joachim Jenke	124d36e093	[OpenMP][OMPT] Change OMPT kind for OpenMP test lock functions The OpenMP specification mentions that omp_test_lock and omp_test_nest_lock dispatch OMPT callbacks with ompt_mutex_test_lock and ompt_mutex_test_nest_lock for their kind respectively. Previously, the values ompt_mutex_lock and ompt_mutex_nest_lock were used. This could cause issues in application relying on the kind to correctly determine lock states. This commit changes the kind to the expected ones. Also update callback.h and OMPT tests to reflect this change. Patch prepared by Thyre Differential Review: https://reviews.llvm.org/D153028 Differential Review: https://reviews.llvm.org/D153031 Differential Review: https://reviews.llvm.org/D153032	2023-07-07 14:49:47 +02:00
Joachim Jenke	6ef16f2618	[OpenMP] Add OMPT support for omp_all_memory task dependence omp_all_memory currently has no representation in OMPT. Adding new dependency flags as suggested by omp-lang issue #3007. Differential Revision: https://reviews.llvm.org/D111788	2023-07-07 13:44:53 +02:00
Adrian Munera	028cf8c016	[OpenMP] Implement printing TDGs to dot files This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies. It is activated through a new environment variable "KMP_TDG_DOT" Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D150962	2023-06-19 08:27:38 -05:00
Animesh Kumar	0c6f2f629c	[OpenMP] Update the default version of OpenMP to 5.1 The default version of OpenMP is updated from 5.0 to 5.1 which means if -fopenmp is specified but -fopenmp-version is not specified with clang, the default version of OpenMP is taken to be 5.1. After modifying the Frontend for that, various LIT tests were updated. This patch contains all such changes. At a high level, these are the patterns of changes observed in LIT tests - # RUN lines which mentioned `-fopenmp-version=50` need to kept only if the IR for version 5.0 and 5.1 are different. Otherwise only one RUN line with no version info(i.e. default version) needs to be there. # Test cases of this sort already had the RUN lines with respect to the older default version 5.0 and the version 5.1. Only swapping the version specification flag `-fopenmp-version` from newer version RUN line to older version RUN line is required. # Diagnostics: Remove the 5.0 version specific RUN lines if there was no difference in the Diagnostics messages with respect to the default 5.1. # Diagnostics: In case there was any difference in diagnostics messages between 5.0 and 5.1, mention version specific messages in tests. # If the test contained version specific ifdef's e.g. "#ifdef OMP5" but there were no RUN lines for any other version than 5.X, then bring the code guarded by ifdef's outside and remove the ifdef's. # Some tests had RUN lines for both 5.0 and 5.1 versions, but it is found that the IR for 5.0 is not different from the 5.1, therefore such RUN lines are redundant. So, such duplicated lines are removed. # To generate CHECK lines automatically, use the script llvm/utils/update_cc_test_checks.py Reviewed By: saiislam, ABataev Differential Revision: https://reviews.llvm.org/D129635 (cherry picked from commit 9dd2999907dc791136a75238a6000f69bf67cf4e)	2023-06-15 12:41:09 +05:30
Shilei Tian	375862b481	[OpenMP] Fix the issue in openmp/runtime/test/parallel/bug63197.c If the system has 32 threads, then the test will fail because of partial match.	2023-06-14 12:23:37 -04:00
Shilei Tian	b14dc71c5e	[OpenMP] Use 0 instead of false in the test bug63197.c	2023-06-14 11:51:51 -04:00
Shilei Tian	85592d3d4d	[OpenMP] Fix the issue where `num_threads` still takes effect incorrectly This patch fixes the issue that, if we have a compile-time serialized parallel region (such as `if (0)`) with `num_threads`, followed by a regular parallel region, the regular parallel region will pick up the value set in the serialized parallel region incorrectly. The reason is, in the front end, if we can prove a parallel region has to serialized, instead of emitting `__kmpc_fork_call`, the front end directly emits `__kmpc_serialized_parallel`, body, and `__kmpc_end_serialized_parallel`. However, this "optimization" doesn't consider the case where `num_threads` is used such that `__kmpc_push_num_threads` is still emitted. Since we don't reset the value in `__kmpc_serialized_parallel`, it will affect the next parallel region followed by it. Fix #63197. Reviewed By: tlwilmar Differential Revision: https://reviews.llvm.org/D152883	2023-06-14 11:46:12 -04:00
Tobias Hieta	f98ee40f4b	[NFC][Py Reformat] Reformat python files in the rest of the dirs This is an ongoing series of commits that are reformatting our Python code. This catches the last of the python files to reformat. Since they where so few I bunched them together. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: jhenderson, #libc, Mordante, sivachandra Differential Revision: https://reviews.llvm.org/D150784	2023-05-25 11:17:05 +02:00
Jonathan Peyton	d67c91b5e7	[OpenMP] Insert missing variable update inside loop While loop within task priority code did not have necessary update of variable which could lead to hangs if two threads collided when both attempted to execute the compare_and_exchange. Fixes: https://github.com/llvm/llvm-project/issues/62867 Differential Revision: https://reviews.llvm.org/D151138	2023-05-23 09:19:04 -05:00
Joachim Jenke	fe7f620ed6	[OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC This patch properly marks the support level for libomp test when testing with GCC. Some new OpenMP features were only introduced with GCC 11. Tests using the target construct are incompatibe with GCC. Tests pass now with GCC 10, 11, 12	2023-05-23 10:33:09 +02:00
Joachim Jenke	39a959eac7	[OpenMP][Tests][NFC] Mark unsupported OMPT tests for GCC Codegen for some OpenMP directives is different from clang, so some OMPT tests fail. As we don't expect GCC codegen to change significantly, we mark the tests as unsupported for GCC. OMPT Tests pass now with GCC 10, 11, 12	2023-05-23 10:33:09 +02:00
Chenle Yu	36d4e4c9b5	[OpenMP] Implement task record and replay mechanism This patch implements the "task record and replay" mechanism. The idea is to be able to store tasks and their dependencies in the runtime so that we do not pay the cost of task creation and dependency resolution for future executions. The objective is to improve fine-grained task performance, both for those from "omp task" and "taskloop". The entry point of the recording phase is __kmpc_start_record_task, and the end of record is triggered by __kmpc_end_record_task. Tasks encapsulated between a record start and a record end are saved, meaning that the runtime stores their dependencies and structures, referred to as TDG, in order to replay them in subsequent executions. In these TDG replays, we start the execution by scheduling all root tasks (tasks that do not have input dependencies), and there will be no involvement of a hash table to track the dependencies, yet tasks do not need to be created again. At the beginning of __kmpc_start_record_task, we must check if a TDG has already been recorded. If yes, the function returns 0 and starts to replay the TDG by calling __kmp_exec_tdg; if not, we start to record, and the function returns 1. An integer uniquely identifies TDGs. Currently, this identifier needs to be incremented manually in the source code. Still, depending on how this feature would eventually be used in the library, the caller function must do it; also, the caller function needs to implement a mechanism to skip the associated region, according to the return value of __kmpc_start_record_task. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D146642	2023-05-15 10:00:55 -05:00
Martin Storsjö	1bd3fba8f7	Revert "[openmp] [test] Set __COMPAT_LAYER=RunAsInvoker when running tests on Windows" This reverts commit `63f0fdc262`. Since `f1431bbfb1`, this environment variable is always set up by lit itself, so individual test suites don't need to set it. Differential Revision: https://reviews.llvm.org/D149356	2023-05-03 09:30:54 +03:00
Animesh Kumar	578b2a36b6	[OpenMP] Add LIT test on task depend clause The working of depend clause with iterator modifier can be correctly tested by means of execution tests and not at the LLVM IR level. These tests are imported/inspired from the SOLLVE tests. SOLLVE repo: https://github.com/SOLLVE/sollve_vv Differential Revision: https://reviews.llvm.org/D146706	2023-04-28 15:53:41 +05:30
Alexey Bataev	0cfe5ae0b6	[OPENMP]Fix PR59947: "Partially-triangular" loop collapse crashes. The indeces of the dependent loops are properly ordered, just start from 1, so need just subtract 1 to get correct loop index. Differential Revision: https://reviews.llvm.org/D145514	2023-03-08 13:06:53 -08:00
Alexey Bataev	ddde06906b	[OpenMP]Fix PR55970: Miscompile of collapse(3) with non-rectangular loop nest. Need to assign the calculated lower bound back to temp variable, otherwise incorrect value (upper bound instead of lower bound) might be used. Differential Revision: https://reviews.llvm.org/D144015	2023-02-14 10:39:04 -08:00
Shilei Tian	544f8c7f39	[OpenMP] Fix stack overflow for test bug54082.c When `N` is 1024, `int result[N][N]` is obviously large stack that Windows cannot support... Fix #60326. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142684	2023-01-26 23:45:11 -05:00
Shilei Tian	5ba8ecb6cc	[Clang][OpenMP] Find the type `omp_allocator_handle_t` from identifier table In Clang, in order to determine the type of `omp_allocator_handle_t`, Clang checks the type of those predefined allocators. The first one it checks is `omp_null_allocator`. If the language is C, and the system is 64-bit, what Clang gets is a `int`, instead of an enum of size 8, given the fact how we define `omp_allocator_handle_t` in `omp.h`. If the allocator is captured by a region, let's say a parallel region, the allocator will be privatized. Because Clang deems `omp_allocator_handle_t` as an `int`, it will first cast the value returned by the runtime library (for `libomp` it is a `void *`) to `int`, and then in the outlined function, it casts back to `omp_allocator_handle_t`. This two casts completely shaves the first 32-bit of the pointer value returned from `libomp`, and when the private "new" pointer is fed to another runtime function `__kmpc_allocate()`, it causes segment fault. That is the root cause of PR54082. I have no idea why `-fno-pic` could hide this bug. In this patch, we detect `omp_allocator_handle_t` using roughly the same method as `omp_event_handle_t`, by looking it up into the identifier table. Fix #54082. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D142297	2023-01-24 22:49:05 -05:00
Shilei Tian	7e89420116	[OpenMP] Disable tests that are not supported by GCC if it is used for testing GCC doesn't support `-fopenmp-version`, causing test failure if the compiler used for testing is GCC. GCC's OpenMP 5.2 support is very limited yet. Disable those tests requiring 5.2 feature for GCC as well. We might want to take a look at all `libomp` tests and mark those tests that don't support GCC yet. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D142173	2023-01-24 17:00:15 -05:00

1 2 3 4 5 ...

411 Commits