clang-p2996

Author	SHA1	Message	Date
Andrey Churbanov	5bf494e73d	Fixed x2APIC discovery for 256-processor architectures. Mask for value read from ebx register returned by CPUID expanded to 0xFFFF. Differential Revision: https://reviews.llvm.org/D23203 llvm-svn: 277825	2016-08-05 15:59:11 +00:00
Paul Osmialowski	ecbe2ea002	Make balanced affinity work on AArch64. This patch enables balanced affinity on machines that do not have hardware threads and have cores clustered into packages. In facts, balacing algorithm could be generalized for any arrangement with at least two levels of hierarchy (depth > 1). Differential Revision: https://reviews.llvm.org/D22365 llvm-svn: 277212	2016-07-29 20:55:03 +00:00
Andrey Churbanov	cb28d6e3a0	D22136: Memory leaks fixed by adding missed __kmp_free() calls llvm-svn: 274850	2016-07-08 14:40:20 +00:00
Jonathan Peyton	fd7cc42fed	Improvements to process affinity mask setting A couple improvements: 1) Add ability to limit fullMask size when KMP_HW_SUBSET limits resources. 2) Make KMP_HW_SUBSET work for affinity_none, and only limit fullMask in this case. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21528 llvm-svn: 273278	2016-06-21 15:54:38 +00:00
Jonathan Peyton	bf35771bcc	Change hwloc discovery algorithm to print topology only for accessible resources Change hwloc discovery algorithm to print topology for only accessible resources, and report uniformity correspondingly, similar to what other topology discovery algorithms do. Fixes minor inconsistency in total topology reported and resources used for threads binding in case hwloc used. Patch by Andrey Churbanov. Differential Revision: http://reviews.llvm.org/D21389 llvm-svn: 272952	2016-06-16 20:31:19 +00:00
Jonathan Peyton	72a8498e08	Fixed missing memory cleanup in __kmp_affinity_create_hwloc_map() Cleanup: fixed missing memory cleanup in couple of corner cases. Fixes possible memory leak in some corner cases Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21355 llvm-svn: 272946	2016-06-16 20:14:54 +00:00
Jonathan Peyton	b9d28fbeb3	Deprecate KMP_PLACE_THREADS and rename as KMP_HW_SUBSET Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion about its purpose and function among users. KMP_HW_SUBSET is an environment variable which allows users to easily pick a subset of the hardware topology to use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core. Patch by Andrey Churbanov Differential Revision: http://reviews.llvm.org/D21340 llvm-svn: 272937	2016-06-16 18:53:48 +00:00
Jonathan Peyton	c5304aa3c4	Affinity mask processing improvements Remove static specifier from var fullMask and remove kmp_get_fullMask() routine. When iterating through procs in a mask, always check if proc is in fullMask (this check was missing in a few places). Patch by Brian Bliss. Differential Revision: http://reviews.llvm.org/D21300 llvm-svn: 272589	2016-06-13 21:28:03 +00:00
Jonathan Peyton	202a24dd9b	Hwloc refactoring patch These changes remove the hwloc_topology_ignore_type function which doesn't exist in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc has the cache levels stripped out and then assumes the final stripped topology follows the typical three-level topology: packages -> cores -> HW threads. But the code is doing unclean manipulations to determine at what level those resources are located and also assumes too much about what hwloc is detecting (there could be intermediate levels in between socket and core for instance). This new way of extracting the topology doesn't strip out any hardware objects that hwloc detects. It does not assume the three level topology, and instead searches for the relevant three levels within the topology for each bit of information using hwloc interface functions. i.e., the three level topology subset that our affinity code is interested in is extracted from the hwloc topology tree directly. For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the number of cores under a socket reliably without worrying if there are unexpected objects between the socket object and core object in the hwloc topology structure. Also, now that all topology information is kept, there are also possibilities of using the caches/numa nodes to determine more sophisticated affinity settings in the future. There is also some cleanup code added for the destruction of the __kmp_hwloc_topology object. Differential Revision: http://reviews.llvm.org/D21195 llvm-svn: 272565	2016-06-13 17:30:08 +00:00
Jonathan Peyton	8407f5b3bd	Remove architecture dependent Hwloc DEBUG section This debug sections's functionality can be replicated using the environment variable KMP_TOPOLOGY_METHOD with different values and KMP_AFFINITY=verbose llvm-svn: 267472	2016-04-25 21:11:26 +00:00
Jonathan Peyton	1d5487c5d0	Fix buffer problem with printing long Hwloc affinity mask This change has the hwloc_bitmap_list_snprintf() function use the entire buffer to print the mask. There is no need to shorten the buffer length by 7. It only needs to be shortened by one byte. llvm-svn: 267470	2016-04-25 21:08:31 +00:00
Jonathan Peyton	3076fa4c35	New API for restoring current thread's affinity to init affinity of application This new API, int kmp_set_thread_affinity_mask_initial(), is available for use by other parallel runtime libraries inside a possibly OpenMP-registered thread. This entry point restores the current thread's affinity mask to the affinity mask of the application when it first began. If -1 is returned it can be assumed that either the thread hasn't called affinity initialization or that the thread isn't registered with the OpenMP library. If 0 is returned then, then the call was successful. Any return value greater than zero indicates an error occurred when setting affinity. Differential Revision: http://reviews.llvm.org/D15867 llvm-svn: 257489	2016-01-12 17:21:55 +00:00
Jonathan Peyton	01dcf36bd5	Adding Hwloc library option for affinity mechanism These changes allow libhwloc to be used as the topology discovery/affinity mechanism for libomp. It is supported on Unices. The code additions: * Canonicalize KMP_CPU_* interface macros so bitmask operations are implementation independent and work with both hwloc bitmaps and libomp bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and the like. These are all in kmp.h and appropriately placed. * Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc interface to create a libomp address2os object which the rest of libomp knows how to handle already. * To build, use -DLIBOMP_USE_HWLOC=on and -DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake can't find the library or hwloc.h, then it will tell you and exit. Differential Revision: http://reviews.llvm.org/D13991 llvm-svn: 254320	2015-11-30 20:02:59 +00:00
Jonathan Peyton	7dee82e729	Improvements to machine_hierarchy code for re-sizing These changes include: 1) Machine hierarchy now uses the base_num_threads field to indicate the maximum number of threads the current hierarchy can handle without a resize. 2) In __kmp_get_hierarchy, we need to get depth after any potential resize is done. 3) Cleanup of hierarchy resize code to support 1 above. Differential Revision: http://reviews.llvm.org/D14455 llvm-svn: 252475	2015-11-09 16:24:53 +00:00
Jonathan Peyton	6778c73243	Fix OMP_PLACES negation operator parsing (!place) Just moved the *scan++ line up before the recursive call. Otherwise, infinite recursion occurs and leads to a segmentation fault. llvm-svn: 250729	2015-10-19 19:43:01 +00:00
Jonathan Peyton	dd4aa9b6b5	Added sockets to the syntax of KMP_PLACE_THREADS environment variable. Added (optional) sockets to the syntax of the KMP_PLACE_THREADS environment variable. Some limitations: * The number of sockets and then optional offset should be specified first (before other parameters). * The letter designation is mandatory for sockets and then for other parameters. * If number of cores is specified first, then the number of sockets is defaulted to all sockets on the machine; also, the old syntax is partially supported if sockets are skipped. * If number of threads per core is specified first, then the number of sockets and cores per socket are defaulted to all sockets and all cores per socket respectively. * The number of cores per socket cannot be specified before sockets or after threads per core. * The number of threads per core can be specified before or after core-offset (old syntax required it to be before core-offset); * Parameters delimiter can be: empty, comma, lower-case x; * Spaces are allowed around numbers, around letters, around delimiter. Approximate shorthand specification: KMP_PLACE_THREADS="[num_sockets(S\|s)[[delim]offset(O\|o)][delim]][num_cores_per_socket(C\|c)[[delim]offset(O\|o)][delim]][num_threads_per_core(T\|t)]" Differential Revision: http://reviews.llvm.org/D13175 llvm-svn: 249708	2015-10-08 17:55:54 +00:00
Jonathan Peyton	7edeef1bbf	Fix memory corruption in Windows debug library This patch adjusts the buffer size when reducing the buffer used for printing. This solves the memory corruption in Windows debug library, and potential memory corruption in other builds. llvm-svn: 248588	2015-09-25 17:23:17 +00:00
Jonathan Peyton	df4d3dd659	Fix depth field bug and resize() function in hierarchical barrier This is a follow up to the hierarchy cleanup patch. Added some clarifying comments to hierarchy_info. Fixed a bug with the depth field not being updated cleanly during a resize. Fixed resize to first check capacity as determined by maxLevels before actually doing the full resize. Differential Revision: http://reviews.llvm.org/D12562 llvm-svn: 247333	2015-09-10 20:34:32 +00:00
Jonathan Peyton	1707836b68	Cleanup of affinity hierarchy code. Some of this is improvement to code suggested by Hal Finkel. Four changes here: 1.Cleanup of hierarchy code to handle all hierarchy cases whether affinity is available or not 2.Separated this and other classes and common functions out to a header file 3.Added a destructor-like fini function for the hierarchy (and call in __kmp_cleanup) 4.Remove some redundant code that is hopefully no longer needed Differential Revision: http://reviews.llvm.org/D12449 llvm-svn: 247326	2015-09-10 19:22:07 +00:00
Jonathan Peyton	62f3840c9b	Fix machine topology pruning. This patch fixes a bug when eliminating layers in the machine topology (namely cores, and threads). Before this patch, if a user specifies using only one thread per socket, then affinity is not set properly due to bad topology pruning. Differential Revision: http://reviews.llvm.org/D11158 llvm-svn: 245966	2015-08-25 18:44:41 +00:00
Jonathan Peyton	7f09a98ab1	Allow machine hierarchy expansion This fix allows the machine hierarchy to be expanded in case it needs to handle more threads. It adds a resize function to accomplish this. Differential Revision: http://reviews.llvm.org/D9900 llvm-svn: 240292	2015-06-22 15:59:18 +00:00
Jonathan Peyton	7be075335d	Re-enable Visual Studio Builds. I tried to compile with Visual Studio using CMake and found these two sections of code causing problems for Visual Studio. The first one removes the use of variable length arrays by instead using KMP_ALLOCA(). The second part eliminates a redundant cpuid assembly call by using the already existing __kmp_x86_cpuid() call instead. llvm-svn: 240290	2015-06-22 15:53:50 +00:00
Jonathan Peyton	663382950d	Apply name change to src/* files. These changes are mostly in comments, but there are a few that aren't. Change libiomp5 => libomp everywhere. One internal function name is changed in kmp_gsupport.c, and in kmp_i18n.c, the static char[] variable 'name' is changed to "libomp". llvm-svn: 238712	2015-06-01 02:37:28 +00:00
Jonathan Peyton	caf09fe022	Fix comment about balanced affinity A while back, Hal mentioned fixing a comment concerning balanced affinity. http://lists.cs.uiuc.edu/pipermail/openmp-dev/2014-December/000358.html I forgot about fixing it until now, but now is better than never. llvm-svn: 238378	2015-05-27 23:27:33 +00:00
Andrey Churbanov	aa1f2b6306	The generation of the hierarchy used by hierarchical barrier improved in how the generation reacts to affinity set to none, or disabled, or no affinity available, or oversubscription. Some cleanup actions based on review comments to follow: need to use meaningful names instead of digital constants, e.g. use enumerators. llvm-svn: 234775	2015-04-13 18:51:59 +00:00
Andrey Churbanov	74bf17b8ff	Replace some unsafe API calls with safe alternatives on Windows, prepare code for similar actions on other platforms - wrap unsafe API calls into macros. llvm-svn: 233915	2015-04-02 13:27:08 +00:00
Andrey Churbanov	1362ae750f	Eliminated the write to depth field of the machine_hierarchy data structure in __kmp_get_hierarchy(), thus fixing race condition. Now local variable used by each thread. llvm-svn: 233914	2015-04-02 13:18:50 +00:00
Andrey Churbanov	16a1432176	issuing of incorrect warning fixed llvm-svn: 231779	2015-03-10 09:34:38 +00:00
Andrey Churbanov	1f037e495a	cleanup: usages of mask size wrapped into macros llvm-svn: 231775	2015-03-10 09:15:26 +00:00
Andrey Churbanov	128755741f	changed unsigned types to signed - caused by comments of Hal Finkel on one of earlier patches llvm-svn: 231773	2015-03-10 09:00:36 +00:00
Andrey Churbanov	e4b9213f80	minor change: comment improved llvm-svn: 231381	2015-03-05 17:46:50 +00:00
Andrey Churbanov	b41e62b713	Fixed memory corruption problem. llvm-svn: 228736	2015-02-10 20:10:21 +00:00
Andrey Churbanov	5cd50e3c0a	enable environment variable KMP_PLACE_THREADS also for non-MIC architectures llvm-svn: 227467	2015-01-29 17:14:58 +00:00
Andrey Churbanov	4b2f17a1d3	fixing typo in error message llvm-svn: 227451	2015-01-29 15:49:22 +00:00
Andrey Churbanov	d9e775edfc	Comments only: removing the Revision and Date svn variables from the top of all the source files. llvm-svn: 227207	2015-01-27 17:13:53 +00:00
Andrey Churbanov	1c33129956	Enables a cpuid leaf 4 check for non-MIC x86 architectures. llvm-svn: 227204	2015-01-27 17:03:42 +00:00
Andrey Churbanov	f696c820cd	Removes some unused variables (__kmp_ht_*) and changes__kmp_ncores and __kmp_nThreadsPerCore to static globals within kmp_affinity.cpp. llvm-svn: 227201	2015-01-27 16:55:43 +00:00
Andrey Churbanov	7daf9803f5	Replaces KMP_OS_WINDOWS && KMP_ARCH_X86_64 or any combination of those two options with the feature macro KMP_GROUP_AFFINITY. llvm-svn: 227199	2015-01-27 16:52:57 +00:00
Andrey Churbanov	f28f613eda	This patch enables the use of KMP_AFFINITY=balanced on non-MIC Architectures. The restriction for using balanced affinity on non-MIC architectures is it only works for one-package machines. llvm-svn: 225794	2015-01-13 14:54:00 +00:00
Jim Cownie	4cc4bb4c60	I apologise in advance for the size of this check-in. At Intel we do understand that this is not friendly, and are working to change our internal code-development to make it easier to make development features available more frequently and in finer (more functional) chunks. Unfortunately we haven't got that in place yet, and unpicking this into multiple separate check-ins would be non-trivial, so please bear with me on this one. We should be better in the future. Apologies over, what do we have here? GGC 4.9 compatibility -------------------- * We have implemented the new entrypoints used by code compiled by GCC 4.9 to implement the same functionality in gcc 4.8. Therefore code compiled with gcc 4.9 that used to work will continue to do so. However, there are some other new entrypoints (associated with task cancellation) which are not implemented. Therefore user code compiled by gcc 4.9 that uses these new features will not link against the LLVM runtime. (It remains unclear how to handle those entrypoints, since the GCC interface has potentially unpleasant performance implications for join barriers even when cancellation is not used) --- new parallel entry points --- new entry points that aren't OpenMP 4.0 related These are implemented fully :- GOMP_parallel_loop_dynamic() GOMP_parallel_loop_guided() GOMP_parallel_loop_runtime() GOMP_parallel_loop_static() GOMP_parallel_sections() GOMP_parallel() --- cancellation entry points --- Currently, these only give a runtime error if OMP_CANCELLATION is true because our plain barriers don't check for cancellation while waiting GOMP_barrier_cancel() GOMP_cancel() GOMP_cancellation_point() GOMP_loop_end_cancel() GOMP_sections_end_cancel() --- taskgroup entry points --- These are implemented fully. GOMP_taskgroup_start() GOMP_taskgroup_end() --- target entry points --- These are empty (as they are in libgomp) GOMP_target() GOMP_target_data() GOMP_target_end_data() GOMP_target_update() GOMP_teams() Improvements in Barriers and Fork/Join -------------------------------------- * Barrier and fork/join code is now in its own file (which makes it easier to understand and modify). * Wait/release code is now templated and in its own file; suspend/resume code is also templated * There's a new, hierarchical, barrier, which exploits the cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve fork/join and barrier performance. *BEWARE* the new source files have not been added to the legacy Cmake build system. If you want to use that fixes wil be required. Statistics Collection Code -------------------------- * New code has been added to collect application statistics (if this is enabled at library compile time; by default it is not). The statistics code itself is generally useful, the lightweight timing code uses the X86 rdtsc instruction, so will require changes for other architectures. The intent of this code is not for users to tune their codes but rather 1) For timing code-paths inside the runtime 2) For gathering general properties of OpenMP codes to focus attention on which OpenMP features are most used. Nested Hot Teams ---------------- * The runtime now maintains more state to reduce the overhead of creating and destroying inner parallel teams. This improves the performance of code that repeatedly uses nested parallelism with the same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL envirable to a depth to enable this (and, of course, OMP_NESTED=true to enable nested parallelism at all). Improved Intel(r) VTune(Tm) Amplifier support --------------------------------------------- * The runtime provides additional information to Vtune via the itt_notify interface to allow it to display better OpenMP specific analyses of load-imbalance. Support for OpenMP Composite Statements --------------------------------------- * Implement new entrypoints required by some of the OpenMP 4.1 composite statements. Improved ifdefs --------------- * More separation of concepts ("Does this platform do X?") from platforms ("Are we compiling for platform Y?"), which should simplify future porting. ScaleMP* contribution --------------------- Stack padding to improve the performance in their environment where cross-node coherency is managed at the page level. Redesign of wait and release code --------------------------------- The code is simplified and performance improved. Bug Fixes --------- Fixes for Windows multiple processor groups. Fix Fortran module build on Linux: offload attribute added. Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen. Fix an inconsistent error message for KMP_PLACE_THREADS environment variable. llvm-svn: 219214	2014-10-07 16:25:50 +00:00
Alp Toker	763b93965c	Add support for FreeBSD Port the OpenMP runtime to FreeBSD along with associated build system changes. Also begin to generalize affinity capabilities so they aren't tied explicitly to Windows and Linux. The port builds with stock clang and gmake and has no additional runtime dependencies. All but a handful of the validation suite tests are now passing on FreeBSD 10 x86_64. llvm-svn: 202478	2014-02-28 09:42:41 +00:00
Alp Toker	8f2d3f0f90	Fix typos llvm-svn: 202018	2014-02-24 10:40:15 +00:00
Jim Cownie	181b4bb3bb	For your Christmas hacking pleasure. This release use aligns with Intel(r) Composer XE 2013 SP1 Product Update 2 New features * The library can now be built with clang (though wiht some limitations since clang does not support 128 bit floats) * Support for Vtune analysis of load imbalance * Code contribution from Steven Noonan to build the runtime for ARM* architecture processors * First implementation of runtime API for OpenMP cancellation Bug Fixes * Fixed hang on Windows (only) when using KMP_BLOCKTIME=0 llvm-svn: 197914	2013-12-23 17:28:57 +00:00
Jim Cownie	5e8470af09	First attempt to import OpenMP runtime llvm-svn: 191506	2013-09-27 10:38:44 +00:00

44 Commits