clang-p2996

Author	SHA1	Message	Date
Ivan A. Kosarev	8264bb8d34	[NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsics This patch adds support for vrndi_f32() and vrndiq_f32() intrinsics in AArch32 mode and for vrndns_f32() intrinsic in AArch64 mode. Differential Revision: https://reviews.llvm.org/D48829 llvm-svn: 337690	2018-07-23 13:26:37 +00:00
Erich Keane	3efe00206f	Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552	2018-07-20 14:13:28 +00:00
Fangrui Song	99337e246c	Change \t to spaces llvm-svn: 337530	2018-07-20 08:19:20 +00:00
Nemanja Ivanovic	2600b839d5	NFC: Remove extraneous semicolons as pointed out in the differential review The commit for https://reviews.llvm.org/D49424 missed the comment about the extraneous semicolons. Remove them. llvm-svn: 337451	2018-07-19 12:49:27 +00:00
Nemanja Ivanovic	1ac56bd33f	[PowerPC] Handle __builtin_xxpermdi the same way as GCC does The codegen for this builtin was initially implemented to match GCC. However, due to interest from users GCC changed behaviour to account for the big endian bias of the instruction and correct it. This patch brings the handling inline with GCC. Fixes https://bugs.llvm.org/show_bug.cgi?id=38192 Differential Revision: https://reviews.llvm.org/D49424 llvm-svn: 337449	2018-07-19 12:44:15 +00:00
Mandeep Singh Grang	0054f48b44	[COFF] Add more missing MSVC ARM64 intrinsics Summary: Added the following intrinsics: _BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64 _InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64, _InterlockedExchangeAdd64, _InterlockedExchangeSub64, _InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64. Reviewers: compnerd, mstorsjo, rnk, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49445 llvm-svn: 337327	2018-07-17 22:03:24 +00:00
Craig Topper	1faf953d75	[X86] Remove custom handling for __builtin_ia32_divss_round_mask and __builtin_ia32_divsd_round_mask. llvm-svn: 336628	2018-07-10 00:50:03 +00:00
Craig Topper	638426fc36	[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622	2018-07-10 00:37:25 +00:00
Craig Topper	74c10e3236	[Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583	2018-07-09 19:00:16 +00:00
Craig Topper	8a8d72794f	[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma llvm-svn: 336507	2018-07-08 01:10:47 +00:00
Craig Topper	f89f62a680	[X86] When creating a select for scalar masked sqrt and div builtins make sure we optimize the all ones mask case. This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases. Factor the code into a helper function to make this easy. llvm-svn: 336472	2018-07-06 22:46:52 +00:00
Craig Topper	be4c2933a2	[X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with native IR using llvm.fma intrinsic. This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns. llvm-svn: 336417	2018-07-06 07:14:47 +00:00
Craig Topper	284c5f342c	[X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388	2018-07-05 20:38:31 +00:00
Gabor Buella	9679eb6527	[X86] Fix some vector cmp builtins - TRUE/FALSE predicates This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355	2018-07-05 14:26:56 +00:00
Craig Topper	8bf793fb35	[X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead. llvm-svn: 335945	2018-06-29 05:43:33 +00:00
Craig Topper	851f363691	[X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm. llvm-svn: 335745	2018-06-27 15:57:57 +00:00
Ivan A. Kosarev	a9f484ac4a	[NEON] Support vldNq intrinsics in AArch32 (Clang part) This patch reworks the support for dup NEON intrinsics as described in D48439. Differential Revision: https://reviews.llvm.org/D48440 llvm-svn: 335734	2018-06-27 13:58:43 +00:00
Craig Topper	4ef61aecbd	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes. llvm-svn: 335564	2018-06-26 00:44:02 +00:00
Gabor Buella	716863c820	[X86] Lower _mm[256\|512]_cmp[.]_mask intrinsics to native llvm IR Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339	2018-06-22 11:59:16 +00:00
Craig Topper	342b095689	[X86] Update handling in CGBuiltin to be tolerant of out of range immediates. D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308	2018-06-21 23:39:47 +00:00
Tomasz Krupa	83ba6fa98d	Fix a bug introduced by rL334850 Summary: All *_sqrt_round_s[s\|d] intrinsics should execute a square root on zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around. Reviewers: itaraban, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D48288 llvm-svn: 334964	2018-06-18 17:57:05 +00:00
Tomasz Krupa	f1792bb3d6	[X86] Lowering sqrt intrinsics to native IR Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850	2018-06-15 18:05:59 +00:00
Luke Geeson	da2b2e8c26	[AArch64] Reverted rC334696 with Clang VCVTA test fix llvm-svn: 334820	2018-06-15 10:10:45 +00:00
Craig Topper	31730ae761	[X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773	2018-06-14 22:02:35 +00:00
Tomasz Krupa	82aa42af49	[X86] Lowering Mask Scalar intrinsics to native IR (Clang part) Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741	2018-06-14 17:36:23 +00:00
Luke Geeson	bb399f8013	[AArch64] reverting rC334693 due to build failures llvm-svn: 334696	2018-06-14 08:59:33 +00:00
Luke Geeson	010bbbf390	[AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A llvm-svn: 334693	2018-06-14 08:28:56 +00:00
Mandeep Singh Grang	2d28383097	[COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility. Reviewers: mstorsjo, compnerd, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D48132 llvm-svn: 334639	2018-06-13 18:49:35 +00:00
Craig Topper	201b9dd334	[X86] Fix operand order in the shuffle created for blend builtins. This was broken when the builtin was added in r334249. llvm-svn: 334422	2018-06-11 17:06:01 +00:00
Craig Topper	3cce6a7ed9	[X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins. Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them. Reviewers: RKSimon, delena, spatel, GBuella Reviewed By: RKSimon Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D47693 llvm-svn: 334366	2018-06-10 17:27:05 +00:00
Ivan A. Kosarev	73c76c35a5	[NEON] Support VST1xN intrinsics in AArch32 mode (Clang part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47446 llvm-svn: 334362	2018-06-10 09:28:10 +00:00
Craig Topper	88097d9355	[X86] Add back some masked vector truncate builtins. Custom IRgen a a few others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331	2018-06-08 21:50:08 +00:00
Craig Topper	5f50f33806	[X86] Fold masking into subvector extract builtins. I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330	2018-06-08 21:50:07 +00:00
Craig Topper	03f4f04b91	[X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking. llvm-svn: 334311	2018-06-08 18:00:25 +00:00
Craig Topper	422a1bbb84	[X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking. llvm-svn: 334266	2018-06-08 07:18:33 +00:00
Craig Topper	03de166ccd	[X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking. llvm-svn: 334265	2018-06-08 06:13:16 +00:00
Craig Topper	3428beeb2f	[X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261	2018-06-08 03:24:47 +00:00
Craig Topper	acf5601961	[X86] Add builtins for vpermilps/pd instructions to enable target feature checking. llvm-svn: 334256	2018-06-08 00:59:27 +00:00
Craig Topper	7d17d7278b	[X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range. llvm-svn: 334249	2018-06-08 00:00:21 +00:00
Craig Topper	9392136414	[X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking. llvm-svn: 334244	2018-06-07 23:03:08 +00:00
Reid Kleckner	aa46ed9278	[MS] Re-add support for the ARM interlocked bittest intrinscs Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239	2018-06-07 21:39:04 +00:00
Craig Topper	e56819eb69	[X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking. We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237	2018-06-07 21:27:41 +00:00
Craig Topper	d3623155a2	[X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208	2018-06-07 17:28:03 +00:00
Craig Topper	b92c77d176	[X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159	2018-06-07 02:46:02 +00:00
Reid Kleckner	11c99ed05f	[MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 libvcruntime.lib Factor out the common setjmp call emission code. Based on a patch by Chris January Differential Revision: https://reviews.llvm.org/D47784 llvm-svn: 334112	2018-06-06 18:39:47 +00:00
Reid Kleckner	05df851327	Fix std::tuple errors llvm-svn: 334060	2018-06-06 01:44:10 +00:00
Reid Kleckner	368d52b7e0	Implement bittest intrinsics generically for non-x86 platforms I tested these locally on an x86 machine by disabling the inline asm codepath and confirming that it does the same bitflips as we do with the inline asm. Addresses code review feedback. llvm-svn: 334059	2018-06-06 01:35:08 +00:00
Craig Topper	f3914b74c1	[X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057	2018-06-06 00:24:55 +00:00
Craig Topper	6b5b5ce06c	[X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only use it with an index of 0. This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand. llvm-svn: 334054	2018-06-05 22:40:03 +00:00
Reid Kleckner	1d9c249db5	Reimplement the bittest intrinsic family as builtins with inline asm We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978	2018-06-05 01:33:40 +00:00

1 2 3 4 5 ...

940 Commits