clang-p2996

Author	SHA1	Message	Date
Sam Parker	aaec3c6260	[ARM] Allow truncs as sources in ARM CGP We previously only allowed truncs as sinks, but now allow them as sources too. We do this by checking that the result type is the narrow type that we're trying to optimise for. Differential Revision: https://reviews.llvm.org/D51978 llvm-svn: 342141	2018-09-13 15:14:12 +00:00
Sam Parker	96f77f142b	[ARM] Fix FixConst for ARMCodeGenPrepare Part of FixConsts wrongly assumes either a 8- or 16-bit constant which can result in the wrong constants being generated during promotion. Differential Revision: https://reviews.llvm.org/D52032 llvm-svn: 342140	2018-09-13 14:48:10 +00:00
Jonas Devlieghere	64c901d2b1	[MC/Dwarf] Unclamp DWARF linetables format on Darwin. In r319995, we fixed the line table format to version 2 on Darwin because dsymutil didn't yet understand the new format which caused test failures for the LLDB bots. This has been resolved in the meantime so there's no reason to keep this limitation. rdar://problem/35968332 llvm-svn: 342136	2018-09-13 13:13:50 +00:00
Matt Arsenault	ff987ac6ea	AMDGPU: Fix not preserving alignent in call setups If an argument was passed on the stack, this was using the default alignment. I'm not sure there's an observable change from this. This was observable due to bugs in expansion of unaligned loads and stores, but since that is fixed I don't think this matters much. llvm-svn: 342133	2018-09-13 12:14:31 +00:00
Matt Arsenault	842cda6312	DAG: Fix expansion of unaligned FP loads and stores This was trying to scalarizing a scalar FP type, resulting in an assert. Fixes unaligned f64 stack stores for AMDGPU. llvm-svn: 342132	2018-09-13 12:14:23 +00:00
Matt Arsenault	9de2fb58fa	AMDGPU: Fix some outdated datalayouts in tests llvm-svn: 342131	2018-09-13 11:56:28 +00:00
Tim Northover	c15d47bb01	ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4. The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad. This also enables the optimization at -Os since it costs on average one byte per loop in return for 1 cycle per iteration, which is pretty good going. llvm-svn: 342127	2018-09-13 10:28:05 +00:00
Alexander Timofeev	2fb44808b1	[AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed. Differential revision: https://reviews.llvm.org/D51975 Reviewers: rampitec llvm-svn: 342115	2018-09-13 06:34:56 +00:00
Craig Topper	f107123a88	[X86] Type legalize v2i32 div/rem by scalarizing rather than promoting Summary: Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall. This patch switches type legalization to do the scalarizing itself using i32. It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support. Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51325 llvm-svn: 342114	2018-09-13 06:13:37 +00:00
Krzysztof Parzyszek	a6d4fc0e29	[Hexagon] Use shuffles when lowering "gather" shufflevectors Shufflevector instructions in LLVM IR that extract a subset of elements of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs. This will avoid expanding them into constly extracts and inserts. llvm-svn: 342091	2018-09-12 22:14:52 +00:00
Heejin Ahn	300f42fbce	[WebAssembly] Make tied inline asm operands work again Summary: rL341389 broke code with tied register operands in inline assembly. For example, `asm("" : "=r"(var) : "0"(var));` The code above specifies the input operand to be in the same register with the output operand, tying the two register. This patch makes this kind of code work again. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51991 llvm-svn: 342084	2018-09-12 21:34:39 +00:00
Michael Berg	22a53cbc7f	Guard FMF context by excluding some FP operators from FPMathOperator Summary: Some FPMathOperators succeed and the retrieve FMF context when they never have it, we should omit these cases to keep from removing FMF context. For instance when we visit some FPMathOperator mapped Instructions which never have FMF flags and a Node was associated which does have FMF flags, that Node today will have all its flags cleared via the intersect operation. With this change, we exclude associating Nodes that never have FPMathOperator status under FMF. Reviewers: spatel, wristow, arsenm, hfinkel, aemerson Reviewed By: spatel Subscribers: llvm-commits, wdng Differential Revision: https://reviews.llvm.org/D51145 llvm-svn: 342081	2018-09-12 21:09:59 +00:00
Konstantin Zhuravlyov	71e43ee47d	AMDGPU: Re-apply r341982 after fixing the layering issue Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069	2018-09-12 18:50:47 +00:00
Thomas Lively	ebd4c906d8	[WebAssembly] SIMD comparisons Summary: Match the ordering semantics of non-vector comparisons. For floating point comparisons that do not correspond to instructions, the tests check that some vector comparison instruction was emitted but do not care about the full implementation. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D51765 llvm-svn: 342064	2018-09-12 17:56:00 +00:00
Diogo N. Sampaio	01b916e188	[ARM] Tighten f64<->f16 conversion requirements Fix missing Requires fields. Patch by Bernard Ogden (bogden) Reviewers: SjoerdMeijer, javed.absar, t.p.northover Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D51631 llvm-svn: 342061	2018-09-12 16:24:43 +00:00
Craig Topper	2262613532	[X86] Remove isel patterns for ADCX instruction There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags. I don't think gcc will generate this instruction either. Fixes PR38852. Differential Revision: https://reviews.llvm.org/D51754 llvm-svn: 342059	2018-09-12 15:47:34 +00:00
Sander de Smalen	2d77e788f2	[AArch64] Implement aarch64_vector_pcs codegen support. This patch adds codegen support for the saving/restoring V8-V23 for functions specified with the aarch64_vector_pcs calling convention attribute, as added in patch D51477. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51479 llvm-svn: 342049	2018-09-12 12:10:22 +00:00
Sam Parker	a023c7a9cb	[ARM] Exchange MAC operands in ARMParallelDSP SMLAD and SMLALD instructions also come in the form of SMLADX and SMLALDX which perform an exchange on their second operand. To support this, more of the loads in the MAC candidates are compared for sequential access and a boolean value has been added to BinOpChain. AddMACCandiate has been refactored into a small pattern matching state machine to reduce the amount of duplicated code, but also to enable the matching to be more flexible. CreateParallelMACPairs now iterates through all the candidates to find parallel ones. Differential Revision: https://reviews.llvm.org/D51424 llvm-svn: 342033	2018-09-12 09:17:44 +00:00
Sam Parker	569b24549e	[ARM] Allow bitcasts in ARMCodeGenPrepare Allow bitcasts in the use-def chains, treating them as sources. Differential Revision: https://reviews.llvm.org/D50758 llvm-svn: 342032	2018-09-12 09:11:48 +00:00
Ilya Biryukov	95066496d0	Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023	2018-09-12 07:05:30 +00:00
Craig Topper	dc32e91bc6	[X86] Teach X86SelectionDAGInfo::EmitTargetCodeForMemcpy about GNUX32 Summary: In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match. Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger. Reviewers: rnk, efriedma, MatzeB, echristo Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51893 llvm-svn: 342015	2018-09-12 01:57:22 +00:00
Jessica Paquette	2386eab360	[MachineOutliner] Add codegen size remarks to the MachineOutliner Since the outliner is a module pass, it doesn't get codegen size remarks like the other codegen passes do. This adds size remarks to the outliner. This is kind of a workaround, so it's peppered with FIXMEs; size remarks really ought to not ever be handled by the pass itself. However, since the outliner is the only "MachineModulePass", this works for now. Since the entire purpose of the MachineOutliner is to produce code size savings, it really ought to be included in codgen size remarks. If we ever go ahead and make a MachineModulePass (say, something similar to MachineFunctionPass), then all of this ought to be moved there. llvm-svn: 342009	2018-09-11 23:05:34 +00:00
Michael Berg	c72a7259be	add IR flags to MI Summary: Initial support for nsw, nuw and exact flags in MI Reviewers: spatel, hfinkel, wristow Reviewed By: spatel Subscribers: nlopes Differential Revision: https://reviews.llvm.org/D51738 llvm-svn: 341996	2018-09-11 21:35:32 +00:00
Konstantin Zhuravlyov	941615e4c8	AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982	2018-09-11 18:56:51 +00:00
Craig Topper	8238580aae	[X86] Prefer unpckhpd over movhlps in isel for fake unary cases In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order. This patch removes the hack. This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage. Differential Revision: https://reviews.llvm.org/D49499 llvm-svn: 341973	2018-09-11 17:57:27 +00:00
Craig Topper	cc9efaffad	[X86] Teach X86FastISel::X86SelectRet to use EAX for the sret pointer in GNUX32 GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value. Fixes ones of the failures from PR38865. Differential Revision: https://reviews.llvm.org/D51940 llvm-svn: 341972	2018-09-11 17:57:23 +00:00
Josh Stone	f446facab0	[GlobalISel] Lower dbg.declare into indirect DBG_VALUE Summary: D31439 changed the semantics of dbg.declare to take the address of a variable as the first argument, making it indirect. It specifically updated FastISel for this change here: https://reviews.llvm.org/D31439#change-WVArzi177jPl GlobalISel needs to follow suit, or else it will be missing a level of indirection in the generated debuginfo. This problem was seen in a Rust debuginfo test on aarch64, since GlobalISel is used at -O0 for aarch64. https://github.com/rust-lang/rust/issues/49807 https://bugzilla.redhat.com/show_bug.cgi?id=1611597 https://bugzilla.redhat.com/show_bug.cgi?id=1625768 Reviewers: dblaikie, aprantl, t.p.northover, javed.absar, rnk Reviewed By: rnk Subscribers: #debug-info, rovka, kristof.beyls, JDevlieghere, llvm-commits, tstellar Differential Revision: https://reviews.llvm.org/D51749 llvm-svn: 341969	2018-09-11 17:52:01 +00:00
Roman Lebedev	baf2628043	[DagCombine][NFC] Some more tests fo for X % C == 0 (UREM case) transform For https://reviews.llvm.org/D50222 Patch by: hermord (Dmytro Shynkevych)! llvm-svn: 341953	2018-09-11 15:34:26 +00:00
Simon Atanasyan	16c2311c59	[MIPS] Fix illegal type assert in single-float mode An fp_to_sint node would be incorrectly lowered to a TruncIntFP node in single-float mode. This would trigger an "Unexpected illegal type!" assert. Patch by Dan Ravensloft. Differential revision: https://reviews.llvm.org/D51810 llvm-svn: 341952	2018-09-11 15:32:47 +00:00
Roman Lebedev	de9d787131	[Hexagon] [Test] Remove undef and infinite loop from test Summary: The undef and the infinite loop at the end cause this test to be translated unpredictably. In particular, the checked-for `mpy` disappears under certain legal optimizations (e.g. the one in D50222). Since the use of these constructs is not relevant to the behavior tested, according to the header comment, this change, suggested by @kparzysz, eliminates them. Was initially committed in r341046, but was reverted. Patch by: hermord (Dmytro Shynkevych)! Reviewers: kparzysz Reviewed By: kparzysz Subscribers: lebedev.ri, llvm-commits, kparzysz Differential Revision: https://reviews.llvm.org/D50944 llvm-svn: 341943	2018-09-11 14:06:14 +00:00
Sam Parker	01db2983cd	[ARM] Add smlald support in ARMParallelDSP Search from i64 reducing phis, as well as i32, to allow the generation of smlald instructions. Differential Revision: https://reviews.llvm.org/D51101 llvm-svn: 341941	2018-09-11 14:01:22 +00:00
Sanjay Patel	e368f46788	[AArch64] test codegen for unsigned saturated add; NFC This is identical to the tests added for x86 at rL341845. A semi-generic DAGCombine should improve things universally. llvm-svn: 341935	2018-09-11 13:21:28 +00:00
Alexander Timofeev	db7ee7660a	[AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928	2018-09-11 11:56:50 +00:00
Simon Atanasyan	32d8d1bf04	[mips] Add a pattern for 64-bit GPR variant of the `rdhwr` instruction MIPS ISAs start to support third operand for the `rdhwr` instruction starting from Revision 6. But LLVM generates assembler code with three-operands version of this instruction on any MIPS64 ISA. The third operand is always zero, so in case of direct code generation we get correct code. This patch fixes the bug by adding an instruction alias. The same alias already exists for 32-bit ISA. Ideally, we also need to reject three-operands version of the `rdhwr` instruction in an assembler code if ISA revision is less than 6. That is a task for a separate patch. This fixes PR38861 (https://bugs.llvm.org/show_bug.cgi?id=38861) Differential revision: https://reviews.llvm.org/D51773 llvm-svn: 341919	2018-09-11 09:57:25 +00:00
Craig Topper	844f035e1e	[X86] In combineMOVMSK, look through int->fp bitcasts before callling SimplifyDemandedBits. MOVMSKPS and MOVMSKPD both take FP types, but likely the operations before it are on integer types with just a int->fp bitcast between them. If the bitcast isn't used by anything else and doesn't change the element width we can look through it to simplify the integer ops. llvm-svn: 341915	2018-09-11 08:20:02 +00:00
Craig Topper	85210311ba	[X86] Add test cases inspired by PR38840. These are test cases inspired by sequences like below for extracting the same bit from every vector element and checking for all zeros/ones. define i1 @and256_x8(<8 x i32>) { %a = trunc <8 x i32> %0 to <8 x i1> %b = bitcast <8 x i1> %a to i8 %d = icmp eq i8 %b, -1 ret i1 %d } This is what the above looks like after InstCombine. define i1 @and256_x8_opt(<8 x i32>) { %2 = and <8 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %a = icmp ne <8 x i32> %2, zeroinitializer %b = bitcast <8 x i1> %a to i8 %d = icmp eq i8 %b, -1 ret i1 %d } llvm-svn: 341908	2018-09-11 07:23:29 +00:00
Matt Arsenault	d0cf1b26d4	AMDGPU: Fix r600 test llvm-svn: 341898	2018-09-11 04:39:16 +00:00
Matt Arsenault	99c780159d	AMDGPU: Don't error on out of bounds address spaces We should never abort on valid IR. The most reasonable interpretation of an arbitrary address space pointer is probably some kind of special subset of global memory. llvm-svn: 341894	2018-09-11 04:00:41 +00:00
Craig Topper	07889079fa	[X89] Explicitly enable aes in aes-schedule.ll to fix failures after r341861. llvm-svn: 341868	2018-09-10 21:49:01 +00:00
Sanjay Patel	7feb3ed78c	[x86] test codegen for unsigned saturated add; NFC All of the ISA holes are going to make this difficult, but we can't canonicalize the IR and try to solve PR14613 until we have backend support to get this right. https://bugs.llvm.org/show_bug.cgi?id=14613 https://rise4fun.com/Alive/Guv https://rise4fun.com/Alive/AADG llvm-svn: 341845	2018-09-10 17:40:15 +00:00
Alexander Timofeev	20cbe6f319	[AMDGPU] Preliminary patch for divergence driven instruction selection. Inline immediate move to V_MADAK_F32. Differential revision: https://reviews.llvm.org/D51586 Reviewer: rampitec llvm-svn: 341843	2018-09-10 16:42:49 +00:00
Petar Jovanovic	ce4dd0ae38	[MIPS GlobalISel] Select icmp Select 32bit integer compare instructions for MIPS32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D51489 llvm-svn: 341840	2018-09-10 15:56:52 +00:00
Matt Arsenault	7f6dc597d3	AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit This will require something to cast. Before this would eliminate the cast, which would result in copies of $noreg. llvm-svn: 341803	2018-09-10 11:59:27 +00:00
Matt Arsenault	57b5966dad	DAG: Handle odd vector sizes in calling conv splitting This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801	2018-09-10 11:49:23 +00:00
Carl Ritson	f898edd117	[AMDGPU] Prevent sequences of non-instructions disrupting GCNHazardRecognizer wait state counting Summary: This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted. Reviewers: nhaehnle, arsenm Reviewed By: arsenm Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51726 Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b llvm-svn: 341798	2018-09-10 10:14:48 +00:00
Matt Arsenault	72d27f5525	AMDGPU: Fix tests using old number for constant address space llvm-svn: 341770	2018-09-10 02:54:25 +00:00
Matt Arsenault	d77fcc2a92	AMDGPU: Use GOT PSV since it has an address space now llvm-svn: 341768	2018-09-10 02:23:39 +00:00
Matt Arsenault	b998674610	AMDGPU: Don't abort on unknown addrspace argument llvm-svn: 341767	2018-09-10 02:23:30 +00:00
Craig Topper	3823516103	[X86] Custom type legalize (v2i32 (fp_to_uint v2f64))) without avx512vl by widening to v4i32 and v4f64 instead of v8i32 and v8f64. Make it aware of x86-experimental-vector-widening-legalization We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that. If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case. llvm-svn: 341765	2018-09-09 20:36:36 +00:00
Sanjay Patel	6ebf218e4c	[SelectionDAG] enhance vector demanded elements to look at a vector select condition operand This is the DAG equivalent of D51433. If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition. The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit vectors because we don't need those anyway. I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed to be running there? Differential Revision: https://reviews.llvm.org/D51696 llvm-svn: 341762	2018-09-09 14:13:22 +00:00

1 2 3 4 5 ...

25829 Commits