clang-p2996

Author	SHA1	Message	Date
Matt Arsenault	06028dd7be	R600/SI: Fix verifier error with pseudo store instructions. Use i32 instead of specifying SReg_32. When this is the pseudo INDIRECT_BASE_ADDR, this would give a bogus verifier error. llvm-svn: 207770	2014-05-01 16:37:52 +00:00
Bradley Smith	3567cc1b42	[ARM64] Prefer generation of bzero on Darwin only llvm-svn: 207760	2014-05-01 13:11:59 +00:00
Tim Northover	534acbdf73	AArch64/ARM64: print BFM instructions as BFI or BFXIL The canonical form of the BFM instruction is always one of the more explicit extract or insert operations, which makes reading output much easier. llvm-svn: 207752	2014-05-01 12:29:38 +00:00
Weiming Zhao	7f6daf1799	[ARM64] Prevent bit extraction to be adjusted by following shift For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx more difficult. For example: Given %shr = lshr i64 %x, 4 %and = and i64 %shr, 15 %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and %0 = load i64* %arrayidx With current shift folding, it takes 3 instrs to compute base address: lsr x8, x0, #1 and x8, x8, #0x78 add x8, x9, x8 If using ubfx, it only needs 2 instrs: ubfx x8, x0, #4, #4 add x8, x9, x8, lsl #3 This fixes bug 19589 llvm-svn: 207702	2014-04-30 21:07:24 +00:00
Michael Zolotukhin	1f4a960ccf	[X86] Never hoist the shift value of a shift instruction. There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. This change is like r206101, but for X86. rdar://problem/16190769 llvm-svn: 207692	2014-04-30 19:17:32 +00:00
Tim Northover	a8c577e454	ARM64: print fp immediates without using scientific notation. llvm-svn: 207669	2014-04-30 16:13:34 +00:00
Tom Stellard	1bd80725b3	R600/SI: Use VALU instructions for copying i1 values We can't use SALU instructions for this since they ignore the EXEC mask and are always executed. This fixes several OpenCV tests. llvm-svn: 207661	2014-04-30 15:31:33 +00:00
Tom Stellard	0c354f25c9	R600/SI: Teach moveToVALU how to handle some SMRD instructions llvm-svn: 207660	2014-04-30 15:31:29 +00:00
Chad Rosier	864e35db0a	[ARM64][fast-isel] Fast-isel doesn't know how to handle f128. llvm-svn: 207659	2014-04-30 15:29:57 +00:00
Sasa Stankovic	7b061a42b1	[mips] Fix MipsLongBranch pass to work when the offset from the branch to the target cannot be determined accurately. This is the case for NaCl where the sandboxing instructions are added in MC layer, after the MipsLongBranch pass. It is also the case when the code has inline assembly. Instead of calculating offset in the MipsLongBranch pass, use %hi(sym1 - sym2) and %lo(sym1 - sym2) expressions that are resolved during the fixup. This patch also deletes microMIPS test file test/CodeGen/Mips/micromips-long-branch.ll and implements microMIPS CHECKs in a much simpler way in a file test/CodeGen/Mips/longbranch.ll, together with MIPS32 and MIPS64. llvm-svn: 207656	2014-04-30 15:06:25 +00:00
Tim Northover	0ac99404f0	ARM64: print lsr instead of lsrv for variable shifts (etc) The canonical syntax for shifts by a variable amount does not end with 'v', but that syntax should be supported as an alias (presumably for legacy reasons). llvm-svn: 207649	2014-04-30 13:37:07 +00:00
Tim Northover	20ad359b77	AArch64/ARM64: use HS instead of CS & LO instead of CC. On instructions using the NZCV register, a couple of conditions have dual representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and unsigned-lower/carry-clear). The first of these is more descriptive in most circumstances, so we should print it. llvm-svn: 207644	2014-04-30 13:14:03 +00:00
Daniel Sanders	e296a0fce5	[mips][msa] Fix vector insertions where the index is variable Summary: This isn't supported directly so we rotate the vector by the desired number of elements, insert to element zero, then rotate back. The i64 case generates rather poor code on MIPS32. There is an obvious optimisation to be made in future (do both insert.w's inside a shared rotate/unrotate sequence) but for now it's sufficient to select valid code instead of aborting. Depends on D3536 Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3537 llvm-svn: 207640	2014-04-30 12:09:32 +00:00
Tim Northover	970c4a8d35	ARM64: use hex immediates for movz/movk instructions Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to piece together an immediate in 16-bit chunks, hex is probably the most appropriate format. llvm-svn: 207635	2014-04-30 11:19:40 +00:00
Tim Northover	4b2f8a990e	ARM64: hexify printing various immediate operands This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they accept weird shifts which are more naturally understandable in hex notation). Also changes BRK/HINT etc, which is probably a neutral change, but easier than the alternative. llvm-svn: 207634	2014-04-30 11:19:28 +00:00
Tim Northover	cfd6e66544	ARM64: print canonical syntax for add/sub (imm) instructions. Since these instructions only accept a 12-bit immediate, possibly shifted left by 12, the canonical syntax used by the architecture reference manual is "#N {, lsl #12 }". We should accept an immediate that has already been shifted, (e.g. Also, print a comment giving the full addend since it can be helpful. llvm-svn: 207633	2014-04-30 11:19:15 +00:00
James Molloy	7c39df37b2	[ARM64] Ensure arm64_be is dealt with when emitting debug info. This is a partial port of r204816 (cpirker "Elf support for MC-JIT runtime dynamic linker") from AArch64 to ARM64. llvm-svn: 207625	2014-04-30 10:15:35 +00:00
Tim Northover	41cec5c3cb	ARM64: make sure FastISel uses a GPR64 source in 64-bit extensions. llvm-svn: 207620	2014-04-30 09:32:01 +00:00
Saleem Abdulrasool	25947c318b	ARM: support stack probe emission for Windows on ARM This introduces the stack lowering emission of the stack probe function for Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where any page allocation which crosses a page boundary of the following guard page will cause a page fault. This page fault must be handled by the kernel to ensure that the page is faulted in. If this does not occur and a write access any memory beyond that, the page fault will go unserviced, resulting in an abnormal program termination. The watermark for the stack probe appears to be at 4080 bytes (for accommodating the stack guard canaries and stack alignment) when SSP is enabled. Otherwise, the stack probe is emitted on the page size boundary of 4096 bytes. llvm-svn: 207615	2014-04-30 07:05:07 +00:00
Saleem Abdulrasool	f8222631a5	ARM: partially handle 32-bit relocations for WoA IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise relocation is not split up and reordered. When expanding the mov32imm pseudo-instruction, create a bundle if the machine operand is referencing an address. This helps ensure that the relocatable address load is not reordered by subsequent passes. Unfortunately, this only partially handles the case as the Constant Island Pass occurs after the instructions are unbundled and does not properly handle bundles. That is a more fundamental issue with the pass itself and beyond the scope of this change. llvm-svn: 207608	2014-04-30 04:54:58 +00:00
Reid Kleckner	fb69308568	Implement X86 code generation for musttail Currently, musttail codegen is relying on sibcall optimization, and reporting a fatal error if fails. Sibcall optimization fails when stack arguments need to be modified, which is insufficient for musttail. The logic for moving arguments in memory safely is already implemented for GuaranteedTailCallOpt. This change merely arranges for musttail calls to use it. No functional change for GuaranteedTailCallOpt. Reviewers: espindola Differential Revision: http://reviews.llvm.org/D3493 llvm-svn: 207598	2014-04-29 23:55:41 +00:00
Tom Stellard	919bb6b83f	R600/SI: Custom lower SI_IF and SI_ELSE to avoid machine verifier errors SI_IF and SI_ELSE are terminators which also produce a value. For these instructions ISel always inserts a COPY to move their value to another basic block. This COPY ends up between SI_(IF\|ELSE) and the S_BRANCH* instruction at the end of the block. This breaks MachineBasicBlock::getFirstTerminator() and also the machine verifier which assumes that terminators are grouped together at the end of blocks. To solve this we coalesce the copy away right after ISel to make sure there are no instructions in between terminators at the end of blocks. llvm-svn: 207591	2014-04-29 23:12:53 +00:00
Tom Stellard	58ac7440e6	R600/SI: Only select SALU instructions in the entry or exit block SALU instructions ignore control flow, so it is not always safe to use them within branches. This is a partial solution to this problem until we can come up with something better. llvm-svn: 207590	2014-04-29 23:12:48 +00:00
Tom Stellard	676f571999	R600: optimize the UDIVREM 64 algorithm This is a squash of several optimization commits: - calculate DIV_Lo and DIV_Hi separately - use BFE_U32 if we are operating on 32bit values - use precomputed constants instead of shifting in UDVIREM - skip the first 32 iterations of udivrem v2: Check whether BFE is supported before using it Patch by: Jan Vesely Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 207589	2014-04-29 23:12:46 +00:00
Reed Kotler	67077b3032	Add Simple return instruction to Mips fast-isel Reviewers: dsanders Reviewed by: dsanders Differential Revision: http://reviews.llvm.org/D3430 llvm-svn: 207565	2014-04-29 17:57:50 +00:00
Daniel Sanders	6857800b67	[mips][msa] Use CHECK-LABEL in basic_operations*.ll Differential Revision: http://reviews.llvm.org/D3536 llvm-svn: 207529	2014-04-29 14:28:58 +00:00
Daniel Sanders	b3268e71e2	[mips][msa] Fix element extraction where the index is variable. Summary: This isn't supported directly so we splat the vector element and extract the most convenient copy. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3530 llvm-svn: 207524	2014-04-29 13:31:37 +00:00
Tim Northover	aacce57d61	ARM: fix test after change to indirect symbol emission. llvm-svn: 207519	2014-04-29 10:13:10 +00:00
Tim Northover	9e7782dcf3	X86: emit hidden stubs into a proper non_lazy_symbol_pointer section. rdar://problem/16660411 llvm-svn: 207518	2014-04-29 10:06:10 +00:00
Tim Northover	2372301bcf	ARM: emit hidden stubs into a proper non_lazy_symbol_pointer section. rdar://problem/16660411 llvm-svn: 207517	2014-04-29 10:06:05 +00:00
Benjamin Kramer	e1ab3f062e	AArch64: Mark vector long multiplication as expand. There are no patterns for this. This was already fixed for ARM64 but I forgot to apply it to AArch64 too. llvm-svn: 207515	2014-04-29 09:37:54 +00:00
Elena Demikhovsky	299cf511c4	AVX-512: optimized a shuffle pattern to VINSERTI64x4. Added intrinsics for VPERMT2PS/PD/D/Q instructions. llvm-svn: 207513	2014-04-29 09:09:15 +00:00
Hao Liu	6db3410071	[ARM64]Fix a bug about incorrect operand order in an EXT instruction, which is introduced by r207485. llvm-svn: 207500	2014-04-29 07:51:19 +00:00
Hao Liu	cf37110920	[ARM64]Fix a bug when lowering shuffle vector to an EXT instruction. E.g. Mask like <-1, -1, 1, ...> will generate incorrect EXT index. llvm-svn: 207485	2014-04-29 01:50:36 +00:00
Chad Rosier	0def8e2652	[ARM64] Fix an issue where we were always assuming a copy was coming from a D subregister. llvm-svn: 207423	2014-04-28 16:21:50 +00:00
Hao Liu	9a342778b9	[ARM64]Fix a bug cannot select UQSHL/SQSHL with constant i64 shift amount. llvm-svn: 207399	2014-04-28 07:34:27 +00:00
Benjamin Kramer	3693e77cb4	X86: If SSE4.1 is missing lower SMUL_LOHI of v4i32 to pmuludq and fix up the high parts. This is more expensive than pmuldq but still cheaper than scalarizing the whole thing. llvm-svn: 207370	2014-04-27 18:47:41 +00:00
Benjamin Kramer	99767ddf0b	Update test not to check for a shuffle of an all-zero vector. llvm-svn: 207354	2014-04-27 11:54:45 +00:00
Benjamin Kramer	6bca8ef667	SelectionDAG: Aggressively fold shuffles of constant splats. llvm-svn: 207352	2014-04-27 11:41:06 +00:00
Benjamin Kramer	da4841b3a9	DAGCombiner: Simplify code a bit, make more transforms work with vectors. llvm-svn: 207338	2014-04-26 23:09:49 +00:00
Benjamin Kramer	6d2dff61f9	X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available. llvm-svn: 207318	2014-04-26 14:12:19 +00:00
Benjamin Kramer	c9827ab103	X86: Add patterns for MULHU/MULHS of v8i16 and v16i16. This gets us pretty code for divs of i16 vectors. Turn the existing intrinsics into the corresponding nodes. llvm-svn: 207317	2014-04-26 13:01:03 +00:00
Benjamin Kramer	4dae598bc8	DAGCombiner: Turn divs of vector splats into vectorized multiplications. Otherwise the legalizer would just scalarize everything. Support for mulhi in the targets isn't that great yet so on most targets we get exactly the same scalarized output. Add a test for x86 vector udiv. I had to disable the mulhi nodes on ARM because there aren't any patterns for it. As far as I know ARM has instructions for getting the high part of a multiply so this should be fixed. llvm-svn: 207315	2014-04-26 12:06:28 +00:00
Michael Zolotukhin	1a97a7bcbf	Revert r206749 till a final decision about the intrinsics is made. llvm-svn: 207313	2014-04-26 09:56:41 +00:00
Juergen Ributzka	a6bda8bae2	[DAG] During DAG legalization keep opaque constants even after expanding. The included test case would return the incorrect results, because the expansion of an shift with a constant shift amount of 0 would generate undefined behavior. This is because ExpandShiftByConstant assumes that all shifts by constants with a value of 0 have already been optimized away. This doesn't happen for opaque constants and usually this isn't a problem, because opaque constants won't take this code path - they are not supposed to. In the case that the opaque constant has to be expanded by the legalizer, the legalizer would drop the opaque flag. In this case we hit the limitations of ExpandShiftByConstant and create incorrect code. This commit fixes the legalizer by not dropping the opaque flag when expanding opaque constants and adding an assertion to ExpandShiftByConstant to catch this not supported case in the future. This fixes <rdar://problem/16718472> llvm-svn: 207304	2014-04-26 02:58:04 +00:00
Quentin Colombet	ea18933d97	[X86] Implement TargetLowering::getScalingFactorCost hook. Scaling factors are not free on X86 because every "complex" addressing mode breaks the related instruction into 2 allocations instead of 1. <rdar://problem/16730541> llvm-svn: 207301	2014-04-26 01:11:26 +00:00
Filipe Cabecinhas	d71f110fe9	Appease the almighty buildbots. llvm-svn: 207295	2014-04-26 00:02:37 +00:00
Filipe Cabecinhas	363b570d2a	Optimization for certain shufflevector by using insertps. Summary: If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower certain shufflevectors to an insertps instruction: When most of the shufflevector result's elements come from one vector (and keep their index), and one element comes from another vector or a memory operand. Added tests for insertps optimizations on shufflevector. Added support and tests for v4i32 vector optimization. Reviewers: nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3475 llvm-svn: 207291	2014-04-25 23:51:17 +00:00
Saleem Abdulrasool	99f0d458c3	ARM: remove @llvm.arm.sevl This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic which provides a generic, extensible manner for adding hint instructions. This functionality can now be represented as @llvm.arm.hint(i32 5). llvm-svn: 207246	2014-04-25 17:51:25 +00:00
Saleem Abdulrasool	7e7c2f9ca6	ARM: provide a new generic hint intrinsic Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into the instruction stream. This is particularly useful for generating IR from a compiler where the user may inject an intrinsic (e.g. __yield). These are then pattern substituted into the correct instruction which already existed. llvm-svn: 207242	2014-04-25 17:24:24 +00:00

1 2 3 4 5 ...

9673 Commits