clang-p2996

Author	SHA1	Message	Date
Dan Gohman	8887d1faed	[WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify. Move RegStackify after coalescing and teach it to use LiveIntervals instead of depending on SSA form. This avoids a problem where a register in a COPY instruction is stackified and then subsequently coalesced with a register that is not stackified. This also puts it after the scheduler, which allows us to simplify the EXPR_STACK constraint, as we no longer have instructions being reordered after stackification and before coloring. llvm-svn: 256402	2015-12-25 00:31:02 +00:00
Elena Demikhovsky	9e225a2f52	AVX-512: Kreg set 0/1 optimization The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365	2015-12-24 08:12:22 +00:00
Igor Breger	268f6f53c5	AVX512: VPMOVM2B/W/D/Q intrinsic implementation. Differential Revision: http://reviews.llvm.org//D15747 llvm-svn: 256364	2015-12-24 07:11:53 +00:00
JF Bastien	3e9f10ad3d	WebAssembly: remove 'external' from test Summary: Linker testing was sad at seeing an unresolved external symbol. For now don't do that: it's valid but we're not playing with multi-file linking yet, and the LLVM tests are used as hacky sanity tests for single-file linking (the GCC torture tests are much better for this purpose). Another solution would be to use '.extern' to make the intent explicit (don't simple-file link this, there's an unresolved symbol), some assemblers use '.extern' while others ignore it, so we wouldn't really be inventing anything new. Reviewers: sunfish, kripken Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D15753 llvm-svn: 256353	2015-12-23 23:56:13 +00:00
Philip Reames	cb0f947a2a	[Statepoints] Use Indirect operands for spill slots Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot. To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc). Differential Revision: http://reviews.llvm.org/D15759 llvm-svn: 256352	2015-12-23 23:44:28 +00:00
Simon Pilgrim	17377bdd45	[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151. As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops. Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well. Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions). Differential Revision: http://reviews.llvm.org/D15477 llvm-svn: 256332	2015-12-23 13:10:07 +00:00
Igor Breger	7b46b4e798	AVX512BW: Enable packed word shift for 512bit vector. Enable lowering scalar immidiate shift v64i8 .Fix predicate for AVX1/2 shifts. Differential Revision: http://reviews.llvm.org/D15713 llvm-svn: 256324	2015-12-23 08:06:50 +00:00
David Majnemer	c640f863e0	[WinEH] Don't visit the same catchswitch twice We visited the same catchswitch twice because it was both the child of another funclet and the predecessor of a cleanuppad. Instead, change the numbering algorithm to only recurse if the unwind destination of the inner funclet agrees with the unwind destination of the catchswitch. This fixes PR25926. llvm-svn: 256317	2015-12-23 03:59:04 +00:00
Changpeng Fang	b41574a961	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282	2015-12-22 20:55:23 +00:00
Rafael Espindola	4b0d24c00a	Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA" This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275	2015-12-22 19:46:44 +00:00
Changpeng Fang	9b8a9be058	AMDGPU/SI: Use flat for global load/store when targeting HSA Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273	2015-12-22 19:32:28 +00:00
Jun Bum Lim	6755c3bc5f	[AArch64] Promote loads from stored This is a recommit of r256004 which was reverted in r256160. The issue was the incorrect promotion for half and byte loads transformed into mov instructions. This fix will replace half and byte type loads only with bit field extracts. Original commit message: This change promotes load instructions which directly read from stored by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256249	2015-12-22 16:36:16 +00:00
Asaf Badouh	13ffa4bf7c	[X86][AVX512] Add rcp14 and rsqrt14 intrinsics Differential Revision: http://reviews.llvm.org/D15414 llvm-svn: 256237	2015-12-22 11:40:04 +00:00
Keno Fischer	4eccf11373	[ASMPrinter] Fix missing handling of DW_OP_bit_piece In r256077, I added printing for DIExpressions in DEBUG_VALUE comments, but neglected to handle DW_OP_bit_piece operands. Thanks to Mikael Holmen and Joerg Sonnenberger for spotting this. llvm-svn: 256236	2015-12-22 07:14:50 +00:00
David Majnemer	ff1d084aa2	[MC] Don't use the architecture to govern which object file format to use InitMCObjectFileInfo was trying to override the triple in awkward ways. For example, a triple specifying COFF but not Windows was forced as ELF. This makes it easy for internal invariants to get violated, such as those which triggered PR25912. This fixes PR25912. llvm-svn: 256226	2015-12-22 01:39:04 +00:00
Jun Bum Lim	a23e5f7516	Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst This is recommit of r256028 with minor fixes in unittests: CodeGen/Mips/eh.ll CodeGen/Mips/insn-zero-size-bb.ll Original commit message: When identifying blocks post-dominated by an unreachable-terminated block in BranchProbabilityInfo, consider only the edge to the normal destination block if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst. llvm-svn: 256202	2015-12-21 22:00:51 +00:00
Cong Hou	8df93ce455	[X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 llvm-svn: 256194	2015-12-21 20:42:43 +00:00
Adrian Prantl	6b44541015	Convert the CodeGen/ARM/sched-it-debug-nodes.ll testcase from IR -> MIR. NFC PR24563 llvm-svn: 256187	2015-12-21 19:44:42 +00:00
Adrian Prantl	5d9acc2443	Teach ARMLoadStoreOptimizer to ignore DBG_VALUE instructions when merging instructions. As noted in PR24563. rdar://problem/23963293 llvm-svn: 256183	2015-12-21 19:25:03 +00:00
Matthew Simpson	11c4de6054	[AArch64] Add additional extract-extend patterns for smov This patch adds to the target description two additional patterns for matching extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and v8i16-to-i64 cases. The existing patterns miss these cases because the extracted elements must first be legalized to i32, resulting in any_extend nodes. This was originally implemented as a DAG combine (r255895), but was reverted due to failing out-of-tree tests. llvm-svn: 256176	2015-12-21 18:31:25 +00:00
Jun Bum Lim	4bb171c8da	Revert "[AArch64] Promote loads from stores" This reverts commit r256004 due to a failure in cortex-a53. llvm-svn: 256160	2015-12-21 15:36:49 +00:00
Chad Rosier	d016574df8	[AArch64] Enable PostRAScheduler for AArch64 generic build. Disable post-ra scheduler for perturbed tests to appease the bots and to preserve the history of the tests. http://reviews.llvm.org/D15652 llvm-svn: 256158	2015-12-21 14:43:45 +00:00
Igor Breger	44b60a3687	AVX512BW: Enable AND/OR/XOR vector byte/word paked operation by promoting to qword that natively suppored. llvm-svn: 256157	2015-12-21 14:40:36 +00:00
Amjad Aboud	60b5e1b6c0	Implemented Support of IA interrupt and exception handlers: http://lists.llvm.org/pipermail/cfe-dev/2015-September/045171.html Differential Revision: http://reviews.llvm.org/D15567 llvm-svn: 256155	2015-12-21 14:07:14 +00:00
Craig Topper	074e845260	[X86] Prevent constant hoisting for a couple compare immediates that the selection DAG knows how to optimize into a shift. This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code. Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases. llvm-svn: 256126	2015-12-20 18:41:54 +00:00
Weiming Zhao	613c6862fa	Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions for Thumb2 Summary: r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb. r250697: http://reviews.llvm.org/rL250697 Reviewers: rmaprath Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D15653 llvm-svn: 256115	2015-12-20 06:41:44 +00:00
JF Bastien	374ea4bda5	WebAssembly: add vtable test The test will mainly be useful to check that the .s file assembles and relocates properly because vtables reference functions in their data section. llvm-svn: 256102	2015-12-19 18:55:18 +00:00
Keno Fischer	f7346e0a6c	Hopefully fix debug-info-blocks.ll test on win32 bot llc_dwarf adds an mtriple, which forces this to use COFF, causing the test to fail. Hopefully using regular llc without the triple will work fine everywhere llvm-svn: 256084	2015-12-19 03:32:23 +00:00
Keno Fischer	00cbf9a69a	Clean up the processing of dbg.value in various places Summary: First up is instcombine, where in the dbg.declare -> dbg.value conversion, the llvm.dbg.value needs to be called on the actual loaded value, rather than the address (since the whole point of this transformation is to be able to get rid of the alloca). Further, now that that's cleaned up, we can remove a hack in the backend, that would add an implicit OP_deref if the argument to dbg.value was an alloca. This stems from before the existence of DIExpression and is no longer necessary since the deref can be expressed explicitly. Now, in order to make sure that the tests pass with this change, we need to correct the printing of DEBUG_VALUE comments to take into account the expression, which wasn't taken into account before. Unfortunately, for both these changes, there were a number of incorrect test cases (mostly the wrong number of DW_OP_derefs, but also a couple where the test itself was broken more badly). aprantl and I have gone through and adjusted these test case in order to make them pass with these fixes and in some cases to make sure they're actually testing what they are meant to test. Reviewers: aprantl Subscribers: dsanders Differential Revision: http://reviews.llvm.org/D14186 llvm-svn: 256077	2015-12-19 02:02:44 +00:00
Matt Arsenault	2aed6ca1d3	AMDGPU: Switch barrier intrinsics to using convergent noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075	2015-12-19 01:46:41 +00:00
Matt Arsenault	10a509292c	Fix broken type legalization of min/max This was using an anyext when promoting the type when zext/sext is required. llvm-svn: 256074	2015-12-19 01:39:48 +00:00
Nicolai Haehnle	dd58705af6	AMDGPU: fix overlapping copies in copyPhysReg Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072	2015-12-19 01:16:06 +00:00
Rafael Espindola	708a91a103	Revert "Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst" This reverts commit r256028. It broke: LLVM :: CodeGen/Mips/eh.ll LLVM :: CodeGen/Mips/insn-zero-size-bb.ll llvm-svn: 256032	2015-12-18 21:23:32 +00:00
Jun Bum Lim	51a247065e	Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst When identifying blocks post-dominated by an unreachable-terminated block in BranchProbabilityInfo, consider only the edge to the normal destination block if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst. llvm-svn: 256028	2015-12-18 20:53:47 +00:00
Krzysztof Parzyszek	21dc8bdd9e	[Hexagon] Add PIC support llvm-svn: 256025	2015-12-18 20:19:30 +00:00
Jun Bum Lim	3509d64c24	[AArch64] Promote loads from stores This change promotes load instructions which directly read from stores by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256004	2015-12-18 18:08:30 +00:00
Hans Wennborg	a6a2e512cf	[X86] Use push-pop for materializing small constants under 'minsize' Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936	2015-12-17 23:18:39 +00:00
Matthew Simpson	13dddb0799	Revert "[AArch64] Add DAG combine for extract extend pattern" This reverts commit r255895. The patch breaks internal tests. Reverting until a fix is ready. llvm-svn: 255928	2015-12-17 21:29:47 +00:00
Dan Gohman	670a60ed52	[WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF. llvm-svn: 255925	2015-12-17 20:50:45 +00:00
Tom Stellard	caaa3aa07c	AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908	2015-12-17 17:05:09 +00:00
Matthew Simpson	4355e404d5	[AArch64] Add DAG combine for extract extend pattern This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) -> (extract_vector_elt v, i). The combine enables us to better match some SMOV patterns. Differential Revision: http://reviews.llvm.org/D15515 llvm-svn: 255895	2015-12-17 14:30:55 +00:00
Alexey Bataev	7b72b658cc	[X86] Add option for enabling LEA optimization pass, by Andrey Turetsky Add option to enable/disable LEA optimization pass. By default the pass is disabled. Differential Revision: http://reviews.llvm.org/D15573 llvm-svn: 255881	2015-12-17 07:34:39 +00:00
Cong Hou	b9e8d483b5	Fix PR25838. This is a quick fix to PR25838. The issue comes from the restriction that we cannot normalize probabilities containing both known and unknown ones. A patch that removes this restriction is under the review now: http://reviews.llvm.org/D15548 llvm-svn: 255867	2015-12-17 01:29:08 +00:00
Dan Gohman	4172953813	[WebAssembly] Fix legalization of shift operators on large integer types. llvm-svn: 255847	2015-12-16 23:25:51 +00:00
Derek Schuff	8bb5f2927a	[WebAssembly] Implement eliminateCallFramePseudo Summary: Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN pseudo-instructions. Add a test calling a vararg function which causes non-0 adjustments. This revealed an issue with RegisterCoalescer wherein it eliminates a COPY from SP32 to a vreg but failes to update the live ranges of EXPR_STACK, causing a machineinstr verifier failure (so this test is commented out). Also add a dynamic alloca test, which causes a callseq_end dag node with a 0 (instead of undef) second argument to be generated. We currently fail to select that, so adjust the ADJCALLSTACKUP tablegen code to handle it. Differential Revision: http://reviews.llvm.org/D15587 llvm-svn: 255844	2015-12-16 23:21:30 +00:00
Eric Christopher	bfba572425	Fix funciton->function typo. llvm-svn: 255841	2015-12-16 23:10:53 +00:00
Manman Ren	cbe4f9417d	CXX_FAST_TLS calling convention: performance improvement for AArch64. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. The target independent portion was committed as r255353. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15341 llvm-svn: 255821	2015-12-16 21:04:19 +00:00
Derek Schuff	45cd5a79b2	[WebAssembly] Print an extra local decl when the user stack pointer is used Differential Revision: http://reviews.llvm.org/D15546 llvm-svn: 255815	2015-12-16 20:43:06 +00:00
Dan Gohman	b3aa1ecab0	[WebAssembly] Fix the CFG Stackifier to handle unoptimized branches If a branch both branches to and falls through to the same block, treat it as an explicit branch. llvm-svn: 255803	2015-12-16 19:06:41 +00:00
Dan Gohman	e2831b4e27	[WebAssembly] Use the new offset syntax for memory operands in inline asm. llvm-svn: 255788	2015-12-16 18:14:49 +00:00

1 2 3 4 5 ...

14508 Commits