clang-p2996

Author	SHA1	Message	Date
Tony Jiang	60c247de18	[PowerPC] Fix a performance bug for PPC::XXPERMDI. There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance. Differential Revision: https://reviews.llvm.org/D33404 llvm-svn: 304298	2017-05-31 13:09:57 +00:00
Nemanja Ivanovic	accab033c9	[PowerPC] Eliminate integer compare instructions - vol. 3 This patch builds upon https://reviews.llvm.org/rL302810 to add handling for the 64-bit SETEQ patterns. Differential Revision: https://reviews.llvm.org/D33369 llvm-svn: 304286	2017-05-31 08:04:07 +00:00
Nemanja Ivanovic	e597bd8230	[PowerPC] Eliminate integer compare instructions - vol. 2 This patch builds upon https://reviews.llvm.org/rL302810 to add handling for bitwise logical operations in general purpose registers. The idea is to keep the values in GPRs as long as possible - only extracting them to a condition register bit when no further operations are to be done. Differential Revision: https://reviews.llvm.org/D31851 llvm-svn: 304282	2017-05-31 05:40:25 +00:00
Tim Shen	0bd0aa8f07	[AntiDepBreaker] Revert r299124 and add a test. Summary: AntiDepBreaker intends to add all live-outs, including the implicit CSRs, in StartBlock. r299124 was done without understanding that intention. Now with the live-ins propagated correctly (D32464), we can revert this change. Reviewers: MatzeB, qcolombet Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D33697 llvm-svn: 304251	2017-05-30 22:26:52 +00:00
Matthias Braun	eec1f3672a	LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI Re-commit r303938 and r303954 with a fix for addLiveIns(): the internal addPristines() function must be called on an empty set or it may accidentally reset saved registers. - addLiveOutsNoPristines() needs to add callee saved registers that are actually saved and restored somewhere to the set (they are not pristine). - Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines(). This fixes the problem from D32156. Differential Revision: https://reviews.llvm.org/D32464 llvm-svn: 304001	2017-05-26 16:23:08 +00:00
Matthias Braun	c93c063993	Revert "LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI" Tentatively revert this to see if it fixes the buildbot stage2 breakages. This reverts commit r303938. This reverts commit r303954. llvm-svn: 303960	2017-05-26 02:25:20 +00:00
Matthias Braun	9e6826de77	Test for r303938 llvm-svn: 303954	2017-05-26 01:29:25 +00:00
Tim Shen	a2b85da879	[PPC] Fix atomics lowering in DAG lowering. I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931	2017-05-25 22:58:35 +00:00
Tony Jiang	0a429f040e	[PowerPC] Fix a performance bug for PPC::XXSLDWI. There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822	2017-05-24 23:48:29 +00:00
Zaara Syeda	932978315b	P9: D-form vector load/store. Differential Revision: https://reviews.llvm.org/D33248 llvm-svn: 303780	2017-05-24 17:50:37 +00:00
Hiroshi Inoue	37e63b1b21	Summary PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass. This patch improves this optimization to eliminate more compare instructions in two types of common case. - comparison against a constant 1 or -1 The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant. This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0". With this patch, compare can be eliminated in the following code sequence, as an example. uint64_t a, b; if ((a \| b) & 0x8000000000000000ull) { ... } else { ... } - andi for 32-bit comparison on PPC64 Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check. %1 = and i32 %a, 10 %2 = icmp ne i32 %1, 0 br i1 %2, label %foo, label %bar In this simple example, LLVM generates andi. + cmplwi + beq on PPC64. This patch make it possible to eliminate the cmplwi for this case. I added andi. for optimization targets if it is safe to do so. Differential Revision: https://reviews.llvm.org/D30081 llvm-svn: 303500	2017-05-21 06:00:05 +00:00
Kyle Butt	f6c61ef64d	CodeGen: Power: Add lowering for shifts of v1i128. When legalizing vector operations on vNi128, they will be split to v1i128 because that is a legal type on ppc64, but then the compiler will crash in selection dag because it fails to select for these operations. This patch fixes shift operations. Logical shift right and left shift can be performed in the vector unit, but algebraic shift right requires being split. Differential Revision: https://reviews.llvm.org/D32774 llvm-svn: 303307	2017-05-17 21:54:41 +00:00
Krzysztof Parzyszek	2b0533126e	[PPC] Properly update register save area offsets The variables MinGPR/MinG8R were not updated properly when resetting the offsets, which in the included testcase lead to saving the CR register in the same location as R30. This fixes another issue reported in PR26519. Differential Revision: https://reviews.llvm.org/D33017 llvm-svn: 303257	2017-05-17 13:25:09 +00:00
Tim Shen	0fbbef43e0	[PPC] Add -ppc-asm-full-reg-names to atomic-2.ll. NFC. Differential Revisions: https://reviews.llvm.org/D32763 llvm-svn: 303209	2017-05-16 20:58:55 +00:00
Tim Shen	3bef27cc6f	[PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync. Summary: This fixes pr32392. The lowering pipeline is: llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in expandPostRAPseudo. The reason why expandPostRAPseudo is chosen is because previous passes are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne- 7, .+4 (some branch pass(s)). Differential Revision: https://reviews.llvm.org/D32763 llvm-svn: 303205	2017-05-16 20:18:06 +00:00
Nirav Dave	da8f221273	Elide stores which are overwritten without being observed. Summary: In SelectionDAG, when a store is immediately chained to another store to the same address, elide the first store as it has no observable effects. This is causes small improvements dealing with intrinsics lowered to stores. Test notes: * Many testcases overwrite store addresses multiple times and needed minor changes, mainly making stores volatile to prevent the optimization from optimizing the test away. * Many X86 test cases optimized out instructions associated with associated with va_start. * Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has dependencies to check and can probably be removed and potentially replaced with another test. Reviewers: rnk, john.brawn Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33206 llvm-svn: 303198	2017-05-16 19:43:56 +00:00
Kyle Butt	7d531daece	CodeGen: BlockPlacement: Increase tail duplication size for O3. At O3 we are more willing to increase size if we believe it will improve performance. The current threshold for tail-duplication of 2 instructions is conservative, and can be relaxed at O3. Benchmark results: llvm test-suite: 6% improvement in aha, due to duplication of loop latch 3% improvement in hexxagon 2% slowdown in lpbench. Seems related, but couldn't completely diagnose. Internal google benchmark: Produces 4% improvement on internal google protocol buffer serialization benchmarks. Differential-Revision: https://reviews.llvm.org/D32324 llvm-svn: 303084	2017-05-15 17:30:47 +00:00
Guozhi Wei	22e7da9597	[PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0 According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0. This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified. Differential Revision: https://reviews.llvm.org/D32880 llvm-svn: 302834	2017-05-11 22:17:35 +00:00
Nemanja Ivanovic	96c3d626a2	[PowerPC] Eliminate integer compare instructions - vol. 1 This patch is the first in a series of patches to provide code gen for doing compares in GPRs when the compare result is required in a GPR. It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64 extensions. This first patch handles equality comparison on i32 operands with the result sign or zero extended. Differential Revision: https://reviews.llvm.org/D31847 llvm-svn: 302810	2017-05-11 16:54:23 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Krzysztof Parzyszek	038a0546db	[PPC] When restoring R30 (PIC base pointer), mark it as <def> This happened on the PPC32/SVR4 path and was discovered when building FreeBSD on PPC32. It was a typo-class error in the frame lowering code. This fixes PR26519. llvm-svn: 302183	2017-05-04 19:14:54 +00:00
Tim Shen	e59d06fe78	[PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction Summary: This is the corresponding llvm change to D28037 to ensure no performance regression. Reviewers: bogner, kbarton, hfinkel, iteratee, echristo Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28329 llvm-svn: 301990	2017-05-03 00:07:02 +00:00
Nemanja Ivanovic	b89c27f515	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE Fixes PR30730. This is a re-commit of a pulled commit. The commit was pulled because some software projects contained uses of Altivec vectors that violated alignment requirements. Known issues have now been fixed. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D26861 llvm-svn: 301892	2017-05-02 01:47:34 +00:00
Sanjoy Das	ba0daee6b2	[StackMaps] Increase the size of the "location size" field Summary: In some cases LLVM (especially the SLP vectorizer) will create vectors that are 256 bytes (or larger). Given that this is intentional[0] is likely to get more common, this patch updates the StackMap binary format to deal with the spill locations for said vectors. This change also bumps the stack map version from 2 to 3. [0]: https://reviews.llvm.org/D32533#738350 Reviewers: reames, kavon, skatkov, javed.absar Subscribers: mcrosier, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D32629 llvm-svn: 301615	2017-04-28 04:48:42 +00:00
Adrian Prantl	083e6a5b5c	Don't emit CFI instructions at the end of a function When functions are terminated by unreachable instructions, the last instruction might trigger a CFI instruction to be generated. However, emitting it would be be illegal since the function (and thus the FDE the CFI is in) has already ended with the previous instruction. Darwin's dwarfdump --verify --eh-frame complains about this and the specification supports this. Relevant bits from the DWARF 5 standard (6.4 Call Frame Information): "[The] address_range [field in an FDE]: The number of bytes of program instructions described by this entry." "Row creation instructions: [...] The new location value is always greater than the current one." The first quotation implies that a CFI cannot describe a target address outside of the enclosing FDE's range. rdar://problem/26244988 Differential Revision: https://reviews.llvm.org/D32246 llvm-svn: 301219	2017-04-24 18:45:59 +00:00
Sanjay Patel	ae382bb6af	[DAG] add splat vector support for 'xor' in SimplifyDemandedBits This allows forming more 'not' ops, so we get improvements for ISAs that have and-not. Follow-up to: https://reviews.llvm.org/rL300725 llvm-svn: 300763	2017-04-19 21:23:09 +00:00
Sanjay Patel	c9d36f181f	[PowerPC] add test and auto-generate checks; NFC llvm-svn: 300700	2017-04-19 14:58:09 +00:00
Hal Finkel	cef9e52736	[PowerPC] multiply-with-overflow might use the CTR register Check the legality of ISD::[US]MULO to see whether Intrinsic::[us]mul_with_overflow will legalize into a function call (and, thus, will use the CTR register). Fixes PR32485. Patch by Tim Neumann! Differential Revision: https://reviews.llvm.org/D31790 llvm-svn: 299910	2017-04-11 02:03:17 +00:00
Matt Arsenault	f10061ec70	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876	2017-04-10 20:18:21 +00:00
Eli Friedman	5fba1e53f2	Turn on -addr-sink-using-gep by default. The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723	2017-04-06 22:42:18 +00:00
Sanjay Patel	b2f1621bb1	[DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401) This is a generic combine enabled via target hook to reduce icmp logic as discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 It's likely that other targets will want to enable this hook for scalar transforms, and there are probably other patterns that can use bitwise logic to reduce comparisons. Note that we are missing an IR canonicalization for these patterns, and we will probably prefer the pair-of-compares form in IR (shorter, more likely to fold). Differential Revision: https://reviews.llvm.org/D31483 llvm-svn: 299542	2017-04-05 14:09:39 +00:00
Sanjay Patel	a4546efbc8	add/move codegen tests for and/or of setcc; NFC llvm-svn: 299396	2017-04-03 22:45:46 +00:00
Sanjay Patel	665021e7ee	[DAGCombiner] enable vector transforms for any/all {sign} bits set/clear The code already allowed vector types in via "isInteger" (which might want a more specific name), so use splat-friendly constant predicates to match those types. llvm-svn: 299304	2017-04-01 15:05:54 +00:00
Sanjay Patel	fe9340c168	[PowerPC, x86] add vector tests for any/all {sign} bits set/clear; NFC llvm-svn: 299303	2017-04-01 14:32:18 +00:00
Sanjay Patel	34da36e74f	[DAGCombiner] add fold for 'All sign bits set?' (and (setlt X, 0), (setlt Y, 0)) --> (setlt (and X, Y), 0) We have 7 similar folds, but this one got away. The fact that the x86 test with a branch didn't change is probably a separate bug. We may also be missing this and the related folds in instcombine. llvm-svn: 299252	2017-03-31 20:28:06 +00:00
Sanjay Patel	a97d36fdda	[PowerPC] add tests for setcc+setcc+logic; NFC These are the same tests added for x86 with r299238, but PPC doesn't specify all branches as cheap, so we see different patterns in tests with branches. llvm-svn: 299244	2017-03-31 18:51:03 +00:00
Eric Christopher	9fd267c221	Temporarily revert "[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64" as it's causing test failures, I've given Carrot a testcase offline. This reverts commit r298955. llvm-svn: 299153	2017-03-31 02:16:54 +00:00
Eric Christopher	bf8f759305	Add testcase for r299124. Patch by Tim Shen! llvm-svn: 299125	2017-03-30 22:35:10 +00:00
Adam Nemet	edaec6de73	[DAGCombiner] Initial support for the fast-math flag contract Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD can also use the per operation FMF to allow fusion. The idea here is not to port everything to the new scheme (e.g. fused multiply-and-sub will be ported later) but that this work all the way from clang. The transformation is conditionalized on both the FADD and the FMUL having the FMF contract flag. Differential Revision: https://reviews.llvm.org/D31169 llvm-svn: 299096	2017-03-30 18:53:04 +00:00
Sanjay Patel	b8a728f993	[CodeGen] clean up and add tests for scalar and-of-setcc; NFC https://bugs.llvm.org/show_bug.cgi?id=32401 llvm-svn: 299034	2017-03-29 21:58:52 +00:00
Guozhi Wei	f8d40181c9	[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64 In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64. This patch fixed PR32442. Differential Revision: https://reviews.llvm.org/D31407 llvm-svn: 298955	2017-03-28 22:55:01 +00:00
Dehao Chen	b197d5b0a0	Fix trellis layout to avoid mis-identify triangle. Summary: For the following CFG: A->B B->C A->C If there is another edge B->D, then ABC should not be considered as triangle. Reviewers: davidxl, iteratee Reviewed By: iteratee Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D31310 llvm-svn: 298661	2017-03-23 23:28:09 +00:00
Tim Shen	ce26a45f7c	[PPC] Add generated tests for all atomic operations Summary: Add tests for all atomic operations for powerpc64le, so that all changes can be easily examined. Reviewers: kbarton, hfinkel, echristo Subscribers: mehdi_amini, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D31285 llvm-svn: 298614	2017-03-23 16:02:47 +00:00
Oren Ben Simhon	9ce0ec5dbc	CalleeSavedRegister was removed from MIR and is recalculated upon MIR parsing. llvm-svn: 298210	2017-03-19 11:18:09 +00:00
Kyle Butt	d609d6ebf0	CodeGen: BlockPlacement: Adjust test case so it covers rL297925. NFC I had ajusted the test case before when testing a chain of length 2, and then reverted it with rL296845 when I switched to 3 triangles. After running benchmarks and examining generated code at length 2 I forgot to put the test back. llvm-svn: 298000	2017-03-16 21:33:29 +00:00
Nemanja Ivanovic	ffcf0fb1cc	[PowerPC][Altivec] Add mfvrd and mffprd extended mnemonic mfvrd and mffprd are both alias to mfvrsd. This patch enables correct parsing of the aliases, but we still emit a mfvrsd. Committing on behalf of brunoalr (Bruno Rosa). Differential Revision: https://reviews.llvm.org/D29177 llvm-svn: 297849	2017-03-15 16:04:53 +00:00
Nirav Dave	54e22f33d9	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695	2017-03-14 00:34:14 +00:00
Tim Shen	c7472d912b	Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"" After inspection, it's an UB in our code base. Someone cast a var-arg function pointer to a non-var-arg one. :/ Re-commit r296771 to continue testing on the patch. Sorry for the trouble! llvm-svn: 297256	2017-03-08 02:41:35 +00:00
Tim Shen	70054bb827	Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size" This reverts commit r296771. We found some wide spread test failures internally. I'm working on a testcase. Politely revert the patch in the mean time. :) llvm-svn: 297124	2017-03-07 07:40:10 +00:00
Nemanja Ivanovic	12e67d868a	[PowerPC] Fix failure with STBRX when store is narrower than the bswap Fixes a crash caused by r296811 by truncating the input of the STBRX node when the bswap is wider than i32. Fixes https://bugs.llvm.org/show_bug.cgi?id=32140 Differential Revision: https://reviews.llvm.org/D30615 llvm-svn: 297001	2017-03-06 07:32:13 +00:00

1 2 3 4 5 ...

1456 Commits