clang-p2996

Author	SHA1	Message	Date
Devang Patel	94ad6ac13c	Fix DWARF description of Q registers. llvm-svn: 129952	2011-04-21 23:22:35 +00:00
Devang Patel	3712c14be9	Fix DWARF description of S registers. llvm-svn: 129947	2011-04-21 22:48:26 +00:00
Devang Patel	be22131c28	Test case for r129922 llvm-svn: 129934	2011-04-21 20:16:43 +00:00
Evan Cheng	5f1ba4cd2d	Remove -use-divmod-libcall. Let targets opt in when they are available. llvm-svn: 129884	2011-04-20 22:20:12 +00:00
Eric Christopher	bcaedb5ce0	Rewrite the expander for umulo/smulo to remember to sign extend the input manually and pass all (now) 4 arguments to the mul libcall. Add a new ExpandLibCall for just this (copied gratuitously from type legalization). Fixes rdar://9292577 llvm-svn: 129842	2011-04-20 01:19:45 +00:00
Daniel Dunbar	4a7783b0c2	CodeGen: Eliminate a use of getDarwinMajorNumber(). - There is a minor semantic change here (evidenced by the test change) for Darwin triples that have no version component. I debated changing the default behavior of isOSVersionLT, but decided it made more sense for triples to be explicit. llvm-svn: 129802	2011-04-19 20:32:39 +00:00
Bob Wilson	0858c3aaed	This patch combines several changes from Evan Cheng for rdar://8659675. Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Enable these fp vmlx codegen changes for Cortex-A9. llvm-svn: 129775	2011-04-19 18:11:57 +00:00
Bob Wilson	d04a83f8f2	Add -mcpu=cortex-a9-mp. It's cortex-a9 with MP extension. rdar://8648637. llvm-svn: 129774	2011-04-19 18:11:52 +00:00
Bob Wilson	a2881ee8a4	Avoid some 's' 16-bit instruction which partially update CPSR (and add false dependency) when it isn't dependent on last CPSR defining instruction. rdar://8928208 llvm-svn: 129773	2011-04-19 18:11:49 +00:00
Bob Wilson	df612ba006	Avoid write-after-write issue hazards for Cortex-A9. Add a avoidWriteAfterWrite() target hook to identify register classes that suffer from write-after-write hazards. For those register classes, try to avoid writing the same register in two consecutive instructions. This is currently disabled by default. We should not spill to avoid hazards! The command line flag -avoid-waw-hazard can be used to enable waw avoidance. llvm-svn: 129772	2011-04-19 18:11:45 +00:00
Jakob Stoklund Olesen	fb1249548f	Tighten test case a bit. Ideally, we would match an S-register to its containing D-register, but that requires arithmetic (divide by 2). llvm-svn: 129756	2011-04-19 06:14:45 +00:00
Jakob Stoklund Olesen	bf78618db6	Make tests register allocation independent again. llvm-svn: 129739	2011-04-19 00:14:43 +00:00
Evan Cheng	4079133796	Do not lose mem_operands while lowering VLD / VST intrinsics. llvm-svn: 129738	2011-04-19 00:04:03 +00:00
Eric Christopher	c37aa0b26a	Fix a bug where we were counting the alias sets as completely used registers for fast allocation a different way. This has us updating used registers only when we're using that exact register. Fixes rdar://9207598 llvm-svn: 129711	2011-04-18 19:26:25 +00:00
Evan Cheng	b14ce09fca	Fix divmod libcall lowering. Convert to {S\|U}DIVREM first and then expand the node to a libcall. rdar://9280991 llvm-svn: 129633	2011-04-16 03:08:26 +00:00
Cameron Zwarich	9c65e4d69c	Add ORR and EOR to the CMP peephole optimizer. It's hard to get isel to generate a case involving EOR, so I only added a test for ORR. llvm-svn: 129610	2011-04-15 21:24:38 +00:00
Cameron Zwarich	0829b3065a	The AND instruction leaves the V flag unmodified, so it falls victim to the same problem as all of the other instructions we fold with CMPs. llvm-svn: 129602	2011-04-15 20:45:00 +00:00
Cameron Zwarich	93eae1571c	Add missing register forms of instructions to the ARM CMP-folding code. This fixes <rdar://problem/9287901>. llvm-svn: 129599	2011-04-15 20:28:28 +00:00
Evan Cheng	12bb05b75b	Fix another fcopysign lowering bug. If src is f64 and destination is f32, don't forget to right shift the source by 32 first. rdar://9287902 llvm-svn: 129556	2011-04-15 01:31:00 +00:00
Cameron Zwarich	415b5e8341	Fix a typo in an ARM-specific DAG combine. This fixes <rdar://problem/9278274>. llvm-svn: 129468	2011-04-13 21:01:19 +00:00
Cameron Zwarich	70be27e913	Fix an obvious problem with an alignment computation. AsmPrinter actually does the max itself, so it is not easy to write a test case for this, but I added a test case that would fail if the code in AsmPrinter were removed. llvm-svn: 129432	2011-04-13 09:02:43 +00:00
Cameron Zwarich	cdf59f7016	If a global variable has a specified alignment that is less than the preferred alignment for its type, use the minimum of the specified alignment and the ABI alignment. This fixes <rdar://problem/9275290>. llvm-svn: 129428	2011-04-13 06:03:16 +00:00
Andrew Trick	b53a00d2cb	Recommit r129383. PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency. Additional fixes: Do something reasonable for subtargets with generic itineraries by handle node latency the same as for an empty itinerary. Now nodes default to unit latency unless an itinerary explicitly specifies a zero cycle stage or it is a TokenFactor chain. Original fixes: UnitsSharePred was a source of randomness in the scheduler: node priority depended on the queue data structure. I rewrote the recent VRegCycle heuristics to completely replace the old heuristic without any randomness. To make the ndoe latency adjustments work, I also needed to do something a little more reasonable with TokenFactor. I gave it zero latency to its consumers and always schedule it as low as possible. llvm-svn: 129421	2011-04-13 00:38:32 +00:00
Eric Christopher	28f4c729f7	Temporarily revert r129408 to see if it brings the bots back. llvm-svn: 129417	2011-04-13 00:20:59 +00:00
Eric Christopher	d829f43c06	Fix a bug where we were counting the alias sets as completely used registers for fast allocation. Fixes rdar://9207598 llvm-svn: 129408	2011-04-12 23:23:14 +00:00
Andrew Trick	1b60ad6644	Revert 129383. It causes some targets to hit a scheduler assert. llvm-svn: 129385	2011-04-12 20:14:07 +00:00
Andrew Trick	c5dd24a542	PreRA scheduler heuristic fixes: VRegCycle, TokenFactor latency. UnitsSharePred was a source of randomness in the scheduler: node priority depended on the queue data structure. I rewrote the recent VRegCycle heuristics to completely replace the old heuristic without any randomness. To make these heuristic adjustments to node latency work, I also needed to do something a little more reasonable with TokenFactor. I gave it zero latency to its consumers and always schedule it as low as possible. llvm-svn: 129383	2011-04-12 19:54:36 +00:00
Cameron Zwarich	fbcd69b96a	Split a store of a VMOVDRR into two integer stores to avoid mixing NEON and ARM stores of arguments in the same cache line. This fixes the second half of <rdar://problem/8674845>. llvm-svn: 129345	2011-04-12 02:24:17 +00:00
Evan Cheng	ef42bea704	Look pass copies when determining whether hoisting would end up inserting more copies. rdar://9266679 llvm-svn: 129297	2011-04-11 21:09:18 +00:00
Chris Lattner	ea6afab4b0	remove a bunch of CHECK lines that aren't checking what they thought they were, because alternation was expanding wrong in {{}}'s. llvm-svn: 129194	2011-04-09 06:31:06 +00:00
Chris Lattner	1c42a4d159	don't test for codegen of 'store undef' llvm-svn: 129184	2011-04-09 02:31:26 +00:00
Evan Cheng	74d92c1924	Change -arm-trap-func= into a non-arm specific option. Now Intrinsic::trap is lowered into a call to the specified trap function at sdisel time. llvm-svn: 129152	2011-04-08 21:37:21 +00:00
Evan Cheng	9a3f2772f0	Add option to emit @llvm.trap as a function call instead of a trap instruction. rdar://9249183. llvm-svn: 129107	2011-04-07 20:31:12 +00:00
Andrew Trick	2ad0b37318	Added a check in the preRA scheduler for potential interference on a induction variable. The preRA scheduler is unaware of induction vars, so we look for potential "virtual register cycles" instead. Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing llvm-svn: 129100	2011-04-07 19:54:57 +00:00
Tanya Lattner	266792a55a	Prevent ARM DAG Combiner from doing an AND or OR combine on an illegal vector type (vectors of size 3). Also included test cases. llvm-svn: 129074	2011-04-07 15:24:20 +00:00
Evan Cheng	a7c7b54dde	Change -arm-divmod-libcall to a target neutral option. llvm-svn: 129045	2011-04-07 00:58:44 +00:00
Owen Anderson	bdff1c997a	Teach the ARM peephole optimizer that RSB, RSC, ADC, and SBC can be used for folded comparisons, just like ADD and SUB. llvm-svn: 129038	2011-04-06 23:35:59 +00:00
Jakob Stoklund Olesen	1ec41e2bd9	These tests no longer require linear scan because reserved register coalescing is now universal. llvm-svn: 128936	2011-04-05 21:40:41 +00:00
Johnny Chen	293875ef55	Fix test-llvm failures. llvm-svn: 128906	2011-04-05 18:41:40 +00:00
Eric Christopher	f392a69ff7	Fix up testcase for previous commit. llvm-svn: 128870	2011-04-05 00:56:01 +00:00
Cameron Zwarich	6fe5c29430	Do some peephole optimizations to remove pointless VMOVs from Neon to integer registers that arise from argument shuffling with the soft float ABI. These instructions are particularly slow on Cortex A8. This fixes one half of <rdar://problem/8674845>. llvm-svn: 128759	2011-04-02 02:40:43 +00:00
Jim Grosbach	360c369967	LDRD/STRD instructions should print both Rt and Rt2 in the asm string. llvm-svn: 128736	2011-04-01 20:26:57 +00:00
Evan Cheng	a6a992a662	Add test case. llvm-svn: 128707	2011-04-01 06:27:25 +00:00
Evan Cheng	0f86d6de50	FileCheck'ify test. llvm-svn: 128706	2011-04-01 03:36:33 +00:00
Jakob Stoklund Olesen	0888bcf542	Fix ARM tests to be register allocator independent. llvm-svn: 128680	2011-03-31 22:14:03 +00:00
Evan Cheng	38bf5adcea	Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier accumulator forwarding: vadd d3, d0, d1 vmul d3, d3, d2 => vmul d3, d0, d2 vmla d3, d1, d2 llvm-svn: 128665	2011-03-31 19:38:48 +00:00
Jakob Stoklund Olesen	ae044c06bf	Pick a conservative register class when creating a small live range for remat. The rematerialized instruction may require a more constrained register class than the register being spilled. In the test case, the spilled register has been inflated to the DPR register class, but we are rematerializing a load of the ssub_0 sub-register which only exists for DPR_VFP2 registers. The register class is reinflated after spilling, so the conservative choice is only temporary. llvm-svn: 128610	2011-03-31 03:54:44 +00:00
Cameron Zwarich	53dd03d537	Add a ARM-specific SD node for VBSL so that forms with a constant first operand can be recognized. This fixes <rdar://problem/9183078>. llvm-svn: 128584	2011-03-30 23:01:21 +00:00
Evan Cheng	18381b4257	Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends was lowering them to sext / uxt + mul instructions. Unfortunately the optimization passes may hoist the extensions out of the loop and separate them. When that happens, the long multiplication instructions can be broken into several scalar instructions, causing significant performance issue. Note the vmla and vmls intrinsics are not added back. Frontend will codegen them as intrinsics vmull* + add / sub. Also note the isel optimizations for catching mul + sext / zext are not changed either. First part of rdar://8832507, rdar://9203134 llvm-svn: 128502	2011-03-29 23:06:19 +00:00
Cameron Zwarich	143f9aea2b	Add Neon SINT_TO_FP and UINT_TO_FP lowering from v4i16 to v4f32. Fixes <rdar://problem/8875309> and <rdar://problem/9057191>. llvm-svn: 128492	2011-03-29 21:41:55 +00:00

1 2 3 4 5 ...

889 Commits