Commit Graph

4894 Commits

Author SHA1 Message Date
Jim Grosbach
8d77bb5f06 Use regex to remove false dependencies on register allocation.
llvm-svn: 138137
2011-08-19 23:10:31 +00:00
Jim Grosbach
066e9ec1e4 Update tests.
llvm-svn: 138116
2011-08-19 22:19:48 +00:00
Jakob Stoklund Olesen
90b6018c8f Add test case for r138018.
llvm-svn: 138033
2011-08-19 04:30:24 +00:00
Akira Hatanaka
fb4161ae88 Use subword loads instead of a 4-byte load when the size of a structure (or a
piece of it) that is being passed by value is smaller than a word.

llvm-svn: 138007
2011-08-18 23:39:37 +00:00
Ivan Krasin
d7cbd4c518 FastISel: avoid function calls between the materialization of the constant and its use.
llvm-svn: 137993
2011-08-18 22:06:10 +00:00
Jim Grosbach
90103ccc05 Thumb assembly parsing and encoding for LDM instruction.
Fix base register type and canonicallize to the "ldm" spelling rather than
"ldmia." Add diagnostics for incorrect writeback token and out-of-range
registers.

llvm-svn: 137986
2011-08-18 21:50:53 +00:00
Richard Osborne
56f3b70225 Add intrinsics for SETEV, GETED, GETET.
llvm-svn: 137938
2011-08-18 13:00:48 +00:00
Bruno Cardoso Lopes
3c7d6eb64c Cleanup vector logical ops in AVX and add use int versions for simple
v2i64

llvm-svn: 137919
2011-08-18 02:11:34 +00:00
Bruno Cardoso Lopes
1a87fcb9ba Fix PR10688. Add support for spliting 256-bit vector shifts when the
shift amount is variable

llvm-svn: 137885
2011-08-17 22:12:20 +00:00
Jim Grosbach
e2a0404a69 Thumb assembly parsing and encoding for ADR.
llvm-svn: 137864
2011-08-17 20:37:40 +00:00
Bruno Cardoso Lopes
be5e987379 Introduce matching patterns for vbroadcast AVX instruction. The idea is to
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.

llvm-svn: 137810
2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes
3400825b41 Update test to not use the scalar type to splat from a load
llvm-svn: 137809
2011-08-17 02:29:15 +00:00
Bruno Cardoso Lopes
ed786a346e Now that we have a canonical way to handle 256-bit splats:
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!

llvm-svn: 137807
2011-08-17 02:29:10 +00:00
Akira Hatanaka
5360f88355 Add support for ext and ins.
llvm-svn: 137804
2011-08-17 02:05:42 +00:00
Bruno Cardoso Lopes
2e99f1b3aa Instead of always leaving the work to the generic legalizer when
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:

For this shuffle:
  shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
                <i32 1, i32 0, i32 7, i32 6>

This was expanded to:
  vextractf128  $1, %ymm1, %xmm2
  vpextrq $0, %xmm2, %rax
  vmovd %rax, %xmm1
  vpextrq $1, %xmm2, %rax
  vmovd %rax, %xmm2
  vpunpcklqdq %xmm1, %xmm2, %xmm1
  vpextrq $0, %xmm0, %rax
  vmovd %rax, %xmm2
  vpextrq $1, %xmm0, %rax
  vmovd %rax, %xmm0
  vpunpcklqdq %xmm2, %xmm0, %xmm0
  vinsertf128 $1, %xmm1, %ymm0, %ymm0
  ret

Now we get:
  vshufpd $1, %xmm0, %xmm0, %xmm0
  vextractf128  $1, %ymm1, %xmm1
  vshufpd $1, %xmm1, %xmm1, %xmm1
  vinsertf128 $1, %xmm1, %ymm0, %ymm0

llvm-svn: 137733
2011-08-16 18:21:54 +00:00
Akira Hatanaka
7d7bec5acf Add test case for r137711.
llvm-svn: 137725
2011-08-16 17:32:01 +00:00
Akira Hatanaka
2263c10946 Fix handling of double precision loads and stores when Mips1 is targeted.
Mips1 does not support double precision loads or stores, therefore two single
precision loads or stores must be used in place of these instructions. This 
patch treats double precision loads and stores as if they are legal
instructions until MCInstLowering, instead of generating the single precision
instructions during instruction selection or Prolog/Epilog code insertion.

Without the changes made in this patch, llc produces code that has the same 
problem described in r137484 or bails out when
MipsInstrInfo::storeRegToStackSlot or loadRegFromStackSlot is called before
register allocation.

llvm-svn: 137711
2011-08-16 03:51:51 +00:00
Bruno Cardoso Lopes
cbe7feeab9 Fix PR10656. It's only profitable to use 128-bit inserts and extracts
when AVX mode is one. Otherwise is just more work for the type
legalizer.

llvm-svn: 137661
2011-08-15 21:45:54 +00:00
Eric Christopher
8c5f3f7624 Fix this test to avoid leaving a temporary file behind.
llvm-svn: 137651
2011-08-15 20:55:03 +00:00
Bob Wilson
d1de7764be Expand VMOVQQQQ pseudo instructions.
Apparently we never added code to expand these pseudo instructions, and in
over a year, no one has noticed.  Our register allocator must be awesome!

llvm-svn: 137551
2011-08-13 05:14:55 +00:00
Bruno Cardoso Lopes
f15dfe5818 The VPERM2F128 is a AVX instruction which permutes between two 256-bit
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.

llvm-svn: 137519
2011-08-12 21:48:26 +00:00
Akira Hatanaka
2fcc1cfdce Define unaligned load and store.
llvm-svn: 137515
2011-08-12 21:30:06 +00:00
Akira Hatanaka
2f6b944f56 Test case for 137484
llvm-svn: 137486
2011-08-12 18:12:06 +00:00
Akira Hatanaka
79d60d0e94 Enclose directive .cprestore with .set macro and nomacro to silence assembler
warning. 

llvm-svn: 137378
2011-08-11 22:42:31 +00:00
Bruno Cardoso Lopes
8fbf023c9b Add a dag combine to xform 256-bit shuffles into simple vector
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.

llvm-svn: 137362
2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes
9eb3762e08 Fix the test added by Nadav in r137308. Make it more strict:
1) check for the "v" version of movaps
2) add a couple of CHECK-NOT to guarantee the behavior
3) move to a more appropriate test file

llvm-svn: 137361
2011-08-11 21:50:35 +00:00
Bruno Cardoso Lopes
043c820800 Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.
llvm-svn: 137324
2011-08-11 18:59:13 +00:00
Jim Grosbach
27ad83d8a9 ARM push of a single register encodes as pre-indexed STR.
Per the ARM ARM, a 'push' of a single register encodes as an STR,
not an STM.

llvm-svn: 137318
2011-08-11 18:07:11 +00:00
Jim Grosbach
8ba76c6d5c ARM pop of a single register encodes as post-indexed LDR.
Per the ARM ARM, a 'pop' of a single register encodes as an LDR,
not an LDM.

llvm-svn: 137316
2011-08-11 17:35:48 +00:00
Nadav Rotem
1542d5a00a [AVX] If the data which is going to be saved is already in two XMM registers
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.

Before:
                vinsertf128         $1, %xmm3, %ymm0, %ymm3
                vinsertf128         $0, %xmm1, %ymm3, %ymm1
                vmovaps              %ymm1, 416(%rsp)

After:
                vmovaps              %xmm3, 416+16(%rsp)
                vmovaps              %xmm1, 416(%rsp)

llvm-svn: 137308
2011-08-11 16:41:21 +00:00
Chris Lattner
5a2c70cc8f add missing colon, thanks peter.
llvm-svn: 137306
2011-08-11 16:15:10 +00:00
Chris Lattner
96710b4308 fix PR10605 / rdar://9930964 by adding a pretty scary missed check.
It's somewhat surprising anything works without this.  Before we would
compile the testcase into:

test:                                   # @test
	movl	$4, 8(%rdi)
	movl	8(%rdi), %eax
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2

now we produce:

test:                                   # @test
	movl	8(%rdi), %eax
	movl	$4, 8(%rdi)
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2
llvm-svn: 137303
2011-08-11 06:26:54 +00:00
Bruno Cardoso Lopes
a2d8bb97b9 Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing
infinite recursive calls in legalize. Fix PR10562

llvm-svn: 137296
2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes
572c9aaf53 Use the splat index to generate the desired shuffle. Otherwise we
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.

llvm-svn: 137295
2011-08-11 02:49:41 +00:00
Eli Friedman
3ae39f8ad1 Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).
Fixes PR9693.

llvm-svn: 137292
2011-08-11 01:48:05 +00:00
NAKAMURA Takumi
504769fc2f test/CodeGen/X86/opt-shuff-tstore.ll: Add explicit -mtriple=x86_64-linux.
llvm-svn: 137262
2011-08-10 22:52:48 +00:00
Devang Patel
37a62058fe While extending definition range of a debug variable, consult lexical scopes also. There is no point extending debug variable out side its lexical block. This provides 6x compile time speedup in some cases.
llvm-svn: 137250
2011-08-10 21:25:34 +00:00
Nadav Rotem
d2b071f562 Fix the test. Add cpu target.
llvm-svn: 137241
2011-08-10 19:49:19 +00:00
Nadav Rotem
410a11fe82 When performing a truncating store, it is sometimes possible to rearrange the
data in-register prior to saving to memory.  When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.

llvm-svn: 137238
2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes
3ff111c12d The following X86 pattern is incorrect:
def : Pat<(X86Movss VR128:$src1,
                   (bc_v4i32 (v2i64 (load addr:$src2)))),
          (MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.

llvm-svn: 137227
2011-08-10 17:45:17 +00:00
Rafael Espindola
36a3abc671 Add support for the R and Q constraints.
llvm-svn: 137217
2011-08-10 16:26:42 +00:00
Bruno Cardoso Lopes
278ffd7d8e Fix a bug in vpermilps mask checking. Fix PR10560
llvm-svn: 137194
2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes
72323966c8 Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes
fc481959d2 Add v16i16 and v32i8 store patterns
llvm-svn: 137166
2011-08-09 22:39:53 +00:00
Bruno Cardoso Lopes
6963062a99 Use fp unpack instructions to unpack int types. Until we have AVX2, this
is the best we can do for these patterns. This fix PR10554.

llvm-svn: 137161
2011-08-09 22:18:37 +00:00
Eli Friedman
4ef2426b87 Fix a couple ridiculous copy-paste errors. rdar://9914773 .
llvm-svn: 137160
2011-08-09 22:17:39 +00:00
Bill Wendling
d7f41b7f66 Revert r137134. It breaks some code as Eli pointed out.
llvm-svn: 137135
2011-08-09 18:56:35 +00:00
Bill Wendling
84ec8f65d1 Print out the variable declaration only if it is a declaration. Otherwise, a
'static' variable will be emitted twice.
PR10081

llvm-svn: 137134
2011-08-09 18:31:50 +00:00
Jakob Stoklund Olesen
53910d6aae Inflate register classes after coalescing.
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class.  Examples are:

  x86: GR32_ABCD:sub_8bit_hi -> GR32
  arm: DPR_VFP2:ssub0 -> DPR

Recompute the register class of any virtual registers that are used by
less instructions after coalescing.

This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:

  vadd.f32  d16, d1, d0
  vcvt.s32.f32  d0, d16

The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.

llvm-svn: 137133
2011-08-09 18:19:41 +00:00
Bruno Cardoso Lopes
bed48dc8ff Reapply a more appropriate solution than in r137114. AVX supports
v4f64 = sitofp v4i32. This fix PR10559.
Also add support for v4i32 = fptosi v4f64.

llvm-svn: 137128
2011-08-09 17:39:13 +00:00