Commit Graph

8182 Commits

Author SHA1 Message Date
Venkatraman Govindaraju
e9ef51222b [Sparc] Emit .register directive to declare the use of global registers %g2, %g4, %g6 and %g7.
llvm-svn: 191158
2013-09-22 00:42:30 +00:00
Venkatraman Govindaraju
829aec5900 [Sparc] Fix lowering FABS on fp128 (long double) on pre-v9 targets.
llvm-svn: 191154
2013-09-21 23:51:08 +00:00
Juergen Ributzka
f043a65327 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
This reverts commit r191130.

llvm-svn: 191138
2013-09-21 15:09:46 +00:00
Juergen Ributzka
ab930591c7 [X86] Emulate AVX 256bit MIN/MAX support by splitting the vector.
In AVX 256bit vectors are valid vectors and therefore the Type Legalizer doesn't
split the VSELECT and SETCC nodes. AVX only supports MIN/MAX on 128bit vectors
and this fix enables vector splitting for this special case in the X86 DAG
Combiner.

This fix is related to PR16695, PR17002, and <rdar://problem/14594431>.

llvm-svn: 191131
2013-09-21 04:55:22 +00:00
Juergen Ributzka
e9a80fc912 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

llvm-svn: 191130
2013-09-21 04:55:18 +00:00
NAKAMURA Takumi
68fa6f9d36 Initialize BSSSection explicitly in InitMachOMCObjectFileInfo() to appease msvc.
This can revert r191087.

llvm-svn: 191128
2013-09-21 02:34:45 +00:00
Reed Kotler
78fb291e62 Set .reorder for the stub so that gas takes care of delay slot processing.
llvm-svn: 191125
2013-09-21 01:37:52 +00:00
NAKAMURA Takumi
2867a0de5d llvm/test: Mark 3 tests as XFAIL:msvc.
llvm-svn: 191087
2013-09-20 12:57:34 +00:00
Kai Nacke
d09bb4614b PR16726: extend rol/ror matching
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.

This commit extends the DAGCombiner in the way that the pattern

(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))

is folded into

([az]ext (rotl x, y))

The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.

This fixes PR16726.

llvm-svn: 191049
2013-09-19 23:00:28 +00:00
Kai Nacke
2d967b2751 Revert PR16726: extend rol/ror matching
There is a buildbot failure. Need to investigate this.

llvm-svn: 191048
2013-09-19 22:53:36 +00:00
Kai Nacke
4eaf6444fa PR16726: extend rol/ror matching
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.

This commit extends the DAGCombiner in the way that the pattern

(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))

is folded into

([az]ext (rotl x, y))

The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.

This fixes PR16726.

llvm-svn: 191045
2013-09-19 22:36:39 +00:00
Bill Wendling
64587097b6 Add testcase to make sure we don't generate too many jumps for a une compare.
<rdar://problem/7859988>

llvm-svn: 191040
2013-09-19 21:58:20 +00:00
Benjamin Kramer
d443e4a080 DAGCombiner: Don't fold vector muls with constants that look like a splat of a power of 2 but differ in bit width.
PR17283.

llvm-svn: 191000
2013-09-19 13:28:20 +00:00
Justin Holewinski
a54daa4640 [NVPTX] Make constant vector test case endian-independent
llvm-svn: 190998
2013-09-19 13:14:44 +00:00
Justin Holewinski
95564bdf5e [NVPTX] Support constant vector globals
llvm-svn: 190997
2013-09-19 12:51:46 +00:00
Amara Emerson
3308909508 [ARMv8] Add support for the v8 cryptography extensions.
llvm-svn: 190996
2013-09-19 11:59:01 +00:00
Tim Northover
97347a81bc X86: FrameIndex addressing modes do have a base register.
When selecting the DAG (add (WrapperRIP ...), (FrameIndex ...)), X86 code had
spotted the FrameIndex possibility and was working out whether it could fold
the WrapperRIP into this.

The test for forming a %rip version is notionally whether we already have a
base or index register (%rip precludes both), but we were forgetting to account
for the register that would be inserted later to access the frame.

rdar://problem/15024520

llvm-svn: 190995
2013-09-19 11:33:53 +00:00
Reed Kotler
d6aadc797c Fix two issues regarding Got pointer (GP) setup.
1) make sure that the first two instructions of the sequence cannot
separate from each other. The linker requires that they be sequential.
If they get separated, it can still work but it will not work in all
cases because the first of the instructions mostly involves the hi part
of the pc relative offset and that part changes slowly. You would have
to be at the right boundary for this to matter.
2) make sure that this sequence begins  on a longword boundary. 
There appears to be a bug in binutils which makes some of these calculations
get messed up if the instruction sequence does not begin on a longword
boundary. This is being investigated with the appropriate binutils folks.

llvm-svn: 190966
2013-09-18 22:46:09 +00:00
Preston Gurd
dd9891f22d Attempt to fix llvm-ppc64-linux2 buildbot failure by adding
-march=x86 to SLM test.

llvm-svn: 190958
2013-09-18 21:39:33 +00:00
Preston Gurd
457daddc9b Verify that llvm can generate the prefetchw instruction when the CPU is
Atom Silvermont.

Patch by Sriram Murali.

llvm-svn: 190957
2013-09-18 21:08:09 +00:00
Richard Sandiford
93183ee78c [SystemZ] Add unsigned compare-and-branch instructions
For some reason I never got around to adding these at the same time as
the signed versions.  No idea why.

I'm not sure whether this SystemZII::BranchC* stuff is useful, or whether
it should just be replaced with an "is normal" flag.  I'll leave that
for later though.

There are some boundary conditions that can be tweaked, such as preferring
unsigned comparisons for equality with [128, 256), and "<= 255" over "< 256",
but again I'll leave those for a separate patch.

llvm-svn: 190930
2013-09-18 09:56:40 +00:00
Craig Topper
98064b9f4d Lift alignment restrictions for load/store folding on VINSERTF128/VEXTRACTF128. Fixes PR17268.
llvm-svn: 190916
2013-09-18 03:55:53 +00:00
Reid Kleckner
c1e7621e01 COFF: Ensure that objects produced by LLVM link with /safeseh
Summary:
We indicate that the object files are safe by emitting a @feat.00
absolute address symbol.  The address is presumably interpreted as a
bitfield of features that the compiler would like to enable.  Bit 0 is
documented in the PE COFF spec to opt in to "registered SEH", which is
what /safeseh enables.

LLVM's object files are safe by default because LLVM doesn't know how to
produce SEH handlers.

Reviewers: Bigcheese

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1691

llvm-svn: 190898
2013-09-17 23:18:05 +00:00
Bill Schmidt
bb381d7063 [PowerPC] Fix problems with large code model (PR17169).
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing.  This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this.  Two
test cases are also modified to reflect this requirement.

Fast-isel was not creating correct code for loading floating-point constants
using large code model.  This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model.  One test case is modified to reflect this requirement.

llvm-svn: 190882
2013-09-17 20:03:25 +00:00
Kevin Qin
36399e6b68 Implement 3 AArch64 neon instructions : umov smov ins.
llvm-svn: 190839
2013-09-17 02:21:02 +00:00
Quentin Colombet
d30a9585b8 [SelectionDAG] Teach the vector scalarizer about TRUNCATE.
When a truncate node defines a legal vector type but uses an illegal
vector type, the legalization process was splitting the vector until
<1 x vector> type, but then it was failing to scalarize the node because
it did not know how to handle TRUNCATE.

<rdar://problem/14989896>

llvm-svn: 190830
2013-09-17 00:26:56 +00:00
Preston Gurd
f4f8d8acc8 Add Atom Silvermont (slm) tests
- check that -mcpu=slm uses the call register indirect optimization
- check that -mcpu=slm runs the scheduler 
- check that -mcpu=slm supports the movbe instruction

llvm-svn: 190814
2013-09-16 22:22:07 +00:00
Richard Sandiford
109a7c6ff1 [SystemZ] Improve extload handling
The port originally had special patterns for extload, mapping them to the
same instructions as sextload.  It seemed neater to have patterns that
match "an extension that is allowed to be signed" and "an extension that
is allowed to be unsigned".

This was originally meant to be a clean-up, but it does improve the handling
of promoted integers a little, as shown by args-06.ll.

llvm-svn: 190777
2013-09-16 09:03:10 +00:00
Peter Collingbourne
3fa50f9b05 Implement function prefix data as an IR feature.
Previous discussion:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html

Differential Revision: http://llvm-reviews.chandlerc.com/D1191

llvm-svn: 190773
2013-09-16 01:08:15 +00:00
Hal Finkel
40c34781b5 PPC: Don't restrict lvsl generation to after type legalization
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190771
2013-09-15 22:09:58 +00:00
Hal Finkel
31025a6325 Revert r190764: PPC: Don't restrict lvsl generation to after type legalization
This is causing test-suite failures.

Original commit message:

The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190765
2013-09-15 15:41:11 +00:00
Hal Finkel
2945d4e916 PPC: Don't restrict lvsl generation to after type legalization
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.

llvm-svn: 190764
2013-09-15 15:20:54 +00:00
Hal Finkel
31658834e6 Prevent assert in CombinerGlobalAA with null values
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).

llvm-svn: 190763
2013-09-15 02:19:49 +00:00
Reed Kotler
655531521e Expand the mask capability for deciding which functions are mips16 and mips32
so it can be better used for general interoperability testing between mips32
and mips16.

llvm-svn: 190762
2013-09-15 02:09:08 +00:00
Joey Gouly
ccd04894c4 [ARMv8] Change hasV8Fp to hasFPARMv8, and other command line options
to be more consistent.

llvm-svn: 190692
2013-09-13 13:46:57 +00:00
Joey Gouly
3c0e5567a9 [ARMv8] Emit the proper .fpu directive.
Patch by Bradley Smith!

llvm-svn: 190683
2013-09-13 11:51:52 +00:00
Richard Sandiford
030c165710 [SystemZ] Try to fold shifts into TMxx
E.g. "SRL %r2, 2; TMLL %r2, 1" => "TMLL %r2, 4".

llvm-svn: 190672
2013-09-13 09:09:50 +00:00
Vincent Lejeune
9a248e5c2d R600: Move code handling literal folding into R600ISelLowering.
llvm-svn: 190644
2013-09-12 23:44:53 +00:00
Vincent Lejeune
ab3baf80a8 R600: Move fabs/fneg/sel folding logic into PostProcessIsel
This move makes possible to correctly handle multiples instructions
from a single pattern.

llvm-svn: 190643
2013-09-12 23:44:44 +00:00
Hal Finkel
a5ebe426a5 Remove unnecessary TBAA metadata from r190636's test case
llvm-svn: 190637
2013-09-12 23:23:12 +00:00
Hal Finkel
262a224712 Fix PPC ABI for ByVal structs with vector members
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).

This should resolve PR16641.

llvm-svn: 190636
2013-09-12 23:20:06 +00:00
Hal Finkel
1e2e3ea584 Make the PPC fast-math sqrt expansion safe at 0
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.

llvm-svn: 190624
2013-09-12 19:04:12 +00:00
Elena Demikhovsky
8952974e29 AVX-512: implemented extractelement with variable index.
Added parsing of mask register and "zeroing" semantic, like {%k1} {z}.

llvm-svn: 190595
2013-09-12 08:55:00 +00:00
Hal Finkel
7fe6a5390f PPC: Enable aggressive anti-dependency breaking
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.

Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.

llvm-svn: 190587
2013-09-12 05:24:49 +00:00
Tom Stellard
afcf12f33a R600/SI: expose TBUFFER_STORE_FORMAT_* for OpenGL transform feedback
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.

The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.

The maximum number of input SGPRs is bumped to 17.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
2013-09-12 02:55:14 +00:00
Bill Wendling
4c3b514da0 Try to fix the atom buildbots by adding an explicit 'cpu' to the 'llc' command.
llvm-svn: 190541
2013-09-11 19:06:04 +00:00
Daniel Sanders
7fab912257 [mips][msa] Added test cases that were supposed to be part of r190507, r190509, r190512, and r190518.
llvm-svn: 190522
2013-09-11 12:39:25 +00:00
Daniel Sanders
fbcb582942 [mips][msa] Added support for matching mulv, nlzc, sll, sra, srl, and subv from normal IR (i.e. not intrinsics)
llvm-svn: 190518
2013-09-11 11:58:30 +00:00
Daniel Sanders
f5bd937bc4 [mips][msa] Added support for matching fadd, fdiv, flog2, fmul, frint, fsqrt, and fsub from normal IR (i.e. not intrinsics)
llvm-svn: 190512
2013-09-11 10:51:30 +00:00
Daniel Sanders
607952bdad [mips][msa] Added support for matching div_[su] from normal IR (i.e. not intrinsics)
llvm-svn: 190509
2013-09-11 10:38:58 +00:00