Commit Graph

68 Commits

Author SHA1 Message Date
Rafael Espindola
5a52b9f139 Revert "Implement global merge optimization for global variables."
This reverts commit r208934.

The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.

The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.

llvm-svn: 208978
2014-05-16 13:02:18 +00:00
Hao Liu
8579f0d4d1 [ARM64]Implement NEON post-increment LD1(lane) and post-increment LD1R.
llvm-svn: 208955
2014-05-16 09:39:02 +00:00
Jiangning Liu
932e1c3924 Implement global merge optimization for global variables.
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.

For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.

llvm-svn: 208934
2014-05-15 23:45:42 +00:00
Jiangning Liu
09cc564310 [ARM64] Support aggressive fastcc/tailcallopt breaking ABI by popping out argument stack from callee.
llvm-svn: 208837
2014-05-15 01:33:17 +00:00
Jay Foad
a0653a3e6c Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Weiming Zhao
dd83691cc3 Folding into CSEL when there is ZEXT between SETCC and ADD
Normally, patterns like (add x, (setcc cc ...)) will be folded into
(csel x, x+1, not cc). However, if there is a ZEXT after SETCC, they
won't be folded. This patch recognizes the ZEXT and allows the
generation of CSINC.

This patch fixes bug 19680.

llvm-svn: 208660
2014-05-13 00:40:58 +00:00
Hal Finkel
f0e086a0bc Pass the value type to TLI::getRegisterByName
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.

No functionality change.

llvm-svn: 208508
2014-05-11 19:29:07 +00:00
Tim Northover
55b3e22927 ARM64: fix SELECT_CC lowering in absence of NaNs.
We were swapping the true & false results while testing for FMAX/FMIN,
but not putting them back to the original state if the later checks
failed.

Should fix PR19700.

llvm-svn: 208469
2014-05-10 07:37:50 +00:00
Hao Liu
1187a3d8db AArch64/ARM64: Port NEON post-increment load/store with 2/3/4 vectors to ARM64 backend.
llvm-svn: 208284
2014-05-08 07:38:13 +00:00
Tim Northover
88a51d983e AArch64/ARM64: optimise vector selects & enable test
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.

llvm-svn: 208210
2014-05-07 14:10:27 +00:00
James Molloy
36132057da [ARM64-BE] Fix variable-argument saving.
llvm-svn: 208199
2014-05-07 12:33:48 +00:00
James Molloy
ccc7f982c1 [ARM64-BE] Make big endian (scalar) argument passing work correctly.
This completes the port of r204814 (cpirker "AArch64_BE function argument
passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues
found during later development along the way. The biggest of these was
that the alignment fixup logic wasn't replicated into all the places it
should have been.

llvm-svn: 208192
2014-05-07 11:28:36 +00:00
Renato Golin
c7aea40ec6 Implememting named register intrinsics
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).

So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.

llvm-svn: 208104
2014-05-06 16:51:25 +00:00
Kevin Qin
1353c3405d [ARM64] Enable alignment control option in front-end for ARM64.
This is the modification in llvm part.

llvm-svn: 208074
2014-05-06 09:48:52 +00:00
Tim Northover
d0b07e133b AArch64/ARM64: support indexed loads/stores on vector types.
While post-indexed LD1/ST1 instructions do exist for vector loads,
this patch makes use of the more flexible addressing-modes in LDR/STR
instructions.

llvm-svn: 207838
2014-05-02 14:54:15 +00:00
Weiming Zhao
7f6daf1799 [ARM64] Prevent bit extraction to be adjusted by following shift
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
  %shr = lshr i64 %x, 4
  %and = and i64 %shr, 15
  %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
  %0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
  lsr x8, x0, #1
  and x8, x8, #0x78
  add x8, x9, x8

If using ubfx, it only needs 2 instrs:
  ubfx  x8, x0, #4, #4
  add x8, x9, x8, lsl #3

This fixes bug 19589

llvm-svn: 207702
2014-04-30 21:07:24 +00:00
Tim Northover
d53a671354 AArch64/ARM64: expunge CPSR from the sources
AArch64 does not have a CPSR register in the same way that AArch32 does. Most
of its compiler-relevant roles have been taken over by the more specific NZCV
register (representing just the flags set by normal instructions).

Its system control functions still remain, but are now under the
pseudo-register referred to as "PSTATE". They're accessed via various MRS & MSR
instructions described in the reference manual.

llvm-svn: 207645
2014-04-30 13:14:14 +00:00
Tim Northover
20ad359b77 AArch64/ARM64: use HS instead of CS & LO instead of CC.
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.

llvm-svn: 207644
2014-04-30 13:14:03 +00:00
James Molloy
54f3485dba [ARM64] Simplify if condition.
v2f32 and v4f32 were missed out of these conditions, so this is also
a bugfix.

llvm-svn: 207628
2014-04-30 10:15:50 +00:00
Craig Topper
2d2aa0ca1f Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently.
llvm-svn: 207616
2014-04-30 07:17:30 +00:00
Hao Liu
6db3410071 [ARM64]Fix a bug about incorrect operand order in an EXT instruction, which is introduced by r207485.
llvm-svn: 207500
2014-04-29 07:51:19 +00:00
Hao Liu
cf37110920 [ARM64]Fix a bug when lowering shuffle vector to an EXT instruction.
E.g. Mask like <-1, -1, 1, ...> will generate incorrect EXT index.

llvm-svn: 207485
2014-04-29 01:50:36 +00:00
Craig Topper
64941d9786 Convert SelectionDAG::getMergeValues to use ArrayRef.
llvm-svn: 207374
2014-04-27 19:20:57 +00:00
Craig Topper
48d114bed1 Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Benjamin Kramer
4dae598bc8 DAGCombiner: Turn divs of vector splats into vectorized multiplications.
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.

I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.

llvm-svn: 207315
2014-04-26 12:06:28 +00:00
Michael Zolotukhin
1a97a7bcbf Revert r206749 till a final decision about the intrinsics is made.
llvm-svn: 207313
2014-04-26 09:56:41 +00:00
Kevin Qin
022d395c9c [ARM64] Add RUN lines for "–target arm64 –mattr=-fp-armv8" on AArch64 no-fp test.
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.

llvm-svn: 207215
2014-04-25 09:44:20 +00:00
Craig Topper
062a2baef0 [C++] Use 'nullptr'. Target edition.
llvm-svn: 207197
2014-04-25 05:30:21 +00:00
Reid Kleckner
5772b77789 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
Kevin Qin
a4ee178762 [ARM64] Enable feature predicates for NEON / FP / CRYPTO.
AArch64 has feature predicates for NEON, FP and CRYPTO instructions.
This allows the compiler to generate code without using FP, NEON
or CRYPTO instructions.

llvm-svn: 206949
2014-04-23 06:22:48 +00:00
Tim Northover
a962398a3f AArch64/ARM64: make use of ANDS and BICS instructions for comparisons.
llvm-svn: 206888
2014-04-22 12:45:42 +00:00
Chandler Carruth
84e68b2994 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

llvm-svn: 206842
2014-04-22 02:41:26 +00:00
Yi Jiang
d069f6393a ARM64: Combine shifts and uses from different basic block to bit-extract instruction
llvm-svn: 206774
2014-04-21 19:34:27 +00:00
Michael Zolotukhin
f2ba994bf6 Reapply r206732. This time without optimization of branches.
llvm-svn: 206749
2014-04-21 12:01:33 +00:00
Chandler Carruth
a2533a7bef Revert r206732 which is causing llc to crash on most of the build bots.
Original commit message:
  Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN,
  safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64).

llvm-svn: 206735
2014-04-21 07:11:15 +00:00
Michael Zolotukhin
137a84616c Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN,
safe.urem.iN (iN = i8, i16, i32, or i64).

llvm-svn: 206732
2014-04-21 05:33:09 +00:00
Tim Northover
e3028832d1 AArch64/ARM64: add non-scalar lowering for more FCVT operations.
llvm-svn: 206591
2014-04-18 13:16:42 +00:00
Tim Northover
01f315a556 AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE.
We couldn't cope if the first mask element was UNDEF before, which
isn't ideal.

llvm-svn: 206588
2014-04-18 12:50:58 +00:00
Tim Northover
a2c4c71c12 AArch64/ARM64: spot a greater variety of concat_vector operations.
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.

This also enables the AArch64 tests which inspired the previous
untested commits.

llvm-svn: 206574
2014-04-18 09:31:27 +00:00
Tim Northover
5ec51a8981 ARM64: spot a vector_shuffle that maps to INS and expand.
Tests will be coming very shortly when all the optimisations needed to
support AArch64's neon-copy.ll file are committed.

llvm-svn: 206572
2014-04-18 09:31:15 +00:00
Tim Northover
8b2fa3dfef AArch64/ARM64: emit all vector FP comparisons as such.
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.

More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.

llvm-svn: 206570
2014-04-18 09:31:07 +00:00
Tim Northover
0a44e66bb8 AArch64/ARM64: port BSL logic from AArch64 & enable test.
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.

Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.

llvm-svn: 206569
2014-04-18 09:31:01 +00:00
Tim Northover
547a4ae6fa AArch64/ARM64: copy byval implementation from AArch64.
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.

llvm-svn: 206568
2014-04-18 09:30:52 +00:00
Louis Gerbarg
153e695ee2 Improve ARM64 vector creation
This patch improves the performance of vector creation in caseiswhere where
several of the lanes in the vector are a constant floating point value. It
also includes new patterns to fold together some of the instructions when the
value is 0.0f. Test cases included.

rdar://16349427

llvm-svn: 206496
2014-04-17 20:51:50 +00:00
Tim Northover
11a6082e33 ARM64: switch to IR-based atomic operations.
Goodbye code!

(Game: spot the bug fixed by the change).

llvm-svn: 206490
2014-04-17 20:00:33 +00:00
Tim Northover
0129f298c4 ARM64: add acquire/release versions of the existing atomic intrinsics.
These will be needed to support IR-level lowering of atomic
operations.

llvm-svn: 206489
2014-04-17 20:00:24 +00:00
Adam Nemet
287f989dde [ARM64] Fix "Cannot select" for vector ctpop
The commit of r205855:

Author: Arnold Schwaighofer <aschwaighofer@apple.com>
Date:   Wed Apr 9 14:20:47 2014 +0000

    SLPVectorizer: Only vectorize intrinsics whose operands are widened equally

    The vectorizer only knows how to vectorize intrinics by widening all operands by
    the same factor.

    Patch by Tyler Nowicki!

exposed a backend bug causing a regression (Cannot select ctpop).

The commit msg is a bit confusing because the patch actually changes the
behavior for the loop-vectorizer as well.  As things got refactored into a
helper ctpop got snuck in to the trivially-vectorizable helper which is now
used by both vectorizers.  In other words, we started seeing vector-ctpops in
the backend.

This change makes ctpop LegalizeAction::Expand for the types not supported by
the byte-only CNT instruction.  We may be able to custom-lower these later to
a single CNT but this is to fix the compiler crash first.

Fixes <rdar://problem/16578951>

llvm-svn: 206433
2014-04-17 01:01:37 +00:00
Tim Northover
6e27b8ded5 AArch64/ARM64: add support for large code-model jump tables.
I've left the MachO CodeGen as it is, there's a reasonable chance it should use
the GOT like ConstPools, but I'm not certain.

llvm-svn: 206288
2014-04-15 14:00:11 +00:00
Tim Northover
b37cff1ae2 AArch64/ARM64: add half as a storage type on ARM64.
This brings it into line with the AArch64 behaviour and should open the way for
certain OpenCL features.

llvm-svn: 206286
2014-04-15 14:00:03 +00:00
Tim Northover
23b1f08282 ARM64: optimise (cmp x, (sub 0, y)) to (cmn x, y).
This transformation is only valid when being used for an EQ or NE
comparison since the flags change otherwise.

llvm-svn: 206167
2014-04-14 12:50:47 +00:00