Commit Graph

18429 Commits

Author SHA1 Message Date
Pete Cooper
336d90b61b Revert "Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC"
This reverts commit 963cdbccf6e5578822836fd9b2ebece0ba9a60b7 (ie r236514)

This is to get the bots green while i investigate.

llvm-svn: 236518
2015-05-05 18:49:08 +00:00
Pete Cooper
05b84d4168 Revert "Fix IfConverter to handle regmask machine operands."
This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).

This is to get the bots green while i investigate the failures.

llvm-svn: 236517
2015-05-05 18:49:05 +00:00
Pete Cooper
6ebc207703 Fix IfConverter to handle regmask machine operands.
A regmask (typically seen on a call) clobbers the set of registers it lists.  The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.

These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier.  Otherwise, uses after the if converted call could think they are reading an undefined register.

Reviewed by Matthias Braun and Quentin Colombet.

llvm-svn: 236515
2015-05-05 18:31:36 +00:00
Pete Cooper
bbd1c727d1 Refactor UpdatePredRedefs and StepForward to avoid duplication. NFC
The code was basically the same here already.  Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.

Will be used in the next commit to also handle regmasks.

llvm-svn: 236514
2015-05-05 18:31:31 +00:00
Reid Kleckner
0738a9c02e Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236360.

This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.

llvm-svn: 236508
2015-05-05 17:44:16 +00:00
Quentin Colombet
61b305edfd [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Tim Northover
851ff69b42 CodeGen: match up correct insertvalue indices when assessing tail calls.
When deciding whether a value comes from the aggregate or inserted value of an
insertvalue instruction, we compare the indices against those of the location
we're interested in. One of the lists needs reversing because the input data is
backwards (so that modifications take place at the end of the SmallVector), but
we were reversing both before leading to incorrect results.

Should fix PR23408

llvm-svn: 236457
2015-05-04 20:41:51 +00:00
Pete Cooper
300069a019 ScheduleDAGInstrs should toggle kill flags on bundled instrs.
ScheduleDAGInstrs wasn't setting or clearing the kill flags on instructions inside bundles.  This led to code such as this

%R3<def> = t2ANDrr %R0
BUNDLE %ITSTATE<imp-def,dead>, %R0<imp-use,kill>
  t2IT 1, 24, %ITSTATE<imp-def>
  R6<def,tied6> = t2ORRrr %R0<kill>, ...

being transformed to

BUNDLE %ITSTATE<imp-def,dead>, %R0<imp-use>
  t2IT 1, 24, %ITSTATE<imp-def>
  R6<def,tied6> = t2ORRrr %R0<kill>, ...
%R3<def> = t2ANDrr %R0<kill>

where the kill flag was removed from the BUNDLE instruction, but not the t2ORRrr inside it.  The verifier then thought that
R0 was undefined when read by the AND.

This change make the toggleKillFlags method also check for bundles and toggle flags on bundled instructions.
Setting the kill flag is special cased as we only want to set the kill flag on the last instruction in the bundle.

llvm-svn: 236428
2015-05-04 16:52:06 +00:00
Elena Demikhovsky
1b60ed7069 Masked gather and scatter intrinsics - enabled codegen for KNL.
llvm-svn: 236394
2015-05-03 07:12:25 +00:00
Simon Pilgrim
017ca19384 [DAGCombiner] Enabled vector float/double -> int constant folding
llvm-svn: 236387
2015-05-02 13:04:07 +00:00
David Blaikie
72d03efa6d DebugInfo: Use low_pc relative debug_ranges under fission when the CU has a low_pc
Seems we were setting the base address on the wrong DwarfCompileUnit
object so it wasn't being used when generating the ranges.

llvm-svn: 236377
2015-05-02 02:31:49 +00:00
Jim Grosbach
bfe3a9c318 Fix spelling.
llvm-svn: 236367
2015-05-02 00:44:07 +00:00
Reid Kleckner
83d89fa546 Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236359. Things are still broken despite testing. :(

llvm-svn: 236360
2015-05-01 22:50:14 +00:00
Reid Kleckner
51476acd77 Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236340.

llvm-svn: 236359
2015-05-01 22:40:25 +00:00
Reid Kleckner
2747d3d55a Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236339, it breaks the win32 clang-cl self-host.

llvm-svn: 236340
2015-05-01 20:14:04 +00:00
Reid Kleckner
4856fc61b4 [WinEH] Add an EH registration and state insertion pass for 32-bit x86
This pass is responsible for constructing the EH registration object
that gets linked into fs:00, which is all it does in this change. In the
future, it will also insert stores to update the EH state number.

I considered keeping this functionality in WinEHPrepare, but it's pretty
separable and X86 specific. It has conceptually very little to do with
the task of WinEHPrepare, which is currently outlining.  WinEHPrepare is
also in theory useful on ARM, but this logic is pretty x86 specific.

Reviewers: andrew.w.kaylor, majnemer

Differential Revision: http://reviews.llvm.org/D9422

llvm-svn: 236339
2015-05-01 20:04:54 +00:00
Simon Pilgrim
9fb06bca67 [SelectionDAG] Unary vector constant folding integer legality fixes
This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund.

The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector.

I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch.

Differential Revision: http://reviews.llvm.org/D9282

llvm-svn: 236308
2015-05-01 08:20:04 +00:00
Matt Arsenault
59d2ca1cba Fix typo
llvm-svn: 236283
2015-04-30 23:20:56 +00:00
Pete Cooper
451755d370 Commute the internal flag on MachineOperands.
When commuting a thumb instruction in the size reduction pass, thumb
instructions are represented as a bundle and so some operands may be marked
as internal.  The internal flag has to move with the operand when commuting.

This test is sensitive to register allocation so can't specifically check that
this error was happening, but so long as it continues to pass with -verify then
hopefully its still ok.

rdar://problem/20752113

llvm-svn: 236282
2015-04-30 23:14:14 +00:00
Andrea Di Biagio
c84b5bdd69 Fix for PR23103. Correctly propagate the 'IsUndef' flag to the register operands of a commuted instruction.
Revision 220239 exposed a latent bug in method
'TargetInstrInfo::commuteInstruction'. When commuting the operands of a machine
instruction, method 'commuteInstruction' didn't correctly propagate the
'IsUndef' flag to the register operands of the new (commuted) instruction.

Before this patch, the following instruction:
  %vreg4<def> = VADDSDrr  %vreg14, %vreg5<undef>; FR64:%vreg4,%vreg14,%vreg5

was wrongly converted by method 'commuteInstruction' into:
  %vreg4<def> = VADDSDrr  %vreg5, %vreg14<undef>; FR64:%vreg4,%vreg5,%vreg14

The correct instruction should have been:
  %vreg4<def> = VADDSDrr  %vreg5<undef>, %vreg14; FR64:%vreg4,%vreg5,%vreg14

This patch fixes the problem in method 'TargetInstrInfo::commuteInstruction'.
When swapping the operands of a machine instruction, we now make sure that
'IsUndef' flags are correctly set.
Added test case 'pr23103.ll'.

Differential Revision: http://reviews.llvm.org/D9406

llvm-svn: 236258
2015-04-30 21:03:29 +00:00
Matt Arsenault
ee5c2ab734 MachineVerifier: Don't crash if MachineOperand has no parent
If you somehow added a MachineOperand to an instruction
that did not have the parent set, the verifier would
crash since it attempts to use the operand's parent.

llvm-svn: 236249
2015-04-30 19:35:41 +00:00
Pete Cooper
4d8d2ec3eb Don't rewrite jumps to empty BBs to landing pads.
In the test case here, the 'unreachable' BB was removed by BranchFolding because its empty.

It then rewrote the jump from 'entry' to jump to its fallthrough, which was a landing pad.

This results in 'entry' jumping to 2 different landing pads, which fails the machine verifier.

rdar://problem/20750162

llvm-svn: 236248
2015-04-30 18:58:23 +00:00
Reid Kleckner
582786b6cc Add a note about permitting default member initializers
Use them in WinEHPrepare so that we can spot any toolchain bugs that
come up.

llvm-svn: 236244
2015-04-30 18:17:12 +00:00
Jan Vesely
808fff585b Reinstate revisions r234755, r234759, r234760
changes:
  Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
  Add location to getConstant
  Drop comment about the ops being turned into expand

llvm-svn: 236240
2015-04-30 17:15:56 +00:00
Daniel Jasper
0366cd23ac Inline local variable to silence unused warning.
llvm-svn: 236212
2015-04-30 08:51:13 +00:00
Elena Demikhovsky
e1eda8a9e6 Masked gather and scatter - added DAGCombine visitors
and AVX-512 instruction selection patterns.
All other patches, including tests will follow.

http://reviews.llvm.org/D7665

llvm-svn: 236211
2015-04-30 08:38:48 +00:00
Owen Anderson
d8a029c81b Semantically revert r236031, which is not a good idea for in-order targets.
At the least it should be guarded by some kind of target hook.
It also introduced catastrophic compile time and code quality
regressions on some out of tree targets (test case still being
reduced/sanitized).

Sanjay agreed with reverting this patch until these issues can be
resolved.

llvm-svn: 236199
2015-04-30 04:06:32 +00:00
Hans Wennborg
4b828d35fd Switch lowering: use profile info to build weight-balanced binary search trees
This will cause hot nodes to appear closer to the root.

The literature says building the tree like this makes it a near-optimal (in
terms of search time given key frequencies) binary search tree. In LLVM's case,
we can do up to 3 comparisons in each leaf node, so it might be better to opt
for lower tree height in some cases; that's something to look into in the
future.

Differential Revision: http://reviews.llvm.org/D9318

llvm-svn: 236192
2015-04-30 00:57:37 +00:00
Reid Kleckner
bcda1cd45a [WinEH] Start EH preparation for 32-bit x86, it uses no arguments
32-bit x86 MSVC-style exceptions are functionaly similar to 64-bit, but
they take no arguments. Instead, they implicitly use the value of EBP
passed in by the caller as a pointer to the parent's frame. In LLVM, we
can represent this as llvm.frameaddress(1), and feed that into all of
our calls to llvm.framerecover.

The next steps are:
- Add an alloca to the fs:00 linked list of handlers
- Add something like llvm.sjlj.lsda or generalize it to store in the
  alloca
- Move state number calculation to WinEHPrepare, arrange for
  FunctionLoweringInfo to call it
- Use the state numbers to insert explicit loads and stores in the IR

llvm-svn: 236172
2015-04-29 22:49:54 +00:00
Sanjay Patel
04b0e92766 generalize binop reassociation; NFC
Move the fold introduced in r236031:
http://reviews.llvm.org/rL236031

to its own helper function, so we can use it for other binops.

This is a preliminary step before partially solving:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

llvm-svn: 236171
2015-04-29 22:30:02 +00:00
Pat Gavlin
022c5acad8 Run StatepointLowering.{cpp,h} through clang-format.
llvm-svn: 236166
2015-04-29 21:52:45 +00:00
David Blaikie
f64246be72 [opaque pointer type] Pass GlobalAlias the actual pointer type rather than decomposing it into pointee type + address space
Many of the callers already have the pointer type anyway, and for the
couple of callers that don't it's pretty easy to call PointerType::get
on the pointee type and address space.

This avoids LLParser from using PointerType::getElementType when parsing
GlobalAliases from IR.

llvm-svn: 236160
2015-04-29 21:22:39 +00:00
Sanjay Patel
caf5180ff7 tidy up; NFC
llvm-svn: 236156
2015-04-29 21:01:41 +00:00
Sanjay Patel
ee6678119d too much space again; NFC
llvm-svn: 236150
2015-04-29 20:38:02 +00:00
Sanjay Patel
435efaadff too much space; NFC
llvm-svn: 236147
2015-04-29 20:32:57 +00:00
Andrew Kaylor
a33f159056 [WinEH] Fix minor bug in begincatch block splitting
llvm-svn: 236129
2015-04-29 17:21:26 +00:00
Duncan P. N. Exon Smith
a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Jan Vesely
7539548738 CodeGen: Default overflow operations to expand so we don't have to assume targets are lying
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: ab
Differential Revision: http://reviews.llvm.org/D9265

llvm-svn: 236119
2015-04-29 16:30:46 +00:00
Elena Demikhovsky
ac969012ef Fixed masked gather/scatter switch-case
llvm-svn: 236092
2015-04-29 08:38:53 +00:00
Elena Demikhovsky
744fe0de33 fixed comments, blanks, nullptr; NFC
llvm-svn: 236086
2015-04-29 06:49:50 +00:00
Matthias Braun
5295793bca RegisterCoalescer: hide terminal rule option by default
llvm-svn: 236062
2015-04-28 23:55:11 +00:00
Andrew Kaylor
91307434f4 Style updates
llvm-svn: 236048
2015-04-28 22:01:51 +00:00
Andrew Kaylor
046f7b42f2 [WinEH] Split blocks at calls to llvm.eh.begincatch
Differential Revision: http://reviews.llvm.org/D9311

llvm-svn: 236046
2015-04-28 21:54:14 +00:00
Sanjay Patel
2fbc4e5c49 transform fadd chains to increase parallelism
This is a compromise: with this simple patch, we should always handle a chain of exactly 3
operations optimally, but we're not generating the optimal balanced binary tree for a longer
sequence.

In general, this transform will reduce the dependency chain for a sequence of instructions
using N operands from a worst case N-1 dependent operations to N/2 dependent operations. 
The optimal balanced binary tree would reduce the chain to log2(N).

The trade-off for not dealing with longer sequences is: (1) we have less complexity in the
compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we
don't need to worry about the increased register pressure required to parallelize longer
sequences. It also seems unlikely that we would ever encounter really long strings of
dependent ops like that in the wild, but I'm not sure how to verify that speculation.
FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math
and this patch.

We can extend this patch to cover other associative operations such as fmul, fmax, fmin, 
integer add, integer mul.

This is a partial fix for:
https://llvm.org/bugs/show_bug.cgi?id=17305

and if extended:
https://llvm.org/bugs/show_bug.cgi?id=21768
https://llvm.org/bugs/show_bug.cgi?id=23116

The issue also came up in:
http://reviews.llvm.org/D8941

Differential Revision: http://reviews.llvm.org/D9232

llvm-svn: 236031
2015-04-28 21:03:22 +00:00
Sanjay Patel
ba55804ea3 move IR-level optimization flags into their own struct
This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900).

In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, 
we should eventually share this data between the IR passes and the backend.

We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to
use the new struct.

The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a
data member to the subclass. This means we don't have to repeat all of the get/set methods per flag,
but we're potentially adding size to all nodes of this subclassi type.

In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an
SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at
least one free byte to play with due to struct alignment.

Differential Revision: http://reviews.llvm.org/D9325

llvm-svn: 235997
2015-04-28 16:39:12 +00:00
Sergey Dmitrouk
842a51bad8 Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"
[DebugInfo] Add debug locations to constant SD nodes

This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235989
2015-04-28 14:05:47 +00:00
Daniel Jasper
48e93f7181 Revert "[DebugInfo] Add debug locations to constant SD nodes"
This breaks a test:
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870

llvm-svn: 235987
2015-04-28 13:38:35 +00:00
Sergey Dmitrouk
adb4c69d5c [DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).

Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.

Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.

This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.

Differential Revision: http://reviews.llvm.org/D9084

llvm-svn: 235977
2015-04-28 11:56:37 +00:00
Elena Demikhovsky
584ce378ab Masked gather and scatter: Added code for SelectionDAG.
All other patches, including tests will follow.

http://reviews.llvm.org/D7665

llvm-svn: 235970
2015-04-28 07:57:37 +00:00
Hans Wennborg
7bf4d4eee0 Switch lowering: use uint32_t for weights everywhere
I previously thought switch clusters would need to use uint64_t in case
the weights of multiple cases overflowed a 32-bit int. It turns
out that the weights on a terminator instruction are capped to allow for
being added together, so using a uint32_t should be safe.

llvm-svn: 235945
2015-04-27 23:52:19 +00:00