Commit Graph

8509 Commits

Author SHA1 Message Date
Sanjay Patel
1c8c6a457d [InstCombine] consolidate rem tests and update checks; NFC
llvm-svn: 297747
2017-03-14 16:27:46 +00:00
Sanjay Patel
9deec85c34 [InstCombine] regenerate checks; NFC
llvm-svn: 297746
2017-03-14 16:16:40 +00:00
Oliver Stannard
062041113f [ValueTracking] Out of range shifts might be undef
If it is possible for the RHS of a shift operation to be greater than or equal
to the bit-width, then the result might be undef, and we can't report any known
bits.

In some cases, this was allowing a transformation in instcombine which widened
an undef value from i1 to i32, increasing the range of values that a function
could return.

Differential revision: https://reviews.llvm.org/D30781

llvm-svn: 297724
2017-03-14 10:13:17 +00:00
Jonas Paulsson
a48ea231c0 [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Daniel Berlin
620f86ff2b Add missing condprop-xfail.ll that contains the remaining xfail'd tests
llvm-svn: 297699
2017-03-14 01:46:51 +00:00
Daniel Berlin
2aa23e8881 NewGVN: We pass rle-nonlocal, we just perform the replacement in a way that keeps the old name instead of the new one
llvm-svn: 297683
2017-03-13 22:43:30 +00:00
Sanjay Patel
caf369bd03 [SimplifyCFG] move tests for PR31028 from CGP
Hopefully, this will make sense with a forthcoming patch. If not, we can move these back.

llvm-svn: 297660
2017-03-13 19:59:14 +00:00
Matt Arsenault
d81f557fe2 AMDGPU: Fold icmp/fcmp into icmp intrinsic
The typical use is a library vote function which
compares to 0. Fold the user condition into the intrinsic.

llvm-svn: 297650
2017-03-13 18:14:02 +00:00
Sanjay Patel
6023a2501c [CGP] add tests for PR31028; NFC
llvm-svn: 297629
2017-03-13 15:45:37 +00:00
Sanjoy Das
3f1e8e0102 Use a WeakVH for UnknownInstructions in AliasSetTracker
Summary:
This change solves the same problem as D30726, except that this only
throws out the bathwater.

AST was not correctly tracking and deleting UnknownInstructions via
handles.  The existing code only tracks "pointers" in its
`ASTCallbackVH`, so an UnknownInstruction (that isn't also def'ing a
pointer used by another memory instruction) never gets a
`ASTCallbackVH`.

There are two other ways to solve this problem:

 - Use the `PointerRec` scheme for both known and unknown instructions.
 - Use a `CallbackVH` that erases the offending Instruction from the
   UnknownInstruction list.

Both of the above changes seemed to be significantly (and unnecessarily
IMO) more complex than this.

Reviewers: chandlerc, dberlin, hfinkel, reames

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D30849

llvm-svn: 297539
2017-03-11 01:15:48 +00:00
Peter Collingbourne
14dcf02fcb WholeProgramDevirt: Implement export/import support for VCP.
Differential Revision: https://reviews.llvm.org/D30017

llvm-svn: 297503
2017-03-10 20:13:58 +00:00
Peter Collingbourne
59675ba0f8 WholeProgramDevirt: Implement export/import support for unique ret val opt.
Differential Revision: https://reviews.llvm.org/D29917

llvm-svn: 297502
2017-03-10 20:09:11 +00:00
Michael Kuperstein
5fb39a7966 [SLP] Revert everything that has to do with memory access sorting.
This reverts r293386, r294027, r294029 and r296411.

Turns out the SLP tree isn't actually a "tree" and we don't handle
accessing the same packet of loads in several different orders well,
causing miscompiles.

Revert until we can fix this properly.

llvm-svn: 297493
2017-03-10 18:59:07 +00:00
Matt Arsenault
a3bdd8f27b AMDGPU: Fix insertion point when reducing load intrinsics
The insertion point may be later than the next instruction,
so it is necessary to set it when replacing the call.

llvm-svn: 297439
2017-03-10 05:25:49 +00:00
Daniel Berlin
e3e69e1680 NewGVN: Rewrite DCE during elimination so we do it as well as old GVN did.
llvm-svn: 297428
2017-03-10 00:32:33 +00:00
Sanjay Patel
962a8431ea [InstSimplify] allow folds for bool vector div/rem
llvm-svn: 297411
2017-03-09 21:56:03 +00:00
Sanjay Patel
7e56366204 [ConstantFold] vector div/rem with any zero element in divisor is undef
Follow-up for:
https://reviews.llvm.org/D30665
https://reviews.llvm.org/rL297390

llvm-svn: 297409
2017-03-09 20:42:30 +00:00
Matt Arsenault
efe949cc67 AMDGPU: Support for SimplifyDemandedVectorElts for load intrinsics
llvm-svn: 297408
2017-03-09 20:34:27 +00:00
Sanjay Patel
bb47616aef [InstSimplify] add tests for vector constant folding div/rem-by-0; NFC
llvm-svn: 297407
2017-03-09 20:31:20 +00:00
Sanjay Patel
2b1f6f4b92 [InstSimplify] vector div/rem with any zero element in divisor is undef
This was suggested as a DAG simplification in the review for rL297026 :
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435253.html
...but let's start with IR since we have actual docs for IR (LangRef).

Differential Revision:
https://reviews.llvm.org/D30665

llvm-svn: 297390
2017-03-09 16:20:52 +00:00
Chandler Carruth
20e588e1af [PM/Inliner] Make the new PM's inliner process call edges across an
entire SCC before iterating on newly-introduced call edges resulting
from any inlined function bodies.

This more closely matches the behavior of the old PM's inliner. While it
wasn't really clear to me initially, this behavior is actually essential
to the inliner behaving reasonably in its current design.

Because the inliner is fundamentally a bottom-up inliner and all of its
cost modeling is designed around that it often runs into trouble within
an SCC where we don't have any meaningful bottom-up ordering to use. In
addition to potentially cyclic, infinite inlining that we block with the
inline history mechanism, it can also take seemingly simple call graph
patterns within an SCC and turn them into *insanely* large functions by
accidentally working top-down across the SCC without any of the
threshold limitations that traditional top-down inliners use.

Consider this diabolical monster.cpp file that Richard Smith came up
with to help demonstrate this issue:
```
template <int N> extern const char *str;

void g(const char *);

template <bool K, int N> void f(bool *B, bool *E) {
  if (K)
    g(str<N>);
  if (B == E)
    return;
  if (*B)
    f<true, N + 1>(B + 1, E);
  else
    f<false, N + 1>(B + 1, E);
}
template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); }
template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); }

extern bool *arr, *end;
void test() { f<false, 0>(arr, end); }
```

When compiled with '-DMAX=N' for various values of N, this will create an SCC
with a reasonably large number of functions. Previously, the inliner would try
to exhaust the inlining candidates in a single function before moving on. This,
unfortunately, turns it into a top-down inliner within the SCC. Because our
thresholds were never built for that, we will incrementally decide that it is
always worth inlining and proceed to flatten the entire SCC into that one
function.

What's worse, we'll then proceed to the next function, and do the exact same
thing except we'll skip the first function, and so on. And at each step, we'll
also make some of the constant factors larger, which is awesome.

The fix in this patch is the obvious one which makes the new PM's inliner use
the same technique used by the old PM: consider all the call edges across the
entire SCC before beginning to process call edges introduced by inlining. The
result of this is essentially to distribute the inlining across the SCC so that
every function incrementally grows toward the inline thresholds rather than
allowing the inliner to grow one of the functions vastly beyond the threshold.
The code for this is a bit awkward, but it works out OK.

We could consider in the future doing something more powerful here such as
prioritized order (via lowest cost and/or profile info) and/or a code-growth
budget per SCC. However, both of those would require really substantial work
both to design the system in a way that wouldn't break really useful
abstraction decomposition properties of the current inliner and to be tuned
across a reasonably diverse set of code and workloads. It also seems really
risky in many ways. I have only found a single real-world file that triggers
the bad behavior here and it is generated code that has a pretty pathological
pattern. I'm not worried about the inliner not doing an *awesome* job here as
long as it does *ok*. On the other hand, the cases that will be tricky to get
right in a prioritized scheme with a budget will be more common and idiomatic
for at least some frontends (C++ and Rust at least). So while these approaches
are still really interesting, I'm not in a huge rush to go after them. Staying
even closer to the existing PM's behavior, especially when this easy to do,
seems like the right short to medium term approach.

I don't really have a test case that makes sense yet... I'll try to find a
variant of the IR produced by the monster template metaprogram that is both
small enough to be sane and large enough to clearly show when we get this wrong
in the future. But I'm not confident this exists. And the behavior change here
*should* be unobservable without snooping on debug logging. So there isn't
really much to test.

The test case updates come from two incidental changes:
1) We now visit functions in an SCC in the opposite order. I don't think there
   really is a "right" order here, so I just update the test cases.
2) We no longer compute some analyses when an SCC has no call instructions that
   we consider for inlining.

llvm-svn: 297374
2017-03-09 11:35:40 +00:00
Peter Collingbourne
0152c8156b WholeProgramDevirt: Implement importing for uniform ret val opt.
Differential Revision: https://reviews.llvm.org/D29854

llvm-svn: 297350
2017-03-09 01:11:15 +00:00
Peter Collingbourne
6d284fab20 WholeProgramDevirt: Implement importing for single-impl devirtualization.
Differential Revision: https://reviews.llvm.org/D29844

llvm-svn: 297333
2017-03-09 00:21:25 +00:00
Changpeng Fang
1be9b9f816 AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized.
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D30719

llvm-svn: 297328
2017-03-09 00:07:00 +00:00
Evgeniy Stepanov
8537d9994d Don't merge global constants with non-dbg metadata.
!type metadata can not be dropped. An alternative to this is adding
!type metadata from the replaced globals to the replacement, but that
may weaken type tests and make them slower at the same time.

The merged global gets !dbg metadata from replaced globals, and can
end up with multiple debug locations.

llvm-svn: 297327
2017-03-09 00:03:37 +00:00
Matthew Simpson
3388de1349 [LV] Select legal insert point when fixing first-order recurrences
Because IRBuilder performs constant-folding, it's not guaranteed that an
instruction in the original loop map to an instruction in the vector loop. It
could map to a constant vector instead. The handling of first-order recurrences
was incorrectly making this assumption when setting the IRBuilder's insert
point.

llvm-svn: 297302
2017-03-08 18:18:20 +00:00
Matthew Simpson
8966848d17 [LV] Make the test case for PR30183 less fragile
This patch also renames the PR number the test points to. The previous
reference was PR29559, but that bug was somehow deleted and recreated under
PR30183.

llvm-svn: 297295
2017-03-08 17:03:38 +00:00
Matthew Simpson
903dd5aa9b [LV] Add missing check labels to tests and reformat
llvm-svn: 297294
2017-03-08 16:55:34 +00:00
Jun Bum Lim
ac170872b2 [JumpThread] Use AA in SimplifyPartiallyRedundantLoad()
Summary: Use AA when scanning to find an available load value.

Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin

Reviewed By: rengolin, dberlin

Subscribers: aemerson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30352

llvm-svn: 297284
2017-03-08 15:22:30 +00:00
Sanjay Patel
62906af379 [InstCombine] avoid crashing on shuffle shrinkage when input type is not same as result type
llvm-svn: 297280
2017-03-08 15:02:23 +00:00
Sam Parker
0f4db38c20 [LoopRotate] Propagate dbg.value intrinsics
Recommitting patch which was previously reverted in r297159. These
changes should address the casting issues.

The original patch enables dbg.value intrinsics to be attached to
newly inserted PHI nodes.

Differential Review: https://reviews.llvm.org/D30701

llvm-svn: 297269
2017-03-08 09:56:22 +00:00
Sebastian Pop
4a4d245b19 Handle UnreachableInst in isGuaranteedToTransferExecutionToSuccessor
A block with an UnreachableInst does not transfer execution to a successor.
The problem was exposed by GVN-hoist. This patch fixes bug 32153.

Patch by Aditya Kumar.

Differential Revision: https://reviews.llvm.org/D30667

llvm-svn: 297254
2017-03-08 01:54:50 +00:00
Sanjay Patel
fe9705149b [InstCombine] shrink truncated insertelement into undef vector
This is the 2nd part of solving:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement. 
We're limiting this transform to undef rather than any constant to avoid backend problems.

Differential Revision: https://reviews.llvm.org/D30137

llvm-svn: 297242
2017-03-07 23:27:14 +00:00
Evgeniy Stepanov
7a5cfa9a11 Fix one-after-the-end type metadata handling in globalsplit.
Itanium ABI may have an address point one byte after the end of a
vtable. When such vtable global is split, the !type metadata needs to
follow the right vtable.

Differential Revision: https://reviews.llvm.org/D30716

llvm-svn: 297236
2017-03-07 22:18:48 +00:00
Sanjay Patel
53fa17a014 [InstCombine] shrink truncated splat shuffle (2nd try)
This was committed at r297155 and reverted at r297166 because of an
over-reaching clang test. That should be fixed with r297189.

This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

This keeps with our general approach: changing arbitrary shuffles is off-limts,
but changing splat is ok. The transform is very similar to the existing
shrinkBitwiseLogic() canonicalization.

Differential Revision: https://reviews.llvm.org/D30123

llvm-svn: 297232
2017-03-07 21:45:16 +00:00
Gor Nishanov
c52006ab09 [coroutines] Add handling for unwind coro.ends
Summary:
The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and
other code that is only relevant during the initial invocation of the coroutine
and should not be present in resume and destroy parts.

In landing pads coro.end is replaced with an appropriate instruction to unwind to
caller. The handling of coro.end differs depending on whether the target is
using landingpad or WinEH exception model.

For landingpad based exception model, it is expected that frontend uses the
`coro.end`_ intrinsic as follows:

```
    ehcleanup:
      %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true)
      br i1 %InResumePart, label %eh.resume, label %cleanup.cont

    cleanup.cont:
      ; rest of the cleanup

    eh.resume:
      %exn = load i8*, i8** %exn.slot, align 8
      %sel = load i32, i32* %ehselector.slot, align 4
      %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
      %lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
      resume { i8*, i32 } %lpad.val29

```
The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
thus leading to immediate unwind to the caller, whereas in start function it
is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
code that is only needed during initial invocation of the coroutine.

For Windows Exception handling model, a frontend should attach a funclet bundle
referring to an enclosing cleanuppad as follows:

```
    ehcleanup:
      %tok = cleanuppad within none []
      %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
      cleanupret from %tok unwind label %RestOfTheCleanup
```

The `CoroSplit` pass, if the funclet bundle is present, will insert
``cleanupret from %tok unwind to caller`` before
the `coro.end`_ intrinsic and will remove the rest of the block.

Reviewers: majnemer

Reviewed By: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25543

llvm-svn: 297223
2017-03-07 21:00:54 +00:00
Matthew Simpson
c86b2134c7 [LV] Consider users that are memory accesses in uniforms expansion step
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.

llvm-svn: 297179
2017-03-07 18:47:30 +00:00
Sanjay Patel
6d30606168 revert r297155 because there's a clang test that depends on InstCombine:
tools/clang/test/CodeGen/zvector.c

llvm-svn: 297166
2017-03-07 17:41:45 +00:00
Adrian Prantl
d4056501fb Revert "Strip debug info when inlining into a nodebug function."
This reverts commit r296488.

As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.

llvm-svn: 297163
2017-03-07 17:28:57 +00:00
Nico Weber
3b2f0094d7 Revert r297132, it caused PR32171
llvm-svn: 297159
2017-03-07 17:23:52 +00:00
Sanjay Patel
defdb7bed5 [InstCombine] shrink truncated splat shuffle
This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

This keeps with our general approach: changing arbitrary shuffles is off-limts, 
but changing splat is ok. The transform is very similar to the existing 
shrinkBitwiseLogic() canonicalization.

Differential Revision: https://reviews.llvm.org/D30123

llvm-svn: 297155
2017-03-07 16:10:36 +00:00
Sam Parker
6ec5fdbc94 [LoopRotate] Update dbg.value intrinsics
Propagate debug info through the newly inserted PHI nodes.

Differential Revision: https://reviews.llvm.org/D30190

llvm-svn: 297132
2017-03-07 09:34:25 +00:00
Sanjoy Das
30c3538e2e [LoopUnrolling] Fix loop size check for peeling
Summary:
We should check if loop size allows us to peel at least one iteration
before we do so.

Patch by Max Kazantsev!

Reviewers: sanjoy, mkuper, efriedma

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30632

llvm-svn: 297122
2017-03-07 06:03:15 +00:00
Michael Kuperstein
768d013a03 [SLP] Revert r296863 due to miscompiles.
Details and reproducer are on the email thread for r296863.

llvm-svn: 297103
2017-03-06 23:54:51 +00:00
Daniel Berlin
961b002714 NewGVN: We were not really failing this testcase, because the instructions it was looking for are unused. GVN value numbers unused instructions, NewGVN does not. Fix the instructions to be used, so we eliminate the redundancies it's checking for, and un-XFAIL it
llvm-svn: 297058
2017-03-06 20:01:31 +00:00
Sanjay Patel
3bbee79d9e [InstSimplify] add tests for vector div/rem with UB potential; NFC
llvm-svn: 297048
2017-03-06 18:45:39 +00:00
Sanjay Patel
c494239bd8 [InstSimplify] regenerate checks; NFC
llvm-svn: 297040
2017-03-06 18:13:01 +00:00
Dehao Chen
c632a393b7 Remove the sample pgo annotation heuristic that uses call count to annotate basic block count.
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.

Reviewers: davidxl, aprantl

Reviewed By: aprantl

Subscribers: llvm-commits, aprantl

Differential Revision: https://reviews.llvm.org/D30658

llvm-svn: 297038
2017-03-06 17:49:59 +00:00
Alexey Bataev
d7db344c17 [SLP] A test for vectorization of users of extractelement instructions,
NFC.

llvm-svn: 297024
2017-03-06 16:26:00 +00:00
Evgeny Stupachenko
e4b0813d62 Add test missed in r296770.
Differential Revision: http://reviews.llvm.org/D27004

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 296962
2017-03-04 05:20:02 +00:00