Commit Graph

19534 Commits

Author SHA1 Message Date
Yaxun Liu
3c42f1c3c9 LoopUnroll: respect pragma unroll when AllowRemainder is disabled
Currently when AllowRemainder is disabled, pragma unroll count is not
respected even though there is no remainder. This bug causes a loop
fully unrolled in many cases even though the user specifies a unroll
count. Especially it affects OpenCL/CUDA since in many cases a loop
contains convergent instructions and currently AllowRemainder is
disabled for such loops.

Differential Revision: https://reviews.llvm.org/D43826

llvm-svn: 326585
2018-03-02 16:22:32 +00:00
Florian Hahn
515acd64fd [LV][CFG] Add irreducible CFG detection for outer loops
This patch adds support for detecting outer loops with irreducible control
flow in LV. Current detection uses SCCs and only works for innermost loops.
This patch adds a utility function that works on any CFG, given its RPO
traversal and its LoopInfoBase. This function is a generalization
of isIrreducibleCFG  from lib/CodeGen/ShrinkWrap.cpp. The code in
lib/CodeGen/ShrinkWrap.cpp is also updated to use the new generic utility
function.

Patch by Diego Caballero <diego.caballero@intel.com>

Differential Revision: https://reviews.llvm.org/D40874

llvm-svn: 326568
2018-03-02 12:24:25 +00:00
Fedor Indutny
1571b1271e [ArgumentPromotion] don't break musttail invariant PR36543
Summary:
Do not break musttail invariant by promoting arguments of musttail
callee or caller.

Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv, fhahn, rnk

Reviewed By: rnk

Subscribers: rnk, llvm-commits

Differential Revision: https://reviews.llvm.org/D43926

llvm-svn: 326521
2018-03-02 00:59:27 +00:00
Sanjay Patel
d0cdb2f861 [InstCombine] allow fmul fold with less than 'fast'
This is a retry of r326502 with updates to the reassociate 
test file that I missed the first time.

@test15_reassoc in the supposed -reassociate test file 
(except that it tests 2 other passes too...) shows that
there's no clear responsiblity for reassociation transforms.

Instcombine now gets that case, but only because the
constant values are identical. Otherwise, it would still
miss that pattern. 

Reassociate doesn't get that case because it hasn't been 
updated to use less than 'fast' FMF.

llvm-svn: 326513
2018-03-02 00:14:51 +00:00
Sanjay Patel
eb5d046890 revert r326502: [InstCombine] allow fmul fold with less than 'fast'
I forgot that I added tests for 'reassoc' to -reassociate, but
suprisingly that file calls -instcombine too, so it is affected.
I'll update that file and try again.

llvm-svn: 326510
2018-03-01 23:39:24 +00:00
Sanjay Patel
7373ae5c9a [InstCombine] allow fmul fold with less than 'fast'
llvm-svn: 326502
2018-03-01 22:53:47 +00:00
Craig Topper
2915bc0046 [SimplifyLibCalls] Update an obviously copy and pasted header comment to match this file. NFC
llvm-svn: 326475
2018-03-01 20:05:09 +00:00
Sanjay Patel
f3b1af7aa4 [InstCombine] simplify code for (X*Y) * X => (X*X) * Y ; NFCI
llvm-svn: 326444
2018-03-01 15:50:26 +00:00
Benjamin Kramer
d1cf7ff5ab [SCCP] Fix unused variable warning in release builds.
llvm-svn: 326429
2018-03-01 11:31:44 +00:00
Reid Kleckner
3762a089d7 [IPSCCP] do not break musttail invariant (PR36485)
Do not replace results of `musttail` calls with a constant if the
call itself can't be removed.

Do not zap returns of `musttail` callees, if the call site can't be
removed and replaced with a constant.

Do not zap returns of `musttail`-calling blocks, this breaks
invariant too.

Patch by Fedor Indutny

Differential Revision: https://reviews.llvm.org/D43695

llvm-svn: 326404
2018-03-01 01:19:18 +00:00
Reid Kleckner
cb9611ca67 [DAE] don't remove args of musttail target/caller
`musttail` requires identical signatures of caller and callee. Removing
arguments breaks `musttail` semantics.

PR36441

Patch by Fedor Indutny

Differential Revision: https://reviews.llvm.org/D43708

llvm-svn: 326394
2018-03-01 00:09:35 +00:00
Sanjay Patel
eaf5a120ed [InstCombine] simplify code for X * -1.0 --> -X; NFC
I've added random FMF to one of the tests to show those are propagated.

llvm-svn: 326377
2018-02-28 22:30:04 +00:00
Jonas Devlieghere
9ca064552a [GlobalOpt] don't change CC of musttail calle(e|r)
When the function has musttail call - its cc is fixed to be equal to the
cc of the musttail callee. In such case (and in the case of the musttail
callee), GlobalOpt should not change the cc to fastcc as it will break
the invariant.

This fixes PR36546

Patch by: Fedor Indutny (indutny)

Differential revision: https://reviews.llvm.org/D43859

llvm-svn: 326376
2018-02-28 22:28:44 +00:00
Craig Topper
b95298b041 [InstCombine] Split the FP constant code out of lookThroughFPExtensions and use nullptr as a sentinel
Currently this code's control flow very much assumes that there are no meaningful checks after determining that it's a ConstantFP. So whenever it wants to stop it just does "return V". But V is also the variable name it uses when it wants to return a new value. So 'return V' appears multiple times with different meanings.

This patch just moves all the code into a helper function and returns nullptr when it wants to stop.

I've split this from D43774 while I try to figure out how to best handle the vector case there. But this change by itself at least seemed like a readability improvement.

Differential Revision: https://reviews.llvm.org/D43833

llvm-svn: 326361
2018-02-28 20:14:34 +00:00
Vedant Kumar
9a041a7522 [InstrProfiling] Emit the runtime hook when no counters are lowered
The API verification tool tapi has difficulty processing frameworks
which enable code coverage, but which have no code. The profile lowering
pass does not emit the runtime hook in this case because no counters are
lowered.

While the hook is not needed for program correctness (the profile
runtime doesn't have to be linked in), it's needed to allow tapi to
validate the exported symbol set of instrumented binaries.

It was not possible to add a workaround in tapi for empty binaries due
to an architectural issue: tapi generates its expected symbol set before
it inspects a binary. Changing that model has a higher cost than simply
forcing llvm to always emit the runtime hook.

rdar://36076904

Differential Revision: https://reviews.llvm.org/D43794

llvm-svn: 326350
2018-02-28 19:00:08 +00:00
Sanjay Patel
b3f4f62698 [InstCombine] move invariant call out of loop; NFC
We really shouldn't need a 2-loop here at all, but that's another cleanup.

llvm-svn: 326330
2018-02-28 16:50:51 +00:00
Sanjay Patel
8fdd87f929 [InstCombine] move constant check into foldBinOpIntoSelectOrPhi; NFCI
Also, rename 'foldOpWithConstantIntoOperand' because that's annoyingly 
vague. The constant check is redundant in some cases, but it allows 
removing duplication for most of the calls.

llvm-svn: 326329
2018-02-28 16:36:24 +00:00
Xin Tong
256869d8bc Fix typo. NFC
llvm-svn: 326319
2018-02-28 12:09:53 +00:00
Xin Tong
8ba674e43b [MergeICmp] Fix a bug in MergeICmp that can lead to a block being processed more than once.
Summary:
Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module.
The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block
is processed and second time when the last block is processed.

We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module.

Reviewers: courbet, davide

Reviewed By: courbet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43825

llvm-svn: 326318
2018-02-28 12:08:00 +00:00
David Green
7c35de124a [Dominators] Remove verifyDomTree and add some verifying for Post Dom Trees
Removes verifyDomTree, using assert(verify()) everywhere instead, and
changes verify a little to always run IsSameAsFreshTree first in order
to print good output when we find errors. Also adds verifyAnalysis for
PostDomTrees, which will allow checking of PostDomTrees it the same way
we check DomTrees and MachineDomTrees.

Differential Revision: https://reviews.llvm.org/D41298

llvm-svn: 326315
2018-02-28 11:00:08 +00:00
Florian Hahn
1807c516c7 [NewGVN] Update phi-of-ops def block when updating existing ValuePHI.
In case we update a ValuePHI node created earlier, we could update it
based on a different OpPHI which could be in a different block.
We need to update the TempToBlock mapping reflecting the new block,
otherwise we would end up placing the new phi node in a wrong block.

This problem is exposed by the test case in
https://bugs.llvm.org/show_bug.cgi?id=36504.

This patch fixes a slightly simpler problem than in the bug report. In
the bug's re-producer, the additional problem is that we are re-using a
ValuePHI node with to few incoming values for the new OpPHI. If this
patch makes sense, I will follow it up with a patch that creates a new
PHI node if the existing PHI node has a different number of incoming
values.

Reviewers: davide, dberlin

Reviewed By: dberlin

Differential Revision: https://reviews.llvm.org/D43770

llvm-svn: 326181
2018-02-27 09:34:51 +00:00
Sanjay Patel
31a90468e1 [InstCombine] allow fdiv folds with less than fully 'fast' ops
Note: gcc appears to allow this fold with -freciprocal-math alone, 
but clang/llvm require more than that with this patch. The wording
in the definitions seems fuzzy enough that it could go either way,
but we'll err on the conservative side of FMF interpretation.

This patch also changes the newly created fmul to have FMF propagated
by the last fdiv rather than intersecting the FMF of the fdivs. This
matches the behavior of other folds near here. The new fmul is only 
used to produce an intermediate op for the final fdiv result, so it
shouldn't be any stricter than that result. The previous behavior
could result in dropping FMF via other folds in instcombine or CSE.

Differential Revision: https://reviews.llvm.org/D43398

llvm-svn: 326098
2018-02-26 16:02:45 +00:00
Renato Golin
9d1b2acaaa [LV] Move isLegalMasked* functions from Legality to CostModel
All SIMD architectures can emulate masked load/store/gather/scatter
through element-wise condition check, scalar load/store, and
insert/extract. Therefore, bailing out of vectorization as legality
failure, when they return false, is incorrect. We should proceed to cost
model and determine profitability.

This patch is to address the vectorizer's architectural limitation
described above. As such, I tried to keep the cost model and
vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning
should be done separately.

Please see
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for
RFC and the discussions.

Closes D43208.

Patch by: Hideki Saito <hideki.saito@intel.com>

llvm-svn: 326079
2018-02-26 11:06:36 +00:00
Florian Hahn
a1822cbabc [LoopInterchange] Loops with empty dependency matrix are safe.
The dependency matrix is only empty if no conflicting load/store
instructions have been found. In that case, it is safe to interchange.

For the LLVM test-suite, after this change around 1900 loops are
interchanged, whereas it is 15 before this change. On cortex-a57,
this gives an improvement of -0.57% on the geomean execution
time of SPEC2006, SPEC2000 and the test-suite. There are a
few small perf regressions, but I think we can improve on those
by making the cost model better.

Reviewers: karthikthecool, mcrosier

Reviewed by: karthikthecool

Differential Revision: https://reviews.llvm.org/D43236

llvm-svn: 326077
2018-02-26 10:45:25 +00:00
Adam Nemet
e4e1de60aa Revert "StructurizeCFG: Test for branch divergence correctly"
This reverts commit r325881.

Breaks many bots

llvm-svn: 326037
2018-02-24 17:29:09 +00:00
Sanjay Patel
2db2769499 [InstCombine] simplify code for fabs(X) * fabs(X) -> X * X; NFC
llvm-svn: 325968
2018-02-23 22:38:10 +00:00
Sanjay Patel
db53d1847b [InstSimplify] sqrt(X) * sqrt(X) --> X
This was misplaced in InstCombine. We can loosen the FMF as a follow-up step.

llvm-svn: 325965
2018-02-23 22:20:13 +00:00
Sanjay Patel
d32104e1b2 [InstCombine] allow fmul-sqrt folds with less than full -ffast-math
Also, add a Builder method for intrinsics to reduce code duplication for clients.

llvm-svn: 325960
2018-02-23 21:16:12 +00:00
Matt Davis
523c656e25 [Debug] Add dbg.value intrinsics for PHIs created during LCSSA.
Summary:
This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA.
I noticed a case where a value carried across a loop was reported as <optimized out>.

Specifically this case:
```
int bar(int x, int y) {
  return x + y;
}

int foo(int size) {
  int val = 0;
  for (int i = 0; i < size; ++i) {
    val = bar(val, i);  // Both val and i are correct
  }
  return val; // <optimized out>
}
```

In the above case, after all of the interesting computation completes our value
is reported as "optimized out." This change will add a dbg.value to correct this.

This patch also moves the dbg.value insertion routine from LoopRotation.cpp 
into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA).

Reviewers: mzolotukhin, aprantl, vsk, davide

Reviewed By: aprantl, vsk

Subscribers: dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D42551

llvm-svn: 325926
2018-02-23 17:38:27 +00:00
Sanjay Patel
6b9c7a9c83 [InstCombine] refactor fmul with negated op folds; NFCI
The existing code was inefficiently looking for 'nsz' variants.
That's unnecessary because we canonicalize those to the expected
form with -0.0.

We may also want to adjust or remove the fold that sinks negation.
We don't do that for fdiv (or integer ops?). That should be uniform?
It may also lead to missed optimization as in PR21914:
https://bugs.llvm.org/show_bug.cgi?id=21914
...or we just have to fix other passes to avoid that problem.

llvm-svn: 325924
2018-02-23 17:14:28 +00:00
Sanjay Patel
4a9116e897 [InstCombine] use FMF-copying functions to reduce code; NFCI
llvm-svn: 325923
2018-02-23 17:07:29 +00:00
Nicolai Haehnle
43c1115cd4 StructurizeCFG: Test for branch divergence correctly
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Reviewers: arsenm, rampitec, jlebar

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D40546

llvm-svn: 325881
2018-02-23 10:45:46 +00:00
Bjorn Steinbrink
983d6c3f18 Mark MergedLoadStoreMotion as not preserving MemDep results
Summary:
MemDep caches results that signify that a dependence is non-local, and
there is currently no way to invalidate such cache entries.
Unfortunately, when MLSM sinks a store that can result in a non-local
dependence becoming a local one, and then MemDep gives wrong answers.
The easiest way out here is to just say that MLSM does indeed not
preserve MemDep results.

Reviewers: davide, Gerolf

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43177

llvm-svn: 325880
2018-02-23 10:41:57 +00:00
Eric Christopher
675dcf02a8 Update comment for whether or not we can optimize an alias - we're
checking the alias and not the aliasee. If the alias can be interposed
then we shouldn't do anything.

llvm-svn: 325837
2018-02-22 23:12:11 +00:00
Peter Collingbourne
32f5405bff Fix DataFlowSanitizer instrumentation pass to take parameter position changes into account for custom functions.
When DataFlowSanitizer transforms a call to a custom function, the
new call has extra parameters. The attributes on parameters must be
updated to take the new position of each parameter into account.

Patch by Sam Kerner!

Differential Revision: https://reviews.llvm.org/D43132

llvm-svn: 325820
2018-02-22 19:09:07 +00:00
Daniel Neilson
20c9207be3 [AlignmentFromAssumptions] Set source and dest alignments of memory intrinsiscs separately
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

Reviewers: hfinkel, bollu, reames

Reviewed By: reames

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D43081

llvm-svn: 325816
2018-02-22 18:55:59 +00:00
Luke Cheeseman
6c1e6bbe0c [FunctionAttrs][ArgumentPromotion][GlobalOpt] Disable some optimisations passes for naked functions
- Fix for bug 36078.
- Prevent the functionattrs, function-attrs, globalopt and argpromotion passes
  from changing naked functions.
- These passes can perform some alterations to the functions that should not be
  applied. An example is removing parameters that are seemingly not used because
  they are only referenced in the inline assembly. Another example is marking
  the function as fastcc.

llvm-svn: 325788
2018-02-22 14:42:08 +00:00
Mircea Trofin
56950974d4 [SampleProf] NFC. Expose reusable functionality in SampleProfile.
Summary:
Exposing getOffset and findFunctionSamples as members of
SampleProfile. They are intimately tied to design choices of the
sample profile format - using offsets instead of line numbers, and
traversing inlined functions stack, respectively.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43605

llvm-svn: 325747
2018-02-22 06:42:57 +00:00
Vedant Kumar
1ceabcf080 [Utils] Avoid a hash table lookup in salvageDI, NFC
According to the current coverage report salvageDebugInfo() is called
5.12 million times during testing and almost always returns early.

The early return depends on LocalAsMetadata::getIfExists returning null,
which involves a DenseMap lookup in an LLVMContextImpl. We can probably
speed this up by simply checking the IsUsedByMD bit in Value.

llvm-svn: 325738
2018-02-22 01:29:41 +00:00
Sanjay Patel
5a6f904520 [InstCombine] add and use Create*FMF functions; NFC
llvm-svn: 325730
2018-02-21 22:18:55 +00:00
Evgeniy Stepanov
43271b1803 [hwasan] Fix inline instrumentation.
This patch changes hwasan inline instrumentation:

Fixes address untagging for shadow address calculation (use 0xFF instead of 0x00 for the top byte).
Emits brk instruction instead of hlt for the kernel and user space.
Use 0x900 instead of 0x100 for brk immediate (0x100 - 0x800 are unavailable in the kernel).
Fixes and adds appropriate tests.

Patch by Andrey Konovalov.

Differential Revision: https://reviews.llvm.org/D43135

llvm-svn: 325711
2018-02-21 19:52:23 +00:00
Vedant Kumar
56492f9177 [BDCE] Salvage debug info from dying insts
This results in 15 additional unique source variables in a stage2 build
of FileCheck (at '-Os -g'), with a negligible increase in the size of
the .debug_loc section.

llvm-svn: 325660
2018-02-21 01:55:33 +00:00
Sanjay Patel
6f716a7c5e [InstCombine] C / -X --> -C / X
We already do this in DAGCombiner, but it should
also be good to eliminate the fsub use in IR.

This is similar to rL325648.

llvm-svn: 325649
2018-02-21 00:01:45 +00:00
Sanjay Patel
d8dd0151fc [InstCombine] -X / C --> X / -C for FP
We already do this in DAGCombiner, but it should 
also be good to eliminate the fsub use in IR.

llvm-svn: 325648
2018-02-20 23:51:16 +00:00
Sanjoy Das
737fa40ffa [DSE] Don't DSE stores that subsequent memmove calls read from
Summary:
We used to remove the first memmove in cases like this:

  memmove(p, p+2, 8);
  memmove(p, p+2, 8);

which is incorrect.  Fix this by changing isPossibleSelfRead to what was most
likely the intended behavior.

Historical note: the buggy code was added in https://reviews.llvm.org/rL120974
to address PR8728.

Reviewers: rsmith

Subscribers: mcrosier, llvm-commits, jlebar

Differential Revision: https://reviews.llvm.org/D43425

llvm-svn: 325641
2018-02-20 23:19:34 +00:00
Sanjay Patel
7365b44b85 [InstCombine] remove unneeded operand swap: NFCI
FMul is commutative, so complexity-based canonicalization should always 
take care of the swap via SimplifyAssociativeOrCommutative(). 

llvm-svn: 325628
2018-02-20 21:52:46 +00:00
Sanjay Patel
29b98ae337 [InstCombine] remove unneeded dyn_cast to prevent unused variable warning
llvm-svn: 325597
2018-02-20 17:14:53 +00:00
Sanjay Patel
b2d978682b [InstCombine] remove compound fdiv pattern folds
These are fdiv-with-constant-divisor, so they already become
reciprocal multiplies. The last gap for vector ops should be
closed with rL325590.

It's possible that we're missing folds for some edge cases 
with denormal intermediate constants after deleting these,
but there are no tests for those patterns, and it would be 
better to handle denormals more consistently (and less 
conservatively) as noted in TODO comments.

llvm-svn: 325595
2018-02-20 16:52:17 +00:00
Sanjay Patel
90f4c8ec29 [InstCombine] fold fdiv with non-splat divisor to fmul: X/C --> X * (1/C)
llvm-svn: 325590
2018-02-20 16:08:15 +00:00
Sanjay Patel
2816560b2c [InstCombine] use CreateWithCopiedFlags to reduce code; NFCI
Also, move the folds with constants closer to make it easier to follow. 

llvm-svn: 325541
2018-02-19 23:09:03 +00:00