Commit Graph

20610 Commits

Author SHA1 Message Date
Fangrui Song
f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Evandro Menezes
a7d48286fb [SLC] Refactor the simplication of pow() (NFC)
Use more meaningful variable names.  Mostly NFC.

llvm-svn: 338266
2018-07-30 16:20:04 +00:00
Alexandros Lamprineas
de3ca964c1 [GVNHoist] Re-enable GVNHoist by default
My initial motivation for this came from https://reviews.llvm.org/D48122,
where it was pointed out that my change didn't fit well in SimplifyCFG and
therefore using GVNHoist was a better way to go. GVNHoist has been disabled
for a while as there was a list of bugs related to it.

I have fixed the following bugs:

https://bugs.llvm.org/show_bug.cgi?id=37808 -> https://reviews.llvm.org/D48372 (rL337149)
https://bugs.llvm.org/show_bug.cgi?id=36787 -> https://reviews.llvm.org/D49555 (rL337674)
https://bugs.llvm.org/show_bug.cgi?id=37445 -> https://reviews.llvm.org/D49425 (rL337680)

The next two bugs no longer occur, and it's unclear which commit fixed them:

https://bugs.llvm.org/show_bug.cgi?id=36635
https://bugs.llvm.org/show_bug.cgi?id=37791

I investigated this one and proved to be unrelated to GVNHoist, but a genuine bug in NewGvn:

https://bugs.llvm.org/show_bug.cgi?id=37660

To convince myself GVNHoist is in a good state I made a successful bootstrap build of LLVM.
Merging this change now in order to make it to the LLVM 7.0.0 branch.

Differential Revision: https://reviews.llvm.org/D49858

llvm-svn: 338240
2018-07-30 10:50:18 +00:00
Max Kazantsev
3327bcaeb1 [NFC] Prepare GuardWidening for widening of cond branches
llvm-svn: 338229
2018-07-30 07:07:32 +00:00
Sanjay Patel
577c705752 [InstCombine] try to fold 'add+sub' to 'not+add'
These are reassociated versions of the same pattern and
similar transforms as in rL338200 and rL338118.

The motivation is identical to those commits:
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.

llvm-svn: 338221
2018-07-29 18:13:16 +00:00
Sanjay Patel
818b253d3a [InstCombine] try to fold 'sub' to 'not'
https://rise4fun.com/Alive/jDd

Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are 
commutative/associative. It can also help codegen.  

llvm-svn: 338200
2018-07-28 16:48:44 +00:00
David Green
fc4b0fe0a2 [GlobalOpt] Test array indices inside structs for out-of-bounds accesses
We now, from clang, can turn arrays of
  static short g_data[] = {16, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0};
into structs of the form
  @g_data = internal global <{ [8 x i16], [8 x i16] }> ...

GlobalOpt will incorrectly SROA it, not realising that the access to the first
element may overflow into the second. This fixes it by checking geps more
thoroughly.

I believe this makes the globalsra-partial.ll test case invalid as the %i value
could be out of bounds. I've re-purposed it as a negative test for this case.

Differential Revision: https://reviews.llvm.org/D49816

llvm-svn: 338192
2018-07-28 08:20:10 +00:00
Alina Sbirlea
5666c7e4bd [SimpleLoopUnswitch] Fix DT updates for trivial branch unswitching.
Summary:
Fixing 2 issues with the DT update in trivial branch switching, though I don't have a case where DT update fails.
1. After splitting ParentBB->UnswitchedBB edge, new edges become: ParentBB->LoopExitBB->UnswitchedBB, so remove ParentBB->LoopExitBB edge.
2. AFAIU, for multiple CFG changes, DT should be updated using batch updates, vs consecutive addEdge and removeEdge calls.

Reviewers: chandlerc, kuhar

Subscribers: sanjoy, jlebar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49925

llvm-svn: 338180
2018-07-28 00:01:05 +00:00
Reid Kleckner
ba82788ff6 [InstrProf] Don't register __llvm_profile_runtime_user
Refactor some FileCheck prefixes while I'm at it.

Fixes PR38340

llvm-svn: 338172
2018-07-27 22:21:35 +00:00
Sanjay Patel
78e4b4d3c4 [InstCombine] not(sub X, Y) --> add (not X), Y
The tests with constants show a missing optimization.
Analysis for adds is better than subs, so this can also
help with other transforms. And codegen is better with 
adds for targets like x86 (destructive ops, no sub-from).

https://rise4fun.com/Alive/llK

llvm-svn: 338118
2018-07-27 10:54:48 +00:00
Max Kazantsev
4d980515d2 [SimplifyIndVar] Canonicalize comparisons to unsigned while eliminating truncs
This is a follow-up for the patch rL335020. When we replace compares against
trunc with compares against wide IV, we can also replace signed predicates with
unsigned where it is legal.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D48763

llvm-svn: 338115
2018-07-27 09:43:39 +00:00
Matt Arsenault
d149650760 PatternMatch: Add wrappers for fabs and canonicalize
llvm-svn: 338111
2018-07-27 09:04:35 +00:00
Anastasis Grammenos
f6e143e67f Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction"
This reverts commit r338106.

llvm-svn: 338109
2018-07-27 08:22:54 +00:00
Anastasis Grammenos
03948d0e0f [LV][DebugInfo] Set DL to the middle block Icmp instruction
Reviewers: hsaito

Differential Revision: https://reviews.llvm.org/D49746

llvm-svn: 338106
2018-07-27 07:12:44 +00:00
Chen Zheng
567485a72f [InstCombine] canonicalize abs pattern
Differential Revision: https://reviews.llvm.org/D48754

llvm-svn: 338092
2018-07-27 01:49:51 +00:00
Vedant Kumar
b572f64212 [DebugInfo] LowerDbgDeclare: Add derefs when handling CallInst users
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.

The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).

This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.

One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.

Testing:

- check-llvm, and an end-to-end test using lldb to debug an optimized
  program.
- Existing unit tests for DIExpression::appendToStack fully cover the
  new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)

Differential Revision: https://reviews.llvm.org/D49454

llvm-svn: 338069
2018-07-26 20:56:53 +00:00
Sanjay Patel
6d6eab66e0 [InstCombine] fold udiv with common factor from muls with nuw
Unfortunately, sdiv isn't as simple because of UB due to overflow.

This fold is mentioned in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

llvm-svn: 338059
2018-07-26 19:22:41 +00:00
David Green
eda3c9efa2 [UnJ] Common some code. NFC
Create a processHeaderPhiOperands for analysing the instructions
in the aft blocks that must be moved before the loop.

Differential Revision: https://reviews.llvm.org/D49061

llvm-svn: 338033
2018-07-26 15:19:07 +00:00
Fangrui Song
984a424c8a [LoadStoreVectorizer] Use const reference
llvm-svn: 337992
2018-07-26 01:11:36 +00:00
Roman Tereshin
4f10a9d3a3 [LSV] Look through selects for consecutive addresses
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.

Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49428

llvm-svn: 337965
2018-07-25 21:33:00 +00:00
Florian Hahn
b6613ac665 Revert r337904: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
I suspect it is causing the clang-stage2-Rthinlto failures.

llvm-svn: 337956
2018-07-25 19:44:19 +00:00
Florian Hahn
6f5c6adbcd Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
r337828 resolves a PredicateInfo issue with unnamed types.

Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.

As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.

Reviewers: davide, mssimpso, dberlin, efriedma

Reviewed By: davide, dberlin

llvm-svn: 337904
2018-07-25 11:13:40 +00:00
Petr Hosek
47e5fcba57 [profile] Support profiling runtime on Fuchsia
This ports the profiling runtime on Fuchsia and enables the
instrumentation. Unlike on other platforms, Fuchsia doesn't use
files to dump the instrumentation data since on Fuchsia, filesystem
may not be accessible to the instrumented process. We instead use
the data sink to pass the profiling data to the system the same
sanitizer runtimes do.

Differential Revision: https://reviews.llvm.org/D47208

llvm-svn: 337881
2018-07-25 03:01:35 +00:00
Hideki Saito
ef380b0fc5 [LV] Fix for PR38110, LV encountered llvm_unreachable()
Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash).

Reviewers: sbaranga, mssimpso, hsaito

Reviewed By: hsaito

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49461

llvm-svn: 337861
2018-07-24 22:30:31 +00:00
Joel Galenson
8dbcc58917 Use SCEV to avoid inserting some bounds checks.
This patch uses SCEV to avoid inserting some bounds checks when they are not needed.  This slightly improves the performance of code compiled with the bounds check sanitizer.

Differential Revision: https://reviews.llvm.org/D49602

llvm-svn: 337830
2018-07-24 15:21:54 +00:00
Florian Hahn
36d2e25d5a [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.
This is a workaround and it would be better to fix this generally, but
doing it generally is quite tricky. See D48541 and PR38117.

Doing it in PredicateInfo directly allows us to use the type address to
differentiate different unnamed types, because neither the created
declarations nor the ssa_copy calls should be visible after
PredicateInfo got destroyed.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D49126

llvm-svn: 337828
2018-07-24 14:49:52 +00:00
Teresa Johnson
e214fdeb69 [ThinLTO] Ensure the TargetLibraryInfo is constructed early enough
Summary:
Without this change, the WholeProgramDevirt pass, which requires the
TargetLibraryInfo, will construct one from the default triple.

Fixes PR38139.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49278

llvm-svn: 337750
2018-07-23 21:58:19 +00:00
George Burgess IV
b00fb46479 [DebugCounters] Keep track of total counts
This patch makes debug counters keep track of the total number of times
we've called `shouldExecute` for each counter, so it's easier to build
automated tooling on top of these.

A patch to print these counts is coming soon.

Patch by Zhizhou Yang!

Differential Revision: https://reviews.llvm.org/D49560

llvm-svn: 337748
2018-07-23 21:49:36 +00:00
John Brawn
fc18a6ad7d [GVN] Don't use the eliminated load as an available value in phi construction
In ConstructSSAForLoadSet if an available value is actually the load that we're
doing SSA construction to eliminate, then we can omit it as SSAUpdate will add
in the value for the phi that will be replacing it anyway. This can result in
simpler IR which can allow further optimisation.

Differential Revision: https://reviews.llvm.org/D44160

llvm-svn: 337686
2018-07-23 12:14:45 +00:00
Alexandros Lamprineas
592cc78dd8 [GVNHoist] safeToHoistLdSt allows illegal hoisting
Bug fix for PR36787. When reasoning if it's safe to hoist a load we
want to make sure that the defining memory access dominates the new
insertion point of the hoisted instruction. safeToHoistLdSt calls
firstInBB(InsertionPoint,DefiningAccess) which returns false if
InsertionPoint == DefiningAccess, and therefore it falsely thinks
it's safe to hoist.

Differential Revision: https://reviews.llvm.org/D49555

llvm-svn: 337674
2018-07-23 09:42:35 +00:00
Aditya Kumar
373ce7eca5 Early exit with cheaper checks
Reviewers: sebpop,davide,fhahn,trentxintong
Differential Revision: https://reviews.llvm.org/D49617

llvm-svn: 337643
2018-07-21 14:13:44 +00:00
Peter Collingbourne
acf005676e Change the cap on the amount of padding for each vtable to 32-byte (previously it was 128-byte)
We tested different cap values with a recent commit of Chromium. Our results show that the 32-byte cap yields the smallest binary and all the caps yield similar performance.
Based on the results, we propose to change the cap value to 32-byte.

Patch by Zhaomo Yang!

Differential Revision: https://reviews.llvm.org/D49405

llvm-svn: 337622
2018-07-20 21:43:20 +00:00
Roman Tereshin
31d52847ef Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)

llvm-svn: 337606
2018-07-20 20:10:04 +00:00
Alexander Potapenko
80c6f41581 [MSan] Hotfix compilation
Make sure NewSI is used in materializeStores()

llvm-svn: 337577
2018-07-20 16:52:12 +00:00
Alexander Potapenko
5ff3abbc31 [MSan] run materializeChecks() before materializeStores()
When pointer checking is enabled, it's important that every pointer is
checked before its value is used.
For stores MSan used to generate code that calculates shadow/origin
addresses from a pointer before checking it.
For userspace this isn't a problem, because the shadow calculation code
is quite simple and compiler is able to move it after the check on -O2.
But for KMSAN getShadowOriginPtr() creates a runtime call, so we want the
check to be performed strictly before that call.

Swapping materializeChecks() and materializeStores() resolves the issue:
both functions insert code before the given IR location, so the new
insertion order guarantees that the code calculating shadow address is
between the address check and the memory access.

llvm-svn: 337571
2018-07-20 16:28:49 +00:00
Florian Hahn
ec3ca89a17 [IPSCCP] Fix for bot failure caused by r337548
llvm-svn: 337554
2018-07-20 14:37:10 +00:00
Florian Hahn
0a560d5d9c Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters.
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.

Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.

Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.

Reviewers: dberlin, mssimpso, davide, efriedma

Reviewed By: mssimpso

Differential Revision: https://reviews.llvm.org/D43762

llvm-svn: 337548
2018-07-20 13:29:12 +00:00
Sam McCall
57743883f1 Revert "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reverts commit r337489.
It causes asserts to fire in some TensorFlow tests, e.g.
tensorflow/compiler/tests/gather_test.py on GPU.

Example stack trace:
Start test case: GatherTest.testHigherRank
assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits"
    @     0x5559446ebe10  __assert_fail
    @     0x55593ef32f5e  llvm::APInt::trunc()
    @     0x55593d78f86e  (anonymous namespace)::Vectorizer::lookThroughComplexAddresses()
    @     0x55593d78f2bc  (anonymous namespace)::Vectorizer::areConsecutivePointers()
    @     0x55593d78d128  (anonymous namespace)::Vectorizer::isConsecutiveAccess()
    @     0x55593d78c926  (anonymous namespace)::Vectorizer::vectorizeInstructions()
    @     0x55593d78c221  (anonymous namespace)::Vectorizer::vectorizeChains()
    @     0x55593d78b948  (anonymous namespace)::Vectorizer::run()
    @     0x55593d78b725  (anonymous namespace)::LoadStoreVectorizer::runOnFunction()
    @     0x55593edf4b17  llvm::FPPassManager::runOnFunction()
    @     0x55593edf4e55  llvm::FPPassManager::runOnModule()
    @     0x55593edf563c  (anonymous namespace)::MPPassManager::runOnModule()
    @     0x55593edf5137  llvm::legacy::PassManagerImpl::run()
    @     0x55593edf5b71  llvm::legacy::PassManager::run()
    @     0x55593ced250d  xla::gpu::IrDumpingPassManager::run()
    @     0x55593ced5033  xla::gpu::(anonymous namespace)::EmitModuleToPTX()
    @     0x55593ced40ba  xla::gpu::(anonymous namespace)::CompileModuleToPtx()
    @     0x55593ced33d0  xla::gpu::CompileToPtx()
    @     0x55593b26b2a2  xla::gpu::NVPTXCompiler::RunBackend()
    @     0x55593b21f973  xla::Service::BuildExecutable()
    @     0x555938f44e64  xla::LocalService::CompileExecutable()
    @     0x555938f30a85  xla::LocalClient::Compile()
    @     0x555938de3c29  tensorflow::XlaCompilationCache::BuildExecutable()
    @     0x555938de4e9e  tensorflow::XlaCompilationCache::CompileImpl()
    @     0x555938de3da5  tensorflow::XlaCompilationCache::Compile()
    @     0x555938c5d962  tensorflow::XlaLocalLaunchBase::Compute()
    @     0x555938c68151  tensorflow::XlaDevice::Compute()
    @     0x55593f389e1f  tensorflow::(anonymous namespace)::ExecutorState::Process()
    @     0x55593f38a625  tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()()
*** SIGABRT received by PID 7798 (TID 7837) from PID 7798; ***

llvm-svn: 337541
2018-07-20 12:03:00 +00:00
Eli Friedman
a3c78f5981 [SCCP] Don't use markForcedConstant on branch conditions.
It's more aggressive than we need to be, and leads to strange
workarounds in other places like call return value inference. Instead,
just directly mark an edge viable.

Tests by Florian Hahn.

Differential Revision: https://reviews.llvm.org/D49408

llvm-svn: 337507
2018-07-19 23:02:07 +00:00
Roman Tereshin
b49b2a601f [LSV] Refactoring + supporting bitcasts to a type of different size
This is mostly a preparation work for adding a limited support for
select instructions. It proved to be difficult to do due to size and
irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I
believe.

It also turned out that these changes make it simpler to finish one of
the TODOs and fix a number of other small issues, namely:

1. Looking through bitcasts to a type of a different size (requires
careful tracking of the original load/store size and some math
converting sizes in bytes to expected differences in indices of GEPs).

2. Reusing partial analysis of pointers done by first attempt in proving
them consecutive instead of starting from scratch. This added limited
support for nested GEPs co-existing with difficult sext/zext
instructions. This also required a careful handling of negative
differences between constant parts of offsets.

3. Handing a case where the first pointer index is not an add, but
something else (a function parameter for instance).

I observe an increased number of successful vectorizations on a large
set of shader programs. Only few shaders are affected, but those that
are affected sport >5% less loads and stores than before the patch.

Reviewed By: rampitec

Differential-Revision: https://reviews.llvm.org/D49342
llvm-svn: 337489
2018-07-19 19:42:43 +00:00
Farhana Aleen
8c7a30baea [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers.
Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by
         comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or
         both of them are not factorized. But isConsecutiveAccess() fails if one of the
         pointers is factorized but the other one is not.

         Here is an example:
         PtrA = 4 * (A + B)
         PtrB = 4 + 4A + 4B

         This patch uses getMinusSCEV() to compute the distance between two pointers.
         getMinusSCEV() allows combining the expressions and computing the simplified distance.

Author: FarhanaAleen

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49516

llvm-svn: 337471
2018-07-19 16:50:27 +00:00
Teresa Johnson
28023dbed7 [ThinLTO] Enable ThinLTO WholeProgramDevirt and LowerTypeTests in new PM
Summary:
Enable these passes for CFI and WPD in ThinLTO and LTO with the new pass
manager. Add a couple of tests for both PMs based on the clang tests
tools/clang/test/CodeGen/thinlto-distributed-cfi*.ll, but just test
through llvm-lto2 and not with distributed ThinLTO.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49429

llvm-svn: 337461
2018-07-19 14:51:32 +00:00
Peter Collingbourne
4a653fa7f1 Rename __asan_gen_* symbols to ___asan_gen_*.
This prevents gold from printing a warning when trying to export
these symbols via the asan dynamic list after ThinLTO promotes them
from private symbols to external symbols with hidden visibility.

Differential Revision: https://reviews.llvm.org/D49498

llvm-svn: 337428
2018-07-18 22:23:14 +00:00
Xin Tong
074ccf32ce Skip debuginfo intrinsic in markLiveBlocks.
Summary:
The optimizer is 10%+ slower with vs without debuginfo. I started checking where
the difference is coming from.

I compiled sqlite3.c with and without debug info from CTMark and compare the time difference.

I use Xcode Instrument to find where time is spent. This brings about 20ms, out of ~20s.

Reviewers: davide, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D49337

llvm-svn: 337416
2018-07-18 18:40:45 +00:00
Simon Pilgrim
2b37ddce4b [SLPVectorizer] Avoid duplicate scalar cost calculations in BoUpSLP::getEntryCost. NFCI.
Pulled out from D49225, we have a lot of repeated scalar cost calculations, often with arguments that don't look the same but turn out to be.

llvm-svn: 337390
2018-07-18 13:53:55 +00:00
Roman Lebedev
3cb87e905c [InstCombine] Re-commit: Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

This is a re-commit, as the original patch, committed in rL337190
was reverted in rL337344 as it broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM

Differential Revision: https://reviews.llvm.org/D49320

llvm-svn: 337376
2018-07-18 10:55:17 +00:00
Bob Haarman
4ebe5d59b6 Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"
This reverts r337190 (and a few follow-up commits), which caused the
Chromium build to fail. See
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832

llvm-svn: 337344
2018-07-18 02:18:28 +00:00
Vedant Kumar
9ece818291 [InstCombine] Preserve debug value when simplifying cast-of-select
InstCombine has a cast transform that matches a cast-of-select:

  Orig = cast (Src = select Cond TV FV)

And tries to replace it with a select which has the cast folded in:

  NewSel = select Cond (cast TV) (cast FV)

The combiner does RAUW(Orig, NewSel), so any debug values for Orig would
survive the transform. But debug values for Src would be lost.

This patch teaches InstCombine to replace all debug uses of Src with
NewSel (taking care of doing any necessary DIExpression rewriting).

Differential Revision: https://reviews.llvm.org/D49270

llvm-svn: 337310
2018-07-17 18:08:36 +00:00
Florian Hahn
d95761d9d0 [IPSCCP] Run Solve each time we resolved an undef in a function.
Once we resolved an undef in a function we can run Solve, which could
lead to finding a constant return value for the function, which in turn
could turn undefs into constants in other functions that call it, before
resolving undefs there.

Computationally the amount of work we are doing stays the same, just the
order we process things is slightly different and potentially there are
a few less undefs to resolve.

We are still relying on the order of functions in the IR, which means
depending on the order, we are able to resolve the optimal undef first
or not. For example, if @test1 comes before @testf, we find the constant
return value of @testf too late and we cannot use it while solving
@test1.

This on its own does not lead to more constants removed in the
test-suite, probably because currently we have to be very lucky to visit
applicable functions in the right order.

Maybe we manage to come up with a better way of resolving undefs in more
'profitable' functions first.

Reviewers: efriedma, mssimpso, davide

Reviewed By: efriedma, davide

Differential Revision: https://reviews.llvm.org/D49385

llvm-svn: 337283
2018-07-17 14:04:59 +00:00
Simon Pilgrim
1a4f3c93fb [SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191)
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.

llvm-svn: 337280
2018-07-17 13:43:33 +00:00