Commit Graph

1581 Commits

Author SHA1 Message Date
Dinar Temirbulatov
b2a9a23213 [SLPVectorizer] buildTree_rec replace cast<Instruction>(VL[0]) to VL0, NFCI.
llvm-svn: 308745
2017-07-21 15:31:54 +00:00
Dinar Temirbulatov
3206409d91 [SLPVectorizer] Change canReuseExtract function parameter Opcode from unsigned to Value *, NFCI.
llvm-svn: 308739
2017-07-21 13:32:36 +00:00
Ayal Zaks
8c452d76ed [LV] Test once if vector trip count is zero, instead of twice
Generate a single test to decide if there are enough iterations to jump to the
vectorized loop, or else go to the scalar remainder loop. This test compares the
Scalar Trip Count: if STC < VF * UF go to the scalar loop. If
requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the
rest can be used to form vector iterations. So in this case the test checks
instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the
scalar loop if so. Otherwise the vector loop is entered for at-least one vector
iteration.

This test covers the case where incrementing the backedge-taken count will
overflow leading to an incorrect trip count of zero. In this (rare) case we will
also avoid the vector loop and jump to the scalar loop.

This patch simplifies the existing tests and effectively removes the basic-block
originally named "min.iters.checked", leaving the single test in block
"vector.ph".

Original observation and initial patch by Evgeny Stupachenko.

Differential Revision: https://reviews.llvm.org/D34150

llvm-svn: 308421
2017-07-19 05:16:39 +00:00
Simon Pilgrim
7ac3efacc0 Remove unnecessary cast. NFCI.
llvm-svn: 308166
2017-07-17 09:35:03 +00:00
Dinar Temirbulatov
3c64077c82 [SLPVectorizer] Add an extra parameter to tryScheduleBundle function, NFCI.
llvm-svn: 308081
2017-07-15 05:43:54 +00:00
Dinar Temirbulatov
21599fe2de [SLPVectorizer] Add an extra parameter to alreadyVectorized function, NFCI.
llvm-svn: 307996
2017-07-14 03:48:29 +00:00
Michael Kuperstein
fdb46b2fb4 [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions.
This fixes PR33706.
Differential Revision: https://reviews.llvm.org/D35227

llvm-svn: 307837
2017-07-12 19:53:55 +00:00
Dinar Temirbulatov
09b6779709 [SLPVectorizer] Revert change in cancelScheduling with referencing to FirstInBundle, NFCI.
llvm-svn: 307667
2017-07-11 15:54:50 +00:00
Dinar Temirbulatov
b78adec638 [SLPVectorizer] Add an extra parameter to cancelScheduling function, NFCI.
llvm-svn: 307158
2017-07-05 13:53:03 +00:00
Teresa Johnson
c12306c0ad Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction
instructions to dehao.

llvm-svn: 306936
2017-07-01 03:24:09 +00:00
Teresa Johnson
eb4fba9d61 re-commit r306336: Enable vectorizer-maximize-bandwidth by default.
Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306935
2017-07-01 03:24:08 +00:00
Teresa Johnson
de56903bde revert r306336 for breaking ppc test.
llvm-svn: 306934
2017-07-01 03:24:07 +00:00
Teresa Johnson
1fbaffeba1 Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306933
2017-07-01 03:24:06 +00:00
Dinar Temirbulatov
2fb1075f14 [SLPVectorizer] Add isOdd() helper function, NFCI.
llvm-svn: 306887
2017-06-30 21:16:26 +00:00
Ayal Zaks
2ff59d4350 [LV] Sink casts to unravel first order recurrence
Check if a single cast is preventing handling a first-order-recurrence Phi,
because the scheduling constraints it imposes on the first-order-recurrence
shuffle are infeasible; but they can be made feasible by moving the cast
downwards. Record such casts and move them when vectorizing the loop.

Differential Revision: https://reviews.llvm.org/D33058

llvm-svn: 306884
2017-06-30 21:05:06 +00:00
Ayal Zaks
8d26f0a602 [LV] Optimize for size when vectorizing loops with tiny trip count
It may be detrimental to vectorize loops with very small trip count, as various
costs of the vectorized loop body as well as enclosing overheads including
runtime tests and scalar iterations may outweigh the gains of vectorizing. The
current cost model measures the cost of the vectorized loop body only, expecting
it will amortize other costs, and loops with known or expected very small trip
counts are not vectorized at all. This patch allows loops with very small trip
counts to be vectorized, but under OptForSize constraints, which ensure the cost
of the loop body is dominant, having no runtime guards nor scalar iterations.

Patch inspired by D32451.

Differential Revision: https://reviews.llvm.org/D34373

llvm-svn: 306803
2017-06-30 08:02:35 +00:00
Chandler Carruth
3545a9e1f9 Remove the BBVectorize pass.
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html

Time to actually let go and move forward. =]

I've updated the release notes both about the removal and the
deprecation of the corresponding C API.

llvm-svn: 306797
2017-06-30 07:09:08 +00:00
Daniel Jasper
5ce1ce742e Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction
instructions to dehao.

llvm-svn: 306792
2017-06-30 06:32:21 +00:00
Dinar Temirbulatov
f05c73c132 [SLPVectorizer] Moving Entry->NeedToGather check out of inner loop,
since it is invariant there. NFCI.

llvm-svn: 306749
2017-06-29 21:56:33 +00:00
Dinar Temirbulatov
7b96266a16 [SLPVectorizer] Introducing getTreeEntry() helper function [NFC]
Differential Revision: https://reviews.llvm.org/D34756

llvm-svn: 306655
2017-06-29 08:46:18 +00:00
Ayal Zaks
d9bc43ef2a [LV] Fix PR33613 - retain order of insertelement per part
r306381 caused PR33613, by reversing the order in which insertelements were
generated per unroll part. This patch fixes PR33613 by retraining this order,
placing each set of insertelements per part immediately after the last scalar
being packed for this part. Includes a test case derived from PR33613.

Reference: https://bugs.llvm.org/show_bug.cgi?id=33613
Differential Revision: https://reviews.llvm.org/D34760

llvm-svn: 306575
2017-06-28 17:59:33 +00:00
Dehao Chen
920d022519 re-commit r306336: Enable vectorizer-maximize-bandwidth by default.
Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306473
2017-06-27 22:05:58 +00:00
Ayal Zaks
fc1e210d44 Recommitting 306331.
Undoing revert 306338 after fixed bug: add metadata to the load instead of the
reverse shuffle added to it, retaining the original ValueMap implementation.

llvm-svn: 306381
2017-06-27 08:41:19 +00:00
Dehao Chen
8b7effb344 revert r306336 for breaking ppc test.
llvm-svn: 306344
2017-06-26 23:05:35 +00:00
Ayal Zaks
3923c0c46b reverting 306331.
Causes TBAA metadata to be generates on reverse shuffles, investigating.

llvm-svn: 306338
2017-06-26 22:26:54 +00:00
Dehao Chen
79655792cc Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 306336
2017-06-26 21:41:09 +00:00
Ayal Zaks
e7e15d186b [LV] Changing the interface of ValueMap, NFC.
Instead of providing access to the internal MapStorage holding all Values
associated with a given Key, used for setting or resetting them all together,
ValueMap keeps its MapStorage internal; its new interface allows getting,
setting or resetting a single Value, per part or per part-and-lane.
Follows the discussion in https://reviews.llvm.org/D32871.

Differential Revision: https://reviews.llvm.org/D34473

llvm-svn: 306331
2017-06-26 21:03:51 +00:00
Diana Picus
b512e91515 Revert "Enable vectorizer-maximize-bandwidth by default."
This reverts commit r305960 because it broke self-hosting on AArch64.

llvm-svn: 305990
2017-06-22 10:00:28 +00:00
Dehao Chen
014db29b89 Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

llvm-svn: 305960
2017-06-21 22:01:32 +00:00
Taewook Oh
9083547ae3 Improve profile-guided heuristics to use estimated trip count.
Summary:
Existing heuristic uses the ratio between the function entry
frequency and the loop invocation frequency to find cold loops. However,
even if the loop executes frequently, if it has a small trip count per
each invocation, vectorization is not beneficial. On the other hand,
even if the loop invocation frequency is much smaller than the function
invocation frequency, if the trip count is high it is still beneficial
to vectorize the loop.

This patch uses estimated trip count computed from the profile metadata
as a primary metric to determine coldness of the loop. If the estimated
trip count cannot be computed, it falls back to the original heuristics.

Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32451

llvm-svn: 305729
2017-06-19 18:48:58 +00:00
Dinar Temirbulatov
e2c6991c07 Remove brackets, NFC.
llvm-svn: 305706
2017-06-19 16:44:07 +00:00
George Burgess IV
a20352e13e [LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.
If we're shrinking a binary operation, it may be the case that the new
operations wraps where the old didn't. If this happens, the behavior
should be well-defined. So, we can't always carry wrapping flags with us
when we shrink operations.

If we do, we get incorrect optimizations in cases like:

void foo(const unsigned char *from, unsigned char *to, int n) {
  for (int i = 0; i < n; i++)
    to[i] = from[i] - 128;
}

which gets optimized to:

void foo(const unsigned char *from, unsigned char *to, int n) {
  for (int i = 0; i < n; i++)
    to[i] = from[i] | 128;
}

Because:
- InstCombine turned `sub i32 %from.i, 128` into
  `add nuw nsw i32 %from.i, 128`.
- LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a
  vector full of `i8 128`s
- InstCombine took advantage of the fact that the newly-shrunken add
  "couldn't wrap", and changed the `add` to an `or`.

InstCombine seems happy to figure out whether we can add nuw/nsw on its
own, so I just decided to drop the flags. There are already a number of
places in LoopVectorize where we rely on InstCombine to clean up.

llvm-svn: 305053
2017-06-09 03:56:15 +00:00
Chandler Carruth
6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Ayal Zaks
ab32aff838 [LV] Make scalarizeInstruction() non-virtual. NFC.
Following the request made in https://reviews.llvm.org/D32871,
scalarizeInstruction() which is no longer overridden by InnerLoopUnroller is
hereby made non-virtual in InnerLoopVectorizer.

Should have been part of r297580 originally.

llvm-svn: 304685
2017-06-04 13:29:51 +00:00
Galina Kistanova
96d51f5bcb Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304636
2017-06-03 05:18:46 +00:00
Alexey Bataev
e4e5923ef1 [SLP] Improve comments and naming of functions/variables/members, NFC.
Fixed some comments, added an additional description of the algorithms,
improved readability of the code.

Differential revision: https://reviews.llvm.org/D33320

llvm-svn: 304616
2017-06-03 00:08:21 +00:00
Alexey Bataev
03ca396b95 Revert "[SLP] Improve comments and naming of functions/variables/members, NFC."
This reverts commit 6e311de8b907aa20da9a1a13ab07c3ce2ef4068a.

llvm-svn: 304609
2017-06-02 23:09:15 +00:00
Alexey Bataev
2c08fde9e5 [SLP] Improve comments and naming of functions/variables/members, NFC.
Summary:
Fixed some comments, added an additional description of the algorithms,
improved readability of the code.

Reviewers: anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33320

llvm-svn: 304593
2017-06-02 20:39:27 +00:00
Matthew Simpson
646475a9bc [LV] Reapply r303763 with fix for PR33193
r303763 caused build failures in some out-of-tree tests due to an assertion in
TTI. The original patch updated cost estimates for induction variable update
instructions marked for scalarization. However, it didn't consider that the
incoming value of an induction variable phi node could be a cast instruction.
This caused queries for cast instruction costs with a mix of vector and scalar
types. This patch includes a fix for cast instructions and the test case from
PR33193.

The fix was suggested by Jonas Paulsson <paulsson@linux.vnet.ibm.com>.

Reference: https://bugs.llvm.org/show_bug.cgi?id=33193
Original Differential Revision: https://reviews.llvm.org/D33457

llvm-svn: 304235
2017-05-30 19:55:57 +00:00
Joerg Sonnenberger
9375a25342 Revert r303763, results in asserts i.e. while building Ruby.
llvm-svn: 304179
2017-05-29 22:52:17 +00:00
Matthew Simpson
d6f179cad6 [LV] Update type in cost model for scalarization
For non-uniform instructions marked for scalarization, we should update
`VectorTy` when computing instruction costs to reflect the scalar type. In
addition to determining instruction costs, this type is also used to signal
that all instructions in the loop will be scalarized. This currently affects
memory instructions and non-pointer induction variables and their updates. (We
also mark GEPs scalar after vectorization, but their cost is computed together
with memory instructions.) For scalarized induction updates, this patch also
scales the scalar cost by the vectorization factor, corresponding to each
induction step.

llvm-svn: 303763
2017-05-24 15:26:15 +00:00
Jonas Paulsson
8624b7e1ce [LoopVectorizer] Let target prefer scalar addressing computations.
The loop vectorizer usually vectorizes any instruction it can and then
extracts the elements for a scalarized use. On SystemZ, all elements
containing addresses must be extracted into address registers (GRs). Since
this extraction is not free, it is better to have the address in a suitable
register to begin with. By forcing address arithmetic instructions and loads
of addresses to be scalar after vectorization, two benefits result:

* No need to extract the register
* LSR optimizations trigger (LSR isn't handling vector addresses currently)

Benchmarking show improvements on SystemZ with this new behaviour.

Any other target could try this by returning false in the new hook
prefersVectorizedAddressing().

Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand
https://reviews.llvm.org/D32422

llvm-svn: 303744
2017-05-24 13:42:56 +00:00
Ayal Zaks
589e1d9610 [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis
The default behavior of -Rpass-analysis=loop-vectorizer is to report only the
first reason encountered for not vectorizing, if one is found, at which time the
vectorizer aborts its handling of the loop. This patch allows multiple reasons
for not vectorizing to be identified and reported, at the potential expense of
additional compile-time, under allowExtraAnalysis which can currently be turned
on by Clang's -fsave-optimization-record and opt's -pass-remarks-missed.

Removed from LoopVectorizationLegality::canVectorize() the redundant checking
and reporting if we CantComputeNumberOfIterations, as LAI::canAnalyzeLoop() also
does that. This redundancy is caught by a lit test once multiple reasons are
reported.

Patch initially developed by Dror Barak.

Differential Revision: https://reviews.llvm.org/D33396

llvm-svn: 303613
2017-05-23 07:08:02 +00:00
Amara Emerson
4d33c86359 Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather
Also s/0/nullptr in the call site in LV.

llvm-svn: 303416
2017-05-19 10:40:18 +00:00
Reid Kleckner
96ab8726a3 [IR] De-virtualize ~Value to save a vptr
Summary:
Implements PR889

Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:

https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing

This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal.  However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue.  I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.

I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.

Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv

Reviewed By: chandlerc

Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31261

llvm-svn: 303362
2017-05-18 17:24:10 +00:00
Matthew Simpson
af60af1ed5 Revert 303174, 303176, and 303178
These commits are breaking the bots. Reverting to investigate.

llvm-svn: 303182
2017-05-16 15:50:30 +00:00
Matthew Simpson
b7b5d55c38 [LV] Avoid potentential division by zero when selecting IC
llvm-svn: 303174
2017-05-16 14:43:55 +00:00
Adam Nemet
e29686e5c1 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
Craig Topper
1a36b7d836 [ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.
This patch finishes off the conversion of ComputeSignBit to computeKnownBits.

Differential Revision: https://reviews.llvm.org/D33166

llvm-svn: 303035
2017-05-15 06:39:41 +00:00
Simon Pilgrim
7d62e4b455 [LoopOptimizer][Fix]PR32859, PR24738
The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form.
Here it is:

before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ]
after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ]

and after this change we have:

%e.0.ph = phi i32 [ 0, %for.inc.2.i ]
%e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ]

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D33055

llvm-svn: 302988
2017-05-13 13:25:57 +00:00