Summary:
Use the new LoopVersioning facility (D16712) to add noalias metadata in
the vector loop if we versioned with memchecks. This can enable some
optimization opportunities further down the pipeline (see the included
test or the benchmark improvement quoted in D16712).
The test also covers the bug I had in the initial version in D16712.
The vectorizer did not previously use LoopVersioning. The reason is
that the vectorizer performs its transformations in single shot. It
creates an empty single-block vector loop that it then populates with
the widened, if-converted instructions. Thus creating an intermediate
versioned scalar loop seems wasteful.
So this patch (rather than bringing in LoopVersioning fully) adds a
special interface to LoopVersioning to allow the vectorizer to add
no-alias annotation while still performing its own versioning.
As the vectorizer propagates metadata from the instructions in the
original loop to the vector instructions we also check the pointer in
the original instruction and see if LoopVersioning can add no-alias
metadata based on the issued memchecks.
Reviewers: hfinkel, nadav, mzolotukhin
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17191
llvm-svn: 263744
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop. This allows non-aliasing information to be used by subsequent
passes.
One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops. To vectorize we
version this loop to fully disambiguate may-aliasing accesses. If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop. The overall
performance improves by 18% on our ARM64.
The scoped noalias annotation is added in LoopVersioning. The patch
then enables this for loop distribution. A follow-on patch will enable
it for the vectorizer. Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.
Essentially my approach was to have a separate alias domain for each
versioning of the loop. For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning. By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.
As written, I also have a separate domain for each loop. This is not
necessary and we could save some metadata here by using the same domain
across the different loops. I don't think it's a big deal either way.
Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers. I have plenty of comments
in the tests.
Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API. The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions. This is also
why the maps have to become part of the object state.
Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline. Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.
Reviewers: hfinkel, nadav, ashutosh.nema
Subscribers: mssimpso, aemerson, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D16712
llvm-svn: 263743
This is similar to D18133 where we allowed profile weights on select instructions.
This extends that change to also allow the 'unpredictable' attribute of branches to apply to selects.
A test to check that 'unpredictable' metadata is preserved when cloning instructions was checked in at:
http://reviews.llvm.org/rL263648
Differential Revision: http://reviews.llvm.org/D18220
llvm-svn: 263716
This was a latent bug that got exposed by the change to add LoopSimplify
as a dependence to LoopLoadElimination. Since LoopInfo was corrupted
after LV, LoopSimplify mis-compiled nbench in the test-suite (more
details in the PR).
The problem was that when we create the blocks for predicated stores we
didn't add those to any loops.
The original testcase for store predication provides coverage for this
assuming we verify LI on the way out of LV.
Fixes PR26952.
llvm-svn: 263565
(Resubmitting after fixing missing file issue)
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263513
Summary:
Specifically, when we perform runtime loop unrolling of a loop that
contains a convergent op, we can only unroll k times, where k divides
the loop trip multiple.
Without this change, we'll happily unroll e.g. the following loop
for (int i = 0; i < N; ++i) {
if (i == 0) convergent_op();
foo();
}
into
int i = 0;
if (N % 2 == 1) {
convergent_op();
foo();
++i;
}
for (; i < N - 1; i += 2) {
if (i == 0) convergent_op();
foo();
foo();
}.
This is unsafe, because we've just added a control-flow dependency to
the convergent op in the prelude.
In general, runtime unrolling loops that contain convergent ops is safe
only if we don't have emit a prelude, which occurs when the unroll count
divides the trip multiple.
Reviewers: resistor
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17526
llvm-svn: 263509
With the changes in r263275, there are now more than just functions in
the summary. Completed the renaming of data structures (started in
r263275) to reflect the wider scope. In particular, changed the
FunctionIndex* data structures to ModuleIndex*, and renamed related
variables and comments. Also renamed the files to reflect the changes.
A companion clang patch will immediately succeed this patch to reflect
this renaming.
llvm-svn: 263490
As noted in:
https://llvm.org/bugs/show_bug.cgi?id=26636
This doesn't accomplish anything on its own. It's the first step towards preserving
and using branch weights with selects.
The next step would be to make sure we're propagating the info in all of the other
places where we create selects (SimplifyCFG, InstCombine, etc). I don't think there's
an easy fix to make this happen; we have to look at each transform individually to
determine how to correctly propagate the weights.
Along with that step, we need to then use the weights when making subsequent transform
decisions such as discussed in http://reviews.llvm.org/D16836.
The inliner test is independent but closely related. It verifies that metadata is
preserved when both branches and selects are cloned.
Differential Revision: http://reviews.llvm.org/D18133
llvm-svn: 263482
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date: Fri Mar 11 17:15:50 2016 +0000
Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
91177308-0d34-0410-b5e6-96231b3b80d8
until we can figure out what to do about clang and Release build testing.
This reverts commit 263258.
llvm-svn: 263321
Summary:
This intrinsic, together with deoptimization operand bundles, allow
frontends to express transfer of control and frame-local state from
one (typically more specialized, hence faster) version of a function
into another (typically more generic, hence slower) version.
In languages with a fully integrated managed runtime this intrinsic can
be used to implement "uncommon trap" like functionality. In unmanaged
languages like C and C++, this intrinsic can be used to represent the
slow paths of specialized functions.
Note: this change does not address how `@llvm.experimental_deoptimize`
is lowered. That will be done in a later change.
Reviewers: chandlerc, rnk, atrick, reames
Subscribers: llvm-commits, kmod, mjacob, maksfb, mcrosier, JosephTremoulet
Differential Revision: http://reviews.llvm.org/D17732
llvm-svn: 263281
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.
Reviewers: chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18023
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.
Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.
Ideally this should be NFC, but in reality its possible that a function:
has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will
Reviewed by Duncan Exon Smith.
Differential Revision: http://reviews.llvm.org/D18074
llvm-svn: 263184
This is a fairly straightforward port to the new pass manager with one
exception. It removes a very questionable use of releaseMemory() in
the old pass to invalidate its caches between runs on a function.
I don't think this is really guaranteed to be safe. I've just used the
more direct port to the new PM to address this by nuking the results
object each time the pass runs. While this could cause some minor malloc
traffic increase, I don't expect the compile time performance hit to be
noticable, and it makes the correctness and other aspects of the pass
much easier to reason about. In some cases, it may make things faster by
making the sets and maps smaller with better locality. Indeed, the
measurements collected by Bruno (thanks!!!) show mostly compile time
improvements.
There is sadly very limited testing at this point as there are only two
tests of memdep, and both rely on GVN. I'll be porting GVN next and that
will exercise this heavily though.
Differential Revision: http://reviews.llvm.org/D17962
llvm-svn: 263082
This patch provides the following infrastructure for PGO enhancements in inliner:
Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.
Differential Revision: http://reviews.llvm.org/D16381
llvm-svn: 262636
The vectorization of first-order recurrences (r261346) caused PR26734. When
detecting these recurrences, we need to ensure that the previous value is
actually defined inside the loop. This patch includes the fix and test case.
llvm-svn: 262624
Summary:
This adds the beginning of an update API to preserve MemorySSA. In particular,
this patch adds a way to remove memory SSA accesses when instructions are
deleted.
It also adds relevant unit testing infrastructure for MemorySSA's API.
(There is an actual user of this API, i will make that diff dependent on this one. In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P)
Reviewers: hfinkel, reames, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17157
llvm-svn: 262362
The cleanupret instruction has an invariant that it's 'from' operand be
a cleanuppad. This invariant was violated when we removed a dead block
which removed a cleanuppad leaving behind a cleanupret with an undef
'from' operand.
This was solved in r261731 by staving off the removal of the dead block
to a later pass.
However, it occured to me that we do not need to do this.
Instead, we can simply avoid processing the cleanupret if it has an
undef 'from' operand because we know that it will be removed soon.
llvm-svn: 261754
DeleteDeadBlock was called indiscriminately, leading to cleanuprets with
undef cleanuppad references.
Instead, try to drain the BB of most of it's instructions if it is
unreachable. We can then remove the BB if it solely consists of a
terminator (and maybe some phis).
llvm-svn: 261731
It is problematic if the inlinee has a cleanupret which unwinds to
caller and we inline it into a call site which doesn't unwind.
If the funclet unwinds anywhere other than to the caller,
then we will give the funclet two unwind destinations.
This will result in a verifier failure.
Seeing as how the caller wasn't an invoke (which would locally unwind)
and that the funclet cannot unwind to caller, we must conclude that an
'unwind to caller' cleanupret is dynamically unreachable.
This fixes PR26698.
Differential Revision: http://reviews.llvm.org/D17536
llvm-svn: 261656
Summary:
When we completely unroll a loop, it's pretty easy to update DT in-place and
thus avoid rebuilding it. DT recalculation is one of the most time-consuming
tasks in loop-unroll, so avoiding it at least in case of full unroll should be
beneficial.
On some extreme (but still real-world) tests this patch improves compile time by
~2x.
Reviewers: escha, jmolloy, hfinkel, sanjoy, chandlerc
Subscribers: joker.eph, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D17473
llvm-svn: 261595
The issue was that we only required LCSSA rebuilding if the immediate
parent-loop had values used outside of it. The fix is to enaable the
same logic for all outer loops, not only immediate parent.
llvm-svn: 261575
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
llvm-svn: 261498
Stop relying on `getNodePtrUnchecked()` being useful on invalid
iterators. This function is documented to be for internal use only, and
the pointer type will eventually have to change to remove UB from
ilist_iterator. Instead, check the iterator before it has been
invalidated.
llvm-svn: 261497
Cleanuppads may be merged together if one is the only predecessor of the
other in which case a simple transform can be performed: replace the
a cleanupret with a branch and remove an unnecessary cleanuppad.
Differential Revision: http://reviews.llvm.org/D17459
llvm-svn: 261390
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.
In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.
Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197
llvm-svn: 261346
routine.
We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.
The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.
With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.
I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.
LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.
Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.
Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!
Differential Revision: http://reviews.llvm.org/D17435
llvm-svn: 261316
more places to prevent gratuitous re-"runs" of these passes.
The passes themselves don't do any work when run, but we keep spending
time scheduling and running these needlessly when we really don't need
to do so.
This is the first patch towards fixing the really horrible loop pass
pipeline fragmentation pointed out by Sanjoy in PR24804.
llvm-svn: 261302