Summary:
Like with other similar intrinsics, presense of strip or
launder.invariant.group should not change the result of inlining cost.
This is because they are just markers and do not perform any computation.
Reviewers: amharc, rsmith, reames, kuhar
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D51814
llvm-svn: 341725
AliasSetTracker has special case handling for memset, memcpy and memmove which pre-existed argmemonly on functions and readonly and writeonly on arguments. This patch generalizes it using the AA infrastructure to any call correctly annotated.
The motivation here is to cut down on confusion, not performance per se. For most instructions, there is a direct mapping to alias set. However, this is not guaranteed by the interface and was not in fact true for these three intrinsics *and only these three intrinsics*. I kept getting myself confused about this invariant, so I figured it would be good to clearly distinguish between a instructions and alias sets. Calls happened to be an easy target.
The nice side effect is that custom implementations of memset/memcpy/memmove - including wrappers discovered by IPO - can now be optimized the same as builts by LICM.
Note: The actual removal of the memset/memtransfer specific handling will happen in a follow on NFC patch. It was originally part of this one, but separate for ease of review and rebase.
Differential Revision: https://reviews.llvm.org/D50730
llvm-svn: 341713
shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) -->
sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask)
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38691
...is the last regression test. In that case, we're just left with the narrow select.
Note that if we do create new shuffles, they use the existing extraction identity mask,
so there's no danger that this transform creates arbitrary shuffles.
Differential Revision: https://reviews.llvm.org/D51496
llvm-svn: 341708
If the ~X wasn't able to simplify above the max/min, we might be able to simplify it by moving it below the max/min.
I had to modify the ~(min/max ~X, Y) transform to prevent getting stuck in a loop when we saw the new ~(max/min X, ~Y) before the ~Y had been folded away to remove the new not.
Differential Revision: https://reviews.llvm.org/D51398
llvm-svn: 341674
Fix a latent bug in loop vectorizer which generates incorrect code for
memory accesses that are executed conditionally. As pointed in review,
this bug definitely affects uniform loads and may affect conditional
stores that should have turned into scatters as well).
The code gen for conditionally executed uniform loads on architectures
that support masked gather instructions is broken.
Without this patch, we were unconditionally executing the *conditional*
load in the vectorized version.
This patch does the following:
1. Uniform conditional loads on architectures with gather support will
have correct code generated. In particular, the cost model
(setCostBasedWideningDecision) is fixed.
2. For the recipes which are handled after the widening decision is set,
we use the isScalarWithPredication(I, VF) form which is added in the
patch.
3. Fix the vectorization cost model for scalarization
(getMemInstScalarizationCost): implement and use isPredicatedInst to
identify *all* predicated instructions, not just scalar+predicated. So,
now the cost for scalarization will be increased for maskedloads/stores
and gather/scatter operations. In short, we should be choosing the
gather/scatter in place of scalarization on archs where it is
profitable.
4. We needed to weaken the assert in useEmulatedMaskMemRefHack.
Reviewers: Ayal, hsaito, mkuper
Differential Revision: https://reviews.llvm.org/D51313
llvm-svn: 341673
Find cold blocks based on profile information (or optionally with static analysis).
Forward propagate profile information to all cold-blocks.
Outline a cold region.
Set calling conv and prof hint for the callsite of the outlined function.
Worked in collaboration with: Sebastian Pop <s.pop@samsung.com>
Differential Revision: https://reviews.llvm.org/D50658
llvm-svn: 341669
If OtherOpT or OtherOpF have scalar types and the condition is a vector,
we would create an invalid select.
Reviewers: spatel, john.brawn, mssimpso, craig.topper
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D51781
llvm-svn: 341666
Currently eliminateInstructions only returns true if any instruction got
replaced. In the test case for this patch, we eliminate the trivially
dead calls, for which eliminateInstructions not do a replacement and the
function is not marked as changed, which is why the inliner crashes
while traversing the call graph.
Alternatively we could also change eliminateInstructions to return true
in case we mark instructions for deletion, but that's slightly more code
and doing it at the place where the replacement happens seems safer.
Fixes PR37517.
Reviewers: davide, mcrosier, efriedma, bjope
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D51169
llvm-svn: 341651
IndVars does not set `Changed` flag when it eliminates dead instructions. As result,
it may make IR modifications and report that it has done nothing. It leads to inconsistent
preserved analyzes results.
Differential Revision: https://reviews.llvm.org/D51770
Reviewed By: skatkov
llvm-svn: 341633
The patch tries to make sample profile loader independent of profile format
change. It moves compact format related code into FunctionSamples and
SampleProfileReader classes, and sample profile loader only has to interact
with those two classes and will be unaware of profile format changes.
The cleanup also contain some fixes to further remove the difference between
compactbinary format and binary format. After the cleanup using different
formats originated from the same profile will generate the same binaries,
which we verified by compiling two large server benchmarks w/wo thinlto.
Differential Revision: https://reviews.llvm.org/D51643
llvm-svn: 341591
This fold is needed to avoid a regression when we try
to recommit rL300977.
We can't see the most basic win currently because
demanded bits changes the patterns:
https://rise4fun.com/Alive/plpp
llvm-svn: 341559
Previously the alignment on the newly created global strings was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it
to 16 bytes. This caused unnecessary code bloat with the padding between
variables.
The main example of this happening was the printf->puts optimisation in
SimplifyLibCalls, but as the change here is made in
IRBuilderBase::CreateGlobalString, other globals using this will now be
aligned too.
Differential Revision: https://reviews.llvm.org/D51410
llvm-svn: 341527
I'm probably missing some way to use m_Deferred to remove the code
duplication, but that can be a follow-up.
The improvement in demand_shrink_nsw.ll is an example of missing
the fold because the pattern matching was deficient. I didn't try
to follow the bits in that test, but Alive says it's correct:
https://rise4fun.com/Alive/ugc
llvm-svn: 341426
Reland r341269. Use std::stable_sort when sorting constant condidates.
Reverting commit, r341365:
Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
Original commit, r341269:
[Constant Hoisting] Hoisting Constant GEP Expressions
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
https://reviews.llvm.org/D51396
Differential Revision: https://reviews.llvm.org/D51654
llvm-svn: 341417
This is fix for PR38786.
First order recurrence phis were incorrectly treated as uniform,
which caused them to be vectorized as uniform instructions.
Patch by Ayal Zaks and Orivej Desh!
Reviewed by: Anna
Differential Revision: https://reviews.llvm.org/D51639
llvm-svn: 341416
There are 2 bugs shown here that were untested before:
1. We fail to perform the fold in 1/2 the possible commuted variants.
2. When the fold is done, it disregards extra uses.
llvm-svn: 341415
Recent change to deleteDeadBlocksFromLoop was not enough to
fix all the problems related to dead blocks after nontrivial
unswitching of switches.
We need to delete all the dead blocks that were created during
unswitching, otherwise we will keep having problems with phi's
or dead blocks.
This change removes all the dead blocks that are reachable from the loop,
not trying to track whether these blocks are newly created by unswitching
or not. While not completely correct, we are unlikely to get loose but
reachable dead blocks that do not belong to our loop nest.
It does fix all the failures currently known, in particular PR38778.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D51519
llvm-svn: 341398
The tests attempted to check for commuted variants
of these folds, but complexity-based canonicalization
meant we had no coverage for at least 1/2 of the cases.
Also, the folds correctly check hasOneUse(), but there
was no coverage for that.
llvm-svn: 341394
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.
if (hot_cond1) { // Likely true.
do_stg_hot1();
}
if (hot_cond2) { // Likely true.
do_stg_hot2();
}
->
if (hot_cond1 && hot_cond2) { // Hot path.
do_stg_hot1();
do_stg_hot2();
} else { // Cold path.
if (hot_cond1) {
do_stg_hot1();
}
if (hot_cond2) {
do_stg_hot2();
}
}
This speeds up some internal benchmarks up to ~30%.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D50591
llvm-svn: 341386
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
llvm-svn: 341365
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
This patch removes the function `expandSCEVIfNeeded` which behaves not as
it was intended. This function tries to make a lookup for exact existing expansion
and only goes to normal expansion via `expandCodeFor` if this lookup hasn't found
anything. As a result of this, if some instruction above the loop has a `SCEVConstant`
SCEV, this logic will return this instruction when asked for this `SCEVConstant` rather
than return a constant value. This is both non-profitable and in some cases leads to
breach of LCSSA form (as in PR38674).
Whether or not it is possible to break LCSSA with this algorithm and with some
non-constant SCEVs is still in question, this is still being investigated. I wasn't
able to construct such a test so far, so maybe this situation is impossible. If it is,
it will go as a separate fix.
Rather than do it, it is always correct to just invoke `expandCodeFor` unconditionally:
it behaves smarter about insertion points, and as side effect of this it will choose a
constant value for SCEVConstants. For other SCEVs it may end up finding a better insertion
point. So it should not be worse in any case.
NOTE: So far the only known case for which this transform may break LCSSA is mapping
of SCEVConstant to an instruction. However there is a suspicion that the entire algorithm
can compromise LCSSA form for other cases as well (yet not proved).
Differential Revision: https://reviews.llvm.org/D51286
Reviewed By: etherzhhb
llvm-svn: 341345
The fold was implemented for the general case but use-limitation,
but the later constant version which didn't check uses was only
matching splat constants.
llvm-svn: 341292
If we have a pair of binops feeding another pair of binops, rearrange the operands so
the matching pair are together because that allows easy factorization folds to happen
in instcombine:
((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation)
--> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization)
This is part of solving PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
Note that there's an instcombine version of this patch attached there, but we're trying
to make instcombine have less responsibility to improve compile-time efficiency.
For reasons I still don't completely understand, reassociate does this kind of transform
sometimes, but misses everything in my motivating cases.
This patch on its own is gluing an independent cleanup chunk to the end of the existing
RewriteExprTree() loop. We can build on it and do something stronger to better order the
full expression tree like D40049. That might be an alternative to the proposal to add a
separate reassociation pass like D41574.
Differential Revision: https://reviews.llvm.org/D45842
llvm-svn: 341288
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
Differential Revision: https://reviews.llvm.org/D51396
llvm-svn: 341269
It has essentially the same benefit it has on 64-bit ARM: it
substantially reduces the number of constants used by large GEP
operations. Seems to be generally helpful across a few different
codebases I've tried.
Differential Revision: https://reviews.llvm.org/D51462
llvm-svn: 341136
Summary:
Currently, the SafeStack analysis disallows out-of-bounds writes but not
out-of-bounds reads for mem intrinsics like llvm.memcpy. This could
cause leaks of pointers to the safe stack by leaking spilled registers/
frame pointers. Check for allocas used as source or destination pointers
to mem intrinsics.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: pcc, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D51334
llvm-svn: 341116
Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all
scalar and vector types.
This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on
x86-64 or A64.
Differential revision: https://reviews.llvm.org/D49273
llvm-svn: 341095
Splitting an alloca can decrease the alignment of GEPs into the
partition. Normally, rewriting accounts for this, but the code was
missing for uses of PHI nodes and select instructions.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38707 .
Differential Revision: https://reviews.llvm.org/D51335
llvm-svn: 341094
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.
I've changed the assert to a report_fatal_error so it's not lost in Release builds.
The test updates are to fix things that tripped the new error.
Differential Revision: https://reviews.llvm.org/D51231
llvm-svn: 341022
The cost modeling was not accounting for the fact we were duplicating the instruction once per predecessor. With a default threshold of 1, this meant we were actually creating #pred copies.
Adding to the fun, there is *absolutely no* test coverage for this. Simply bailing for more than one predecessor passes all checked in tests.
llvm-svn: 341001
Teach LICM to hoist stores out of loops when the store writes to a location otherwise unused in the loop, writes a value which is invariant, and is guaranteed to execute if the loop is entered.
Worth noting is that this transformation is partially overlapping with the existing promotion transformation. Reasons this is worthwhile anyway include:
* For multi-exit loops, this doesn't require duplication of the store.
* It kicks in for case where we can't prove we exit through a normal exit (i.e. we may throw), but can prove the store executes before that possible side exit.
Differential Revision: https://reviews.llvm.org/D50925
llvm-svn: 340974
Summary:
Assert from PR38737 happens on the dead block inside the parent loop
after unswitching nontrivial switch in the inner loop.
deleteDeadBlocksFromLoop now takes extra care to detect/remove dead
blocks in all the parent loops in addition to the blocks from original
loop being unswitched.
Reviewers: asbirlea, chandlerc
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51415
llvm-svn: 340955
This is a follow-up to rL339604 which did the same transform
for a sin libcall. The handling of intrinsics vs. libcalls
is unfortunately scattered, so I'm just adding this next to
the existing transform for llvm.cos for now.
This should resolve PR38458:
https://bugs.llvm.org/show_bug.cgi?id=38458
If the call was already negated, the negates will cancel
each other out.
llvm-svn: 340952