Summary:
Loop idiom recognize tries to convert loops like
```
int foo(int x) {
int cnt = 0;
while (x) {
x >>= 1;
++cnt;
}
return cnt;
}
```
into calls to ctlz, but if x is initially negative this loop should be infinite.
It happens that the cases that motivated this change have an absolute value of x before the loop. So this patch restricts the transform to cases where we know x is positive. Note: We are relying on the absolute value of INT_MIN to be undefined so we can assume that the result is always positive.
Fixes PR37479
Reviewers: spatel, hfinkel, efriedma, javed.absar
Reviewed By: efriedma
Subscribers: dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D47348
llvm-svn: 333702
This is the planned enhancement to D47163 / rL333611.
We want to match cmp/select sizes because that will be recognized
as min/max more easily and lead to better codegen (especially for
vector types).
As mentioned in D47163, this improves some of the tests that would
also be folded by D46380, so we may want to adjust that patch to
match the new patterns where the extend op occurs after the select.
llvm-svn: 333689
Convert a vector load intrinsic into an llvm load instruction.
This is beneficial when the underlying object being addressed
comes from a constant, since we get constant-folding for free.
Differential Revision: https://reviews.llvm.org/D46273
llvm-svn: 333643
In post-commit review, Eric Christopher notes that many
new MSan warnings are being observed with this patch.
The probable reason is: if 'y' is undef here and we could
evaluate it twice and get different results.
We can't increase the number of uses of a value.
llvm-svn: 333631
Don't always:
cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C'
This is something that came up as far back as D26556, and I lost track of it.
I suspect that this transform is part of the underlying problem that is
inspiring some of the recent proposals that seek to match larger patterns
that include a cast op. Even if that's not true, this transform causes
problems for codegen (particularly with vector types).
A transform to actively match the size of cmp and select operand sizes should
follow. This patch just removes the harmful canonicalization in the other
direction.
Differential Revision: https://reviews.llvm.org/D47163
llvm-svn: 333611
Summary:
Fix PR37625. It's possible for an extern_weak declaration to be emitted
to the merged module when a definition exists in the ThinLTO portion of
the build; discard the linkage on the declaration in that case.
(otherwise we copy the linkage to the alias to the jumptable and fail)
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47494
llvm-svn: 333604
Summary:
The isKnownNonZero() function have checks that abort the recursion when
it reaches the specified max depth. However one of the recursive calls
was placed before the max depth check was done, resulting in a endless
recursion that eventually triggered a segmentation fault.
Fixed the problem by moving the max depth check above the first
recursive call.
Reviewers: Prazek, nlopes, spatel, craig.topper, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, bjope, llvm-commits
Differential Revision: https://reviews.llvm.org/D47531
llvm-svn: 333557
Turning a table lookup intrinsic into a shuffle vector instruction
can be beneficial. If the mask used for the lookup is the constant
vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse
instructions instead.
Differential Revision: https://reviews.llvm.org/D46133
llvm-svn: 333550
loop-cleanup passes at the beginning of the loop pass pipeline, and
re-enqueue loops after even trivial unswitching.
This will allow us to much more consistently avoid simplifying code
while doing trivial unswitching. I've also added a test case that
specifically shows effective iteration using this technique.
I've unconditionally updated the new PM as that is always using the
SimpleLoopUnswitch pass, and I've made the pipeline changes for the old
PM conditional on using this new unswitch pass. I added a bunch of
comments to the loop pass pipeline in the old PM to make it more clear
what is going on when reviewing.
Hopefully this will unblock doing *partial* unswitching instead of just
full unswitching.
Differential Revision: https://reviews.llvm.org/D47408
llvm-svn: 333493
be both simpler and substantially more efficient.
Rather than use a hand-rolled iteration technique that isn't quite the
same as RPO, use the pre-built RPO loop body traversal utility.
Once visiting the loop body in RPO, we can assert that we visit defs
before uses reliably. When this is the case, the only need to iterate is
when simplifying a def that is used by a PHI node along a back-edge.
With this patch, the first pass over the loop body is just a complete
simplification of every instruction across the loop body. When we
encounter a use of a simplified instruction that stems from a PHI node
in the loop body that has already been visited (due to some cyclic CFG,
potentially the loop itself, or a nested loop, or unstructured control
flow), we recall that specific PHI node for the second iteration.
Nothing else needs to be preserved from iteration to iteration.
On the second and later iterations, only instructions known to have
simplified inputs are considered, each time starting from a set of PHIs
that had simplified inputs along the backedges.
Dead instructions are collected along the way, but deleted in a batch at
the end of each iteration making the iterations themselves substantially
simpler. This uses a new batch API for recursively deleting dead
instructions.
This alsa changes the routine to visit subloops. Because simplification
is fundamentally transitive, we may need to visit the entire loop body,
including subloops, to handle knock-on simplification.
I've added a basic test file that helps demonstrate that all of these
changes work. It includes both straight-forward loops with
simplifications as well as interesting PHI-structures, CFG-structures,
and a nested loop case.
Differential Revision: https://reviews.llvm.org/D47407
llvm-svn: 333461
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 333358
Reverting this to see if this is causing the failures of the
clang-with-thin-lto-ubuntu bot.
[IPSCCP] Use PredicateInfo to propagate facts from cmp instructions.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333323
Libfuzzer tests have been fixed to prevent being optimized.
Original commit message:
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. N
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333300
Summary:
Look past debug intrinsics when querying whether an instruction is the
first instruction in the header block. The commit includes a reproducer
for a case where LICM would not hoist an instruction, due to the presence
of the intrinsic.
A caveat with this commit is that the check will not work properly if
the instruction at hand is a debug intrinsic. I assume that no one
depends on isGuaranteedToExecute() to return true for debug intrinsics
for these cases (and that this might be an indication of another debug
invariant issue), so I thought that it was not worth adding that extra
bit of complexity.
Reviewers: reames, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47197
llvm-svn: 333274
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 333268
Setting the "Debug Info Version" module flag makes it possible to pipe
synthetic debug info into llc, which is useful for testing backends.
llvm-svn: 333237
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases.
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333226
Reassociation of math ops in some contexts (especially vector contexts)
has generally only been happening when the 'fast' FMF was set. This
enables reassoication when only the finer grained controls 'reassoc' and
'nsz' are set.
Differential Revision: https://reviews.llvm.org/D47335
llvm-svn: 333221
Summary:
In LICM, CFG could be changed in splitPredecessorsOfLoopExit(), which update
only DT and LoopInfo. Therefore, we should preserve only DT and LoopInfo specifically,
instead of all analyses that depend on the CFG (setPreservesCFG()).
This change should fix PR37323.
Reviewers: uabelho, davide, dberlin, Ka-Ka
Reviewed By: dberlin
Subscribers: mzolotukhin, bjope, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46775
llvm-svn: 333198
The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in
the instruction. Therefore, if the AES key is zero and the AES data was
previously XORed, it can be combined into a single instruction.
Differential Revision: https://reviews.llvm.org/D47239
Patch by Michael Brase!
llvm-svn: 333193
Summary:
If NaryReassociate succeed it will, when replacing the old instruction
with the new instruction, also recursively delete trivially
dead instructions from the old instruction. However, if the input to the
NaryReassociate pass contain dead code it is not save to recursively
delete trivially deadinstructions as it might lead to deleting the newly
created instruction.
This patch will fix the problem by using WeakVH to detect this
rare case, when the newly created instruction is dead, and it will then
restart the basic block iteration from the beginning.
This fixes pr37539
Reviewers: tra, meheff, grosser, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47139
llvm-svn: 333155
Summary:
StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order.
The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops.
To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited
before moving on to outer loop.
However, we found a problem for a SubRegion which is a loop itself:
--> BB1 --> BB2 --> BB3 -->
In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead
BB2 to be placed in the wrong order.
In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth
to guard the sorting.
Reviewers:
arsenm, jlebar
Differential Revision:
https://reviews.llvm.org/D46912
llvm-svn: 333111
Summary:
Finally fixes [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]].
Now that the backend is all done, we can finally fold it!
The canonical unfolded masked merge pattern is
```(x & m) | (y & ~m)```
There is a second, equivalent variant:
```(x | ~m) & (y | m)```
Only one of them (the or-of-and's i think) is canonical.
And if the mask is not a constant, we should fold it to:
```((x ^ y) & M) ^ y```
https://rise4fun.com/Alive/ndQw
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: nicholas, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D46814
llvm-svn: 333106
Loop unswitching makes substantial changes to a loop that can also affect cached
SCEV info in its outer loops as well, but it only cares to invalidate SCEV cache for the
innermost loop in case of full unswitching and does not invalidate anything at all in
case of trivial unswitching. As result, we may end up with incorrect data in cache.
Differential Revision: https://reviews.llvm.org/D46045
Reviewed By: mzolotukhin
llvm-svn: 333072
Summary:
Patch for capture tracking broke
bootstrap of clang with -fstict-vtable-pointers
which resulted in debbugging nightmare. It was fixed
https://reviews.llvm.org/D46900 but as it turned
out, there were other parts like inliner (computing of
noalias metadata) that I found after bootstraping with enabled
assertions.
Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar
Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47088
llvm-svn: 333070
Also, produce the canonical IR abs (s<0) to be more efficient.
This is the libcall equivalent of the clang builtin change from:
rL333038
Pasting from that commit message:
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
llvm-svn: 333042
Summary: Previous patch does not care if a value is changed between calloc and strlen. This needs to be removed from InstCombine and maybe moved to DSE later after some rework.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47218
llvm-svn: 333022
This patch fixes two bugs:
* test1: Previously assume(a >= 5) concluded that a == 5. That's only
valid for assume(a == 5)...
* test2: If operands were swapped, additional users were added to the
wrong cmp operand. This resulted in an "unsettled iteration"
assertion failure.
Patch by Nikita Popov
Differential Revision: https://reviews.llvm.org/D46974
llvm-svn: 333007
Summary:
When lowerswitch merge several cases into a new default block it's not
updating the PHI nodes accordingly. The code that update the PHI nodes
for the default edge only update the first entry and do not remove the
remaining ones, to make sure the number of entries match the number of
predecessors.
This is easily fixed by replacing the code that update the PHI node with
the already existing utility function for updating PHI nodes.
Reviewers: hans, reames, arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47055
llvm-svn: 332960
Summary:
In LoopVersioning::addPHINodes we need to iterate over all
users for a value "Inst", and if the user is outside of the
VersionedLoop we should replace the use of "Inst" by using
the value "PN" instead.
Replacing the use of "Inst" for a user of "Inst" also means
that Inst->users() is modified. So it is not safe to do the
replace while iterating over Inst->users() as we used to do.
This patch splits the task into two steps. First we iterate
over Inst->users() to find all users that should be updated.
Those users are saved into a local data structure on the stack.
And then, in the second step, we do the actual updates. This
time iterating over the local data structure.
Reviewers: mzolotukhin, anemet
Reviewed By: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47134
llvm-svn: 332958
We can eliminate old value if bound_ctrl = 1 and row_mask = bank_mask = 0xf.
This is alternative implementation working with the intrinsic in InstCombine.
Original review for past-ISel optimization: D46570.
Differential Revision: https://reviews.llvm.org/D46596
llvm-svn: 332956
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.
Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.
llvm-svn: 332953
In all cases, we're pulling the cast above the select.
That's not a good canonicalization if we're creating
a select that then mismatches the operand size of its
condition.
llvm-svn: 332883