The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include:
1. IR InstCombine, sinking load operation to shorten lifetimes.
2. MIR LiveRangeShrink, similar to #1
3. MIR TwoAddressInstructionPass, i.e, opeq transform
4. MIR function argument copy elision
5. IR stack protection. (though not perf-critical but nice to have).
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D95982
This is a special-case multiply that replicates bits of
the source operand. We need this fold to avoid regression
if we make canonicalization to `mul` more aggressive for
shl+or patterns.
I did not see a way to make Alive generalize the bit width
condition for even-number-of-bits only, but an example of
the proof is:
Name: i32
Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2))
%m = mul nuw i32 %x, C1
%t = lshr i32 %m, C2
=>
%t = and i32 %x, C1 - 2
Name: i14
%m = mul nuw i14 %x, 129
%t = lshr i14 %m, 7
=>
%t = and i14 %x, 127
https://rise4fun.com/Alive/e52
Instcombine will convert the nonnull and alignment assumption that use the boolean condtion
to an assumption that uses the operand bundles when knowledge retention is enabled.
Differential Revision: https://reviews.llvm.org/D82703
This is a yet another hint that we will eventually need InstCombineInverter,
which would consistently sink inversions, but but for that we'll need
to consistently hoist inversions where possible, so let's do that here.
Example of a proof: https://alive2.llvm.org/ce/z/78SbDq
See https://bugs.llvm.org/show_bug.cgi?id=48995
The constant trunc/ext may not be the optimal pre-condition,
but I think that handles the common cases.
Example of Alive2 proof:
https://alive2.llvm.org/ce/z/sREeLC
This is another step towards canonicalizing to the intrinsics.
Narrowing was identified as source of potential regression for
abs(), so we need to handle this for min/max - see:
https://llvm.org/PR48816
If this is not enough, we could process intrinsics in
the trunc-driven matching in canEvaluateTruncated().
We can sink extends after min/max if they match and would
not change the sign-interpreted compare. The only combo
that doesn't work is zext+smin/smax because the zexts
could change a negative number into positive:
https://alive2.llvm.org/ce/z/D6sz6J
Sext+umax/umin works:
define i32 @src(i8 %x, i8 %y) {
%0:
%sx = sext i8 %x to i32
%sy = sext i8 %y to i32
%m = umax i32 %sx, %sy
ret i32 %m
}
=>
define i32 @tgt(i8 %x, i8 %y) {
%0:
%m = umax i8 %x, %y
%r = sext i8 %m to i32
ret i32 %r
}
Transformation seems to be correct!
A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope.
When that is not the case for at least one of the two, the intrinsic call can as well be removed.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95141
Some utilities used by InstCombine, like SimplifyLibCalls, may add new
instructions and replace the uses of a call, but return nullptr because
the inserted call produces multiple results.
Previously, the replaced library calls would get removed by
InstCombine's deleter, but after
292077072e this may not happen, if the
willreturn attribute is missing.
As a work-around, update replaceInstUsesWith to set MadeIRChange, if it
replaces any uses. This catches the cases where it is used as replacer
by utilities used by InstCombine and seems useful in general; updating
uses will modify the IR.
This fixes an expensive-check failure when replacing
@__sinpif/@__cospifi with @__sincospif_sret.
In the motivating cases from https://llvm.org/PR48816 ,
we have a trailing trunc. But that is not required to
reduce the abs width:
https://alive2.llvm.org/ce/z/ECaz-p
...as long as we clear the int-min-is-poison bit (nsw).
We have some existing tests that are affected, and I'm
not sure what the overall implications are, but in general
we favor narrowing operations over preserving nsw/nuw.
If that causes problems, we could restrict this transform
based on type (shouldChangeType() and/or vector vs. scalar).
Differential Revision: https://reviews.llvm.org/D95235
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
Note that we could position this transformation as just hoisting
of the `not` (still, iff y is freely negatible), but the test changes
show a number of regressions, so let's not do that.
Iff we know we can get rid of the inversions in the new pattern,
we can thus get rid of the inversion in the old pattern,
this decreasing instruction count.
Relative to the original change, this adds a check that the
instruction on which we're replacing operands is safe to speculatively
execute, because that's what we're effectively doing. We're executing
the instruction with the replaced operand, which is fine if it's pure,
but not fine if can cause side-effects or UB (aka is not speculatable).
Additionally, we cannot (generally) replace operands in phi nodes,
as these may refer to a different loop iteration. This is also covered
by the speculation check.
-----
InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.
This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.
Among other things, this also subsumes D94860.
Differential Revision: https://reviews.llvm.org/D94862
This caused a miscompile in Chromium, see comments on the codereview for
discussion and pointer to a reproducer.
> InstCombine already performs a fold where X == Y ? f(X) : Z is
> transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
> if f(X) only has one use, then we can always directly replace the
> use inside the instruction. To actually be profitable, limit it to
> the case where Y is a non-expr constant.
>
> This could be further extended to replace uses further up a one-use
> instruction chain, but for now this only looks one level up.
>
> Among other things, this also subsumes D94860.
>
> Differential Revision: https://reviews.llvm.org/D94862
This also reverts the follow-up
a003f26539:
> [llvm] Prevent infinite loop in InstCombine of select statements
>
> This fixes an issue where the RHS and LHS the comparison operation
> creating the predicate were swapped back and forth forever.
>
> Differential Revision: https://reviews.llvm.org/D94934
This fixes an issue where the RHS and LHS the comparison operation
creating the predicate were swapped back and forth forever.
Differential Revision: https://reviews.llvm.org/D94934
InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.
This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.
Among other things, this also subsumes D94860.
Differential Revision: https://reviews.llvm.org/D94862
We can fold a ? b : false to a & b if is_poison(b) implies that
is_poison(a), at which point we're able to reuse all the usual fold
on ands. In particular, this covers the very common case of
icmp X, C && icmp X, C'. The same applies to ors.
This currently only has an effect if the
-instcombine-unsafe-select-transform=0 option is set.
Differential Revision: https://reviews.llvm.org/D94550
The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.
Differential Revision: https://reviews.llvm.org/D94372
This is a more basic pattern that we should handle before trying to solve:
https://llvm.org/PR48640
There might be a better way to think about this because the pre-condition
that I came up with (number of sign bits in the compare constant) misses a
potential transform for each of ugt and ult as commented on in the test file.
Tried to model this is in Alive:
https://rise4fun.com/Alive/juX1
...but I couldn't get the ComputeNumSignBits() pre-condition to work as
expected, so replaced with leading 0/1 preconditions instead.
Name: ugt
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ugt i8 %a, C2
=>
%r = icmp slt i8 %x, 0
Name: ult
Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
%a = ashr %x, C1
%r = icmp ult i4 %a, C2
=>
%r = icmp sgt i4 %x, -1
Also approximated in Alive2:
https://alive2.llvm.org/ce/z/u5hCczhttps://alive2.llvm.org/ce/z/__szVL
Differential Revision: https://reviews.llvm.org/D94014
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.
Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.
Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.
This patch also updates a few places to use make use of
make_early_inc_range.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93992
This patch
- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly
With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.
Thanks!
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94053
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.
Differential Revision: https://reviews.llvm.org/D91927
As Mikael Holmén is noting in the post-commit review for the first fix
https://reviews.llvm.org/rGd4ccef38d0bb#967466
not hoisting constantexprs is not enough,
because if the xor originally was a constantexpr (i.e. X is a constantexpr).
`SimplifyAssociativeOrCommutative()` in `visitXor()` will immediately
undo this transform, thus again causing an infinite combine loop.
This transform has resulted in a surprising number of constantexpr failures.
This disables the poison-unsafe select -> and/or transform behind
a flag (we continue to perform the fold by default). This is intended
to simplify evaluation and testing while we teach various passes
to directly recognize the select pattern.
This only disables the main select -> and/or transform. A number of
related ones are instead changed to canonicalize to the a ? b : false
and a ? true : b forms which represent and/or respectively. This
requires a bit of care to avoid infinite loops, as we do not want
!a ? b : false to be converted into a ? false : b.
The basic idea here is the same as D93065, but keeps the change
behind a flag for now.
Differential Revision: https://reviews.llvm.org/D93840
As it is being reported (in post-commit review) in
https://reviews.llvm.org/D93857
this fold (as i expected, but failed to come up with test coverage
despite trying) has issues with constant expressions.
Since we only care about true constants, which constantexprs are not,
don't perform such hoisting for constant expressions.
Currently undef is used as a don’t-care vector when constructing a vector using a series of insertelement.
However, this is problematic because undef isn’t undefined enough.
Especially, a sequence of insertelement can be optimized to shufflevector, but using undef as its placeholder makes shufflevector a poison-blocking instruction because undef cannot be optimized to poison.
This makes a few straightforward optimizations incorrect, such as:
```
; https://bugs.llvm.org/show_bug.cgi?id=44185
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%xv = insertelement <4 x float> %q, float %x, i32 2
%r = shufflevector <4 x float> %y, <4 x float> %xv, <4 x i32> { 0, 6, 2, undef }
ret <4 x float> %r ; %r[3] is undef
}
=>
define <4 x float> @insert_not_undef_shuffle_translate_commute(float %x, <4 x float> %y, <4 x float> %q) {
%r = insertelement <4 x float> %y, float %x, i32 1
ret <4 x float> %r ; %r[3] = %y[3], incorrect if %y[3] = poison
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source
```
I’d like to suggest
1. Using poison as insertelement’s placeholder value (IRBuilder::CreateVectorSplat should be patched too)
2. Updating shufflevector’s semantics to return poison element if mask is undef
Note that poison is currently lowered into UNDEF in SelDag, so codegen part is okay.
m_Undef() matches PoisonValue as well, so existing optimizations will still fire.
The only concern is hidden miscompilations that will go incorrect when poison constant is given.
A conservative way is copying all tests having `insertelement undef` & replacing it with `insertelement poison` & run Alive2 on it, but it will create many tests and people won’t like it. :(
Instead, I’ll simply locally maintain the tests and run Alive2.
If there is any bug found, I’ll report it.
Relevant links: https://bugs.llvm.org/show_bug.cgi?id=43958 , http://lists.llvm.org/pipermail/llvm-dev/2019-November/137242.html
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93586
This is one of the deficiencies that can be observed in
https://godbolt.org/z/YPczsG after D91038 patch set.
This exposed two missing folds, one was fixed by the previous commit,
another one is `(A ^ B) | ~(A ^ B) --> -1` / `(A ^ B) & ~(A ^ B) --> 0`.
`-early-cse` will catch it: https://godbolt.org/z/4n1T1v,
but isn't meaningful to fix it in InstCombine,
because we'd need to essentially do our own CSE,
and we can't even rely on `Instruction::isIdenticalTo()`,
because there are no guarantees that the order of operands matches.
So let's just accept it as a loss.
This reverts commit 899faa50f2.
Upon further consideration, this does not fix the right issue.
Doing this fold for non-inbounds GEPs is legal, because the
resulting pointer is still based-on null, which has no associated
address range, and as such and access to it is UB.
https://bugs.llvm.org/show_bug.cgi?id=48577#c3