InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store
and LD1/ST1 to load/store when a ptrue all predicate pattern operand
is present.
This allows existing IR optimizations such as dead-load removal to
occur.
Differential Revision: https://reviews.llvm.org/D113489
For the scalar/splat case, this fold is subsumed by
foldLogOpOfMaskedICmps(). However, the conjugated fold for "or"
also supports splats with undef. Make both code paths consistent
by using m_ZeroInt() for the "and" implementation as well.
https://alive2.llvm.org/ce/z/tN63cuhttps://alive2.llvm.org/ce/z/ufB_Ue
Tests for:
```
(a | ~(b & c)) & ~(a & (b ^ c)) --> ~(a | b) | (a ^ b ^ c)
(~(a & b) | c) & ~(a & (b & c)) -> ~(a & b)
(~(a & b) | c) & ~(b & (a & c)) -> ~(a & b)
(~a | b | c) & ~(a & b & c) -> ~a | (b ^ c)
(~a | b | c) & ~(a & b) -> (c & ~b) | ~a
```
When folding and/or of icmps, look through add of a constant and
adjust the icmp range instead. Effectively, this decomposes
X + C1 < C2 style range checks back into a normal range. This allows
us to fold comparisons involving two range checks or one range check
and some other condition. We had a fold for a really specific case
of this (or of range check and eq, and only one one side!) while
this handles it in fully generality.
Differential Revision: https://reviews.llvm.org/D113510
Since there is just a single check for LHS in ~(A | B) & C | ...
transforms and multiple RHS checks inside with more coming I am
removing m_OneUse checks for LHS and adding new checks for RHS.
This is non essential as long as there is total benefit.
In addition (~(A | B) & C) | (~(A | C) & B) --> (B ^ C) & ~A
checks were overly restrictive, it should be good without any
additional checks.
Differential Revision: https://reviews.llvm.org/D113141
A problem was noted in the post-commit review for
c36b7e21bd / D113035 :
If the source type is not integer or integer vector,
then we could crash when trying to ComputeNumSignBits().
Previously, InstCombine detected a pair of llvm.stacksave/stackrestore
instructions that are adjacent modulo debug instructions in order to
eliminate the llvm.stackrestore. This precludes situations where
intervening instructions (e.g. loads) preclude the llvm.stacksave and
llvm.stackrestore from becoming adjacent. This commit extends the logic
and allows for eliminating the llvm.stackrestore when the range of
instructions between them does not include any alloca or side-effect
causing instructions.
Signed-off-by: Itay Bookstein <itay.bookstein@nextsilicon.com>
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D113105
Op0 - umax(X, Op0) --> 0 - usub.sat(X, Op1)
I'm not sure if this is really an improvement in IR because
we probably have better recognition/analysis for min/max,
but this lines up with the fold we do for the icmp+select
idiom and removes another diff from D98152.
This is similar to the previous fold in the code that was
added with:
83c2fb9f66baa6a85130https://alive2.llvm.org/ce/z/5MrVB9
(Cond & C) | (~bitcast(Cond) & D) --> bitcast (select Cond, (bc C), (bc D))
This is part of fixing:
https://llvm.org/PR34047
That report shows a case where a bitcast is sitting between the select condition
candidate and its 'not' value due to current cast canonicalization rules.
There's a bitcast type restriction that might be violated in existing matching,
but I still need to investigate if that is possible -
Alive2 shows we can only do this transform safely when the bitcast is from
narrow to wide vector elements (otherwise poison could leak into elements
that were safe in the original code):
https://alive2.llvm.org/ce/z/Hf66qh
Differential Revision: https://reviews.llvm.org/D113035
InstCombine converts range tests of the form (X > C1 && X < C2) or
(X < C1 || X > C2) into checks of the form (X + C3 < C4) or
(X + C3 > C4). It is possible to express all range tests in either
of these forms (with different choices of constants), but currently
neither of them is considered canonical. We may have equivalent
range tests using either ult or ugt.
This proposes to canonicalize all range tests to use ult. An
alternative would be to canonicalize to either ult or ugt depending
on the specific constants involved -- e.g. in practice we currently
generate ult for && style ranges and ugt for || style ranges when
going through the insertRangeTest() helper. In fact, the "clamp like"
fold was relying on this, which is why I had to tweak it to not
assume whether inversion is needed based on just the predicate.
Proof: https://alive2.llvm.org/ce/z/_SP_rQ
Differential Revision: https://reviews.llvm.org/D113366
As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this
fold is incorrect, because inbounds only guarantees that the
pointers don't wrap in the unsigned space: It is possible that
the sign boundary is crossed by an object.
I'm dropping the fold entirely rather than adjusting it, because
computePointerICmp() fully subsumes it (just with correct predicate
handling).
Differential Revision: https://reviews.llvm.org/D113343
umax(X, Op1) - Op1 --> usub.sat(X, Op1)
https://alive2.llvm.org/ce/z/HpcGiJ
This happens in 2 or more steps with an icmp-select idiom
instead of an intrinsic. This is another step towards
canonicalization of the min/max intrinsics. See:
D98152
There is a combine in instcombine to transform a saturated add/sub into
a saddsat/ssubsat, currently handling inputs which are both sign
extended (https://alive2.llvm.org/ce/z/68qpTn). This can generalize to,
for example ashr of at least the bitwidth (https://alive2.llvm.org/ce/z/4TFyX-
and https://alive2.llvm.org/ce/z/qDWzFs for example). Which means it
generalizes further to "the number of sign bits", needing to be enough
to truncate to the size of the saturate. (An example using `or` for
instance: https://alive2.llvm.org/ce/z/EI_h_A).
So this patch makes use of ComputeNumSignBits (with the newly added
ComputeMinSignedBits) in matchSAddSubSat to generalize the fold to any
inputs with enough sign bits known, truncating the inputs to the new
size of the saturate.
Differential Revision: https://reviews.llvm.org/D112298
The added test has poison lanes due to the vector shuffle. This can
cause an infinite loop of combines in instcombine where it folds
xor(ashr, -1) -> select (icmp slt 0), -1, 0 -> sext (icmp slt 0) -> xor(ashr, -1).
We usually prevent this by checking that the xor constant is not -1,
but with vectors some of the lanes may be -1, some may be poison. So
this changes the way we detect that from "!C1->isAllOnesValue()" to
"!match(C1, m_AllOnes())", which is more able to detect that some of the
lanes are poison.
Fixes PR52397
In D71220 a pattern was added to replace shuffle's insertelement operand
if inserted scalar is not demanded. The pattern was added only for
the case where the shuffle's mask size is equal to element's vector size.
However, that condition is not required because the pattern does not
change the shuffle vector size.
This patch extends the pattern to also include cases where shuffle's mask
size is not equal to element's vector size.
Differential Revision: https://reviews.llvm.org/D112318
The tests diffs are logically equivalent, and so this is
generally NFC, but this makes the code match the code
comment.
It should also be more efficient. If we choose the 'not'
operand (rather than the 'not' instruction) as the select
condition, then we don't have to invert the select
condition/operands as a subsequent transform.
Similar to 54e969cffd (and with cosmetic updates to hopefully
make that easier to read), this fold has been around since early
in LLVM history.
Intermediate folds have been added subsequently, so extra uses
are required to exercise this code.
The test example actually shows an unintended consequence with
extra uses - we end up with an extra instruction compared to what
we started with. But this at least makes scalar/vector consistent.
General proof:
https://alive2.llvm.org/ce/z/tmuBza
This fold was added long ago (part of fixing PR4216),
and it matched scalars only. Intermediate folds have
been added subsequently, so extra uses are required
to exercise this code.
General proof:
https://alive2.llvm.org/ce/z/G6BBhB
One of the specific tests:
https://alive2.llvm.org/ce/z/t0JhEB
The extra uses are needed to prevent intermediate folds.
Without that, there would be no coverage currently.
The vector tests show an artificial limitation in the code.