When the sub arguments are ptr2int it is not possible to determine
computeKnownBits() of its arguments.
For scalar case generally sub of 2 ptr2int are converted to sub of
indexes.
However a loop with recursive GEP/PHI where the arguments to sub is of
type ptr2int, if it is possible to determine that a sub of this GEP and
another pointer with the same base is KnownNonZero we can return this.
This helps subsequent passes to optimize the loop further.
We have a bunch of folds that basically perform X pred Y to ~Y pred ~X
for various special cases where this saves an instruction.
Generalize these folds to use isFreeToInvert(). We have to make sure
that we consume an instruction in either of the inversions, otherwise
we're just going to swap the icmp back and forth.
Fixes https://github.com/llvm/llvm-project/issues/74302.
We can cover more cases by directly checking if the result is
known-nonzero for common patterns when they are missing `OrZero`.
This patch add `isKnownNonZero` checks for `shl`, `lshr`, `and`, and `mul`.
Differential Revision: https://reviews.llvm.org/D157309
1) If LHS is constant:
- The low bits of the LHS is set, the lower bound is non-zero
- The upper bound can be capped at popcount(LHS) high bits
2) If RHS is constant:
- The upper bound can be capped at (Width - RHS) high bits
Bug was introduced in: https://reviews.llvm.org/D157807
The prior logic assumed that the information from the knownbits
returned from analyzing the `icmp` and its operands in the context
basicblock would be consistent.
This is not necessarily the case if we are analyzing deadcode.
For example with `(icmp sgt (select cond, 0, 1), -1)`. If we take
knownbits for the `select` using knownbits from the operator, we will
know the signbit is zero. If we are analyzing a not-taken from based
on the `icmp` (deadcode), we will also "know" that the `select` must
be negative (signbit is one). This will result in a conflict in
knownbits.
The fix is to just give up on analying the phi-node if its deadcode. We 1) don't want to waste time continuing to analyze it and 2) will be removing it (and its dependencies) later.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D158960
Use the comparison based analysis to strengthen the standard
knownbits analysis rather than choosing either/or.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D157807
This is basically a copy and paste of the same logic we do in
`computeKnownBits` but adapts it for just `isKnownNonZero`.
Differential Revision: https://reviews.llvm.org/D157801
Just fill in missing cases (TODO) for `ugt`, `uge`, `sgt`, `sge`,
`slt`, and `sle`. These are all in the same spirit as `ult`/`uge`, but
each of the other conditions have different constraints.
Proofs: https://alive2.llvm.org/ce/z/gnj4o-
Differential Revision: https://reviews.llvm.org/D157800
This check is redundant as it is covered by the call to
isPotentiallyReachable.
Depends on D155726.
Differential Revision: https://reviews.llvm.org/D155718
Set phi inputs to poison whenever we find a dead edge (either
during initial worklist population or the main InstCombine run),
instead of only doing this for successors of dead blocks.
This means that the phi operand is set to poison even if for
critical edges without an intermediate block.
There are quite a few test changes, because the pattern is fairly
common in vectorizer output, for cases where we know the vectorized
loop will be entered.
InstCombine is a worklist-driven algorithm, which works roughly
as follows:
* All instructions are initially pushed to the worklist.
The initial order is in RPO program order.
* All newly inserted instructions get added to the worklist.
* When an instruction is folded, its users get added back to the
worklist.
* When the use-count of an instruction decreases, it gets added
back to the worklist.
* And a few of other heuristics on when we should revisit
instructions.
On top of the worklist algorithm, InstCombine layers an additional
fix-point iteration: If any fold was performed in the previous
iteration, then InstCombine will re-populate the worklist from
scratch and fold the entire function again. This continues until
a fix-point is reached.
In the vast majority of cases, InstCombine will reach a fix-point
within a single iteration: However, a second iteration is performed
to verify that this is indeed the fixpoint. We can see this in the
statistics for llvm-test-suite:
"instcombine.NumOneIteration": 411380,
"instcombine.NumTwoIterations": 117921,
"instcombine.NumThreeIterations": 236,
"instcombine.NumFourOrMoreIterations": 2,
The way to read these numbers is that in 411380 cases, InstCombine
performs no folds. In 117921 cases it performs a fold and reaches
the fix-point within one iteration (the second iteration verifies
the fixpoint). In the remaining 238 cases, more than one iteration
is needed to reach the fixpoint.
In other words, only in 0.04% of cases are additional iterations
needed to reach a fixpoint. Conversely, in 22.3% of cases InstCombine
performs a completely useless extra iteration to verify the fix point.
This patch removes the fixpoint iteration from InstCombine, and always
only perform a single iteration. This results in a major compile-time
improvement of around 4% at negligible codegen impact.
This explicitly does accept that we will not reach a fixpoint in all
cases. However, this is mitigated by two factors: First, the data
suggests that this happens very rarely in practice. Second,
InstCombine runs many times during the optimization pipeline
(8 times even without LTO), so there are many chances to recover
such cases.
In order to prevent accidental optimization regressions in the
future, this implements a verify-fixpoint option, which is enabled
by default when instcombine is specified in -passes and disabled
when InstCombinePass() is constructed from C++. This means that
test cases need to explicitly use the no-verify-fixpoint option
if they fail to reach a fixed point (for a well understand reason
we cannot / do not want to avoid).
Differential Revision: https://reviews.llvm.org/D154579
Support the canonical range check pattern for KnownBits assumptions.
This is the same as the generic ConstantRange handling, just shifted
by an offset.
Make ValueTracking directly call the KnownBits shift helpers, which
provides more precise results.
Unfortunately, ValueTracking has a special case where sometimes we
determine non-zero shift amounts using isKnownNonZero(). I have my
doubts about the usefulness of that special-case (it is only tested
in a single unit test), but I've reproduced the special-case via an
extra parameter to the KnownBits methods.
Differential Revision: https://reviews.llvm.org/D151816
The knownbits implementation covers all the cases previously handled
by `uadd.sat`/`usub.sat` as well some additional ones. We previously
were not handling the `ssub.sat`/`sadd.sat` cases at all.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D150103
`abs` preserves the lowest set bit, so if we know the lowest set bit,
set it in the output.
As well, implement the case where the operand is known negative.
Reviewed By: foad, RKSimon
Differential Revision: https://reviews.llvm.org/D150100
We can more precisely determine the upper bits doing `MaxNum /
MinDenum` as opposed to only using the MSB.
As well, if the `exact` flag is set, we can sometimes determine some
of the low-bits.
Differential Revision: https://reviews.llvm.org/D150094