Commit Graph

742 Commits

Author SHA1 Message Date
Krishna Kariya
da92e86263 [InstCombine] Fold IntToPtr/PtrToInt to bitcast
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support
to specific cases where this optimization works. This patch is the
first one on this line.

Consider the example:

    %i = ptrtoint i8* %X to i64
    %p = inttoptr i64 %i to i16*
    %cmp = icmp eq i8* %load, %p

In this specific case, the inttoptr/ptrtoint optimization is correct
as it only compares the pointer values. In this patch, we fold
inttoptr/ptrtoint to a bitcast (if src and dest types are different).

Differential Revision: https://reviews.llvm.org/D105088
2021-07-18 23:13:25 +02:00
Sanjay Patel
ca6e117d86 [InstCombine] reorder icmp with offset folds for better results
This set of folds was added recently with:
c7b658aeb5
0c400e8953
40b752d28d

...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.

This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.
2021-07-14 12:12:05 -04:00
Sanjay Patel
a488c7879e [InstCombine] reduce signbit test of logic ops to cmp with zero
This is the pattern from the description of:
https://llvm.org/PR50816

There might be a way to generalize this to a smaller or more
generic pattern, but I have not found it yet.

https://alive2.llvm.org/ce/z/ShzJoF

define i1 @src(i8 %x) {
  %add = add i8 %x, -1
  %xor = xor i8 %x, -1
  %and = and i8 %add, %xor
  %r = icmp slt i8 %and, 0
  ret i1 %r
}

define i1 @tgt(i8 %x) {
  %r = icmp eq i8 %x, 0
  ret i1 %r
}
2021-07-12 09:01:26 -04:00
Sanjay Patel
40b752d28d [InstCombine] fold icmp slt/sgt of offset value with constant
This follows up patches for the unsigned siblings:
0c400e8953
c7b658aeb5

We are translating an offset signed compare to its
unsigned equivalent when one end of the range is
at the limit (zero or unsigned max).

(X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1)
(X + C2) <s C --> X >u (C ^ SMAX) (if C == C2)

This probably does not show up much in IR derived
from C/C++ source because that would likely have
'nsw', and we have folds for that already.

As with the previous unsigned transforms, the folds
could be generalized to handle non-constant patterns:

https://alive2.llvm.org/ce/z/Y8Xrrm

  ; sgt
  define i1 @src(i8 %a, i8 %c) {
    %c2 = add i8 %c, 1
    %t = add i8 %a, %c2
    %ov = icmp sgt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_off = sub i8 127, %c ; SMAX
    %ov = icmp ult i8 %a, %c_off
    ret i1 %ov
  }

https://alive2.llvm.org/ce/z/c8uhnk

  ; slt
  define i1 @src(i8 %a, i8 %c) {
    %t = add i8 %a, %c
    %ov = icmp slt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_offnot = xor i8 %c, 127 ; SMAX
    %ov = icmp ugt i8 %a, %c_offnot
    ret i1 %ov
  }
2021-07-05 10:08:31 -04:00
Sanjay Patel
0c400e8953 [InstCombine] fold icmp ult of offset value with constant
This is one sibling of the fold added with c7b658aeb5 .

(X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN)
I'm still not sure how to describe it best, but we're
translating 2 constants from an unsigned range comparison
to signed because that eliminates the offset (add) op.

This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/K-fMBf

  define i1 @src(i8 %a, i8 %c2) {
    %t = add i8 %a, %c2
    %c = add i8 %c2, 128 ; SMIN
    %ov = icmp ult i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c2) {
    %not_c2 = xor i8 %c2, -1
    %ov = icmp sgt i8 %a, %not_c2
    ret i1 %ov
  }
2021-06-30 19:00:12 -04:00
Sanjay Patel
c7b658aeb5 [InstCombine] fold icmp of offset value with constant
There must be a better way to describe this pattern in words?
(X + C2) >u C --> X <s -C2 (if C == C2 + SMAX)

This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/rdfNFP

  define i1 @src(i8 %a, i8 %c1) {
    %t = add i8 %a, %c1
    %c2 = add i8 %c1, 127 ; SMAX
    %ov = icmp ugt i8 %t, %c2
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c1) {
    %neg_c1 = sub i8 0, %c1
    %ov = icmp slt i8 %a, %neg_c1
    ret i1 %ov
  }

The pattern was noticed as a by-product of D104932.
2021-06-30 13:37:31 -04:00
Sanjay Patel
9d0bf7699c [InstCombine] don't try to fold a constant expression that can trap (PR50906)
We could use a bigger hammer and bail out on any constant
expression, but there's a regression test that appears to
validly do the transform (although it may not have been
intending to check that optimization).
2021-06-28 17:00:21 -04:00
Nikita Popov
fdd4c199a1 Revert "[InstCombine] Make indexed compare fold opaque ptr compatible"
This reverts commit 5cb20ef8a2.

Assertion failures with this patch were reported on
https://reviews.llvm.org/rG5cb20ef8a235, revert for now.
2021-06-26 00:32:59 +02:00
Eli Friedman
8d5bf0709d [NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion
The implementation is identical, but it makes the semantics a bit more
obvious.
2021-06-25 14:43:13 -07:00
Nikita Popov
5cb20ef8a2 [InstCombine] Make indexed compare fold opaque ptr compatible
Rather than relying on pointer type equality (which, for a change,
is silently incorrect with opaque pointers) check that the GEP
source element types match.
2021-06-24 22:33:01 +02:00
Juneyoung Lee
c038845f58 [InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified
This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified.

Resolves llvm.org/pr48975.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D96663
2021-06-21 17:39:05 +09:00
hyeongyukim
69b0ed9a0a [InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.

```
   // assume that GV is an array of 4-byte elements
   GEP = gep GV, 0, Idx // this is accessing Idx * 4
   L = load GEP
   ICI = icmp eq L, value
 =>
   ICI = icmp eq Idx, NewIdx
```

The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.

Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.

This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.

```
   ...
 =>
   Idx' = and Idx, 0x3F..FF
   ICI = icmp eq Idx', NewIdx
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D99481
2021-06-17 19:46:17 +09:00
Nathan Chancellor
e6b086bef2 Revert "[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)"
This reverts commit 4f2fd3818b.

The Linux kernel fails to build after this commit. See
https://reviews.llvm.org/D99481 for a reproducer.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2021-05-31 20:21:26 -07:00
Hyeongyu Kim
4f2fd3818b [InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210)
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.

```
   // assume that GV is an array of 4-byte elements
   GEP = gep GV, 0, Idx // this is accessing Idx * 4
   L = load GEP
   ICI = icmp eq L, value
 =>
   ICI = icmp eq Idx, NewIdx
```

The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.

Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.

This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.

```
   ...
 =>
   Idx' = and Idx, 0x3F..FF
   ICI = icmp eq Idx', NewIdx
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D99481
2021-05-31 14:08:20 +09:00
Nikita Popov
9a9421a461 Reapply [InstCombine] Fold multiuse shr eq zero
This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).

-----

The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-05-22 14:46:50 +02:00
Sanjay Patel
a6f79b5671 [InstCombine] avoid infinite loops with select/icmp transforms
This fixes https://llvm.org/PR48900 , but as seen in the
regression tests prevents some optimizations.

There are a few options to restore those (switch to min/max
intrinsics, add larger pattern matching for select with
dominating condition, improve CVP), but we need to prevent
the bug 1st.
2021-05-04 11:54:06 -04:00
Nikita Popov
24e9fbc1a3 Revert "[InstCombine] Fold multiuse shr eq zero"
This reverts commit 9423f78240.

A performance regression with this patch has been reported at
https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.
2021-04-21 21:40:52 +02:00
Reid Kleckner
91f7a4fff7 Revert "[InstCombine] Recognize ((x * y) s/ x) !=/== y as an signed multiplication overflow check (PR48769)"
This reverts commit 13ec913bdf.

This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
2021-04-20 15:53:34 -07:00
Roman Lebedev
13ec913bdf [InstCombine] Recognize ((x * y) s/ x) !=/== y as an signed multiplication overflow check (PR48769)
We already had support for it's unsigned variant, so simply extend it
to also handle the signed variant.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48769
2021-04-20 21:29:43 +03:00
Nikita Popov
9423f78240 [InstCombine] Fold multiuse shr eq zero
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-04-19 22:13:11 +02:00
Mehrnoosh Heidarpour
29f189f90d [InstCombine] Conditionally emit nowrap flags when combining two adds
Currently, the InstCombineCompare is combining two add operations
into a single add operation which always has a nsw flag, without
checking the conditions to see if this flag should be present
according to the original two add operations or not.

This patch will change the InstCombineCompare to emit the nsw or
nuw only when these flags are allowed to be generated according to
the original add operations and remove the possibility of applying
wrong optimization with passes that will perform on the IR later
in the pipeline.

To confirm that the current results are buggy and the results after
proposed patch are the correct IR the following examples from Alive2
are attached; the same results can be seen in the case of nuw flag
and nsw is just used as an example. The following link shows that
the generated IR with current LLVM is a buggy IR when none of the
original add operations have nsw flag.
https://alive2.llvm.org/ce/z/WGaDrm
The following link proves that the generated IR after the patch in
the former case is the correct IR.
https://alive2.llvm.org/ce/z/wQ7G_e

Differential Revision: https://reviews.llvm.org/D100095
2021-04-14 20:53:06 +02:00
Sanjay Patel
5354a213a0 [InstCombine] fold shift+trunc signbit check
https://alive2.llvm.org/ce/z/6vQvrP

This solves:
https://llvm.org/PR49866
2021-04-12 16:19:43 -04:00
Sanjay Patel
85294703a7 [InstCombine] fold fcmp-of-copysign idiom
As discussed in:
https://llvm.org/PR49179
...this pattern shows up in library code.
There are several potential generalizations as noted,
but we need to be careful that we get FP special-values
right, and it's not clear how much variation we should
expect to see from this exact idiom.
2021-02-17 10:32:33 -05:00
Roman Lebedev
4ed0d8f2f0 [NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate()
I'd like to use it in an upcoming fold.
2021-01-22 17:23:53 +03:00
Sanjay Patel
288f3fc5df [InstCombine] reduce icmp(ashr X, C1), C2 to sign-bit test
This is a more basic pattern that we should handle before trying to solve:
https://llvm.org/PR48640

There might be a better way to think about this because the pre-condition
that I came up with (number of sign bits in the compare constant) misses a
potential transform for each of ugt and ult as commented on in the test file.

Tried to model this is in Alive:
https://rise4fun.com/Alive/juX1
...but I couldn't get the ComputeNumSignBits() pre-condition to work as
expected, so replaced with leading 0/1 preconditions instead.

  Name: ugt
  Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
  %a = ashr %x, C1
  %r = icmp ugt i8 %a, C2
    =>
  %r = icmp slt i8 %x, 0

  Name: ult
  Pre: countLeadingZeros(C2) <= C1 && countLeadingOnes(C2) <= C1
  %a = ashr %x, C1
  %r = icmp ult i4 %a, C2
    =>
  %r = icmp sgt i4 %x, -1

Also approximated in Alive2:
https://alive2.llvm.org/ce/z/u5hCcz
https://alive2.llvm.org/ce/z/__szVL

Differential Revision: https://reviews.llvm.org/D94014
2021-01-11 15:53:39 -05:00
Florian Hahn
c701f85c45 [STLExtras] Use return type from operator* of the wrapped iter.
Currently make_early_inc_range cannot be used with iterators with
operator* implementations that do not return a reference.

Most notably in the LLVM codebase, this means the User iterator ranges
cannot be used with make_early_inc_range, which slightly simplifies
iterating over ranges while elements are removed.

Instead of directly using BaseT::reference as return type of operator*,
this patch uses decltype to get the actual return type of the operator*
implementation in WrappedIteratorT.

This patch also updates a few places to use make use of
make_early_inc_range.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D93992
2021-01-10 14:41:13 +00:00
Kazu Hirata
33bf1cad75 [llvm] Use *Set::contains (NFC) 2021-01-07 20:29:34 -08:00
Juneyoung Lee
29f8628d1f [Constant] Add containsPoisonElement
This patch

- Adds containsPoisonElement that checks existence of poison in constant vector elements,
- Renames containsUndefElement to containsUndefOrPoisonElement to clarify its behavior & updates its uses properly

With this patch, isGuaranteedNotToBeUndefOrPoison's tests w.r.t constant vectors are added because its analysis is improved.

Thanks!

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94053
2021-01-06 12:10:33 +09:00
Simon Pilgrim
313d982df6 [IR] Add ConstantInt::getBool helpers to wrap getTrue/getFalse. 2021-01-05 11:01:10 +00:00
Simon Pilgrim
89abe1cf83 [InstCombine] foldICmpUsingKnownBits - use KnownBits signed/unsigned getMin/MaxValue helpers. NFCI.
Replace the local compute*SignedMinMaxValuesFromKnownBits methods with the equivalent KnownBits helpers to determine the min/max value ranges.
2020-12-24 14:22:26 +00:00
Jun Ma
e12f584578 [InstCombine] Remove scalable vector restriction in InstCombineCompares
Differential Revision: https://reviews.llvm.org/D93269
2020-12-15 20:36:57 +08:00
LemonBoy
42732d33cc [InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D89628
2020-11-09 14:41:07 +03:00
Roman Lebedev
8d0fdd36a3 [IR] CmpInst: Add getFlippedSignednessPredicate()
And refactor a few places to use it
2020-11-06 11:31:09 +03:00
Sanjay Patel
5a6e66ec72 [InstCombine] add folds for icmp+ctpop
https://alive2.llvm.org/ce/z/XjFPQJ

  define void @src(i64 %value) {
    %t0 = call i64 @llvm.ctpop.i64(i64 %value)
    %gt = icmp ugt i64 %t0, 63
    %lt = icmp ult i64 %t0, 64
    call void @use(i1 %gt, i1 %lt)
    ret void
  }

  define void @tgt(i64 %value) {
    %eq = icmp eq i64 %value, -1
    %ne = icmp ne i64 %value, -1
    call void @use(i1 %eq, i1 %ne)
    ret void
  }

  declare i64 @llvm.ctpop.i64(i64) #1
  declare void @use(i1, i1)
2020-10-26 16:48:56 -04:00
Sanjay Patel
437d7551c5 [InstCombine] reduce code duplication in icmp intrinsic folds; NFC 2020-10-26 16:48:56 -04:00
Caroline Concatto
2415636475 [SVE]Clarify TypeSize comparisons in llvm/lib/Transforms
Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.

This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.

Differential Revision: https://reviews.llvm.org/D89703
2020-10-23 09:15:17 +01:00
Simon Pilgrim
17b9a91ec2 [InstCombine] canRewriteGEPAsOffset - don't dereference a dyn_cast<>. NFCI.
We know V is a IntToPtrInst or PtrToIntInst type so we know its a CastInst - so use cast<> directly.

Prevents clang static analyzer warning that we could deference a null pointer.
2020-10-06 14:48:34 +01:00
Simon Pilgrim
567049f892 [InstCombine] Use m_FAbs matcher helper. NFCI. 2020-10-01 14:42:34 +01:00
Huihui Zhang
9ad6049736 [InstCombine][SVE] Skip scalable type for InstCombiner::getFlippedStrictnessPredicateAndConstant.
We cannot iterate on scalable vector, the number of elements is unknown at compile-time.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87918
2020-09-18 11:26:36 -07:00
Nikita Popov
f6b87da0c7 [InstCombine] Fold comparison of abs with int min
If the abs is poisoning, this is already folded to true/false.
For non-poisoning abs, we can convert this to a comparison with
the operand.
2020-09-08 20:23:03 +02:00
Sanjay Patel
7a6d6f0f70 [InstCombine] improve folds for icmp with multiply operands (PR47432)
Check for no overflow along with an odd constant before
we lose information by converting to bitwise logic.

https://rise4fun.com/Alive/2Xl

  Pre: C1 != 0
  %mx = mul nsw i8 %x, C1
  %my = mul nsw i8 %y, C1
  %r = icmp eq i8 %mx, %my
  =>
  %r = icmp eq i8 %x, %y

  Name: nuw ne
  Pre: C1 != 0
  %mx = mul nuw i8 %x, C1
  %my = mul nuw i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y

  Name: odd ne
  Pre: C1 % 2 != 0
  %mx = mul i8 %x, C1
  %my = mul i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y
2020-09-07 12:40:37 -04:00
Nikita Popov
ada8a17d94 [InstCombine] Fold abs intrinsic eq zero
Following the same transform for the select version of abs.
2020-09-05 15:11:38 +02:00
Christopher Tetreault
640f20b0c7 [SVE] Remove calls to VectorType::getNumElements from InstCombine
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82237
2020-08-31 12:59:10 -07:00
Roman Lebedev
e65f213178 [InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Benjamin Kramer
b98e25b6d7 Make helpers static. NFC. 2020-08-19 16:00:03 +02:00
Roman Lebedev
a512c89476 [NFC][InstCombine] Refactor '(-NSW x) pred x' fold 2020-08-06 11:50:36 +03:00
Roman Lebedev
141357663e [InstCombine] (-NSW x) u<= x --> x s<=0 (PR39480)
Name: (-x) u<= x  -->  x s<= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ule i8 %neg_x, %x
  =>
%r = icmp sle i8 %x, 0

https://rise4fun.com/Alive/V22

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev
132be1f502 [InstCombine] (-NSW x) u< x --> x s< 0 (PR39480)
Name: (-x) u< x  -->  x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ult i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 0

https://rise4fun.com/Alive/zSuf

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev
0e1241a3c9 [InstCombine] (-NSW x) u>= x --> x s>= 0 (PR39480)
Name: (-x) u>= x  -->  x s>= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp uge i8 %neg_x, %x
  =>
%r = icmp sge i8 %x, 0

https://rise4fun.com/Alive/LLHd

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev
16c642fa39 [InstCombine] (-NSW x) u> x --> x s> 0 (PR39480)
Name: (-x) u> x  -->  x s> 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ugt i8 %neg_x, %x
  =>
%r = icmp sgt i8 %x, 0

https://rise4fun.com/Alive/Raea

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00