There is a missed optimization in
``` llvm
define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
%2 = add i8 %x, -1
%3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
%4 = lshr i8 -1, %3
%5 = add i8 %4, 1
%6 = icmp ugt i8 %x, 1
%p = select i1 %6, i8 %5, i8 1
%r = urem i8 %y, %p
ret i8 %r
}
```
which is extracted from the Rust code
``` rust
fn func(x: usize, y: usize) -> usize {
let z = x.next_power_of_two();
y % z
}
```
Here `%p` (a.k.a `z`) is semantically a power-of-two, so `y urem p` can
be optimized to `y & (p - 1)`. (Alive2 proof:
https://alive2.llvm.org/ce/z/H3zooY)
---
It could be generalized to recognizing `LShr(UINT_MAX, Y) + 1` as a
power-of-two, which is what this PR does.
Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc