This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.
We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.
This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.
This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.
This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
The current code structure results in cases where if a) we can't clone
the IV user (because it's not in our whitelist) or b) can't prove the
SCEV expressions are identical, we'd sometimes leave both the original
unwiddened IV and the partially widdened IV in code. Instead, just
truncate thw wide IV to the use - same as what we'd do if we couldn't
find an addrec to start with.
Noticed this while playing with changing how we produce addrecs. The
current structure results in a very tight interlock between SCEVs
internal capabilities and indvars code.
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.
Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
IndVars has the existing notion of a narrow definition which is known to
positive and thus both sign and zero extension kinds are actually the
same operations. There's existing logic for forming a SCEV based on the
extension kind and the no-wrap flags. This change extends that logic to
form the opposite extension kind for a positive def if doing so is
allowed by the flags. Note that we already do something analogous for
the getWideRecurrence case as well.
Adding test coverage in advance of upcoming changes. Note that these
tests specifically use unsigned comparisons for the backends, the
signed versions are fairly well handled by existing logic.
zext nneg was recently added to the IR in #67982. This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.
The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice. For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways. The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR. Those are exactly the cases where
using zext nneg are most useful. We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.
Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.
zext nneg was recently added to the IR in #67982. Teaching SCEVExpander
to emit nneg when possible is valuable since SCEV may have proved
non-trivial facts about loop bounds which would otherwise be lost when
materializing the value.
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.
One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.
Differential Revision: https://reviews.llvm.org/D141060
When a SCEVCallbackVH is RAUWed, we currently do a def-use walk and
remove dependent instructions from the ValueExprMap. However, unlike
SCEVs usual invalidation, this does not forget memoized values.
The end result is that we might end up removing a SCEVUnknown from the
map, while that expression still has users. Due to that, we may later
fail to invalide those expressions. In particular, invalidation of loop
dispositions only does something if there is an expression for the
value, which would not be the case here.
Fix this by using the standard forgetValue() API, instead of rolling a
custom variant.
Fixes https://github.com/llvm/llvm-project/issues/68285.
Unforuntately, the assumption underlying this optimization is
incorrect for getSCEVAtScope(): A SCEVUnknown instruction with
operands that have constant loop exit values can evaluate to
a constant, thus creating a dependency from an "always unknown"
instruction.
Losing this optimization is quite unfortunate, but it doesn't
seem like there is any simple workaround for this.
Fixes#68260.
This reverts commit 3ddd1ffb72.
The only thing we care about here is that we don't exit on the
first iteration. Whether the BTC is large enough to overflow the
signed integer space is not relevant.
SCEVExpander tries to reuse existing instruction with the same
SCEV expression. However, doing this replacement blindly is not
safe, because the instruction might be more poisonous.
What we were already doing is to drop poison-generating flags on
the reused instruction. But this is not the only way that more
poison can be introduced. The poison-generating flag might not
be directly on the reused instruction, or the poison contribution
might come from something like 0 * %var, which folds to 0 but can
still introduce poison.
This patch fixes the issue in a principled way, by determining which
values can contribute poison to the SCEV expression, and then
checking whether any additional values can contribute poison to the
instruction being reused. Poison-generating flags are dropped if
doing that enables reuse.
This is a pretty big hammer and does cause some regressions in
tests, but less than I would have expected. I wasn't able to come
up with a less intrusive fix that still satisfies the correctness
requirements.
Fixes https://github.com/llvm/llvm-project/issues/63763.
Fixes https://github.com/llvm/llvm-project/issues/63926.
Fixes https://github.com/llvm/llvm-project/issues/64333.
Fixes https://github.com/llvm/llvm-project/issues/63727.
Differential Revision: https://reviews.llvm.org/D158181
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
replaceCongruentIVs analysis is based on ScalarEvolution; this makes
comparing different PHIs and performing the replacement straightforward.
However, it can have some side-effects: it isn't aware whether an
induction variable is in canonical form, so it can perform replacements
which obscure the meaning of the IR.
In test22 in widen-loop-comp.ll, the resulting loop can't be analyzed by
ScalarEvolution at all.
My attempted solution is to restrict the transform: don't try to replace
induction variables using PHI nodes that don't represent simple
induction variables.
I'm not sure if this is the best solution; suggestions welcome.
Differential Revision: https://reviews.llvm.org/D121950
The backstory is that the LCSSA invalidation we perform here is not
really necessary from a SCEV perspective. However, other code may
rely on the fact that invalidating only LCSSA phi nodes is sufficient
for transforms like loop peeling
(see https://reviews.llvm.org/D149331#4398582 for more details).
However, performing invalidation during LCSSA construction also
means that SCEV expansion (which may need to construct LCSSA) can
invalidate SCEV, which is somewhat unexpected and code may not be
prepared to deal with it (see the added test case, reported at
https://reviews.llvm.org/D149435#4428219).
Instead of invalidating SCEV, ensure that the LCSSA phi node also
has cached SCEV if the original instruction did. This means that
later invalidation of LCSSA phi nodes will work as expected. This
should avoid both the above issues and be more efficient.
Differential Revision: https://reviews.llvm.org/D153145
Make ValueTracking directly call the KnownBits shift helpers, which
provides more precise results.
Unfortunately, ValueTracking has a special case where sometimes we
determine non-zero shift amounts using isKnownNonZero(). I have my
doubts about the usefulness of that special-case (it is only tested
in a single unit test), but I've reproduced the special-case via an
extra parameter to the KnownBits methods.
Differential Revision: https://reviews.llvm.org/D151816
This is a follow-up to b71edfaa4e
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
While pointers in address space 7 (128 bit rsrc + 32 bit offset)
should be rewritten out of the code before IR translation on AMDGPU,
higher-level analyses may still call MVT getPointerTy() and the like
on the target machine. Currently, since there is no MVT::i160, this
operation ends up causing crashes.
The changes to the data layout that caused such crashes were D149776.
This patch causes getPointerTy() to return the type MVT::v5i32
and getPointerMemTy() to be MVT::v8i32. These are accurate types,
but mean that we can't use vectors of address space 7 pointers during
codegen. This is mostly OK, since vectors of buffers aren't supported
in LPC anyway, but it's a noticable limitation.
Potential alternative solutions include adjusting getPointerTy() to return
an EVT or adding MVT::i160 and MVT::i256, both of which are rather
disruptive to the rest of the compiler.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D150002
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.
The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.
The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.
Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.
This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.
Depends on D143437
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D145441
The highest address the object can start is ObjSize bytes before the
end (unsigned max value). If this value is not a multiple of the
alignment, the last possible start value is the next lowest multiple
of the alignment. Note: The computations cannot overflow,
because if they would there's no possible start address for the
object.
At the moment, this is limited to GlobalVariables, because I could not
find a API similar to getObjectSize to also get the alignment of the
object. With such an API, this can be generalized to general addresses.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149483
Sometimes a phi can both be trivial and match the
createNodeFromSelectLikePHI() fold. In that case it is generally
more profitable to look through the phi node.
This lifts two TODOs from this function, allowing us to prove
no-overflow whether it happens through max int (up) or through
min int (down) for both and and sub.
Differential Revision: https://reviews.llvm.org/D148618
Reviewed By: dmakogon