When the sub arguments are ptr2int it is not possible to determine
computeKnownBits() of its arguments.
For scalar case generally sub of 2 ptr2int are converted to sub of
indexes.
However a loop with recursive GEP/PHI where the arguments to sub is of
type ptr2int, if it is possible to determine that a sub of this GEP and
another pointer with the same base is KnownNonZero we can return this.
This helps subsequent passes to optimize the loop further.
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.
Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
Loop guards tend to provide better results when it comes to reasoning
about ranges than isLoopEntryGuardedByCond(). See the test change for
the motivating case.
I have retained both the loop guard check and the implied cond based
check for now, though the latter only seems to impact a single test and
only via side effects (nowrap flag calculation) at that.
depend_diff_types.ll already covers the same tests afer it hs been
converted to opaque pointersj, so remove the redundant
depend_diff_types_opaque_ptr.ll
We have a bunch of folds that basically perform X pred Y to ~Y pred ~X
for various special cases where this saves an instruction.
Generalize these folds to use isFreeToInvert(). We have to make sure
that we consume an instruction in either of the inversions, otherwise
we're just going to swap the icmp back and forth.
Fixes https://github.com/llvm/llvm-project/issues/74302.
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.
The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.
I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
Use the disjoint flag to convert or to add instead of calling the
haveNoCommonBitsSet() ValueTracking query. This ensures that we can
reliably undo add -> or canonicalization, even in cases where the
necessary information has been lost or is too complex to reinfer in
SCEV.
I have updated the bulk of the test coverage to add the necessary
disjoint flags in advance.
As shown in #70473, the following loop was not considered safe to
vectorize. When determining the memory access dependencies in
a loop which has negative iteration step, we invert the source and
sink of the dependence. Perhaps we should just invert the operands
to getMinusSCEV(). This way the dependency is not regarded to be
true, since the users of the `IsWrite` variables, which correspond to
each of the memory accesses, rely on program order and therefore
should not be swapped.
void vectorizable_Read_Write(int *A) {
for (unsigned i = 1022; i >= 0; i--)
A[i+1] = A[i] + 1;
}
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
When using `BranchProbabilityPrinterPass`, if a BB has no name, we get pretty unusable information like `edge -> has probability...` (i.e. we have no idea what the vertices of that edge are).
This patch uses `printAsOperand`, which uses the same naming scheme as `Function::dump`, so for example during debugging sessions, the IR obtained from a function and the names used by `BranchProbabilityPrinterPass` will match.
A shortcoming is that `printAsOperand` will result in the numbering algorithm re-running for every edge and every vertex (when `BranchProbabilityPrinterPass` is run on a function). If, for the given scenario, this is a problem, we can revisit this subsequently.
Another nuance is that the entry basic block will be numbered, which may be slightly confusing when it's anonymous, but it's easily identifiable - the first edge would have it as source (and the number should be easily recognizable)
After 9645267, TypeByteSize is 0 if both access do not have the same
size (i.e. HasSameSize will be false). This can cause an infinite loop
in couldPreventStoreLoadForward, if HasSameSize is not checked first.
So check HasSameSize first instead of after
couldPreventStoreLoadForward. Checking HasSameSize first is also
cheaper.
Auto-generate checks for -loop-carried.ll to make it easier to update in
follow-on patch. As this test only checks the dependence, mark pointers
as noalias to avoid also checking various runtime pointer check groups.
- Change `BranchProbabilityPrinterPass` output to match expectations of `update_analyze_test_checks.py`.
- Add `Branch Probability Analysis` to list of supported analyses.
- Process `llvm/test/Analysis/BranchProbabilityInfo/basic.ll` with `update_analyze_test_checks.py` as proof of concept. Leaving the other tests unchanged to reduce the amount of churn.
The current code incorrectly assumed that the absolute variable index
needs to be at least 1, if the variable is != 0. This is incorrect, in
case multiplying with Scale wraps.
The code below already checks for wrapping properly, so just remove the
incorrect assignment.
Fixes https://github.com/llvm/llvm-project/issues/72831.
This test is the last holdout that still uses the legacy loop simplify
CFG pass. The issues originally pointed out in the test comments seem to
have been fixed now as there are no MemorySSA verification failures.
Add an additional test case where we currently incorrectly identify a
dependence as Foward instead of ForwardButPreventsForwarding.
Also cleans up the names in the tests a bit to improve readability.
Avoids infinite issues in some upcoming patches to help D152928 - x86 sees a number of regressions that are addressed by extending SimplifyDemandedVectorEltsForTargetNode to cover more binop opcodes
This patch adds a new dependence kind UnsafeIndirect, for cases where at
least one of the memory access instructions may access a loop varying object,
e.g. the address of underlying object is loaded inside the loop, like A[B[i]].
We cannot determine direction or distance in those cases, and also are unable
to generate any runtime checks.
This fixes a miscompile, if we attempt to generate runtime checks for
unknown dependencies.
Note that in most cases we do not attempt to generate runtime checks for
unknown dependences, except if FoundNonConstantDistanceDependence is
true.
Fixes https://github.com/llvm/llvm-project/issues/69744.
For `{{regex}}` we don't really need a capturing group, and only add it
to properly handle cases like `{{foo|bar}}`. This is problematic,
because the use of capturing groups makes our regex implementation
slower (we have to go through the "dissect" stage, which can have
quadratic complexity).
Unfortunately, our regex implementation does not support non-capturing
groups like `(?:regex)`. So instead, avoid adding the group entirely if
the regex doesn't contain any alternations.
This causes a slight difference in escaping behavior, where previously
it was possible to write `{{{{}}` and get the same behavior as
`{{\{\{}}`. This will no longer work. I don't think this is a problem,
especially as we recently taught update_analyze_test_checks.py to emit
`{{\{\{}}`, so this shouldn't get introduced in any new tests.
For CodeGen/X86/vector-interleaved-store-i16-stride-7.ll (our slowest
X86 test) this drops FileCheck time from 6s to 5s (the remainder is
spent in a different regex issue). I expect similar speedups in other
tests using a lot of `{{}}`.
SCEV expressions may contain multiple {{ or }} in the debug output,
which needs escaping.
See
llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll
for a test that needs escaping.
Note that both loops in the tests are needed to incorrectly determine that
the loops are safe with runtime checks via FoundNonConstantDistanceDependence
handling code in LAA.
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
In commit 5a9a02f67b scalar evolution got support for
computing SCEV:s for (ashr(add(shl(x, n), c), m)) constructs. The code
however used APInt::getZExtValue without first checking that the APInt
would fit inside an uint64_t. When for example using 128-bit types we
ended up in assertion failures (or maybe miscompiles in non-assert
builds).
This patch simply avoid converting from APInt to uint64_t when creating
the truncated constant. We can just truncate the APInt instead.
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.
Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
zext nneg was recently added to the IR in #67982. This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.
The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice. For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways. The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR. Those are exactly the cases where
using zext nneg are most useful. We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.
Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.
update_analyze_test_checks.py will now insert check lines for
empty lines, which means that all the existing test coverage will
have a spurious change to check for the newline after "Predicates:".
I don't think we actually want to have that newline, so drop it
before it gets into more test coverage.
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.
There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
Currently the loop access analysis classifies this loop as unsafe to
vectorize because the memory dependencies are
'ForwardButPreventsForwarding'. However, the access pattern is
'write-after-read' with no subsequent read accessing the written memory
locations. I can't see how store-to-load forwarding is applicable here.
void vectorizable_Read_Write(int *A) {
for (unsigned i = 1022; i >= 0; i--)
A[i+1] = A[i] + 1;
}
update_analyze_test_checks.py is an invaluable tool in updating tests.
Unfortunately, it only supports output from the CostModel,
ScalarEvolution, and LoopVectorize analyses. Many LoopAccessAnalysis
tests use hand-crafted CHECK lines, and it is moreover tedious to
generate these CHECK lines, as the output fom the analysis is not
stable, and requires the test-writer to hand-craft FileCheck matches.
Alleviate this pain, and support output from:
$ opt -passes='print<loop-accesses>'
This patch includes several non-trivial changes including:
- Preserving whitespace at the beginning of the line, so that the LAA
output can be properly indented.
- Regexes matching the unstable output, which is basically a pointer
address hex.
- Separating is_analyze from preserve_names clearly, as the former was
formerly used as an overload for the latter.
To demonstate the utility of this patch, several tests in
LoopAccessAnalysis have been auto-generated by
update_analyze_test_checks.py.