Commit Graph

4173 Commits

Author SHA1 Message Date
Simon Pilgrim
3736e1d1cd [SCEV] Ensure shift amount is in range before calling getZExtValue()
Fixes #76234
2023-12-22 14:16:54 +00:00
Nikita Popov
0df3200931 [ValueTracking] Fix KnownBits conflict for poison-only vector
If all the demanded elements are poison, return unknown instead of
conflict to avoid downstream assertions.

Fixes https://github.com/llvm/llvm-project/issues/75505.
2023-12-21 09:23:47 +01:00
bipmis
64987c648f [ValueTracking] isNonZero sub of ptr2int's with recursive GEP (#68680)
When the sub arguments are ptr2int it is not possible to determine
computeKnownBits() of its arguments.
For scalar case generally sub of 2 ptr2int are converted to sub of
indexes.
However a loop with recursive GEP/PHI where the arguments to sub is of
type ptr2int, if it is possible to determine that a sub of this GEP and
another pointer with the same base is KnownNonZero we can return this.
This helps subsequent passes to optimize the loop further.
2023-12-20 14:11:58 +00:00
Eric Biggers
09058654f6 [RISCV] Remove experimental from Vector Crypto extensions (#74213)
The RISC-V vector crypto extensions have been ratified. This patch
updates the Clang and LLVM support for these extensions to be
non-experimental, while leaving the C intrinsics as experimental since
the C intrinsics are not yet standardized.

Co-authored-by: Brandon Wu <brandon.wu@sifive.com>
2023-12-18 22:04:22 -08:00
Nikita Popov
337504683e [ValueTracking] Use isKnownNonEqual() in isNonZeroSub()
(x - y) != 0 is true iff x != y, so use the isKnownNonEqual()
helper, which knows some additional tricks.
2023-12-18 12:26:40 +01:00
Nikita Popov
7c1d8c74e8 [ValueTracking] Add test for non-zero sub via known non equal (NFC) 2023-12-18 12:26:40 +01:00
bipmis
6df6320374 [ValueTracking] isNonEqual Pointers with with a recursive GEP (#70459)
Handles canonical icmp eq(ptr1, ptr2) -> where ptr1/ptr2 is a recursive
GEP.
Can helps scenarios where InstCombineCompares folds icmp eq(sub(ptr2int,
ptr2int), 0) -> icmp eq(ptr1, ptr2)
and
icmp eq(phi(sub(ptr2int, ptr2int), ...)) -> phi i1 (icmp eq(sub(ptr2int,
ptr2int), 0), ....)
2023-12-15 10:02:57 +00:00
Mariusz Sikora
966416b9e8 [AMDGPU][GFX12] Add new v_permlane16 variants (#75475) 2023-12-15 10:14:38 +01:00
David Green
7433120137 [CostModel] Mark ssa_copy as free (#75294)
These are intrinsics are only used ephemerally and be should be given a
zero cost.
2023-12-13 11:24:47 +00:00
David Green
b003fed283 [CostModel] Add some ssa.copy costmodel tests. NFC 2023-12-13 07:26:17 +00:00
Nikita Popov
90d82412ea [SCEV] Use loop guards when checking that RHS >= Start (#75039)
Loop guards tend to provide better results when it comes to reasoning
about ranges than isLoopEntryGuardedByCond(). See the test change for
the motivating case.

I have retained both the loop guard check and the implied cond based
check for now, though the latter only seems to impact a single test and
only via side effects (nowrap flag calculation) at that.
2023-12-12 09:41:54 +01:00
Nikita Popov
dbee36c523 [SCEV] Add test for unnecessary umax in BECount (NFC) 2023-12-11 12:12:34 +01:00
Florian Hahn
184290e579 [LAA] Add tests with dependencies may preventing st-to-ld forwarding.
Add test cases with varying distances between stores and loads that may
prevent store-to-load forwarding.
2023-12-10 13:56:53 +00:00
Florian Hahn
cd4067af36 [LAA] Remove duplicated test.
depend_diff_types.ll already covers the same tests afer it hs been
converted to opaque pointersj, so remove the redundant
depend_diff_types_opaque_ptr.ll
2023-12-09 21:27:42 +00:00
Nikita Popov
cf47af493b [InstCombine] Generalize folds for inversion of icmp operands (#74317)
We have a bunch of folds that basically perform X pred Y to ~Y pred ~X
for various special cases where this saves an instruction.

Generalize these folds to use isFreeToInvert(). We have to make sure
that we consume an instruction in either of the inversions, otherwise
we're just going to swap the icmp back and forth.

Fixes https://github.com/llvm/llvm-project/issues/74302.
2023-12-08 11:25:41 +01:00
Teresa Johnson
88fbc4d3df [ThinLTO] Add tail call flag to call edges in summary (#74043)
This adds support for a HasTailCall flag on function call edges in the
ThinLTO summary. It is intended for use in aiding discovery of missing
frames from tail calls in profiled call stacks for MemProf of profiled
binaries that did not disable tail call elimination. A follow on change
will add the use of this new flag during MemProf context disambiguation.

The new flag is encoded in the bitcode along with either the hotness
flag from the profile, or the relative block frequency under the
-write-relbf-to-summary flag when there is no profile data.
Because we now will always have some additional call edge information, I
have removed the non-profile function summary record format, and we
simply encode the tail call flag along with a hotness type of none when
there is no profile information or relative block frequency. The change
of record format and name caused most of the test case changes.

I have added explicit testing of generation of the new tail call flag
into the bitcode and IR assembly format as part of the changes to
llvm/test/Bitcode/thinlto-function-summary-refgraph.ll. I have also
added round trip testing through assembly and bitcode to
llvm/test/Assembler/thinlto-summary.ll.
2023-12-06 08:41:44 -08:00
Nikita Popov
ff0e4fb89a [SCEV] Use or disjoint flag (#74467)
Use the disjoint flag to convert or to add instead of calling the
haveNoCommonBitsSet() ValueTracking query. This ensures that we can
reliably undo add -> or canonicalization, even in cases where the
necessary information has been lost or is too complex to reinfer in
SCEV.

I have updated the bulk of the test coverage to add the necessary
disjoint flags in advance.
2023-12-05 17:01:46 +01:00
Alexandros Lamprineas
3ad6d1cbe5 [LAA] Fix incorrect dependency classification. (#70819)
As shown in #70473, the following loop was not considered safe to
vectorize. When determining the memory access dependencies in
a loop which has negative iteration step, we invert the source and
sink of the dependence. Perhaps we should just invert the operands
to getMinusSCEV(). This way the dependency is not regarded to be
true, since the users of the `IsWrite` variables, which correspond to
each of the memory accesses, rely on program order and therefore
should not be swapped.

void vectorizable_Read_Write(int *A) {
  for (unsigned i = 1022; i >= 0; i--)
    A[i+1] = A[i] + 1;
}
2023-12-05 15:27:30 +00:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Mircea Trofin
bb6497ffa6 [BPI] Reuse the AsmWriter's BB naming scheme in BranchProbabilityPrinterPass (#73593)
When using `BranchProbabilityPrinterPass`, if a BB has no name, we get pretty unusable information like `edge -> has probability...` (i.e. we have no idea what the vertices of that edge are).

This patch uses `printAsOperand`, which uses the same naming scheme as `Function::dump`, so for example during debugging sessions, the IR obtained from a function and the names used by `BranchProbabilityPrinterPass` will match.

A shortcoming is that `printAsOperand` will result in the numbering algorithm re-running for every edge and every vertex (when `BranchProbabilityPrinterPass` is run on a function). If, for the given scenario, this is a problem, we can revisit this subsequently.

Another nuance is that the entry basic block will be numbered, which may be slightly confusing when it's anonymous, but it's easily identifiable - the first edge would have it as source (and the number should be easily recognizable)
2023-12-02 13:01:48 -08:00
Allen
ab3fdbdfbe [ValueTracking] Support srem/urem for isKnownNonNullFromDominatingCondition (#74021)
Similar to div, the rem should also proof its second operand is
non-zero, otherwise it is a UB.

Fix https://github.com/llvm/llvm-project/issues/71782
2023-12-01 16:20:38 +08:00
Craig Topper
03d4a9d94d [InstCombine] Set disjoint flag when turning Add into Or. (#72702)
The disjoint flag was recently added to IR in #72583
2023-11-27 12:54:11 -08:00
Florian Hahn
17139f38e5 [LAA] Check HasSameSize before couldPreventStoreLoadForward.
After 9645267, TypeByteSize is 0 if both access do not have the same
size (i.e. HasSameSize will be false). This can cause an infinite loop
in couldPreventStoreLoadForward, if HasSameSize is not checked first.

So check HasSameSize first instead of after
couldPreventStoreLoadForward. Checking HasSameSize first is also
cheaper.
2023-11-27 10:10:41 +00:00
Florian Hahn
2fda8ca6da [LAA] Auto-generate checks for forward-loop-carried.ll
Auto-generate checks for -loop-carried.ll to make it easier to update in
follow-on patch. As this test only checks the dependence, mark pointers
as noalias to avoid also checking various runtime pointer check groups.
2023-11-27 10:06:17 +00:00
Aiden Grossman
5eb85c052e [JumpThreading] Remove LVI printer flag (#73426)
This patch removes the -print-lvi-after-jump-threading flag now that we
can print everything in the LVI cache using the print<lazy-value-info>
pass.
2023-11-27 00:19:23 -08:00
Aiden Grossman
5a74805bd6 [LVI] Add NewPM printer pass (#73425)
This patch adds a NewPM printer pass for the LazyValueAnalysis.
2023-11-26 12:20:49 -08:00
Nikita Popov
88f7dc17eb [SCEV] Regenerate test checks (NFC)
There have been some minor but pervasive changes to the generated
CHECK lines, so regenerate all of them, to minimize future diffs.
2023-11-24 15:49:28 +01:00
Matthias Braun
331111277a Support BranchProbabilityInfo in update_analyze_test_checks.py (#72943)
- Change `BranchProbabilityPrinterPass` output to match expectations of `update_analyze_test_checks.py`.
- Add `Branch Probability Analysis` to list of supported analyses.
- Process `llvm/test/Analysis/BranchProbabilityInfo/basic.ll` with `update_analyze_test_checks.py` as proof of concept. Leaving the other tests unchanged to reduce the amount of churn.
2023-11-21 17:08:44 -08:00
Florian Hahn
2d39cb4983 [BasicAA] Don't use MinAbsVarIndex = 1. (#72993)
The current code incorrectly assumed that the absolute variable index
needs to be at least 1, if the variable is != 0. This is incorrect, in
case multiplying with Scale wraps.

The code below already checks for wrapping properly, so just remove the
incorrect assignment.

Fixes https://github.com/llvm/llvm-project/issues/72831.
2023-11-21 14:27:50 +00:00
Florian Hahn
ad86d3e94f [BasicAA] Add wrapping test for #72831.
Add test with GEP where the index may wrap.
2023-11-21 13:38:57 +00:00
Aiden Grossman
523c0d3e49 [MemorySSA] Update test to use NewPM (#72915)
This test is the last holdout that still uses the legacy loop simplify
CFG pass. The issues originally pointed out in the test comments seem to
have been fixed now as there are no MemorySSA verification failures.
2023-11-20 14:45:01 -08:00
Florian Hahn
5d353423c9 [LAA] Add extra test for #70819 showing incorrect Forward dep.
Add an additional test case where we currently incorrectly identify a
dependence as Foward instead of ForwardButPreventsForwarding.

Also cleans up the names in the tests a bit to improve readability.
2023-11-20 11:18:13 +00:00
Simon Pilgrim
761a963dfc [DAG] narrowExtractedVectorBinOp - ensure we limit late node creation to LegalOperations only (#72130)
Avoids infinite issues in some upcoming patches to help D152928 - x86 sees a number of regressions that are addressed by extending SimplifyDemandedVectorEltsForTargetNode to cover more binop opcodes
2023-11-20 10:56:41 +00:00
Noah Goldstein
f112e4693a [InstCombine] Don't transform sub X, ~Y -> add X, -Y unless Y is actually negatable
This combine was previously adding instruction in some cases (see the
tests).

Closes #72767
2023-11-19 12:15:03 -06:00
Florian Hahn
1dbcaf2777 [LAA] Check if dependencies access loop-varying underlying objects.
This patch adds a new dependence kind UnsafeIndirect, for cases where at
least one of the memory access instructions may access a loop varying object,
e.g. the address of underlying object is loaded inside the loop, like A[B[i]].
We cannot determine direction or distance in those cases, and also are unable
to generate any runtime checks.

This fixes a miscompile, if we attempt to generate runtime checks for
unknown dependencies.

Note that in most cases we do not attempt to generate runtime checks for
unknown dependences, except if FoundNonConstantDistanceDependence is
true.

Fixes https://github.com/llvm/llvm-project/issues/69744.
2023-11-15 21:58:57 +00:00
Acim-Maravic
f3138524db [AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-14 18:49:21 +01:00
Nikita Popov
a3eeef82da [FileCheck] Avoid capturing group for {{regex}} (#72136)
For `{{regex}}` we don't really need a capturing group, and only add it
to properly handle cases like `{{foo|bar}}`. This is problematic,
because the use of capturing groups makes our regex implementation
slower (we have to go through the "dissect" stage, which can have
quadratic complexity).

Unfortunately, our regex implementation does not support non-capturing
groups like `(?:regex)`. So instead, avoid adding the group entirely if
the regex doesn't contain any alternations.

This causes a slight difference in escaping behavior, where previously
it was possible to write `{{{{}}` and get the same behavior as
`{{\{\{}}`. This will no longer work. I don't think this is a problem,
especially as we recently taught update_analyze_test_checks.py to emit
`{{\{\{}}`, so this shouldn't get introduced in any new tests.

For CodeGen/X86/vector-interleaved-store-i16-stride-7.ll (our slowest
X86 test) this drops FileCheck time from 6s to 5s (the remainder is
spent in a different regex issue). I expect similar speedups in other
tests using a lot of `{{}}`.
2023-11-14 09:03:54 +01:00
Florian Hahn
c491c93365 [LAA] Refine tests added in 9c535a3c2e.
Refine FIXMEs in added tests, the problematic case only materializes if
there's either both a read and write from an indirect address.
2023-11-13 19:19:57 +00:00
Florian Hahn
24839c3253 [UTC] Escape multiple {{ or }} in input for check lines. (#71790)
SCEV expressions may contain multiple {{ or }} in the debug output,
which needs escaping.

See
llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll
for a test that needs escaping.
2023-11-09 17:18:11 +00:00
Florian Hahn
9c535a3c2e [LAA] Add tests for #69744.
Note that both loops in the tests are needed to incorrectly determine that
the loops are safe with runtime checks via FoundNonConstantDistanceDependence
handling code in LAA.
2023-11-09 09:59:48 +00:00
Jun Wang
54470176af [AMDGPU] Add inreg support for SGPR arguments (#67182)
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2023-11-08 11:35:52 -08:00
Björn Pettersson
8fc0aca5d1 [SCEV] Support larger than 64-bit types in ashr(add(shl(x, n), c), m) (#71600)
In commit 5a9a02f67b scalar evolution got support for
computing SCEV:s for (ashr(add(shl(x, n), c), m)) constructs. The code
however used APInt::getZExtValue without first checking that the APInt
would fit inside an uint64_t. When for example using 128-bit types we
ended up in assertion failures (or maybe miscompiles in non-assert
builds).
This patch simply avoid converting from APInt to uint64_t when creating
the truncated constant. We can just truncate the APInt instead.
2023-11-08 11:29:12 +01:00
Philip Reames
a7f35d54ee [SCEV] Extend isImpliedCondOperandsViaRanges to independent predicates (#71110)
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.

Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
2023-11-07 07:25:47 -08:00
Philip Reames
5adf6ab7ff Revert "[IndVars] Generate zext nneg when locally obvious"
This reverts commit a6c8e27b3a.  It appears likely to have caused https://lab.llvm.org/buildbot/#/builders/57/builds/30988.
2023-11-03 11:19:14 -07:00
Philip Reames
a6c8e27b3a [IndVars] Generate zext nneg when locally obvious
zext nneg was recently added to the IR in #67982.  This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.

The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice.  For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways.  The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR.  Those are exactly the cases where
using zext nneg are most useful.  We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.

Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.
2023-11-03 09:20:59 -07:00
Philip Reames
015c06ade0 Regenerate a couple scev/indvars tests [nfc]
Update to modern output to reduce spurious deltas in upcoming change.
2023-11-03 08:42:59 -07:00
Nikita Popov
a8ac6a9868 [SCEV] Remove newline after predicates in dump
update_analyze_test_checks.py will now insert check lines for
empty lines, which means that all the existing test coverage will
have a spurious change to check for the newline after "Predicates:".

I don't think we actually want to have that newline, so drop it
before it gets into more test coverage.
2023-11-03 15:43:30 +01:00
Nikita Popov
e4a4122eb6 [IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Alexandros Lamprineas
7d21d7395c [LAA] Add a test case to show incorrect dependency classification (NFC). (#70473)
Currently the loop access analysis classifies this loop as unsafe to
vectorize because the memory dependencies are
'ForwardButPreventsForwarding'. However, the access pattern is
'write-after-read' with no subsequent read accessing the written memory
locations. I can't see how store-to-load forwarding is applicable here.

void vectorizable_Read_Write(int *A) {
  for (unsigned i = 1022; i >= 0; i--)
    A[i+1] = A[i] + 1;
}
2023-10-31 15:01:28 +00:00
Ramkumar Ramachandra
4c01a58008 update_analyze_test_checks: support output from LAA (#67584)
update_analyze_test_checks.py is an invaluable tool in updating tests.
Unfortunately, it only supports output from the CostModel,
ScalarEvolution, and LoopVectorize analyses. Many LoopAccessAnalysis
tests use hand-crafted CHECK lines, and it is moreover tedious to
generate these CHECK lines, as the output fom the analysis is not
stable, and requires the test-writer to hand-craft FileCheck matches.
Alleviate this pain, and support output from:

  $ opt -passes='print<loop-accesses>'

This patch includes several non-trivial changes including:
- Preserving whitespace at the beginning of the line, so that the LAA
output can be properly indented.
- Regexes matching the unstable output, which is basically a pointer
address hex.
- Separating is_analyze from preserve_names clearly, as the former was
formerly used as an overload for the latter.

To demonstate the utility of this patch, several tests in
LoopAccessAnalysis have been auto-generated by
update_analyze_test_checks.py.
2023-10-31 14:33:53 +00:00