Commit Graph

4308 Commits

Author SHA1 Message Date
annamthomas
54a9f0007c [SCEV] Fix BinomialCoefficient Iteration to fit in W bits (#88010)
BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since 1b76120.
    
Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift.
    
 Fixes #87798
2024-04-10 09:02:23 -04:00
Shih-Po Hung
3d985a6f1b [RISCV][TTI] Scale the cost of Select with LMUL (#88098)
Use the Val type to estimate the instruction cost for SelectInst.
2024-04-10 14:18:15 +08:00
Shih-Po Hung
ee52add6cb [RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931)
This patch uses the argument type to infer the LMUL cost for the index
generation, add, and comparison.
2024-04-10 10:08:33 +08:00
David Green
f0e79d9152 [AArch64] Add a cost for identity shuffles.
These are mostly handled at a higher level when costing shuffles, but some
masks can end up being identity or concat masks which we can treat as free.
2024-04-09 17:16:14 +01:00
David Green
4ac2721e51 [AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
Simon Pilgrim
3bfd5c6424 [TTI] getCommonMaskedMemoryOpCost - consistently use getScalarizationOverhead instead of ExtractElement costs for address/mask extraction. (#87771)
These aren't unknown extraction indices, we will be extracting every address/mask element in sequence.
2024-04-09 15:42:51 +01:00
Florian Hahn
977c0a6d29 [LAA] Add tests with non-constant strides & distances.
Add a number of LAA test cases with both forward and backward
dependences with non-constant strides and dependence distances.

This includes test coverage for
https://github.com/llvm/llvm-project/issues/87336

Also includes a LoopLoadElimination test to make sure the pass does not
crash on non-constant dependence distances.
2024-04-08 19:18:38 +01:00
David Green
0bfea40101 [AArch64] More shuffle-store test cases. NFC 2024-04-08 09:19:47 +01:00
David Green
d57d094779 [AArch64] Add test for LD2/LD3/LD4 shuffle cost models. NFC 2024-04-07 18:18:32 +01:00
David Green
e4169f79ef [AArch64] Add extra zip and uzp shuffle cost tests. NFC 2024-04-05 19:33:22 +01:00
Simon Pilgrim
58187fad93 [CostModel][X86] Update masked load/store/gather/scatter tests to explicitly use variable masks
Using <X x i1> undef masks means they are treated as constants, which underestimates the scalar costs as it assumes that the masks/branches will fold away.
2024-04-05 11:15:46 +01:00
Simon Pilgrim
53fe94a0ce [CostModel][X86] Add costkinds test coverage for masked load/store/gather/scatter
Noticed while starting triage for #87640
2024-04-04 19:13:17 +01:00
Simon Pilgrim
ed41249498 [CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case llvm-mca numbers
We were using raw instruction count which overestimated the costs for #67803
2024-04-04 17:17:55 +01:00
Simon Pilgrim
3871eaba6b [CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case llvm-mca numbers
We were using raw instruction count which overestimated the costs for #67803
2024-04-04 12:26:35 +01:00
Shih-Po Hung
97523e5321 [RISCV][TTI] Scale the cost of intrinsic stepvector with LMUL (#87301)
Use the return type to measure the LMUL size for latency/throughput cost
2024-04-04 08:30:15 +08:00
Kevin P. Neal
9c9f94063c [FPEnv][CostModel] Correct strictfp test.
Correct strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

These tests needed the strictfp attribute added to some function
definitions.

Test changes verified with D146845.
2024-04-02 13:53:56 -04:00
Shih-Po Hung
d7a43a00fe [RISCV][TTI] Scale the cost of trunc/fptrunc/fpext with LMUL (#87101)
Use the destination data type to measure the LMUL size for
latency/throughput cost
2024-04-02 09:30:51 +08:00
Shih-Po Hung
84f24c2daf [RISCV][TTI] Scale the cost of intrinsic umin/umax/smin/smax with LMUL (#87245)
Use the return type to measure the LMUL size for throughput/latency cost
2024-04-02 09:26:27 +08:00
Shih-Po Hung
c7954ca312 Recommit "[RISCV] Refine cost on Min/Max reduction (#79402)" (#86480)
This is recommitted as the test and fix for
llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697
2024-04-01 14:44:10 +08:00
Vitaly Buka
37d6e5b7a5 [memoryssa] Exclude llvm.allow.{runtime,ubsan}.check() (#86066)
RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-03-31 22:50:02 -07:00
Vitaly Buka
0bc3781649 [Analysis] Exclude llvm.allow.{runtime,ubsan}.check() from AliasSetTracker (#86065)
RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-03-31 22:47:55 -07:00
Vitaly Buka
1e442ac4c3 [CostModel] No cost for llvm.allow.{runtime,ubsan}.check() (#86064)
These intrinsics will not be lowered to code.

RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-03-31 22:27:57 -07:00
ShihPo Hung
aa2d5d5413 Recommit "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)"
Changes in Recommit:
  Add an additional check on sign/zero extend to the same type.

Original message:
  Use the destination data type to measure the LMUL size for
  latency/throughput cost
2024-03-26 23:41:16 -07:00
ShihPo Hung
da3e58e74a Revert "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)"
This reverts commit 7545c63572 as it's
failing on the Linux bots.
2024-03-26 21:47:32 -07:00
Shih-Po Hung
7545c63572 [RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)
Use the destination data type to measure the LMUL size for
latency/throughput cost
2024-03-27 10:58:17 +08:00
Changpeng Fang
350bda4419 AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313)
Rename the intrinsics to close to the instruction mnemonic names:
Use global_load_tr_b64 and global_load_tr_b128 instead of
global_load_tr.

This patch also removes f16/bf16 versions of builtins/intrinsics. To
simplify the design, we should avoid enumerating all possible types in
implementing builtins. We can always use bitcast.
2024-03-25 16:55:22 -07:00
Shih-Po Hung
3cb024198f [RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum (#80697)
The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs
if any element of the vector is a NaN.
Following #79402, the patch adds the cost for NaN check (vmfne + vcpop)
2024-03-25 17:17:36 +08:00
Vitaly Buka
5c95484061 [Analysis] Use implicit-check-not in test 2024-03-20 19:52:39 -07:00
Andreas Jonson
e66cfebb04 [ValueTracking] Handle range attributes (#85143)
Handle the range attribute in ValueTracking.
2024-03-20 12:43:00 +01:00
Nikita Popov
6872a64652 [ValueTracking] Handle vector range metadata in isKnownNonZero()
Nowadays !range can be placed on instructions with vector of int
return value. Support this case in isKnownNonZero().
2024-03-19 15:50:13 +01:00
Nikita Popov
0e76818672 [ValueTracking] Test isKnownNonZero() range metadata with vector (NFC) 2024-03-19 15:50:13 +01:00
Philip Reames
35db929b50 [RISCV] Add cost model coverage for fixed vector insert with known VLEN 2024-03-13 15:21:37 -07:00
Noah Goldstein
744a23f24b [ValueTracking] Use select condition to help infer bits of arms
If we have something like `(select (icmp ult x, 8), x, y)`, we can use
the `(icmp ult x, 8)` to help compute the knownbits of `x`.

Closes #84699
2024-03-13 14:27:05 -05:00
Noah Goldstein
882992a951 [ValueTracking] Add tests for inferring select arm bits from condition; NFC 2024-03-13 14:27:05 -05:00
mikaelholmen
2d62ce4beb [ValueTracking] Remove faulty dereference of "InsertBefore" (#85034)
In 2fe81edef6
 [NFC][RemoveDIs] Insert instruction using iterators in Transforms/
we changed
       if (*req_idx != *i)
         return FindInsertedValue(I->getAggregateOperand(), idx_range,
-                                 InsertBefore);
+                                 *InsertBefore);
     }
but there is no guarantee that is InsertBefore is non-empty at that
point,
which we e.g can see in the added testcase.

Instead just pass on the optional InsertBefore in the recursive call to
FindInsertedValue, as we do at several other places already.
2024-03-13 09:58:47 +01:00
Florian Hahn
a3ad5faa32 [LAA] Fix typo IndidrectUnsafe -> IndirectUnsafe.
Fix type in textual analysis output.
2024-03-12 14:44:04 +00:00
Florian Hahn
b274b23665 [ValueTracking] Treat phi as underlying obj when not decomposing further (#84339)
At the moment, getUnderlyingObjects simply continues for phis that do
not refer to the same underlying object in loops, without adding them to
the list of underlying objects, effectively ignoring those phis.

Instead of ignoring those phis, add them to the list of underlying
objects. This fixes a miscompile where LoopAccessAnalysis fails to
identify a memory dependence, because no underlying objects can be found
for a set of memory accesses.

Fixes https://github.com/llvm/llvm-project/issues/82665.

PR: https://github.com/llvm/llvm-project/pull/84339
2024-03-12 08:55:03 +00:00
Noah Goldstein
d81db0e5f5 [KnownBits] Implement knownbits lshr/ashr with exact flag
The exact flag basically allows us to set an upper bound on shift
amount when we have a known 1 in `LHS`.

Typically we deduce exact using knownbits (on non-exact incoming
shifts), so this is particularly impactful, but may be useful in some
circumstances.

Closes #84254
2024-03-11 15:51:07 -05:00
Noah Goldstein
f19d9e1617 [KnownBits] Add test for computing more information for lshr/ashr with exact flag; NFC 2024-03-11 15:51:06 -05:00
Dominik Steenken
718962f53b [SystemZ] Provide improved cost estimates (#83873)
This commit provides better cost estimates for
the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all
vector lengths and integer types up to i128. For integer types larger
than i128, we fall back to the default cost estimate.

This has the effect of lowering the estimated costs of most common
instances of the intrinsic. The expected performance impact of this is
minimal with a tendency to slightly improve performance of some
benchmarks.

This commit also provides a test to check the proper computation of the
new estimates, as well as the fallback for types larger than i128.
2024-03-11 10:40:59 +01:00
Florian Hahn
4cfd4a7896 [LAA] Add test case for #82665.
Test case for https://github.com/llvm/llvm-project/issues/82665.
2024-03-07 13:53:03 +00:00
Simon Pilgrim
55304d0d90 [CostModel] getInstructionCost - improve estimation of costs for length changing shuffles (#84156)
Fix gap in the cost estimation for length changing shuffles, by adjusting the shuffle mask and either widening the shuffle inputs or extracting the lower elements of the result.

A small step towards moving some of this implementation inside improveShuffleKindFromMask and/or target getShuffleCost handlers (and reduce the diffs in cost estimation depending on whether coming from a ShuffleVectorInst or the raw operands / mask components)
2024-03-07 10:46:27 +00:00
Philip Reames
1a37147af5 [SCEV] Match both (-1)b + a and a + (-1)b as a - b (#84247)
In our analysis of guarding conditions, we were converting a-b == 0 into
a == b alternate form, but we were only checking for one of the two
forms for the sub. There's no requirement that the multiply only be on
the LHS of the add.
2024-03-06 15:57:34 -08:00
Philip Reames
5cd45e442e [SCEV] Precommit test for widened signed induction variables
These tests highlight that we have missed oppurtunities proving
trip count bounds when our start/end values are sign extended
from smaller types and we have either a loop guard to relate our
start vs end, or a nsw/nuw fact to bound end.
2024-03-06 14:09:40 -08:00
Philip Reames
0d38f21e4a [SCEV] Extend type hint in analysis output to all backedge kinds
This extends the work from 7755c26 to all of the different backend
taken count kinds that we print for the scev analysis printer.  As
before, the goal is to cut down on confusion as i4 -1 is a very
different (unsigned) value from i32 -1.
2024-03-06 13:08:05 -08:00
Philip Reames
e946b5a87b [SCEV] Autogenerate more scev analysis check tests 2024-03-06 12:42:19 -08:00
Philip Reames
8b5b294ec2 [SCEV] Print predicate backedge count only if new information available
When printing the result of SCEV's analysis, we can avoid printing
the predicated backedge taken count and the predicates if the predicates
are empty and no new information is provided.  This helps to reduce the
verbosity of the output.
2024-03-06 10:24:32 -08:00
Philip Reames
7755c26195 [SCEV] Include type when printing constant max backedge taken count
When printing the result of the analysis, i8 -1 and i64 -1 are quite
different in terms of analysis quality.  In a recent conversion with
a new contributor, we ran into exactly this confusion.

Adding the type for constant scevs more globally seems worthwhile, but
introduces a much larger test diff.  I'm splitting this off first since
it addresses the immediate need, and then going to do some further
changes to clarify a few related bits of analysis result output.
2024-03-06 08:48:25 -08:00
Philip Reames
987fe6fa50 [SCEV] Migrate a couple tests to be auto generated
A few notes:
* pr34538.ll has bitrotten.  The original test printed the analysis after transforms in some cases, but this appears to been lost during migration to new pass manager.  Remove the now redundant pass invocations and simplify the test setup.
2024-03-05 18:04:30 -08:00
Philip Reames
31c304ba7b [SCEV] Migrate some tests to be autogenerated
In advance of a change which needs to update these.  This batch was the
"easy" ones, I'll be landing the harder set a few a time for easier
review.
2024-03-05 17:41:58 -08:00