Commit Graph

1604 Commits

Author SHA1 Message Date
Shih-Po Hung
3d985a6f1b [RISCV][TTI] Scale the cost of Select with LMUL (#88098)
Use the Val type to estimate the instruction cost for SelectInst.
2024-04-10 14:18:15 +08:00
Shih-Po Hung
ee52add6cb [RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931)
This patch uses the argument type to infer the LMUL cost for the index
generation, add, and comparison.
2024-04-10 10:08:33 +08:00
David Green
f0e79d9152 [AArch64] Add a cost for identity shuffles.
These are mostly handled at a higher level when costing shuffles, but some
masks can end up being identity or concat masks which we can treat as free.
2024-04-09 17:16:14 +01:00
David Green
4ac2721e51 [AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
Simon Pilgrim
3bfd5c6424 [TTI] getCommonMaskedMemoryOpCost - consistently use getScalarizationOverhead instead of ExtractElement costs for address/mask extraction. (#87771)
These aren't unknown extraction indices, we will be extracting every address/mask element in sequence.
2024-04-09 15:42:51 +01:00
David Green
0bfea40101 [AArch64] More shuffle-store test cases. NFC 2024-04-08 09:19:47 +01:00
David Green
d57d094779 [AArch64] Add test for LD2/LD3/LD4 shuffle cost models. NFC 2024-04-07 18:18:32 +01:00
David Green
e4169f79ef [AArch64] Add extra zip and uzp shuffle cost tests. NFC 2024-04-05 19:33:22 +01:00
Simon Pilgrim
58187fad93 [CostModel][X86] Update masked load/store/gather/scatter tests to explicitly use variable masks
Using <X x i1> undef masks means they are treated as constants, which underestimates the scalar costs as it assumes that the masks/branches will fold away.
2024-04-05 11:15:46 +01:00
Simon Pilgrim
53fe94a0ce [CostModel][X86] Add costkinds test coverage for masked load/store/gather/scatter
Noticed while starting triage for #87640
2024-04-04 19:13:17 +01:00
Simon Pilgrim
ed41249498 [CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case llvm-mca numbers
We were using raw instruction count which overestimated the costs for #67803
2024-04-04 17:17:55 +01:00
Simon Pilgrim
3871eaba6b [CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case llvm-mca numbers
We were using raw instruction count which overestimated the costs for #67803
2024-04-04 12:26:35 +01:00
Shih-Po Hung
97523e5321 [RISCV][TTI] Scale the cost of intrinsic stepvector with LMUL (#87301)
Use the return type to measure the LMUL size for latency/throughput cost
2024-04-04 08:30:15 +08:00
Kevin P. Neal
9c9f94063c [FPEnv][CostModel] Correct strictfp test.
Correct strictfp tests to follow the rules documented in the LangRef:
https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics

These tests needed the strictfp attribute added to some function
definitions.

Test changes verified with D146845.
2024-04-02 13:53:56 -04:00
Shih-Po Hung
d7a43a00fe [RISCV][TTI] Scale the cost of trunc/fptrunc/fpext with LMUL (#87101)
Use the destination data type to measure the LMUL size for
latency/throughput cost
2024-04-02 09:30:51 +08:00
Shih-Po Hung
84f24c2daf [RISCV][TTI] Scale the cost of intrinsic umin/umax/smin/smax with LMUL (#87245)
Use the return type to measure the LMUL size for throughput/latency cost
2024-04-02 09:26:27 +08:00
Shih-Po Hung
c7954ca312 Recommit "[RISCV] Refine cost on Min/Max reduction (#79402)" (#86480)
This is recommitted as the test and fix for
llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697
2024-04-01 14:44:10 +08:00
Vitaly Buka
1e442ac4c3 [CostModel] No cost for llvm.allow.{runtime,ubsan}.check() (#86064)
These intrinsics will not be lowered to code.

RFC:
https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-03-31 22:27:57 -07:00
ShihPo Hung
aa2d5d5413 Recommit "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)"
Changes in Recommit:
  Add an additional check on sign/zero extend to the same type.

Original message:
  Use the destination data type to measure the LMUL size for
  latency/throughput cost
2024-03-26 23:41:16 -07:00
ShihPo Hung
da3e58e74a Revert "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)"
This reverts commit 7545c63572 as it's
failing on the Linux bots.
2024-03-26 21:47:32 -07:00
Shih-Po Hung
7545c63572 [RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617)
Use the destination data type to measure the LMUL size for
latency/throughput cost
2024-03-27 10:58:17 +08:00
Shih-Po Hung
3cb024198f [RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum (#80697)
The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs
if any element of the vector is a NaN.
Following #79402, the patch adds the cost for NaN check (vmfne + vcpop)
2024-03-25 17:17:36 +08:00
Philip Reames
35db929b50 [RISCV] Add cost model coverage for fixed vector insert with known VLEN 2024-03-13 15:21:37 -07:00
Dominik Steenken
718962f53b [SystemZ] Provide improved cost estimates (#83873)
This commit provides better cost estimates for
the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all
vector lengths and integer types up to i128. For integer types larger
than i128, we fall back to the default cost estimate.

This has the effect of lowering the estimated costs of most common
instances of the intrinsic. The expected performance impact of this is
minimal with a tendency to slightly improve performance of some
benchmarks.

This commit also provides a test to check the proper computation of the
new estimates, as well as the fallback for types larger than i128.
2024-03-11 10:40:59 +01:00
Simon Pilgrim
55304d0d90 [CostModel] getInstructionCost - improve estimation of costs for length changing shuffles (#84156)
Fix gap in the cost estimation for length changing shuffles, by adjusting the shuffle mask and either widening the shuffle inputs or extracting the lower elements of the result.

A small step towards moving some of this implementation inside improveShuffleKindFromMask and/or target getShuffleCost handlers (and reduce the diffs in cost estimation depending on whether coming from a ShuffleVectorInst or the raw operands / mask components)
2024-03-07 10:46:27 +00:00
Simon Pilgrim
3b84b6f176 [CostModel][X86] Add test coverage for 'concat subvector' style shuffles
Shows 2 major issues:
 - SSE should be free as it splits everything to 128-bit
 - Negative costs for 128 -> 512 concat shuffles
2024-03-05 16:21:10 +00:00
Graham Hunter
03f852f704 [AArch64] Improve cost model for legal subvec insert/extract (#81135)
Currently we model subvector inserts and extracts as shuffles,
potentially going as far as scalarizing. If the types are legal then
they can just be simple zip/unzip operations, or possible even no-ops.
Change the cost to a relatively small one to ensure that simple loops
featuring such operations between fixed and scalable vector types that
are effectively the same at a given sve width can be unrolled and
further optimized.
2024-03-04 16:17:01 +00:00
Chen Zheng
8d1046ae49 [PowerPC] adjust cost for extract i64 from vector on P9 and above (#82963)
https://godbolt.org/z/Ma347Tx1W
2024-03-04 09:37:11 +08:00
David Majnemer
3dd6750027 [AArch64] Add more complete support for BF16
We can use a small amount of integer arithmetic to round FP32 to BF16
and extend BF16 to FP32.

While a number of operations still require promotion, this can be
reduced for some rather simple operations like abs, copysign, fneg but
these can be done in a follow-up.

A few neat optimizations are implemented:
- round-inexact-to-odd is used for F64 to BF16 rounding.
- quieting signaling NaNs for f32 -> bf16 tries to detect if a prior
  operation makes it unnecessary.
2024-03-03 22:39:50 +00:00
Shih-Po Hung
fb67dce1cb [RISCV] Fix crash when unrolling loop containing vector instructions (#83384)
When MVT is not a vector type, TCK_CodeSize should return an invalid
cost. This patch adds a check in the beginning to make sure all cost
kinds return invalid costs consistently.

Before this patch, TCK_CodeSize returns a valid cost on scalar MVT but
other cost kinds doesn't.

This fixes the issue #83294 where a loop contains vector instructions
and MVT is scalar after type legalization when the vector extension is
not enabled,
2024-03-02 12:33:55 +08:00
Shih-Po Hung
6ee9c8afbc [RISCV][CostModel] Updates reduction and shuffle cost (#77342)
- Make `andi` cost 1 in SK_Broadcast
- Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL
2024-02-29 15:41:19 +08:00
Paul Walker
900bea9b1c [LLVM][test] Convert remaining instances of ConstantExpr based splats to use splat().
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
2024-02-27 13:37:23 +00:00
Paschalis Mpeis
bbdc62e718 [AArch64][CostModel] Improve scalar frem cost (#80423)
In AArch64 the cost of scalar frem is the cost of a call to 'fmod'.
2024-02-23 09:29:45 +00:00
Simon Pilgrim
9978f6a10f [CostModel][X86] Reduce the extra costs for ICMP complex predicates when an operand is constant
In most cases, SETCC lowering will be able to simplify/commute the comparison by adjusting the constant.

TODO: We still need to adjust ExtraCost based on CostKind

Fixes #80122
2024-02-21 16:19:39 +00:00
Simon Pilgrim
4beb4d5c72 [CostModel][X86] Add test coverage for icmp vs zero
This is really to test for icmp vs constant - some icmp unsigned could fold to simpler comparisons, but costmodel analysis won't do this
2024-02-21 16:19:39 +00:00
Philip Reames
f037e709ca [RISCV][TTI] Cost a subvector extract at a register boundary with exact vlen (#82405)
If we have exact vlen knowledge, we can figure out which indices
correspond to register boundaries. Our lowering uses this knowledge to
replace the vslidedown.vi with a sub-register extract. Our costs can
reflect that as well.

This is another piece split off
https://github.com/llvm/llvm-project/pull/80164

---------

Co-authored-by: Luke Lau <luke_lau@icloud.com>
2024-02-21 07:56:08 -08:00
Simon Pilgrim
a0869b14cd [CostModel][X86] Fix expanded CTPOP i8 costs
Updated to match #79989 / 9410019ac9
2024-02-21 14:54:50 +00:00
Simon Pilgrim
6ba8ca8c16 [CostModel][X86] Don't use undef for icmp cost tests
Cleanup prior to #80122 fix - using undef means we think that the comparison is with a Constant
2024-02-21 14:54:50 +00:00
Graham Hunter
ad78e210bd [NFC][AArch64] Tests for guarding unrolling with scalable vec ins/ext (#81132) 2024-02-19 09:47:49 +00:00
Chen Zheng
80f3bb4cf2 [PowerPC] adjust cost for vector insert/extract with non const index (#79092)
P9 has vxform `Vector Extract Element Instructions` like `vextuwrx` and
P10 has vxform `Vector Insert Element instructions` like `vinsd`. Update
the instruction cost reflecting these instructions.

Fixes https://github.com/llvm/llvm-project/issues/50249
2024-02-19 09:57:49 +08:00
Philip Reames
2549c24142 Reapply "[RISCV][TTI] Extract subvector at index zero is free (#81751)"
This reverts commit 834d11c215 which was
a revert of my 3a626937b1.  I had failed
to rebase after new tests added overnight by
fc0b67e1d7.

Original commit message follows:

Extracing a subvector at index zero corresponds to a type conversion and
possibly a subregister operation. We will not emit a vslidedown. As such,
they are free.

As an aside, it looks like we're not passing an index in for cases where
the subvec type is scalable. For at least index zero, we probably should be.

Revert "Revert "[RISCV][TTI] Extract subvector at index zero is free (#81751)""
2024-02-15 16:51:15 -08:00
Craig Topper
834d11c215 Revert "[RISCV][TTI] Extract subvector at index zero is free (#81751)"
This reverts commit 3a626937b1.

Causes tests added by fc0b67e1d7 to fail.
2024-02-15 12:51:23 -08:00
Philip Reames
3a626937b1 [RISCV][TTI] Extract subvector at index zero is free (#81751)
Extracing a subvector at index zero corresponds to a type conversion and
possibly a subregister operation. We will not emit a vslidedown. As
such, they are free.

As an aside, it looks like we're not passing an index in for cases where
the subvec type is scalable. For at least index zero, we probably should
be.
2024-02-15 07:43:50 -08:00
Luke Lau
fc0b67e1d7 [RISCV] Add cost model tests for llvm.vector.{insert,extract}. NFC
For llvm.vector.extract, this tests combinations of inserting at a zero and
non-zero index, and extracting from a fixed or scalable vector.

For llvm.vector.insert, this tests the same combinations as extracts but with
an additional configuration for an undef vector. This is because we can use a
subregister insert if the index is 0 and the vector is undef, which should be
free.
2024-02-15 12:17:27 +08:00
Fangrui Song
3d18c8cd26 [test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64
Similar to d39b4ce3ce
Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and
warned by Clang Driver. We want to avoid them elsewhere as well. Just
use the common "aarch64" without other triple components.
2024-02-12 18:29:55 -08:00
Alexey Bataev
7bc079c852 [TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source permute. If there are no costs
estimation for extractsubvector, better to switchto single source
permute for better cost estimation.

Reviewers: RKSimon, davemgreen, arsenm

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/79837
2024-02-12 07:09:49 -05:00
Jeremy Morse
66d4fe97d8 [DebugInfo][RemoveDIs] Final final test-maintenence patch (#80988)
This should be the final portion of shaping-up the test suite to be
ready for turning on non-intrinsic debug-info:
* Pin CostModel tests that expect to see intrinsics in their -debug
output to not use RemoveDIs. This is a spurious test output difference.
* Add 'tail' to a bunch of intrinsics in UpdateTestChecks. We're
cannonicalising intrinsics to be printed with "tail" in RemoveDI
conversion as dbg.values usually pick that up while being optimised.
This is another spurious output difference.
* The "DebugInfoDrop" pass used in the debugify unit-tests happens to
operate inside the pass manager, thus it sees non-intrinsic debug-info.
Update it to correctly drop it.
2024-02-07 14:31:52 +00:00
Shih-Po Hung
a826a0c234 [RISCV] Add tests for reduce.fmaximum/fminimum. NFC (#80553)
This is to add test coverage for crash report in #80340
2024-02-05 21:41:24 +08:00
Nikita Popov
1aee1e1f4c [Analysis] Convert tests to opaque pointers (NFC) 2024-02-05 12:04:39 +01:00
Philip Reames
b78b264518 [TTI] Add costing for vp.strided.load and vp.strided.store (#80360)
The primary motivation of this patch is to add testing infrastructure
atop the recently landed 8ad14b6d90, so
that we can separate the costing aspects of strided memory operations
from the SLP implementation details.

I want to be clear that I am *not* proposing that we use the
vp.strided.* forms as our canonical IR representation. I'm merely using
them as a testing vehicle to exercise the costing machinery. The
canonical IR form remains a masked.gather or masked.scatter. I do want
to explore adding a non-vp strided load/store intrinsic, but that's a
separate line of work.

There is one costing change included in this. As I wrote my test, I
discovered that the default implementation was scalarized (if invoked
via generic routines such as getInstructionCost), and when adding the
call into the strided specific costing discovered that we hadn't modeled
the fallback to scalarization properly in the initial patch. After
fixing that, there is a minor difference in scalarization cost reported
for the unaligned case but I believe that to be uninteresting.

For the record, I did confirm that vp.strided.store is lowered to a
strided store on RISCV. :)
2024-02-02 09:17:07 -08:00