clang-p2996

Author	SHA1	Message	Date
Shih-Po Hung	3d985a6f1b	[RISCV][TTI] Scale the cost of Select with LMUL (#88098 ) Use the Val type to estimate the instruction cost for SelectInst.	2024-04-10 14:18:15 +08:00
Shih-Po Hung	ee52add6cb	[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931 ) This patch uses the argument type to infer the LMUL cost for the index generation, add, and comparison.	2024-04-10 10:08:33 +08:00
David Green	f0e79d9152	[AArch64] Add a cost for identity shuffles. These are mostly handled at a higher level when costing shuffles, but some masks can end up being identity or concat masks which we can treat as free.	2024-04-09 17:16:14 +01:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Simon Pilgrim	3bfd5c6424	[TTI] getCommonMaskedMemoryOpCost - consistently use getScalarizationOverhead instead of ExtractElement costs for address/mask extraction. (#87771 ) These aren't unknown extraction indices, we will be extracting every address/mask element in sequence.	2024-04-09 15:42:51 +01:00
David Green	0bfea40101	[AArch64] More shuffle-store test cases. NFC	2024-04-08 09:19:47 +01:00
David Green	d57d094779	[AArch64] Add test for LD2/LD3/LD4 shuffle cost models. NFC	2024-04-07 18:18:32 +01:00
David Green	e4169f79ef	[AArch64] Add extra zip and uzp shuffle cost tests. NFC	2024-04-05 19:33:22 +01:00
Simon Pilgrim	58187fad93	[CostModel][X86] Update masked load/store/gather/scatter tests to explicitly use variable masks Using <X x i1> undef masks means they are treated as constants, which underestimates the scalar costs as it assumes that the masks/branches will fold away.	2024-04-05 11:15:46 +01:00
Simon Pilgrim	53fe94a0ce	[CostModel][X86] Add costkinds test coverage for masked load/store/gather/scatter Noticed while starting triage for #87640	2024-04-04 19:13:17 +01:00
Simon Pilgrim	ed41249498	[CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 17:17:55 +01:00
Simon Pilgrim	3871eaba6b	[CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 12:26:35 +01:00
Shih-Po Hung	97523e5321	[RISCV][TTI] Scale the cost of intrinsic stepvector with LMUL (#87301 ) Use the return type to measure the LMUL size for latency/throughput cost	2024-04-04 08:30:15 +08:00
Kevin P. Neal	9c9f94063c	[FPEnv][CostModel] Correct strictfp test. Correct strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics These tests needed the strictfp attribute added to some function definitions. Test changes verified with D146845.	2024-04-02 13:53:56 -04:00
Shih-Po Hung	d7a43a00fe	[RISCV][TTI] Scale the cost of trunc/fptrunc/fpext with LMUL (#87101 ) Use the destination data type to measure the LMUL size for latency/throughput cost	2024-04-02 09:30:51 +08:00
Shih-Po Hung	84f24c2daf	[RISCV][TTI] Scale the cost of intrinsic umin/umax/smin/smax with LMUL (#87245 ) Use the return type to measure the LMUL size for throughput/latency cost	2024-04-02 09:26:27 +08:00
Shih-Po Hung	c7954ca312	Recommit "[RISCV] Refine cost on Min/Max reduction (#79402 )" (#86480 ) This is recommitted as the test and fix for llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697	2024-04-01 14:44:10 +08:00
Vitaly Buka	1e442ac4c3	[CostModel] No cost for llvm.allow.{runtime,ubsan}.check() (#86064 ) These intrinsics will not be lowered to code. RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641	2024-03-31 22:27:57 -07:00
ShihPo Hung	aa2d5d5413	Recommit "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 )" Changes in Recommit: Add an additional check on sign/zero extend to the same type. Original message: Use the destination data type to measure the LMUL size for latency/throughput cost	2024-03-26 23:41:16 -07:00
ShihPo Hung	da3e58e74a	Revert "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 )" This reverts commit `7545c63572` as it's failing on the Linux bots.	2024-03-26 21:47:32 -07:00
Shih-Po Hung	7545c63572	[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 ) Use the destination data type to measure the LMUL size for latency/throughput cost	2024-03-27 10:58:17 +08:00
Shih-Po Hung	3cb024198f	[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum (#80697 ) The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs if any element of the vector is a NaN. Following #79402, the patch adds the cost for NaN check (vmfne + vcpop)	2024-03-25 17:17:36 +08:00
Philip Reames	35db929b50	[RISCV] Add cost model coverage for fixed vector insert with known VLEN	2024-03-13 15:21:37 -07:00
Dominik Steenken	718962f53b	[SystemZ] Provide improved cost estimates (#83873 ) This commit provides better cost estimates for the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all vector lengths and integer types up to i128. For integer types larger than i128, we fall back to the default cost estimate. This has the effect of lowering the estimated costs of most common instances of the intrinsic. The expected performance impact of this is minimal with a tendency to slightly improve performance of some benchmarks. This commit also provides a test to check the proper computation of the new estimates, as well as the fallback for types larger than i128.	2024-03-11 10:40:59 +01:00
Simon Pilgrim	55304d0d90	[CostModel] getInstructionCost - improve estimation of costs for length changing shuffles (#84156 ) Fix gap in the cost estimation for length changing shuffles, by adjusting the shuffle mask and either widening the shuffle inputs or extracting the lower elements of the result. A small step towards moving some of this implementation inside improveShuffleKindFromMask and/or target getShuffleCost handlers (and reduce the diffs in cost estimation depending on whether coming from a ShuffleVectorInst or the raw operands / mask components)	2024-03-07 10:46:27 +00:00
Simon Pilgrim	3b84b6f176	[CostModel][X86] Add test coverage for 'concat subvector' style shuffles Shows 2 major issues: - SSE should be free as it splits everything to 128-bit - Negative costs for 128 -> 512 concat shuffles	2024-03-05 16:21:10 +00:00
Graham Hunter	03f852f704	[AArch64] Improve cost model for legal subvec insert/extract (#81135 ) Currently we model subvector inserts and extracts as shuffles, potentially going as far as scalarizing. If the types are legal then they can just be simple zip/unzip operations, or possible even no-ops. Change the cost to a relatively small one to ensure that simple loops featuring such operations between fixed and scalable vector types that are effectively the same at a given sve width can be unrolled and further optimized.	2024-03-04 16:17:01 +00:00
Chen Zheng	8d1046ae49	[PowerPC] adjust cost for extract i64 from vector on P9 and above (#82963 ) https://godbolt.org/z/Ma347Tx1W	2024-03-04 09:37:11 +08:00
David Majnemer	3dd6750027	[AArch64] Add more complete support for BF16 We can use a small amount of integer arithmetic to round FP32 to BF16 and extend BF16 to FP32. While a number of operations still require promotion, this can be reduced for some rather simple operations like abs, copysign, fneg but these can be done in a follow-up. A few neat optimizations are implemented: - round-inexact-to-odd is used for F64 to BF16 rounding. - quieting signaling NaNs for f32 -> bf16 tries to detect if a prior operation makes it unnecessary.	2024-03-03 22:39:50 +00:00
Shih-Po Hung	fb67dce1cb	[RISCV] Fix crash when unrolling loop containing vector instructions (#83384 ) When MVT is not a vector type, TCK_CodeSize should return an invalid cost. This patch adds a check in the beginning to make sure all cost kinds return invalid costs consistently. Before this patch, TCK_CodeSize returns a valid cost on scalar MVT but other cost kinds doesn't. This fixes the issue #83294 where a loop contains vector instructions and MVT is scalar after type legalization when the vector extension is not enabled,	2024-03-02 12:33:55 +08:00
Shih-Po Hung	6ee9c8afbc	[RISCV][CostModel] Updates reduction and shuffle cost (#77342 ) - Make `andi` cost 1 in SK_Broadcast - Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL	2024-02-29 15:41:19 +08:00
Paul Walker	900bea9b1c	[LLVM][test] Convert remaining instances of ConstantExpr based splats to use splat(). This is mostly NFC but some output does change due to consistently inserting into poison rather than undef and using i64 as the index type for inserts.	2024-02-27 13:37:23 +00:00
Paschalis Mpeis	bbdc62e718	[AArch64][CostModel] Improve scalar frem cost (#80423 ) In AArch64 the cost of scalar frem is the cost of a call to 'fmod'.	2024-02-23 09:29:45 +00:00
Simon Pilgrim	9978f6a10f	[CostModel][X86] Reduce the extra costs for ICMP complex predicates when an operand is constant In most cases, SETCC lowering will be able to simplify/commute the comparison by adjusting the constant. TODO: We still need to adjust ExtraCost based on CostKind Fixes #80122	2024-02-21 16:19:39 +00:00
Simon Pilgrim	4beb4d5c72	[CostModel][X86] Add test coverage for icmp vs zero This is really to test for icmp vs constant - some icmp unsigned could fold to simpler comparisons, but costmodel analysis won't do this	2024-02-21 16:19:39 +00:00
Philip Reames	f037e709ca	[RISCV][TTI] Cost a subvector extract at a register boundary with exact vlen (#82405 ) If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering uses this knowledge to replace the vslidedown.vi with a sub-register extract. Our costs can reflect that as well. This is another piece split off https://github.com/llvm/llvm-project/pull/80164 --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>	2024-02-21 07:56:08 -08:00
Simon Pilgrim	a0869b14cd	[CostModel][X86] Fix expanded CTPOP i8 costs Updated to match #79989 / `9410019ac9`	2024-02-21 14:54:50 +00:00
Simon Pilgrim	6ba8ca8c16	[CostModel][X86] Don't use undef for icmp cost tests Cleanup prior to #80122 fix - using undef means we think that the comparison is with a Constant	2024-02-21 14:54:50 +00:00
Graham Hunter	ad78e210bd	[NFC][AArch64] Tests for guarding unrolling with scalable vec ins/ext (#81132 )	2024-02-19 09:47:49 +00:00
Chen Zheng	80f3bb4cf2	[PowerPC] adjust cost for vector insert/extract with non const index (#79092 ) P9 has vxform `Vector Extract Element Instructions` like `vextuwrx` and P10 has vxform `Vector Insert Element instructions` like `vinsd`. Update the instruction cost reflecting these instructions. Fixes https://github.com/llvm/llvm-project/issues/50249	2024-02-19 09:57:49 +08:00
Philip Reames	2549c24142	Reapply "[RISCV][TTI] Extract subvector at index zero is free (#81751 )" This reverts commit `834d11c215` which was a revert of my `3a626937b1`. I had failed to rebase after new tests added overnight by `fc0b67e1d7`. Original commit message follows: Extracing a subvector at index zero corresponds to a type conversion and possibly a subregister operation. We will not emit a vslidedown. As such, they are free. As an aside, it looks like we're not passing an index in for cases where the subvec type is scalable. For at least index zero, we probably should be. Revert "Revert "[RISCV][TTI] Extract subvector at index zero is free (#81751)""	2024-02-15 16:51:15 -08:00
Craig Topper	834d11c215	Revert "[RISCV][TTI] Extract subvector at index zero is free (#81751 )" This reverts commit `3a626937b1`. Causes tests added by `fc0b67e1d7` to fail.	2024-02-15 12:51:23 -08:00
Philip Reames	3a626937b1	[RISCV][TTI] Extract subvector at index zero is free (#81751 ) Extracing a subvector at index zero corresponds to a type conversion and possibly a subregister operation. We will not emit a vslidedown. As such, they are free. As an aside, it looks like we're not passing an index in for cases where the subvec type is scalable. For at least index zero, we probably should be.	2024-02-15 07:43:50 -08:00
Luke Lau	fc0b67e1d7	[RISCV] Add cost model tests for llvm.vector.{insert,extract}. NFC For llvm.vector.extract, this tests combinations of inserting at a zero and non-zero index, and extracting from a fixed or scalable vector. For llvm.vector.insert, this tests the same combinations as extracts but with an additional configuration for an undef vector. This is because we can use a subregister insert if the index is 0 and the vector is undef, which should be free.	2024-02-15 12:17:27 +08:00
Fangrui Song	3d18c8cd26	[test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64 Similar to `d39b4ce3ce` Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and warned by Clang Driver. We want to avoid them elsewhere as well. Just use the common "aarch64" without other triple components.	2024-02-12 18:29:55 -08:00
Alexey Bataev	7bc079c852	[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for extract subvector. Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source permute. If there are no costs estimation for extractsubvector, better to switchto single source permute for better cost estimation. Reviewers: RKSimon, davemgreen, arsenm Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/79837	2024-02-12 07:09:49 -05:00
Jeremy Morse	66d4fe97d8	[DebugInfo][RemoveDIs] Final final test-maintenence patch (#80988 ) This should be the final portion of shaping-up the test suite to be ready for turning on non-intrinsic debug-info: * Pin CostModel tests that expect to see intrinsics in their -debug output to not use RemoveDIs. This is a spurious test output difference. * Add 'tail' to a bunch of intrinsics in UpdateTestChecks. We're cannonicalising intrinsics to be printed with "tail" in RemoveDI conversion as dbg.values usually pick that up while being optimised. This is another spurious output difference. * The "DebugInfoDrop" pass used in the debugify unit-tests happens to operate inside the pass manager, thus it sees non-intrinsic debug-info. Update it to correctly drop it.	2024-02-07 14:31:52 +00:00
Shih-Po Hung	a826a0c234	[RISCV] Add tests for reduce.fmaximum/fminimum. NFC (#80553 ) This is to add test coverage for crash report in #80340	2024-02-05 21:41:24 +08:00
Nikita Popov	1aee1e1f4c	[Analysis] Convert tests to opaque pointers (NFC)	2024-02-05 12:04:39 +01:00
Philip Reames	b78b264518	[TTI] Add costing for vp.strided.load and vp.strided.store (#80360 ) The primary motivation of this patch is to add testing infrastructure atop the recently landed `8ad14b6d90`, so that we can separate the costing aspects of strided memory operations from the SLP implementation details. I want to be clear that I am not proposing that we use the vp.strided.* forms as our canonical IR representation. I'm merely using them as a testing vehicle to exercise the costing machinery. The canonical IR form remains a masked.gather or masked.scatter. I do want to explore adding a non-vp strided load/store intrinsic, but that's a separate line of work. There is one costing change included in this. As I wrote my test, I discovered that the default implementation was scalarized (if invoked via generic routines such as getInstructionCost), and when adding the call into the strided specific costing discovered that we hadn't modeled the fallback to scalarization properly in the initial patch. After fixing that, there is a minor difference in scalarization cost reported for the unaligned case but I believe that to be uninteresting. For the record, I did confirm that vp.strided.store is lowered to a strided store on RISCV. :)	2024-02-02 09:17:07 -08:00

1 2 3 4 5 ...

1604 Commits