clang-p2996

Author	SHA1	Message	Date
LiqinWeng	c3edeaa61b	[Test] Rename the test function name suffix. NFC (#114504 )	2024-11-01 13:49:34 +08:00
Luke Lau	9c7188871c	[RISCV] Cost ordered bf16/f16 w/ zvfhmin reductions as invalid (#114250 ) In #111000 we removed promotion of fadd/fmul reductions for bf16 and f16 without zvfh, and marked the cost as invalid to prevent the vectorizers from emitting them. However it inadvertently didn't change the cost for ordered reductions, so this moves the check earlier to fix this. This also uses BasicTTIImpl instead which now assigns a valid but expensive cost for fixed-length vectors, which reflects how codegen will actually scalarize them.	2024-10-31 23:36:09 +08:00
Luke Lau	e989e31a47	[RISCV] Mark f16/bf16 lrint and llrint cost as invalid (#113924 ) We currently can't lower scalable vector lrint and llrint nodes for bf16 and f16, even with zvfh, and will crash. Mark the cost as invalid for now to prevent the vectorizers from emitting them. Note that we can actually lower fixed-length vectors fine by scalarizing them, but we were still undercosting these too so I've also included them. I presume there's an opportunity to improve the codegen later on.	2024-10-30 17:21:18 +02:00
David Sherwood	7f498a865f	[CostModel][LoopVectorize] Move some loop vectoriser tests (#113702 ) Many tests that were in test/Analysis/CostModel were actually loop vectoriser tests. I've moved them as follows: Analysis/CostModel/X86 -> Transforms/LoopVectorize/X86/CostModel Analysis/CostModel/AArch64/arith-fp-frem.ll -> Transforms/LoopVectorize/AArch64/arith-fp-frem-costs.ll	2024-10-30 13:50:02 +00:00
Luke Lau	8b55162e19	[RISCV] Add cost model tests for scalable FP reductions. NFC There are already some in reduce-scalable-fp.ll but this makes it a bit easier to see the difference alongside their fixed-length counterparts.	2024-10-29 23:58:06 +02:00
Luke Lau	40363d506d	[RISCV] Add cost model tests for fp rounding ops for bf16. NFC	2024-10-28 14:59:06 +00:00
Paul Walker	5bb34803a4	[NFC] Migrate tests to use autoupdate for CHECK lines.	2024-10-22 12:55:15 +00:00
Han-Kuan Chen	12bcea3292	[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#111459 ) reference: https://github.com/llvm/llvm-project/pull/110457	2024-10-18 20:16:56 +07:00
Elvis Wang	566012a64e	[RISCV][TTI] Implement instruction cost for vp_merge. (#112327 ) This patch implement the instruction for `vp_merge`, which will generate similar instruction sequence to the `select` instruction.	2024-10-17 07:47:43 +08:00
Luke Lau	2b6b7f664d	[RISCV] Mark math functions as expanded for zvfhmin/zvfbfmin (#112508 ) For regular floating point types we mark these as expanded on scalable vectors so they're not legal in the cost model, so this does the same for f16 w/ zvfhmin and bf16.	2024-10-16 21:40:37 +01:00
Luke Lau	e88bcc1204	[RISCV] Lower vector_splice on zvfhmin/zvfbfmin (#112579 ) Similar to other permutation ops, we can just reuse the existing lowering.	2024-10-16 21:40:18 +01:00
Luke Lau	1d40fefb08	[RISCV] Add zvfhmin/zvfbfmin cost model tests for libcall ops. NFC	2024-10-16 10:09:34 +01:00
Elvis Wang	f3648046ec	[RISCV] Fix vp-intrinsics args in cost model tests. NFC (#112463 ) This patch contains following changes to fix vp intrinsics tests. 1. v\float -> v\f32, v\double -> v\f64 and v\half -> v\f16 2. Fix the order of the vp-intrinsics.	2024-10-16 12:57:43 +08:00
Luke Lau	4c894730a1	[RISCV] Fix bf16 cost model tests. NFC These were inadvertently changed in #112393	2024-10-15 23:01:53 +01:00
Luke Lau	a3cd269fbe	[RISCV] Remove {s,u}int_to_fp custom op action for f16/bf16 (#111471 ) It turns out that {s,u}int_to_fp nodes get their operation action from their operand's type, not the result type, so we don't need to set it for fp16 or bf16. vp_{s,u}int_to_fp uses the result type though so we need to keep it. This also means that we can lower int_to_fp for fixed length bf16 vectors already, so this adds tests for that. The cost model test changes are due to BasicTTIImpl's getCastInstrCost not taking into account that int_to_fp needs its legal type swapped. This can be fixed in a later patch, but its worth noting that the affected types in the tests currently crash when lowered anyway (due to them needing split at LMUL > 8)	2024-10-10 14:40:24 +01:00
Philip Reames	f11568bcb0	Revert "[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457 )" This reverts commit `554eaec639`. Change was not approved when landed.	2024-10-07 11:31:57 -07:00
Luke Lau	20864d2cf6	[ValueTypes][RISCV] Add v1bf16 type (#111112 ) When trying to add RISC-V fadd reduction cost model tests for bf16, I noticed a crash when the vector was of <1 x bfloat>. It turns out that this was being scalarized because unlike f16/f32/f64, there's no v1bf16 value type, and the existing cost model code assumed that the legalized type would always be a vector. This adds v1bf16 to bring bf16 in line with the other fp types. It also adds some more RISC-V bf16 reduction tests which previously crashed, including tests to ensure that SLP won't emit fadd/fmul reductions for bf16 or f16 w/ zvfhmin after #111000.	2024-10-06 22:20:51 +08:00
Han-Kuan Chen	554eaec639	[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457 )	2024-10-05 14:58:44 +08:00
Luke Lau	3b0e120336	[RISCV] Add tests for @llvm.vector.reduce.fmul. NFC	2024-10-04 14:27:45 +08:00
RolandF77	06c8210a67	update P7 32-bit partial vector load cost (#108261 ) Update cost model to reflect codegen change to use lfiwzx for 32-bit partial vector loads on pwr7 with https://github.com/llvm/llvm-project/pull/104507.	2024-10-03 12:28:43 -04:00
Luke Lau	487686b82e	[SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000 ) In https://reviews.llvm.org/D153848, promotion was added for a variety of f16 ops with zvfhmin, including VP reductions. However I don't believe it's correct to promote f16 fadd or fmul reductions to f32 since we need to round the intermediate results. Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two different results depending on whether we compiled with +zvfh or +zvfhmin, for example with a 3 element reduction: ; v9 = [0.1563, 5.97e-8, 0.00006104] ; zvfh vsetivli x0, 3, e16, m1, ta, ma vmv.v.i v8, 0 vfredosum.vs v8, v9, v8 vfmv.f.s fa0, v8 ; fa0 = 0.1563 ; zvfhmin vsetivli x0, 3, e16, m1, ta, ma vfwcvt.f.f.v v10, v9 vsetivli x0, 3, e32, m1, ta, ma vmv.v.i v8, 0 vfredosum.vs v8, v10, v8 vfmv.f.s fa0, v8 fcvt.h.s fa0, fa0 ; fa0 = 0.1564 This same thing happens with reassociative reductions e.g. vfredusum.vs, and this also applies for bf16. I couldn't find anything in the LangRef for reductions that suggest the excess precision is allowed. There may be something we can do in Clang with -fexcess-precision=fast, but I haven't looked into this yet. I presume the same precision issue occurs with fmul, but not with fmin/fmax/fminimum/fmaximum. I can't think of another way of lowering these other than scalarizing, and we can't scalarize scalable vectors, so this just removes the promotion and adjusts the cost model to return an invalid cost. (It looks like we also don't currently cost fmul reductions, so presumably they also have an invalid cost?) I think this should be enough to stop the loop vectorizer or SLP from emitting these intrinsics.	2024-10-04 00:17:45 +08:00
Philip Reames	50afafbf29	[RISCV][TTI] Adjust constant materialization cost for (z/s)ext from i1 (#110282 ) When we're lowering to a split sequence, we only need one materialization of the zero constant. Our codegen looks something like this: vmv.v.i v24, 0 vmerge.vim v8, v24, -1, v0 vmv1r.v v0, v16 vmerge.vim v16, v24, -1, v0 Note: Doing this specific case since it was pointed out in https://github.com/llvm/llvm-project/pull/110164#discussion_r1778268391, but it's worth noting that we have the same basic problem (over costing split operations with split invariant terms) at multiple places through this file.	2024-09-27 10:53:45 -07:00
Philip Reames	1a9569c4f0	[RISCV][TTI] Avoid an infinite recursion issue in getCastInstrCost (#110164 ) Calling into BasicTTI is not always safe. In particular, BasicTTI does not have a full legalization implementation (vector widening is missing), and falls back on scalarization. The problem is that scalarization for <N x i1> vectors is cost in terms of the cast API and we can end up in an infinite recursive cycle. The "right" fix for this would be teach BasicTTI how to model the full legalization state machine, but several attempts at doing so have resulted in dead ends or undesirable cost changes for targets I don't understand. This patch instead papers over the issue by avoiding the call to the base class when dealing with an i1 source or dest. This doesn't necessarily produce correct costs, but it should at least return something semi-sensible and not crash. Fixes https://github.com/llvm/llvm-project/issues/108708	2024-09-27 07:47:09 -07:00
Philip Reames	d288574363	[TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824 ) This follows in the spirit of `7d82c99403`, and extends the costing API for compares and selects to provide information about the operands passed in an analogous manner. This allows us to model the cost of materializing the vector constant, as some select-of-constants are significantly more expensive than others when you account for the cost of materializing the constants involved. This is a stepping stone towards fixing https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch will be required to utilize the new API.	2024-09-25 07:25:57 -07:00
Luke Lau	f43ad88ae1	[RISCV] Handle zvfhmin and zvfbfmin promotion to f32 in half arith costs (#108361 ) Arithmetic half or bfloat ops on zvfhmin and zvfbfmin respectively will be promoted and carried out in f32, so this updates getArithmeticInstrCost to check for this.	2024-09-25 18:50:16 +08:00
Philip Reames	bcbdf7ad6b	[RISCV][TTI/SLP] Add test coverage for select of constants costing Provides coverage for an upcoming change which accounts for the cost of materializing the vector constants in the vector select.	2024-09-24 08:15:40 -07:00
Sushant Gokhale	c5672e21ca	[AArch64][CostModel] Reduce the cost of fadd reduction with fast flag (#108791 ) fadd reduction with 1. Fast flag set 2. No of elements in input vector is power of 2 results in series of faddp instructions. faddp instruction has latency/throughput identical to fadd instruction and hence, we set relative cost=1 for faddp as well. The change didn't show any regression with SPEC17-FP(C/C++), llvm-test-suite on Neoverse-V2.	2024-09-24 14:35:01 +05:30
Luke Lau	cce1fa39ea	[RISCV] Add zvfbfmin arithmetic cost model test coverage. NFC	2024-09-23 23:28:25 +08:00
Philip Reames	0b524efa95	[RISCV][TTI] Reduce cost of a <N x i1> build_vector pattern (#109449 ) This is a follow up to `7f6bbb3`. When lowering a <N x i1> build_vector, we currently chose to extend to i8, perform the build_vector there, and then truncate back in vector. Our costing on the other hand accounts for it as if we performed a vector extend, an insert, and a vector extract for every element. This significantly over estimates the cost. Note that we can likely do better in our build_vector lowering here by packing the bits in scalar, and doing a build_vector of the packed bits. Regardless, our costing should match our lowering.	2024-09-23 07:21:54 -07:00
Elvis Wang	80b44517f5	[RISCV][TTI] Add instruction cost for vp.select. (#109381 ) This patch make instruction cost for vp.select the same as its non-vp counterpart.	2024-09-23 15:06:04 +08:00
Philip Reames	f7c3309b13	[RISCV] Add coverage for <N x i1> vp.strided.load These are currently scalarized, and I need something to exercise <N x i1> scalarization costing. We should probably consider adding a buildvector intrinsic for this purpose.	2024-09-20 10:30:21 -07:00
Philip Reames	7f6bbb3c4f	[RISCV][TTI] Reduce cost of a build_vector pattern (#108419 ) This change is actually two related changes, but they're very hard to meaningfully separate as the second balances the first, and yet doesn't do much good on it's own. First, we can reduce the cost of a build_vector pattern. Our current costing for this defers to generic insertelement costing which isn't unreasonable, but also isn't correct. While inserting N elements requires N-1 slides and N vmv.s.x, doing the full build_vector only requires N vslide1down. (Note there are other cases that our build vector lowering can do more cheaply, this is simply the easiest upper bound which appears to be "good enough" for SLP costing purposes.) Second, we need to tell SLP that calls don't preserve vector registers. Without this, SLP will vectorize scalar code which performs e.g. 4 x float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the costing works out that this is in fact the optimal choice - except that we don't actually have a <2 x float> @exp, and unroll during DAG. This would be fine (or at least cost neutral) except that the libcall for the scalar @exp blows all vector registers. So the net effect is we added a bunch of spills that SLP had no idea about. Thankfully, AArch64 has a similiar problem, and has taught SLP how to reason about spill cost once the right TTI hook is implemented. Now, for some implications... The SLP solution for spill costing has some inaccuracies. In particular, it basically just guesses whether a intrinsic will be lowered to a call or not, and can be wrong in both directions. It also has no mechanism to differentiate on calling convention. This has the effect of making partial vectorization (i.e. starting in scalar) more profitable. In practice, the major effect of this is to make it more like SLP will vectorize part of a tree in an intersecting forrest, and then vectorize the remaining tree once those uses have been removed. This has the effect of biasing us slightly away from strided, or indexed loads during vectorization - because the scalar cost is more accurately modeled, and these instructions look relevatively less profitable.	2024-09-20 08:34:36 -07:00
Luke Lau	400b725c27	[RISCV] Remove -riscv-v-vector-bits-min from cost model tests. NFC It looks like they were added to prevent fixed length vectors from being expanded, but that's no longer the case today: https://reviews.llvm.org/D121447#3376520	2024-09-20 15:02:42 +08:00
Sam Tebbs	b49a6b2a9d	[AArch64] Consider histcnt smaller than i32 in the cost model (#108521 ) This PR updates the AArch64 cost model to consider the cheaper cost of <i32 histograms to reflect the improvements from https://github.com/llvm/llvm-project/pull/101017 and https://github.com/llvm/llvm-project/pull/103037 Work by Max Beck-Jones (@DevM-uk) --------- Co-authored-by: DevM-uk <max.beck-jones@arm.com>	2024-09-19 13:56:52 +01:00
Elvis Wang	edc71e22c0	[RISCV][TTI] Add instruction cost for vp.load/store. (#109245 ) This patch makes the instruction cost of vp.load/store same as their non-vp counterpart.	2024-09-19 16:00:21 +08:00
Luke Lau	8d7d4c25cb	[RISCV] Split fp rounding ops with zvfhmin nxv32f16 (#108765 ) This adds zvfhmin test coverage for fceil, ffloor, fnearbyint, frint, fround and froundeven and splits them at nxv32f16 to avoid crashing, similarly to what we do for other nodes that we promote. This also sets ftrunc to promote which was previously missing. We already promote the VP version of it, vp_froundtozero. Marking it as promoted affects some of the cost model tests since they're no longer expanded.	2024-09-18 16:36:13 +08:00
Sushant Gokhale	090850f15d	[AArch64][CostModel] Add NFC tests for extractelement cost (#108941 ) A successive patch aims to reduce the extractelement cost where the only user(s) is fmul instruction.	2024-09-17 22:57:05 +05:30
Philip Reames	2e7c7d20d5	[RISCV][TTI] Adjust cost for extract/insert element when VLEN is known (#108595 ) If we know an exact VLEN, then the index is effectively modulo the number of elements in a single vector register. Our lowering performs this subvector optimization. A bit of context. This change may look a bit strange on it's own given we are currently not scaling insert/extract cost by LMUL. This costing decision needs to change, but is very intertwined with SLP profitability, and is thus a bit hard to adjust. I'm hoping that https://github.com/llvm/llvm-project/pull/108419 will let me start to untangle this. This change is basically a case of finding a subset I can tackle before other dependencies are in place which does no real harm in the meantime.	2024-09-17 08:43:40 -07:00
David Green	960c975acd	[AArch64] Expand scmp/ucmp vector operations with sub (#108830 ) Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select, under Neon we can use the arithmetic expansion to generate fewer instructions. Notably it also prevents the scalarization of vselect during vector-legalization.	2024-09-16 18:44:52 +01:00
Sushant Gokhale	7a6945fcf6	[AArch64][SLP] Add NFC test cases for floating point reductions (#106507 ) A successive patch would be added to fix some of the tests. Pull request: #106507	2024-09-12 23:07:12 +05:30
Luke Lau	89c10e27d8	[RISCV] Add zvfhmin cost model test coverage. NFC This adds tests coverage for zvfhmin and halfs in general in the cost model tests. Some existing half tests were split into separate functions so that if the check prefixes diverge it won't affect the rest of the non-half instructions. Whilst we're here, also remove the redundant -riscv-vector-bits-min=128 and declares.	2024-09-12 18:41:47 +08:00
Elvis Wang	1b3e64a9d2	[RISCV][TTI] Add vp.cmp intrinsic cost with functionalOPC. (#107504 ) This patch make the instruction cost of VP compare intrinsics as same as their non-VP counterpart.	2024-09-12 07:06:36 +08:00
Florian Hahn	ea83e1c05a	[LV] Assign cost to all interleave members when not interleaving. At the moment, the full cost of all interleave group members is assigned to the instruction at the group's insert position, even if the decision was to not form an interleave group. This can lead to inaccurate cost estimates, e.g. if the instruction at the insert position is dead. If the decision is to not vectorize but scalarize or scather/gather, then the cost will be to total cost for all members. In those cases, assign individual the cost per member, to more closely reflect to choice per instruction. This fixes a divergence between legacy and VPlan-based cost model. Fixes https://github.com/llvm/llvm-project/issues/108098.	2024-09-11 21:04:34 +01:00
Florian Hahn	1741b9c3d7	[LV] Generalize check lines for interleave group costs. Check cost of all instructions in an interleave group, to prepare for follow-up changes.	2024-09-11 15:21:32 +01:00
Florian Hahn	70ff6501e6	[AArch64] Auto-generate check-lines in cost model test. Auto-generate check lines for easier updating.	2024-09-06 22:38:02 +01:00
Florian Hahn	b0ae93e847	[AArch64] Add more type combinations to vector fp conversion cost tests. Generealize test coverage for https://github.com/llvm/llvm-project/pull/107303 Also adjust the name to reflect the fact that it is not limited to vectorrs with 3 elements now.	2024-09-06 14:49:45 +01:00
Jon Roelofs	bded3b3ea9	[llvm][AArch64] Improve the cost model for i128 div's (#107306 )	2024-09-05 07:42:23 -07:00
Elvis Wang	845d8d909c	[RISCV][TTI] Add cost of typebased cast VPIntrinsics with functionalOPC. (#97797 ) This patch make the instruction cost of type-based cast VP intrinsics will be same as their non-VP counterpart. This is the following patch of [#93435](https://github.com/llvm/llvm-project/pull/93435)	2024-09-05 13:05:01 +08:00
Florian Hahn	34f2c9a9ce	[AArch64] Add tests for FP conversion with 3 element vectors. Add tests showing a number of cases where costs for floating point conversions are overestimated for vectors with 3 elements.	2024-09-04 20:44:14 +01:00
Elvis Wang	0ad6cee926	[RISCV] Fix missing `i64` to `double` tests in the cast.ll. (NFC) (#106972 )	2024-09-04 11:29:50 +08:00

1 2 3 4 5 ...

1742 Commits