clang-p2996

Author	SHA1	Message	Date
annamthomas	54a9f0007c	[SCEV] Fix BinomialCoefficient Iteration to fit in W bits (#88010 ) BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since `1b76120`. Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift. Fixes #87798	2024-04-10 09:02:23 -04:00
Shih-Po Hung	3d985a6f1b	[RISCV][TTI] Scale the cost of Select with LMUL (#88098 ) Use the Val type to estimate the instruction cost for SelectInst.	2024-04-10 14:18:15 +08:00
Shih-Po Hung	ee52add6cb	[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931 ) This patch uses the argument type to infer the LMUL cost for the index generation, add, and comparison.	2024-04-10 10:08:33 +08:00
David Green	f0e79d9152	[AArch64] Add a cost for identity shuffles. These are mostly handled at a higher level when costing shuffles, but some masks can end up being identity or concat masks which we can treat as free.	2024-04-09 17:16:14 +01:00
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Simon Pilgrim	3bfd5c6424	[TTI] getCommonMaskedMemoryOpCost - consistently use getScalarizationOverhead instead of ExtractElement costs for address/mask extraction. (#87771 ) These aren't unknown extraction indices, we will be extracting every address/mask element in sequence.	2024-04-09 15:42:51 +01:00
Florian Hahn	977c0a6d29	[LAA] Add tests with non-constant strides & distances. Add a number of LAA test cases with both forward and backward dependences with non-constant strides and dependence distances. This includes test coverage for https://github.com/llvm/llvm-project/issues/87336 Also includes a LoopLoadElimination test to make sure the pass does not crash on non-constant dependence distances.	2024-04-08 19:18:38 +01:00
David Green	0bfea40101	[AArch64] More shuffle-store test cases. NFC	2024-04-08 09:19:47 +01:00
David Green	d57d094779	[AArch64] Add test for LD2/LD3/LD4 shuffle cost models. NFC	2024-04-07 18:18:32 +01:00
David Green	e4169f79ef	[AArch64] Add extra zip and uzp shuffle cost tests. NFC	2024-04-05 19:33:22 +01:00
Simon Pilgrim	58187fad93	[CostModel][X86] Update masked load/store/gather/scatter tests to explicitly use variable masks Using <X x i1> undef masks means they are treated as constants, which underestimates the scalar costs as it assumes that the masks/branches will fold away.	2024-04-05 11:15:46 +01:00
Simon Pilgrim	53fe94a0ce	[CostModel][X86] Add costkinds test coverage for masked load/store/gather/scatter Noticed while starting triage for #87640	2024-04-04 19:13:17 +01:00
Simon Pilgrim	ed41249498	[CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 17:17:55 +01:00
Simon Pilgrim	3871eaba6b	[CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803	2024-04-04 12:26:35 +01:00
Shih-Po Hung	97523e5321	[RISCV][TTI] Scale the cost of intrinsic stepvector with LMUL (#87301 ) Use the return type to measure the LMUL size for latency/throughput cost	2024-04-04 08:30:15 +08:00
Kevin P. Neal	9c9f94063c	[FPEnv][CostModel] Correct strictfp test. Correct strictfp tests to follow the rules documented in the LangRef: https://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics These tests needed the strictfp attribute added to some function definitions. Test changes verified with D146845.	2024-04-02 13:53:56 -04:00
Shih-Po Hung	d7a43a00fe	[RISCV][TTI] Scale the cost of trunc/fptrunc/fpext with LMUL (#87101 ) Use the destination data type to measure the LMUL size for latency/throughput cost	2024-04-02 09:30:51 +08:00
Shih-Po Hung	84f24c2daf	[RISCV][TTI] Scale the cost of intrinsic umin/umax/smin/smax with LMUL (#87245 ) Use the return type to measure the LMUL size for throughput/latency cost	2024-04-02 09:26:27 +08:00
Shih-Po Hung	c7954ca312	Recommit "[RISCV] Refine cost on Min/Max reduction (#79402 )" (#86480 ) This is recommitted as the test and fix for llvm.vector.reduce.fmaximum/fminimum are covered in #80553 and #80697	2024-04-01 14:44:10 +08:00
Vitaly Buka	37d6e5b7a5	[memoryssa] Exclude llvm.allow.{runtime,ubsan}.check() (#86066 ) RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641	2024-03-31 22:50:02 -07:00
Vitaly Buka	0bc3781649	[Analysis] Exclude llvm.allow.{runtime,ubsan}.check() from AliasSetTracker (#86065 ) RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641	2024-03-31 22:47:55 -07:00
Vitaly Buka	1e442ac4c3	[CostModel] No cost for llvm.allow.{runtime,ubsan}.check() (#86064 ) These intrinsics will not be lowered to code. RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641	2024-03-31 22:27:57 -07:00
ShihPo Hung	aa2d5d5413	Recommit "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 )" Changes in Recommit: Add an additional check on sign/zero extend to the same type. Original message: Use the destination data type to measure the LMUL size for latency/throughput cost	2024-03-26 23:41:16 -07:00
ShihPo Hung	da3e58e74a	Revert "[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 )" This reverts commit `7545c63572` as it's failing on the Linux bots.	2024-03-26 21:47:32 -07:00
Shih-Po Hung	7545c63572	[RISCV][TTI] Scale the cost of the sext/zext with LMUL (#86617 ) Use the destination data type to measure the LMUL size for latency/throughput cost	2024-03-27 10:58:17 +08:00
Changpeng Fang	350bda4419	AMDGPU: Rename intrinsics and remove f16/bf16 versions for load transpose (#86313 ) Rename the intrinsics to close to the instruction mnemonic names: Use global_load_tr_b64 and global_load_tr_b128 instead of global_load_tr. This patch also removes f16/bf16 versions of builtins/intrinsics. To simplify the design, we should avoid enumerating all possible types in implementing builtins. We can always use bitcast.	2024-03-25 16:55:22 -07:00
Shih-Po Hung	3cb024198f	[RISCV][CostModel] Estimate cost of llvm.vector.reduce.fmaximum/fminimum (#80697 ) The ‘llvm.vector.reduce.fmaximum/fminimum.*’ intrinsics propagate NaNs if any element of the vector is a NaN. Following #79402, the patch adds the cost for NaN check (vmfne + vcpop)	2024-03-25 17:17:36 +08:00
Vitaly Buka	5c95484061	[Analysis] Use implicit-check-not in test	2024-03-20 19:52:39 -07:00
Andreas Jonson	e66cfebb04	[ValueTracking] Handle range attributes (#85143 ) Handle the range attribute in ValueTracking.	2024-03-20 12:43:00 +01:00
Nikita Popov	6872a64652	[ValueTracking] Handle vector range metadata in isKnownNonZero() Nowadays !range can be placed on instructions with vector of int return value. Support this case in isKnownNonZero().	2024-03-19 15:50:13 +01:00
Nikita Popov	0e76818672	[ValueTracking] Test isKnownNonZero() range metadata with vector (NFC)	2024-03-19 15:50:13 +01:00
Philip Reames	35db929b50	[RISCV] Add cost model coverage for fixed vector insert with known VLEN	2024-03-13 15:21:37 -07:00
Noah Goldstein	744a23f24b	[ValueTracking] Use select condition to help infer bits of arms If we have something like `(select (icmp ult x, 8), x, y)`, we can use the `(icmp ult x, 8)` to help compute the knownbits of `x`. Closes #84699	2024-03-13 14:27:05 -05:00
Noah Goldstein	882992a951	[ValueTracking] Add tests for inferring select arm bits from condition; NFC	2024-03-13 14:27:05 -05:00
mikaelholmen	2d62ce4beb	[ValueTracking] Remove faulty dereference of "InsertBefore" (#85034 ) In `2fe81edef6` [NFC][RemoveDIs] Insert instruction using iterators in Transforms/ we changed if (req_idx != i) return FindInsertedValue(I->getAggregateOperand(), idx_range, - InsertBefore); + *InsertBefore); } but there is no guarantee that is InsertBefore is non-empty at that point, which we e.g can see in the added testcase. Instead just pass on the optional InsertBefore in the recursive call to FindInsertedValue, as we do at several other places already.	2024-03-13 09:58:47 +01:00
Florian Hahn	a3ad5faa32	[LAA] Fix typo IndidrectUnsafe -> IndirectUnsafe. Fix type in textual analysis output.	2024-03-12 14:44:04 +00:00
Florian Hahn	b274b23665	[ValueTracking] Treat phi as underlying obj when not decomposing further (#84339 ) At the moment, getUnderlyingObjects simply continues for phis that do not refer to the same underlying object in loops, without adding them to the list of underlying objects, effectively ignoring those phis. Instead of ignoring those phis, add them to the list of underlying objects. This fixes a miscompile where LoopAccessAnalysis fails to identify a memory dependence, because no underlying objects can be found for a set of memory accesses. Fixes https://github.com/llvm/llvm-project/issues/82665. PR: https://github.com/llvm/llvm-project/pull/84339	2024-03-12 08:55:03 +00:00
Noah Goldstein	d81db0e5f5	[KnownBits] Implement knownbits `lshr`/`ashr` with exact flag The exact flag basically allows us to set an upper bound on shift amount when we have a known 1 in `LHS`. Typically we deduce exact using knownbits (on non-exact incoming shifts), so this is particularly impactful, but may be useful in some circumstances. Closes #84254	2024-03-11 15:51:07 -05:00
Noah Goldstein	f19d9e1617	[KnownBits] Add test for computing more information for `lshr`/`ashr` with `exact` flag; NFC	2024-03-11 15:51:06 -05:00
Dominik Steenken	718962f53b	[SystemZ] Provide improved cost estimates (#83873 ) This commit provides better cost estimates for the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all vector lengths and integer types up to i128. For integer types larger than i128, we fall back to the default cost estimate. This has the effect of lowering the estimated costs of most common instances of the intrinsic. The expected performance impact of this is minimal with a tendency to slightly improve performance of some benchmarks. This commit also provides a test to check the proper computation of the new estimates, as well as the fallback for types larger than i128.	2024-03-11 10:40:59 +01:00
Florian Hahn	4cfd4a7896	[LAA] Add test case for #82665 . Test case for https://github.com/llvm/llvm-project/issues/82665.	2024-03-07 13:53:03 +00:00
Simon Pilgrim	55304d0d90	[CostModel] getInstructionCost - improve estimation of costs for length changing shuffles (#84156 ) Fix gap in the cost estimation for length changing shuffles, by adjusting the shuffle mask and either widening the shuffle inputs or extracting the lower elements of the result. A small step towards moving some of this implementation inside improveShuffleKindFromMask and/or target getShuffleCost handlers (and reduce the diffs in cost estimation depending on whether coming from a ShuffleVectorInst or the raw operands / mask components)	2024-03-07 10:46:27 +00:00
Philip Reames	1a37147af5	[SCEV] Match both (-1)b + a and a + (-1)b as a - b (#84247 ) In our analysis of guarding conditions, we were converting a-b == 0 into a == b alternate form, but we were only checking for one of the two forms for the sub. There's no requirement that the multiply only be on the LHS of the add.	2024-03-06 15:57:34 -08:00
Philip Reames	5cd45e442e	[SCEV] Precommit test for widened signed induction variables These tests highlight that we have missed oppurtunities proving trip count bounds when our start/end values are sign extended from smaller types and we have either a loop guard to relate our start vs end, or a nsw/nuw fact to bound end.	2024-03-06 14:09:40 -08:00
Philip Reames	0d38f21e4a	[SCEV] Extend type hint in analysis output to all backedge kinds This extends the work from `7755c26` to all of the different backend taken count kinds that we print for the scev analysis printer. As before, the goal is to cut down on confusion as i4 -1 is a very different (unsigned) value from i32 -1.	2024-03-06 13:08:05 -08:00
Philip Reames	e946b5a87b	[SCEV] Autogenerate more scev analysis check tests	2024-03-06 12:42:19 -08:00
Philip Reames	8b5b294ec2	[SCEV] Print predicate backedge count only if new information available When printing the result of SCEV's analysis, we can avoid printing the predicated backedge taken count and the predicates if the predicates are empty and no new information is provided. This helps to reduce the verbosity of the output.	2024-03-06 10:24:32 -08:00
Philip Reames	7755c26195	[SCEV] Include type when printing constant max backedge taken count When printing the result of the analysis, i8 -1 and i64 -1 are quite different in terms of analysis quality. In a recent conversion with a new contributor, we ran into exactly this confusion. Adding the type for constant scevs more globally seems worthwhile, but introduces a much larger test diff. I'm splitting this off first since it addresses the immediate need, and then going to do some further changes to clarify a few related bits of analysis result output.	2024-03-06 08:48:25 -08:00
Philip Reames	987fe6fa50	[SCEV] Migrate a couple tests to be auto generated A few notes: * pr34538.ll has bitrotten. The original test printed the analysis after transforms in some cases, but this appears to been lost during migration to new pass manager. Remove the now redundant pass invocations and simplify the test setup.	2024-03-05 18:04:30 -08:00
Philip Reames	31c304ba7b	[SCEV] Migrate some tests to be autogenerated In advance of a change which needs to update these. This batch was the "easy" ones, I'll be landing the harder set a few a time for easier review.	2024-03-05 17:41:58 -08:00

1 2 3 4 5 ...

4308 Commits