clang-p2996

Author	SHA1	Message	Date
Fangrui Song	21c4dc7997	std::optional::value => operator*/operator-> value() has undesired exception checking semantics and calls __throw_bad_optional_access in libc++. Moreover, the API is unavailable without _LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). This fixes clang.	2022-12-17 00:42:05 +00:00
Florian Hahn	08f16a8217	[VPlan] Use macro to define recipe classof implementation (NFC). Add a VP_CLASSOF_IMPL macro to define common classof implementations for recipes. This reduces duplication and also adds missing implementations to existing recipes.	2022-12-16 17:52:15 +00:00
Kazu Hirata	6eb0b0a045	Don't include Optional.h These files no longer use llvm::Optional.	2022-12-14 21:16:22 -08:00
Florian Hahn	e898479f2b	[VPlan] Sink non-uniform recieps for scalar plans. In scalar plans, replicate recipes will only generate a single value per UF, independent of whether they are uniform or not. So don't consider uniformity for plans with scalar VFs only. This allows us to handle a few additional cases in VPlan sinking instead of non-VPlan sinkScalarOperands. Depends on D133762. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D134218	2022-12-14 17:55:31 +00:00
Fangrui Song	d4b6fcb32e	[Analysis] llvm::Optional => std::optional	2022-12-14 07:32:24 +00:00
Alexey Bataev	ecac8192db	[SLP][NFC]Initial redesign of ShuffleInstructionBuilder, NFC. The patch redesigns ShuffleInstructionBuilder so it could later be used for reshuffling of the buildvector sequences and vectorized parts of externally used scalars. Also will allow to generalize cost model for the gathers/buildvectors. Part of D110978. Differential Revision: https://reviews.llvm.org/D139718	2022-12-13 09:37:18 -08:00
Fangrui Song	1ec11d2d48	[Transforms/Vectorize] llvm::Optional => std::optional	2022-12-12 08:56:35 +00:00
Fangrui Song	c178ed33bd	Transforms/Utils: llvm::Optional => std::optional	2022-12-12 08:29:05 +00:00
Florian Hahn	29e8de5de1	[VPlan] Summarize recipes used to model inductions (NFC). Document recipes used to model inductions after introducing VPDerivedIVRecipe in `0c5df7cd2f`. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D138748	2022-12-11 16:28:43 +00:00
Kazu Hirata	f7dffc28b3	Don't include None.h (NFC) I've converted all known uses of None to std::nullopt, so we no longer need to include None.h. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-10 11:24:26 -08:00
Philip Reames	b0f904b6da	[LV] Account for minimum vscale when rejecting scalable vectorization of short loops The vectorizer has code to reject scalable vectorization of loops with very short trip counts, and instead use fixed length vectors. The current code doesn't account for the minimum vscale value known, and thus under estimates the number of lanes in the scalable type for RISCV's default configuration. This results in use of predication and a trivially dead loop where a single straight line piece of code would suffice. Note that the code quality of the original scalable vectorization could (and probably should) be improved other ways as well. This patch is solely about whether the scalable vectorization was the right choice to begin with. This bit of code - both with and without my change - does make the unchecked assumption that the target knows how to lower fixed length vectors whose length is provably less than the vector length. Differential Revision: https://reviews.llvm.org/D137285	2022-12-09 11:29:41 -08:00
Alexey Bataev	f4c6d7b813	[SLP][NFC]prepare isUndefVector function to be used for differently sized vectors as shuffle masks, NFC. Use use-mask instead of actual mask to speed up the process and make it possible to use for the cases where the mask is used for vector resizing.	2022-12-09 04:14:16 -08:00
Kazu Hirata	1f421b6d7e	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-06 22:45:17 -08:00
Kazu Hirata	595f1a6aaf	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 19:47:13 -08:00
Kazu Hirata	9f252e5567	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 17:31:17 -08:00
Kazu Hirata	3c09ed006a	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 17:12:44 -08:00
Florian Hahn	37809c867a	[VPlan] Support sinking VPScalarIVStepsRecipe. This patch extends VP-based sinking to also sink VPScalarStepsRecipe. This takes us a step closer towards retiring the IR based sinking. The main change is extending VPScalarIVStepsRecipe::execute to support executing in a replicate-region. Depends on D133758. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133760	2022-12-04 22:59:17 +00:00
Florian Hahn	3c5f07349f	[VPlan] Mark VPScalarIVStepsRecipe as not reading/writing memory. The recipe only computes the inductions steps using its operands. It does neither read nor write memory. Split of from D133760.	2022-12-04 12:58:46 +00:00
Kazu Hirata	343de6856e	[Transforms] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 21:11:37 -08:00
Krzysztof Parzyszek	26424c96c0	Attributes: convert Optional to std::optional	2022-12-02 08:15:45 -06:00
Florian Hahn	0c5df7cd2f	Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit `bf15f1e489`. The updated version fixes a crash by checking the induction kind instead of the opcode; for integer inductions, the step is always added, but the opcode might not be set.	2022-11-30 17:04:20 +00:00
Alexey Bataev	0cc15050a4	[SLP]Fix PR59230: Use actual vector factor when sorting entries. When we sort entries for attempting to reorder scalars, need to use actual vectorization factor, not the number of scalars. Otherwise the compiler crashes, if the scalars has to be reordered. Differential Revision: https://reviews.llvm.org/D138819	2022-11-29 06:46:06 -08:00
Stanislav Mekhanoshin	c46634554d	[LoadStoreVectorizer] Consider if operation is faster than before Compare a relative speed of misaligned accesses before and after vectorization, not just check the new instruction is not going to be slower. Since no target now returns anything but 0 or 1 for Fast argument of the allowsMisalignedMemoryAccesses this is still NFCI. The subsequent patch will tune actual vaues of Fast on AMDGPU. Differential Revision: https://reviews.llvm.org/D124218	2022-11-28 15:52:32 -08:00
Florian Hahn	bf15f1e489	Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe." This reverts commit `0fa666eced`. This triggers an assertion during AArch64 stage2 builds. Revert while I investigate. See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio	2022-11-28 22:43:11 +00:00
Florian Hahn	0fa666eced	[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe. This patch splits off the logic to transform the canonical IV to a a value for an induction with a different start and step. This transformation only needs to be done once (independent of VF/UF) and enables sinking of VPScalarIVStepsRecipe as follow-up. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133758	2022-11-28 16:32:31 +00:00
Qiongsi Wu	f946c70130	[SLPVectorizer] Do Not Move Loads/Stores Beyond Stacksave/Stackrestore Boundaries If left unchecked, the SLPVecrtorizer can move loads/stores below a stackrestore. The move can cause issues if the loads/stores have pointer operands from `alloca`s that are reset by the stackrestores. This patch adds the dependency check. The check is conservative, in that it does not check if the pointer operands of the loads/stores are actually from `alloca`s that may be reset. We did not observe any SPECCPU2017 performance degradation so this simple fix seems sufficient. The test could have been added to `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll`, but that test has not been updated to use opaque pointers. I am not inclined to add tests that still use typed pointers, or to refactor `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll` to use opaque pointers in this patch. If desired, I will open a different patch to refactor and consolidate the tests. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D138585	2022-11-28 10:00:29 -05:00
Kazu Hirata	41c638875e	[Vectorize] Use std::optional in VPlanSLP.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-26 18:11:32 -08:00
Kazu Hirata	5fc8f6c37c	[Vectorize] Use std::optional in SLPVectorizer.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-26 18:03:49 -08:00
Florian Hahn	bf0bd85f9d	[LV] Move trunc codegen to buildScalarSteps (NFCI). This moves the code to truncate step and IV into buildScalarSteps, closer to the place where they are actually used. Suggested in D133758.	2022-11-26 23:48:46 +00:00
Florian Hahn	12bb5535d2	[VPlan] Move cast codegen to emitTransformedIndex (NFCI). This reduces duplication a bit. Suggested as simplification in D133758.	2022-11-26 22:47:13 +00:00
Florian Hahn	ed2fdace89	[LV] Use separate index to access StoredValues in vectorizeInterleave. StoredValues only has entries for members of the interleave group. If there are gaps, then using the index i here will either access a wrong entry or be out-of-bounds. Instead use a dedicated index that only gets incremented for members of the interleave group. Fixes #59090.	2022-11-25 15:28:05 +00:00
Fangrui Song	fa36d72305	[LoopVectorize] Internalize some cl::opt	2022-11-23 23:03:02 -08:00
Matt Devereau	ee4d6c8bf0	[VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for scalable vectors which caused poor scalable Binop codegen. Differential Revision: https://reviews.llvm.org/D138545	2022-11-23 13:17:21 +00:00
Benjamin Kramer	f116107f2d	[VectorCombine] Don't touch instruction after foldSingleElementStore, it might be deleted Use after free found by asan.	2022-11-22 21:12:42 +01:00
Sanjay Patel	ede6d608f4	[VectorCombine] switch on opcode to compile faster This follows `87debdadaf` to further eliminate wasting time calling helper functions only to early return to the main run loop. Once again, this results in significant savings based on experimental data: https://llvm-compile-time-tracker.com/compare.php?from=01023bfcd33f922ed8c934ce563e54abe8bfe246&to=3dce4f70b73e48ccb045decb634c185e6b4c67d5&stat=instructions:u This is NFCI other than making the pass faster. The total cost of VectorCombine runs in an -O3 build appears to be well under 0.1% of compile-time now, so there's not much left to do AFAICT. There's a TODO about making the code cleaner, but it probably doesn't change timing much. I didn't include those changes here because it requires updating much more code.	2022-11-22 10:23:32 -05:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: `87debdadaf` https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Sanjay Patel	8f337f8ffe	[VectorCombine] generalize pass param name for early combines; NFC The option was added with https://reviews.llvm.org/D102496, and currently the name is accurate, but I am hoping to add a load transform that is not a scalarization. See issue #17113.	2022-11-21 13:57:55 -05:00
Alexey Bataev	ac93b61165	[SLP]Fix PR59098: check if the vector type is scalarized for extractelements. If the resulting type is going to be scalarized, no need to adjust the cost of removed extractelement and insert/extract subvector costs. Otherwise, the compiler can crash because of the wrong type sizes.	2022-11-21 10:26:01 -08:00
Bjorn Pettersson	1c308d6641	[LV] Clean up LoopVectorizationCostModel::calculateRegisterUsage. NFC Minor refactoring in LoopVectorizationCostModel::calculateRegisterUsage. Also adding some FIXME:s related to what appears to be some short comings related to how the register usage is calculated. Differential Revision: https://reviews.llvm.org/D138342	2022-11-20 20:52:13 +01:00
Sanjay Patel	87debdadaf	[VectorCombine] check instruction type before dispatching to folds This is no externally visible change intended, but appears to be a noticeable (surprising) improvement in compile-time based on: https://llvm-compile-time-tracker.com/compare.php?from=0f3e72e86c8c7c6bf0ec24bf1e2acd74b4123e7b&to=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&stat=instructions:u The early returns in the individual fold functions are not good enough to avoid the overhead of the many "fold*" calls, so this speeds up the main instruction loop enough to make a difference.	2022-11-18 16:03:18 -05:00
Alexey Bataev	07015e12f0	[SLP]Fix PR59053: trying to erase instruction with users. Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.	2022-11-17 17:23:48 -08:00
Stanislav Mekhanoshin	bcaf31ec3f	[AMDGPU] Allow finer grain control of an unaligned access speed A target can return if a misaligned access is 'fast' as defined by the target or not. In reality there can be different levels of 'fast' and 'slow'. This patch changes the boolean 'Fast' argument of the allowsMisalignedMemoryAccesses family of functions to an unsigned representing its speed. A target can still define it as it wants and the direct translation of the current code uses 0 and 1 for current false and true. This makes the change an NFC. Subsequent patch will start using an actual value of speed in the load/store vectorizer to compare if a vectorized access going to be not just fast, but not slower than before. Differential Revision: https://reviews.llvm.org/D124217	2022-11-17 09:23:53 -08:00
Florian Hahn	55f56cdc33	[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC). This clarifies the intention of code that uses the helper. Suggested by @Ayal during review of D136068, thanks!	2022-11-16 23:12:40 +00:00
Florian Hahn	aa16689f82	[VPlan] Use recipe type to avoid getDefiningRecipe call (NFC). Suggested by @Ayal during review of D136068, thanks!	2022-11-16 23:03:34 +00:00
Florian Hahn	239b52d4b6	[VPlan] Update stale comment (NFC). Update comment to reflect current code, which also allows for VPScalarIVStepsRecipes to be uniform. Suggested by @Ayal during review of D136068, thanks!	2022-11-16 22:39:50 +00:00
Florian Hahn	bcc9c5d959	[LV] Replace unnecessary cast_or_null with cast (NFC). The existing code already unconditionally dereferences RepR, so cast_or_null can be replaced by just cast. Suggested by @Ayal during review of D136068, thanks!	2022-11-16 22:31:59 +00:00
Florian Hahn	32f1c5531b	[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC) The return value of getDef is guaranteed to be a VPRecipeBase and all users can also accept a VPRecipeBase *. Most users actually case to VPRecipeBase or a specific recipe before using it, so this change removes a number of redundant casts. Also rename it to getDefiningRecipe to make the name a bit clearer. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136068	2022-11-16 22:12:08 +00:00
Alexey Bataev	9f9fdab9f1	[SLP]Fix PR58766: deleted value used after vectorization. If same instruction is reduced several times, but in one graph is part of buildvector sequence and in another it is vectorized, we may loose information that it was part of buildvector and must be extracted from later vectorized value.	2022-11-16 10:57:03 -08:00
Alexey Bataev	2f8f17c157	[SLP]Fix PR58956: fix insertpoint for reduced buildvector graphs. If the graph is only the buildvector node without main operation, need to inherit insrtpoint from the redution instruction. Otherwise the compiler crashes trying to insert instruction at the entry block.	2022-11-16 07:38:49 -08:00
Alexey Bataev	0a33ceee01	[SLP]Fix a crash on analysis of the vectorized node. Need to use advanced check for the same vectorized node to avoid possible compiler crash. We may have 2 similar nodes (vector one and gather) after graph nodes rotation, need to do extra checks for the exact match.	2022-11-15 13:40:28 -08:00

1 2 3 4 5 ...

3493 Commits