clang-p2996

Author	SHA1	Message	Date
Alexey Bataev	7439e1b2de	[SLP]Fix incorrect reordering of clustered scalars. The new mask represents the order, not the mask itself. At first, need to treat as the order, convert to mask and only after that reorder gathered scalars to build correct clustered order. Differential Revision: https://reviews.llvm.org/D141161	2023-01-06 16:04:09 -08:00
Alexey Bataev	9b5f62685a	[SLP]Fix cost of the broadcast buildvector/gather. Need to include the cost of the initial insertelement to the cost of the broadcasts. Also, need to adjust the cost of the gather/buildvector if the element is inserted into poison/undef vector. Differential Revision: https://reviews.llvm.org/D140498	2023-01-06 09:25:05 -08:00
Valery N Dmitriev	6d677c0b3d	[SLP] Unify GEP cost modeling for load, store and GEP nodes. Make a separate routine for GEPs cost calculation and make the approach uniform across load, store and GEP tree nodes. Additional issue fixed is GEP cost savings were applied twice for ScatterVectorize nodes (aka gather load) making them look unrealistically profitable for vectorization. Differential Revision: https://reviews.llvm.org/D140789	2023-01-05 10:11:36 -08:00
Nikita Popov	b061159e79	[SLPVectorizer] Convert test to opaque pointers (NFC)	2023-01-05 12:32:44 +01:00
Alexey Bataev	a1b18946f9	[SLP]Fix incorrect shuffle results because of missing shuffle mask analysis. Missed the analysis of the shuffle mask when trying to analyze the operands of the shuffle instruction during peeking through shuffle instructions.	2023-01-04 13:10:40 -08:00
Alexey Bataev	352b660c1b	[SLP][NFC]Add a pass.	2023-01-04 10:30:48 -08:00
Alexey Bataev	53a858f7fc	[SLP][NFC]Add a test for incorrect skipping of shuffle instruction at peek-through-shuffles, NFC.	2023-01-04 10:17:03 -08:00
Nikita Popov	51ba34708d	[SLPVectorizer] Convert test to opaque pointers (NFC)	2023-01-04 16:39:51 +01:00
Nikita Popov	8383da1583	[SLPVectorizer] Name instructions in test (NFC)	2023-01-04 16:35:45 +01:00
Nikita Popov	a34ae06c20	[SLPVectorizer] Convert some tests to opaque pointers (NFC)	2023-01-04 16:34:39 +01:00
Valery N Dmitriev	6bb4b2d002	[NFC] Test case intended to cover SLP cost for chain with masked gather loads. SLP produces two gather loads (one feeds another). For the first set of scalar loads GEP indices are all constant. The result of the second load is then fed into reduction (as a seed). Differential Revision: https://reviews.llvm.org/D140785	2022-12-30 12:27:34 -08:00
Alexey Bataev	5dccea5a68	[SLP]Do not emit many extractelements, reuse the single one emitted. We do not need to emit many extractelements for each particular use, we can reuse the only one, just need to adjust it to make it dominate on all uses. Differential Revision: https://reviews.llvm.org/D140580	2022-12-30 06:38:06 -08:00
Alexey Bataev	ac01ae71f0	[SLP]Use ShuffleInstructionBuilder for vector shrinking. We can use ShuffleInstructionBuilder now for shrinking shuffle emission. It allows to remove extra shuffle from the emitted code and reuse original vector. Part of D110978 Differential Revision: https://reviews.llvm.org/D140499	2022-12-28 06:09:04 -08:00
Alexey Bataev	a9b052e2ef	[SLP]Fix PR59693: Do not crash trying to set insert point for buildvector of extractvalues. No need to get the last instruction only for vectorized extractvalues, for gathered(buildvector sequence) still need to get the insertion point.	2022-12-27 06:01:38 -08:00
Nikita Popov	580210a0c9	[SLP] Convert some tests to opaque pointers (NFC)	2022-12-23 10:02:57 +01:00
Alexey Bataev	2e972ea056	[SLP]Integrate looking through shuffles logic into ShuffleInstructionBuilder. Added BaseShuffleAnalysis as a base class for ShuffleInstructionBuilder and integrated shuffle logic from shuffles for externally used scalars into this class. This class is used as the main container that implements smart shuffle instruction builder logic. ShuffleInstructionBuilder uses this logic. ShuffleInstructionBuilder is also used in building of the shuffle for the externally used scalars instead of lambdas, which are now part of BaseShuffleAnalysis class. Differential Revision: https://reviews.llvm.org/D140100	2022-12-21 06:12:53 -08:00
Simon Pilgrim	90b02f6c63	[SLP][X86] slp-fma-loss.ll - add various targets with different FMA abilities Add targets with FMA3, FMA4 and no-FMA support Should help with D132872 testing	2022-12-09 11:46:06 +00:00
Bjorn Pettersson	3528e63d89	[test] Remove duplicate RUN lines in Transform tests	2022-12-08 11:47:16 +01:00
Roman Lebedev	59ffac7dd2	[NFC] Port all SLPVectorizer tests to `-passes=` syntax	2022-12-08 02:38:50 +03:00
Roman Lebedev	6697140ba1	[NFC] Port all SLPVectorizer tests to `-passes=` syntax	2022-12-07 21:44:09 +03:00
Alexey Bataev	0cc15050a4	[SLP]Fix PR59230: Use actual vector factor when sorting entries. When we sort entries for attempting to reorder scalars, need to use actual vectorization factor, not the number of scalars. Otherwise the compiler crashes, if the scalars has to be reordered. Differential Revision: https://reviews.llvm.org/D138819	2022-11-29 06:46:06 -08:00
Qiongsi Wu	f946c70130	[SLPVectorizer] Do Not Move Loads/Stores Beyond Stacksave/Stackrestore Boundaries If left unchecked, the SLPVecrtorizer can move loads/stores below a stackrestore. The move can cause issues if the loads/stores have pointer operands from `alloca`s that are reset by the stackrestores. This patch adds the dependency check. The check is conservative, in that it does not check if the pointer operands of the loads/stores are actually from `alloca`s that may be reset. We did not observe any SPECCPU2017 performance degradation so this simple fix seems sufficient. The test could have been added to `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll`, but that test has not been updated to use opaque pointers. I am not inclined to add tests that still use typed pointers, or to refactor `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll` to use opaque pointers in this patch. If desired, I will open a different patch to refactor and consolidate the tests. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D138585	2022-11-28 10:00:29 -05:00
Alexey Bataev	ac93b61165	[SLP]Fix PR59098: check if the vector type is scalarized for extractelements. If the resulting type is going to be scalarized, no need to adjust the cost of removed extractelement and insert/extract subvector costs. Otherwise, the compiler can crash because of the wrong type sizes.	2022-11-21 10:26:01 -08:00
Alexey Bataev	07015e12f0	[SLP]Fix PR59053: trying to erase instruction with users. Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.	2022-11-17 17:23:48 -08:00
Alexey Bataev	9f9fdab9f1	[SLP]Fix PR58766: deleted value used after vectorization. If same instruction is reduced several times, but in one graph is part of buildvector sequence and in another it is vectorized, we may loose information that it was part of buildvector and must be extracted from later vectorized value.	2022-11-16 10:57:03 -08:00
Alexey Bataev	0a33ceee01	[SLP]Fix a crash on analysis of the vectorized node. Need to use advanced check for the same vectorized node to avoid possible compiler crash. We may have 2 similar nodes (vector one and gather) after graph nodes rotation, need to do extra checks for the exact match.	2022-11-15 13:40:28 -08:00
Roman Lebedev	8e37b53360	[X86] Rewrite `getScalarizationOverhead()` All of our insert/extract ops work on 128-bit lanes. For `Insert`, we need to extract affected 128-bit lane, unless it's being fully overwritten (FIXME: do we need to be careful about legalization-induced padding that we obviously don't demand?), perform insertions, and then insert the 128-bit lane back. But hold on. If we are operating on an 256-bit legal vector, and thus have two 128-bit subvectors, and are fully overwriting them both, we don't actually need to insert both subvectors, only the second one, into the implicitly-widened first one. Also, `Insert` wasn't actually querying the costs, but just assuming them to be `1`. `getShuffleCost(TTI::SK_ExtractSubvector)` notes: ``` // Note that in general, the insertion starting at the beginning of a vector // isn't free, because we need to preserve the rest of the wide vector. ``` ... so as far as i can tell, we didn't account for that. I was hoping this would allow vectorization at a higher VF at one case i looked at, but the subvector insertion cost is still dis-advising that. The change for `Extract` is NFC, and is for consistency only, i wanted to get rid of of that weird explicit discounting of insertion of 0'th element, since the general code should already deal with that. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137913	2022-11-15 21:07:12 +03:00
Alexey Bataev	b505fd559d	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-10 10:59:54 -08:00
Alexey Bataev	563d03d65e	[SLP][NFC]Add a test for vectorization with scheduling blocks order different than the instruction order, NFC.	2022-11-10 10:12:51 -08:00
Alexey Bataev	ecd0b5a532	Revert "[SLP]Redesign vectorization of the gather nodes." This reverts commit `8ddd1ccdf8` to fix buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839	2022-11-07 08:35:21 -08:00
Alexey Bataev	8ddd1ccdf8	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-07 07:04:38 -08:00
Alexey Bataev	2ec51f1c75	[SLP]Improve analysis of same/alternate code ops and scheduling. Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531	2022-10-27 16:29:16 -07:00
Alexey Bataev	8ce0c7b1c9	Revert "[SLP]Improve analysis of same/alternate code ops and scheduling." This reverts commit `dad64448c6` to fix a crash in https://lab.llvm.org/buildbot/#/builders/74/builds/14584	2022-10-27 15:21:35 -07:00
Alexey Bataev	dad64448c6	[SLP]Improve analysis of same/alternate code ops and scheduling. Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531	2022-10-27 11:31:18 -07:00
Alexey Bataev	456951dcd3	[SLP][NFC]Add a test for possible reordering gap in SLP, NFC.	2022-10-19 08:22:07 -07:00
Alexey Bataev	087dadfd37	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 11:55:59 -07:00
Alexey Bataev	62267e8de0	Revert "[SLP]Generalize cost model." This reverts commit `f12fb91188` and `f5c747bfbe` to fix detected non-initialized var use.	2022-10-18 11:25:59 -07:00
Arthur Eubanks	df92b05f1b	[test] Remove redundant -passes flags	2022-10-18 09:57:06 -07:00
Alexey Bataev	f12fb91188	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 08:49:32 -07:00
Alexey Bataev	c787986cdd	[SLP]Improve costs of vectorized loads/stores by analyzing GEPs. When generating masked gathers nodes, SLP vectorizer accounts the cost of the GEPs for loads as part of the scalar-vector transformation cost estimation. But it does not do it for vectorized loads/stores, while it may completely remove some of the GEPs completely. Because of this in some cases masked gather operation can be much more profitable rather than regular vectorization (masked-gather cost + vector GEP - scalar loads + GEPs comparing to vectorized loads - scalar loads). Added the analysis of the removed scalarGEPs for vectorized load/store nodes for better cost estimation. Differential Revision: https://reviews.llvm.org/D135282	2022-10-13 07:20:41 -07:00
Bjorn Pettersson	3be72f4029	[test][SLPVectorizer] Use -passes syntax in RUN lines. NFC	2022-10-13 10:44:38 +02:00
Alexey Bataev	d71ad41080	[SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash. Need to set the insertpoint for extractelement to point to the first instruction in the node to avoid possible crash during external uses combine process. Without it we may endup with the incorrect transformation. Differential Revision: https://reviews.llvm.org/D135591	2022-10-12 08:18:30 -07:00
Alexey Bataev	1be3428ea0	[SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze. Freeze instruction in some cases makes codegen worse, so need to be very careful when emitting it. Instead improve analysis in isUndefVector function to generate mask of unused elements and use it in the analysis. Differential Revision: https://reviews.llvm.org/D135382	2022-10-12 07:32:54 -07:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Alexey Bataev	323ed2308a	[SLP]Improve/fix CSE analysis of the blocks/instructions. Added analysis for invariant extractelement instructions and improved detection of the CSE blocks for generated extractelement instructions. Differential Revision: https://reviews.llvm.org/D135279	2022-10-06 12:08:48 -07:00
Alexey Bataev	d7c85d7e34	[SLP][NFC]Add a test for CSE for extractelements.	2022-10-05 07:55:25 -07:00
Alexey Bataev	ab9a81f736	[SLP]Try to emit canonical shuffle with undef operand. In the canonical form of the shuffle the poison/undef operand is the second operand, the patch tries to emit canonical form for partial vectorization of the buildvector sequence. Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value. Differential Revision: https://reviews.llvm.org/D134377	2022-10-04 08:16:07 -07:00
Simon Pilgrim	5fc7bbfaa3	[SLP][X86] Add test coverage for Issue #58054	2022-09-30 13:26:31 +01:00
Simon Pilgrim	5849fcb635	Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions" Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI." Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds" Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.	2022-09-30 11:22:48 +01:00
Simon Pilgrim	19782a46f8	[SLP][X86] Add test case for crash reported on D134605	2022-09-30 11:07:54 +01:00

1 2 3 4 5 ...

1086 Commits