clang-p2996

Author	SHA1	Message	Date
Andrzej Warzyński	39ad84e4d1	[mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349 ) At the moment, `GenericPadOpVectorizationPattern` implements two orthogonal transformations: 1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`, `linalg::FillOp` and `tensor::InsertSliceOp`. 2. Vectorizes (where possible) `tensor::InsertSliceOp` (see `tryVectorizeCopy`). This patch splits `GenericPadOpVectorizationPattern` into two separate patterns: 1. `GeneralizePadOpPattern` for the first transformation (note that currently `GenericPadOpVectorizationPattern` inherits from `GeneralizePadOpPattern`). 2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`. With this change, we gain the following: * a clear separation between pre-processing and vectorization transformations/stages, * a path to support masked vectorisation for `tensor.insert_slice` (with a dedicated pattern for vectorization, it is much easier to specify the input vector sizes used in masking), * more opportunities to vectorize `tensor.insert_slice`. Note for downstream users: -------------------------- If you were using `populatePadOpVectorizationPatterns`, following this change you will also have to add `populateInsertSliceVectorizationPatterns`. Finer implementation details: ----------------------------- 1. The majority of changes in this patch are copy & paste + some edits. 1.1. The only functional change is that the vectorization of `tensor.insert_slice` is now broadly available (as opposed to being constrained to the pad vectorization pattern: `GenericPadOpVectorizationPattern`). 1.2. Following-on from the above, `@pad_and_insert_slice_dest` is updated. As expected, the input `tensor.insert_slice` Op is no longer "preserved" and instead gets vectorized successfully. 2. The `linalg.fill` case in `getConstantPadVal` works under the assumption that only _scalar_ source values can be used. That's consistent with the definition of the Op, but it's not tested at the moment. Hence a test case in Linalg/invalid.mlir is added. 3. The behaviour of the two TD vectorization Ops, `transform.structured.vectorize_children_and_apply_patterns` and `transform.structured.vectorize` is preserved.	2024-10-29 16:57:23 +00:00
Andrzej Warzyński	ac4bd74190	[mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504 ) This PR simply wraps `populatePadOpVectorizationPatterns` into a new Transform Dialect Op: `apply_patterns.linalg.pad_vectorization`. This change makes it possible to run (and test) the corresponding patterns _without_: `transform.structured.vectorize_children_and_apply_patterns`. Note that the Op above only supports non-masked vectorisation (i.e. when the inputs are static), so, effectively, only fixed-width vectorisation (as opposed to scalable vectorisation). As such, this change is required to construct vectorization pipelines for tensor.pad targeting scalable vectors. To test the new Op and the corresponding patterns, I added "vectorization-pad-patterns.mlir" - most tests have been extracted from "vectorization-with-patterns.mlir".	2024-10-25 10:39:26 -07:00
Max191	2bff9d9ffe	[mlir] Don't hoist transfers from potentially zero trip loops (#112752 ) The hoistRedundantVectorTransfers function does not verification of loop bounds when hoisting vector transfers. This is not safe in general, since it is possible that the loop will have zero trip count. This PR uses ValueBounds to verify that the lower bound is less than the upper bound of the loop before hoisting. Trip count verification is currently behind an option `verifyNonZeroTrip`, which is false by default. Zero trip count loops can arise in GPU code generation, where a loop bound can be dependent on a thread id. If not all threads execute the loop body, then hoisting out of the loop can cause these threads to execute the transfers when they are not supposed to. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>	2024-10-18 16:11:21 -04:00
Andrzej Warzyński	0a3347dc63	[mlir][linalg] Fix idx comparison in the vectorizer (#112900 ) Fixes loop comparison condition in the vectorizer. As that logic is used specifically for vectorising `tensor.extract`, I also added a test that violates the assumptions made inside `getTrailingNonUnitLoopDimIdx`, namely that Linalg loops are non-empty. Vectorizer pre-conditions will capture that much earlier making sure that `getTrailingNonUnitLoopDimIdx` is only run when all the assumptions are actually met. Thank you for pointing this out, @pfusik !	2024-10-18 15:27:43 +01:00
Andrzej Warzyński	f7f51f2afb	[mlir][vector] Clarify the semantics of masking maps (nfc) (#111383 ) We use the term "masking map" throughout the Linalg vectorization logic, but we don't really define what it is and how it differs from Linalg indexing maps. This PR clarifies the differnces, makes sure that the new terminology is used consistenty and improves code re-use.	2024-10-18 08:58:58 +01:00
Alexander Pivovarov	a24c468782	[MLIR] Fix assert expressions (#112474 ) I noticed that several assertions in MLIR codebase have issues with operator precedence The issue with operator precedence in these assertions is due to the way logical operators are evaluated. The `&&` operator has higher precedence than the `\|\|` operator, which means the assertion is currently evaluating incorrectly, like this: ``` assert((resType.getNumDynamicDims() == dynOutDims.size()) \|\| (dynOutDims.empty() && "Either none or all output dynamic dims must be specified!")); ``` We should add parentheses around the entire expression involving `dynOutDims.empty()` to ensure that the logical conditions are grouped correctly. Here’s the corrected version: ``` assert(((resType.getNumDynamicDims() == dynOutDims.size()) \|\| dynOutDims.empty()) && "Either none or all output dynamic dims must be specified!"); ```	2024-10-16 15:22:29 -07:00
Andrzej Warzyński	a758bcdbd9	[mlir][td] Rename pack_paddings in structured.pad (#111036 ) The pack_paddings attribute in the structure.pad TD Op is used to set the `nofold` attribute in the generated tensor.pad Op. The current name is confusing and suggests that there's a relation with the tensor.pack Op. This patch renames it as `nofold_flags` to better match the actual usage.	2024-10-15 19:24:43 +01:00
Longsheng Mou	4b31568e02	[mlir][linalg] Bugfix for `InlineScalarOperands` (#111534 ) This PR fixes a bug where `scalarOperand` is a simple scalar and should be used directly, rather than accessed via `tensor.extract`. Fixes #111243.	2024-10-14 15:38:35 +08:00
Javed Absar	c13f806f17	[mlir][linalg] raise generic to named ops. (#110421 ) Add support for specializing linalg.broadcast and linalg.transform from generic. Also, does some refactoring to reuse specialization checks, migrating some common uses to op interface methods.	2024-10-11 15:27:27 +01:00
Emilio Cota	1276ce9e97	Revert "[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783 )" This reverts commit `03483737a7` and `99c8557`, which is a fix-up on top of the former. I'm reverting because this commit broke two tests: mlir/test/python/integration/dialects/linalg/opsrun.py mlir/test/python/integration/dialects/transform.py See https://lab.llvm.org/buildbot/#/builders/138/builds/4872 I'm not familiar with the tests, so I'm leaving it to the original author to either remove or adapt the broken tests, as discussed here: https://github.com/llvm/llvm-project/pull/104783#issuecomment-2406390905	2024-10-11 05:22:56 -04:00
Dmitriy Smirnov	bb4696ce30	[mlir][linalg] Fix for bias handling for Winograd (#110331 ) PR makes winograd.output_transform op a destination style op and fixes handing of a pre-existing data in its output argument (i.e. possibly pre-initialized with bias, which was discarded before). --------- Signed-off-by: Dmitriy Smirnov <dmitriy.smirnov@arm.com>	2024-10-11 09:39:19 +01:00
Md Asghar Ahmad Shahid	03483737a7	[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783 ) The main goal of this patch is to extend the semantic of 'linalg.matmul' named op to include per operand transpose semantic while also laying out a way to move ops definition from OpDSL to tablegen. Hence, it is implemented in tablegen. Transpose semantic is as follows. By default 'linalg.matmul' behavior will remain as is. Transpose semantics can be appiled on per input operand by specifying the optional permutation attributes (namely 'permutationA' for 1st input and 'permutationB' for 2nd input) for each operand explicitly as needed. By default, no transpose is mandated for any of the input operand. Example: ``` %val = linalg.matmul ins(%arg0, %arg1 : memref<5x3xf32>, memref<5x7xf32>) outs(%arg2: memref<3x7xf32>) permutationA = [1, 0] permutationB = [0, 1] ```	2024-10-10 17:00:58 +01:00
Andrzej Warzyński	f59b0c7603	[mlir][linalg][nfc] Delete references to args_in/args_out (#111517 ) After the refactor in: * `ed229132f1`, the `args_in` and `args_out` attributes are no longer used by `linalg.generic`. This patch removes most the remaining references. I've left out BufferDeallocationInternals.md, which doesn't seem maintained anymore and is quite out of sync with other bits of MLIR (e.g. `test.generic` instead of `linalg.generic`).	2024-10-10 15:45:52 +01:00
BARRET	1666d13078	[CMake]: Remove unnecessary dependencies on LLVM/MLIR (#111255 ) Previous https://github.com/llvm/llvm-project/pull/110362 (reverted) caused breakage. Here is the PR with fix. My build cmdline: ``` cmake ../llvm \ -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=install \ -DCMAKE_C_COMPILER=gcc-9 \ -DCMAKE_CXX_COMPILER=g++-9 \ -DCMAKE_CUDA_COMPILER=$(which nvcc) \ -DLLVM_ENABLE_LLD=OFF \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_BUILD_EXAMPLES=ON \ -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \ -DLLVM_CCACHE_BUILD=ON \ -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ -DBUILD_SHARED_LIBS=ON \ -DLLVM_ENABLE_PROJECTS='llvm;mlir' ```	2024-10-07 15:52:43 +02:00
Andrzej Warzyński	d9d623310d	[mlir][linalg] Add a new helper hook: `hasVectorizationImpl` (#110708 ) The newly added hook simply returns `false` for Ops for which there's no "vectorization logic" in the Linalg Vectorizer (i.e. the `vectorize()` method). It's added so that the following two TD ops expose identical level of functionality (that's not the case ATM): * `transform.structured.vectorize_children_and_apply_patterns` * `transform.structured.vectorize` Specifically, ATM, the former works only for Linalg Ops, while the latter works for all Ops that the vectorizer supports (). With this change, I am making sure that both TD will behave consistently. Note, this shouldn't affect any of the current uses of the vectorizer. () This is implemented via the `vectorize()` method in Vectorization.cpp.	2024-10-04 10:06:33 +01:00
Andrzej Warzyński	56d6b56739	[mlir][vector] Relax the requirements on broadcast dims (#99341 ) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated () and will update the attribute value corresponding to broadcast dims to `true` if all non-broadcast dims are marked as "in bounds". Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in #97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from #97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.	2024-10-04 07:41:20 +01:00
Rolf Morel	94cf80d6fd	[MLIR][Linalg] Pattern to fold AddOp to accumulation via contraction op's dest (#110514 ) Replaces a linalg.add with one operand the single user of a contraction, which has a zero-filled, "identity-mapped" destination and is dominated by the `other` operand, by the contraction with `other` as its dest. Benefits include elision of an elementwise op, namely the linalg.add, and removing a tensor.empty as a destination which is likely to require an allocation upon bufferization.	2024-10-03 12:22:57 +02:00
Andrzej Warzyński	1c01bcb350	[mlir][tensor] Relax the logic to generalise tensor.pack (#110807 ) Make sure that the logic to generalize tensor.pack (into e.g. tensor.pad tensor.transpose) does indeed allow multiple dynamic tile sizes. This was effectively already implemented in #109815 - in this PR I merely removing one `if` condition and adding a test. I also took the liberty of renaming a few test functions - just to better highlight the differences between the old and the new tests. Follow-on for #109815.	2024-10-02 16:43:29 +01:00
Andrzej Warzyński	66f84c8b8a	[mlir][tensor] Extend the logic to generalise tensor.pack (#109815 ) Extends the logic to generalise tensor.pack (into e.g. tensor.pad + tensor.transpose) so that it also works when one of the inner tile sizes is scalable (i.e. a multiple of `vector.vscale`). For example: ```mlir %c8 = arith.constant 8 : index %vscale = vector.vscale %c8_vscale = arith.muli %vscale, %c8 : index %0 = tensor.pack %input padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [%c8_vscale, 2] into %output : tensor<5x1xf32> -> tensor<1x1x?x2xf32> } ``` is generalised as: ```mlir %c8 = arith.constant 8 : index %vscale = vector.vscale %c8_vscale = arith.muli %vscale, %c8 : index %0 = affine.apply #map()[%c8_vscale, %c5] %padded = tensor.pad %arg0 low[0, 0] high[%0, 1] { ^bb0(%arg3: index, %arg4: index): tensor.yield %arg2 : f32 } : tensor<5x1xf32> to tensor<?x2xf32> ``` At the Tensor level, we model scalability using dynamic shapes and this change basically extends the relevant logic so that it also works for dynamic shapes.	2024-10-02 09:44:13 +01:00
Mehdi Amini	8b47711e84	Revert "CMake: Remove unnecessary dependencies on LLVM/MLIR" (#110594 ) Reverts llvm/llvm-project#110362 Multiple bots are broken.	2024-10-01 00:44:21 +02:00
BARRET	4980f2177e	CMake: Remove unnecessary dependencies on LLVM/MLIR (#110362 ) There are some spurious libraries which can be removed. I'm trying to bundle MLIR/LLVM library dependencies for our own libraries. We're utilizing cmake function to recursively collect MLIR/LLVM related dependencies. However, we identified certain library dependencies as redundant and safe for removal.	2024-09-30 23:57:13 +02:00
Andrzej Warzyński	6d11494414	[mlir][Linalg] Refine how broadcast dims are treated (#99015 ) This PR fixes how broadcast dims (identified as "zero" results in permutation maps) corresponding to a reduction iterator are vectorised in the case of generic Ops. Here's an example: ```mlir #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) { %0 = tensor.empty() : tensor<1x12x197x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%arg0 : tensor<1x12x197x197xf32>) outs(%0 : tensor<1x12x197x1xf32>) { ^bb0(%in: f32, %out: f32): %818 = arith.addf %in, %out : f32 linalg.yield %818 : f32 } -> tensor<1x12x197x1xf32> return %1 : tensor<1x12x197x1xf32> } ``` This is a perfectly valid Generic Op, but currently triggers two issues in the vectoriser. The root cause is this map: ```mlir #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> ``` This map triggers an assert in `reindexIndexingMap` - this hook incorrectly assumes that every result in the input map is a `dim` expression and that there are no constants. That's not the case in this example. `reindexIndexingMap` is extended to allow maps like the one above. For now, only constant "zero" results are allowed. This can be extended in the future once a good motivating example is available. Separately, the permutation map highlighted above "breaks" mask calculation (ATM masks are always computed, even in the presence of static shapes). When applying the following permutation: ```mlir (d0, d1, d2, d3) -> (d0, d1, d2, 0) ``` to these canonical shapes (corresponding to the example above): ``` (1, 12, 197, 197) ``` we end up with the following error: ```bash error: vector types must have positive constant sizes but got 1, 12, 197, 0 ``` The error makes sense and indicates that we should update the permutation map above to: ``` (d0, d1, d2, d3) -> (d0, d1, d2) ``` This would correctly give the following vector type: ``` vector<1x12x197xi1> ``` Fixes #97247	2024-09-26 16:17:15 +01:00
Hugo Trachino	28039055e5	[MLIR][Transform] Hoist Pad generates linalg.transpose (#109669 ) For readability purpose, generate linalg named ops when possible. For maintainability purpose, get rid of duplicated code.	2024-09-26 09:33:47 +01:00
Nirvedh Meshram	234193bae6	[mlir][linalg] Vectorization support for convolution of i1 type (#109480 ) Normally convolutions present with the following linalg op region ``` ^bb0(%arg14: i4, %arg15: i4, %arg16: i4): %17 = arith.muli %arg14, %arg15 : i4 %18 = arith.addi %arg16, %17 : i4 linalg.yield %18 : i4 ``` However, for i1 due to strength reduction we get something like ``` ^bb0(%arg14: i1, %arg15: i1, %arg16: i1): %17 = arith.andi %arg14, %arg15 : i1 %18 = arith.ori %arg16, %17 : i1 linalg.yield %18 : i1 ``` This PR updates the logic to support this region for i1 types.	2024-09-24 12:24:59 -05:00
Andrzej Warzyński	b47d1787b5	[mlir][vector] Refine vectorisation of tensor.extract (#109580 ) This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the following case is vectorised as `vector.gather` (as opposed to attempting a contiguous load): ```mlir func.func @index_from_output_column_vector_gather_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> { %c0 = arith.constant 0 : index %0 = tensor.empty() : tensor<8x1xf32> %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel"] } outs(%0 : tensor<8x1xf32>) { ^bb0(%arg1: f32): %1 = linalg.index 0 : index %extracted = tensor.extract %src[%1, %c0] : tensor<8x128xf32> linalg.yield %extracted : f32 } -> tensor<8x1xf32> return %res : tensor<8x1xf32> } ``` Specifically, when looking for loop-invariant indices in `tensor.extract` Ops, any `linalg.index` Op that's used in address colcluation should only access loop dims that are == 1. In the example above, the following does not meet that criteria: ```mlir %1 = linalg.index 0 : index ``` Note that this PR also effectively addresses the issue fixed in #107922, i.e. exercised by: * `@vectorize_nd_tensor_extract_load_1d_column_vector_using_gather_load` `getNonUnitLoopDim` introduced in #107922 is still valid though. In fact, it is required to identify that the following case is a contiguous load: ```mlir func.func @index_from_output_column_vector_contiguous_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> { %c0 = arith.constant 0 : index %0 = tensor.empty() : tensor<8x1xf32> %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel"] } outs(%0 : tensor<8x1xf32>) { ^bb0(%arg1: f32): %1 = linalg.index 0 : index %extracted = tensor.extract %src[%c0, %1] : tensor<8x128xf32> linalg.yield %extracted : f32 } -> tensor<8x1xf32> return %res : tensor<8x1xf32> } ``` Some logic is still missing to lower the above to `vector.transfer_read`, so it is conservatively lowered to `vector.gather` instead (see TODO in `getTensorExtractMemoryAccessPattern`). There's a few additional changes: * `getNonUnitLoopDim` is simplified and renamed as `getTrailingNonUnitLoopDimIdx`, additional comments are added (note that the functionality didn't change); * extra comments in a few places, variable names in comments update to use Markdown (which is the preferred approach in MLIR). This is a follow-on for: * https://github.com/llvm/llvm-project/pull/107922 * https://github.com/llvm/llvm-project/pull/102321	2024-09-24 14:03:30 +01:00
Andrzej Warzyński	c1826aeef3	[mlir][tensor] Add new helper hooks for RelayoutOp (#109642 ) Implements two helper hooks for PackOp and UnPackOP, `getAllOuterDims` and `getTiledOuterDims`, and adds them to RelayoutOp (that both PackOp an UnPackOp inherit from). This improves code re-use and also clarifies the meaning of "outer dims" and "tiled outer dims".	2024-09-24 13:14:49 +01:00
Kazu Hirata	f264d9a9d5	[Linalg] Fix a warning This patch fixes: mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:821:12: error: variable 'countNonUnitDim' set but not used [-Werror,-Wunused-but-set-variable]	2024-09-21 13:59:36 -07:00
Nirvedh Meshram	e45fc5140d	[Linalg][Vectorization] Add support for linalg vectorization of a tensor.extract case (#107922 ) In https://github.com/llvm/llvm-project/pull/102321 we relaxed the vectorizer so that when checking for contiguous loads we dont always have a trailing non unit dim. For example in the test case added we have `tensor<8x1xf32>` which is now a valid candidate for contiguous load. However, the logic to check contiguous load assumed that only the trailing dim will be non unit so this PR just updates that logic to find the actual non unit dim.	2024-09-21 15:12:51 -05:00
Andrzej Warzyński	315ba77406	[mlir][linalg] Vectorisation of tensor.extract - dynamic shapes (#100582 ) This PR removes the assumption that reading from a dynamic tensor is always a gather load: ```mlir %extracted = tensor.extract %src[%c79, %3] : tensor<?x?xf32> ``` That assumption was originally introduced to simplify the implementation and to reduce the number of cases to consider. Now that the vectorisation of `tensor.extract` has been around for > 1 year and has been quite stable, we can safely relax it. This is a relatively small change - rather than using the parent linalg Op to infer the target output shape (not possible with dynamic shapes), the vectorizer will use the (previously constructed) output vector shape instead. As expected, the following test required updating (`vector.gather` -> `vector.transfer_read`): * @masked_dynamic_vectorize_nd_tensor_extract_with_affine_apply_contiguous Similar test for scalable vectors is also added.	2024-09-19 19:53:11 +01:00
Max191	08efa23083	[mlir] Allow multi-result ops in reshape fusion (#108576 ) Fusion of reshapes by collapsing patterns were restricted to single result operations, but the implementation supports multi result ops. This PR removes the restriction, since it is not necessary.	2024-09-16 13:06:38 -04:00
Thomas Preud'homme	326287fd5b	Add missing FillOp to winograd lowering (#108181 ) Winograd lowering involves a number of matmul and batch_matmul which are currently passed tensor.empty result as out parameter, thereby are undefined behaviour. This commit adds the necessary linalg.fill. --------- Co-authored-by: Max191 <44243577+Max191@users.noreply.github.com>	2024-09-13 15:48:17 +01:00
MaheshRavishankar	d5f0969c96	[mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882 ) Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF` looks at operands of tiled/tiled+fused operations to see if they are produced by `extract_slice` operations to populate the worklist used to continue fusion. This implicit assumption does not always work. Instead make the implementations of `getTiledImplementation` return the slices to use to continue fusion. This is a breaking change - To continue to get the same behavior of `scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree implementation of `TilingInterface::getTiledImplementation` to return the slices to continue fusion on. All in-tree implementations have been adapted to this. - This change touches parts that required a simplification to the `ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a `std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that should be `std::nullopt` if fusion is not to be performed. Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>	2024-09-11 22:15:43 -07:00
Max191	e982d7fd7c	[mlir] Reuse pack dest in tensor.pack decomposition (#108025 ) In the `lowerPack` transform, there is a special case for lowering into a simple `tensor.pad` + `tensor.insert_slice`, but the destination becomes a newly created `tensor.empty`. This PR fixes the transform to reuse the original destination of the `tensor.pack`.	2024-09-10 10:45:08 -04:00
Longsheng Mou	f3b4e47b34	[mlir][linalg][NFC] Drop redundant rankReductionStrategy (#107875 ) This patch drop redundant rankReductionStrategy in `populateFoldUnitExtentDimsViaSlicesPatterns` and fixes comment typos.	2024-09-10 09:19:22 +08:00
Adrian Kuegel	b7981a78f0	[mlir] Apply ClangTidyPerformance finding (NFC). Use const reference for loop variable.	2024-08-29 08:14:55 +00:00
Christopher Bate	8bf69ceb00	Reapply "[mlir] NFC: fix dependence of (Tensor\|Linalg\|MemRef\|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832 )" (#105703 ) Reapply the commit `43b5085667` with additional fixes for building with BUILD_SHARED_LIBS=ON.	2024-08-28 22:34:14 -06:00
MaheshRavishankar	4dbaef6d5e	[mlir][Linalg] Avoid doing op replacement in `linalg::dropUnitDims`. (#105749 ) It is better to do the replacement in the caller. This avoids the footgun if the caller needs the original operation. Instead return the produced operation and replacement values. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2024-08-23 13:43:33 -07:00
Jie Fu	d8b6df2e8b	[mlir] Fix -Wunused-result in ElementwiseOpFusion.cpp (NFC) /llvm-project/mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp:124:7: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result] opOperandsToIgnore.pop_back_val(); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated.	2024-08-21 09:04:24 +08:00
DanielLevi6	4a4b233f35	[mlir][linalg] Improve getPreservedProducerResults estimation in ElementwiseOpFusion (#104409 ) This commit changes the getPreservedProducerResults function so that it takes the consumer into account along with the producer, in order to predict which of the producer’s outputs can be dropped during the fusion process. It provides a more accurate prediction, considering that the fusion process also depends on the consumer.	2024-08-20 16:14:01 -07:00
Christopher Bate	06fd808654	Revert "[mlir] NFC: fix dependence of (Tensor\|Linalg\|MemRef\|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832 )" This reverts commit `43b5085667` since it caused the build to break with BUILD_SHARED_LIBS=ON.	2024-08-20 03:46:29 +00:00
Christopher Bate	43b5085667	[mlir] NFC: fix dependence of (Tensor\|Linalg\|MemRef\|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832 ) This change removes dependencies declared as either 'LINK_LIBS' or 'LINK_COMPONENTS' across several MLIR libraries. The removed dependencies appear to be incorrect and may have been required in older versions of the project. These dependencies cause many high level dialects to have transitive dependence on the LLVM dialect and the LLVM 'Core' library ('llvm/lib/IR'). Note that if using the 'Ninja' CMake generator, one can inspect the dependencies (including all transitive libraries) of any given MLIR target but using the command `ninja -C <build dir> -t browse` and navigating to the library of interest in a web browser.	2024-08-19 18:49:22 -06:00
Hsiangkai Wang	c4bf949171	[mlir][linalg] Implement TilingInterface for winograd operators (#96184 ) In order to support arbitrary size input data of conv2d, implement TilingInterface for winograd operations. Before converting winograd operations into nested loops with matrix multiply, tile the input of conv2d into the supported size first. Add a transform operation structured.decompose_winograd_op to decompose winograd operations. Before applying the transform op, use tile_using_for to tile the input data into supported size. The test case shows how to tile and decompose winograd operations.	2024-08-16 16:22:02 +01:00
Ian Wood	a95ad2da36	[mlir] Add bubbling patterns for non intersecting reshapes (#103401 ) Refactored @Max191's PR https://github.com/llvm/llvm-project/pull/94637 to move it to `Tensor` From the original PR >This PR adds fusion by expansion patterns to push a tensor.expand_shape up through a tensor.collapse_shape with non-intersecting reassociations. Sometimes parallel collapse_shape ops like this can block propagation of expand_shape ops, so this allows them to pass through each other. I'm not sure if I put the code/tests in the right places, so let me know where those go if they aren't. cc @MaheshRavishankar @hanhanW --------- Co-authored-by: Max Dawkins <max.dawkins@gmail.com>	2024-08-14 13:58:35 -07:00
Frank Schlimbach	baabcb2898	[mlir][mesh] Shardingcontrol (#102598 ) This is a fixed copy of #98145 (necessary after it got reverted). @sogartar @yaochengji This PR adds the following to #98145: - `UpdateHaloOp` accepts a `memref` (instead of a tensor) and not returning a result to clarify its inplace-semantics - `UpdateHaloOp` accepts `split_axis` to allow multiple mesh-axes per tensor/memref-axis (similar to `mesh.sharding`) - The implementation of `Shardinginterface` for tensor operation (`tensor.empty` for now) moved from the tensor library to the mesh interface library. `spmdize` uses features from `mesh` dialect. @rengolin agreed that `tensor` should not depend on `mesh` so this functionality cannot live in a `tensor`s lib. The unfulfilled dependency caused the issues leading to reverting #98145. Such cases are generally possible and might lead to re-considering the current structure (like for tosa ops). - rebased onto latest main -------------------------- Replacing `#mesh.sharding` attribute with operation `mesh.sharding` - extended semantics now allow providing optional `halo_sizes` and `sharded_dims_sizes` - internally a sharding is represented as a non-IR class `mesh::MeshSharding` What previously was ```mlir %sharded0 = mesh.shard %arg0 <@mesh0, [[0]]> : tensor<4x8xf32> %sharded1 = mesh.shard %arg1 <@mesh0, [[0]]> annotate_for_users : tensor<16x8xf32> ``` is now ```mlir %sharding = mesh.sharding @mesh0, [[0]] : !mesh.sharding %0 = mesh.shard %arg0 to %sharding : tensor<4x8xf32> %1 = mesh.shard %arg1 to %sharding annotate_for_users : tensor<16x8xf32> ``` and allows additional annotations to control the shard sizes: ```mlir mesh.mesh @mesh0 (shape = 4) %sharding0 = mesh.sharding @mesh0, [[0]] halo_sizes = [1, 2] : !mesh.sharding %0 = mesh.shard %arg0 to %sharding0 : tensor<4x8xf32> %sharding1 = mesh.sharding @mesh0, [[0]] sharded_dims_sizes = [3, 5, 5, 3] : !mesh.sharding %1 = mesh.shard %arg1 to %sharding1 annotate_for_users : tensor<16x8xf32> ``` - `mesh.shard` op accepts additional optional attribute `force`, useful for halo updates - Some initial spmdization support for the new semantics - Support for `tensor.empty` reacting on `sharded_dims_sizes` and `halo_sizes` in the sharding - New collective operation `mesh.update_halo` as a spmdized target for shardings with `halo_sizes` --------- Co-authored-by: frank.schlimbach <fschlimb@smtp.igk.intel.com> Co-authored-by: Jie Fu <jiefu@tencent.com>	2024-08-12 12:20:58 +01:00
Kazu Hirata	165f45354a	[mlir] Use llvm::is_contained (NFC) (#102714 )	2024-08-09 21:42:19 -07:00
Andrzej Warzyński	62e5032c9a	Reapply "[mlir][linalg] Relax tensor.extract vectorization" (#102321 ) [This reverts commit 6662523d6b2ca0198141c94ee80ebbb41601df9f] Simplifies the vectorization of tensor.extract so that: * all cases that read into a genuinely multi-dim vector () are considered a gather load, all other cases are considered as potential contiguous loads. This change means that the following extraction from a "column" tensor is correctly identified as a scalar load followed by a broadcast (rather than a gather load). ```mlir func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> { %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %cst = arith.constant dense<[...]> : tensor<15x1xi32> %out = linalg.generic { indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} outs(%in : tensor<1x1x4xi32>) { ^bb0(%out: i32): %8 = linalg.index 0 : index %idx_0 = linalg.index 0 : index %extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32> linalg.yield %extracted : i32 } -> tensor<1x1x4xi32> return %out:tensor<1x1x4xi32> } ``` Overview of the delta compared to the original submission (#99299): * removed an assert representing a condition that is being relaxed here, * added a test (reading from a column tensor) based on a repro from @hanhanW. (*) `vector<1x4x1xf32>` is considered as 1D vector in this context.	2024-08-08 09:36:10 +01:00
Renato Golin	3968942f10	Revert "[mlir][mesh] adding shard-size control (#98145 )" This reverts commit `fca69838ca`. Also reverts the fixup: "[mlir] Fix -Wunused-variable in MeshOps.cpp (NFC)" This reverts commit `fc737368fe`.	2024-08-07 15:12:37 +01:00
Frank Schlimbach	fca69838ca	[mlir][mesh] adding shard-size control (#98145 ) - Replacing `#mesh.sharding` attribute with operation `mesh.sharding` - extended semantics now allow providing optional `halo_sizes` and `sharded_dims_sizes` - internally a sharding is represented as a non-IR class `mesh::MeshSharding` What previously was ```mlir %sharded0 = mesh.shard %arg0 <@mesh0, [[0]]> : tensor<4x8xf32> %sharded1 = mesh.shard %arg1 <@mesh0, [[0]]> annotate_for_users : tensor<16x8xf32> ``` is now ```mlir %sharding = mesh.sharding @mesh0, [[0]] : !mesh.sharding %0 = mesh.shard %arg0 to %sharding : tensor<4x8xf32> %1 = mesh.shard %arg1 to %sharding annotate_for_users : tensor<16x8xf32> ``` and allows additional annotations to control the shard sizes: ```mlir mesh.mesh @mesh0 (shape = 4) %sharding0 = mesh.sharding @mesh0, [[0]] halo_sizes = [1, 2] : !mesh.sharding %0 = mesh.shard %arg0 to %sharding0 : tensor<4x8xf32> %sharding1 = mesh.sharding @mesh0, [[0]] sharded_dims_sizes = [3, 5, 5, 3] : !mesh.sharding %1 = mesh.shard %arg1 to %sharding1 annotate_for_users : tensor<16x8xf32> ``` - `mesh.shard` op accepts additional optional attribute `force`, useful for halo updates - Some initial spmdization support for the new semantics - Support for `tensor.empty` reacting on `sharded_dims_sizes` and `halo_sizes` in the sharding - New collective operation `mesh.update_halo` as a spmdized target for shardings with `halo_sizes` @sogartar @yaochengji	2024-08-07 13:34:57 +01:00
Han-Chung Wang	28fa83f8d4	Revert "[mlir][linalg] Relax tensor.extract vectorization" (#102232 ) Reverts llvm/llvm-project#99299 because it breaks the lowering. To repro: `mlir-opt -transform-interpreter ~/repro.mlir` ```mlir #map = affine_map<(d0, d1) -> (d0)> #map1 = affine_map<(d0, d1) -> (d1)> #map2 = affine_map<(d0, d1) -> (d0, d1)> #map3 = affine_map<(d0, d1) -> (d0 + d1)> module { func.func @foo(%arg0: index, %arg1: tensor<2xf32>, %arg2: tensor<4xf32>, %arg3: tensor<1xf32>) -> tensor<4x1xf32> { %c0 = arith.constant 0 : index %cst = arith.constant 1.000000e+00 : f32 %cst_0 = arith.constant 0.000000e+00 : f32 %0 = tensor.empty() : tensor<4x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : tensor<4xf32>, tensor<1xf32>) outs(%0 : tensor<4x1xf32>) { ^bb0(%in: f32, %in_1: f32, %out: f32): %2 = linalg.index 0 : index %3 = linalg.index 1 : index %4 = affine.apply #map3(%3, %arg0) %extracted = tensor.extract %arg1[%c0] : tensor<2xf32> %5 = arith.cmpi eq, %2, %c0 : index %6 = arith.cmpi ult, %2, %c0 : index %7 = arith.select %5, %cst, %in : f32 %8 = arith.select %6, %cst_0, %7 : f32 %9 = arith.cmpi eq, %4, %c0 : index %10 = arith.cmpi ult, %4, %c0 : index %11 = arith.select %9, %cst, %in_1 : f32 %12 = arith.select %10, %cst_0, %11 : f32 %13 = arith.mulf %8, %12 : f32 %14 = arith.mulf %13, %extracted : f32 %15 = arith.cmpi eq, %2, %4 : index %16 = arith.select %15, %cst, %cst_0 : f32 %17 = arith.subf %16, %14 : f32 linalg.yield %17 : f32 } -> tensor<4x1xf32> return %1 : tensor<4x1xf32> } } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op transform.structured.vectorize %0 : !transform.any_op transform.yield } } ```	2024-08-06 14:35:27 -07:00
Andrzej Warzyński	8868c02cda	[mlir][linalg] Relax tensor.extract vectorization (#99299 ) Simplifies the vectorization of tensor.extract so that: * all cases that read into a genuinely multi-dim vector () are considered a gather load, all other cases are considered as potential contiguous loads. This change means that the following extraction from a "column" tensor will be correctly identified as a scalar load followed by a broadcast (rather than a gather load). ```mlir func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> { %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %cst = arith.constant dense<[...]> : tensor<15x1xi32> %out = linalg.generic { indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} outs(%in : tensor<1x1x4xi32>) { ^bb0(%out: i32): %idx_0 = linalg.index 0 : index %extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32> linalg.yield %extracted : i32 } -> tensor<1x1x4xi32> return %out:tensor<1x1x4xi32> } ``` (*) `vector<1x4x1xf32>` is considered as 1D vector in this context.	2024-08-06 10:57:10 +01:00

1 2 3 4 5 ...

1618 Commits