clang-p2996

Author	SHA1	Message	Date
Andrzej Warzyński	1b2c8f104f	[mlir][linalg] Extract `GeneralizePadOpPattern` into a standalone transformation (#117329 ) Currently, `GeneralizePadOpPattern` is grouped under `populatePadOpVectorizationPatterns`. However, as noted in #111349, this transformation "decomposes" rather than "vectorizes" `tensor.pad`. As such, it functions as: * a vectorization _pre-processing_ transformation, not * a vectorization transformation itself. To clarify its purpose, this PR turns `GeneralizePadOpPattern` into a standalone transformation by: * introducing a dedicated `populateDecomposePadPatterns` method, * adding a `apply_patterns.linalg.decompose_pad` Transform Dialect Op, * removing it from `populatePadOpVectorizationPatterns`. In addition, to better reflect its role, it is renamed as "decomposition" rather then "generalization". This is in line with the recent renaming of similar ops, i.e. tensor.pack/tensor.unpack Ops in #116439.	2024-11-26 08:11:15 +00:00
Kunwar Grover	8e66303916	[mlir][Vector] Remove trivial uses of vector.extractelement/vector.insertelement (1/N) (#116053 ) This patch removes trivial usages of vector.extractelement/vector.insertelement. These operations can be fully represented by vector.extract/vector.insert. See https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops/71116 for more information. Further patches will remove more usages of these ops.	2024-11-13 15:45:59 +00:00
Md Asghar Ahmad Shahid	3ad0148020	[MLIR][Linalg] Re-land linalg.matmul move to ODS. + Remove/update failing obsolete OpDSL tests. (#115319 ) The earlier PR(https://github.com/llvm/llvm-project/pull/104783) which introduces transpose and broadcast semantic to linalg.matmul was reverted due to two failing OpDSL test for linalg.matmul. Since linalg.matmul is now defined using TableGen ODS instead of Python-based OpDSL, these test started failing and needs to be removed/updated. This commit removes/updates the failing obsolete tests from below files. All other files were part of earlier PR and just cherry picked. "mlir/test/python/integration/dialects/linalg/opsrun.py" "mlir/test/python/integration/dialects/transform.py" --------- Co-authored-by: Renato Golin <rengolin@systemcall.eu>	2024-11-07 14:51:02 +00:00
Andrzej Warzyński	39ad84e4d1	[mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349 ) At the moment, `GenericPadOpVectorizationPattern` implements two orthogonal transformations: 1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`, `linalg::FillOp` and `tensor::InsertSliceOp`. 2. Vectorizes (where possible) `tensor::InsertSliceOp` (see `tryVectorizeCopy`). This patch splits `GenericPadOpVectorizationPattern` into two separate patterns: 1. `GeneralizePadOpPattern` for the first transformation (note that currently `GenericPadOpVectorizationPattern` inherits from `GeneralizePadOpPattern`). 2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`. With this change, we gain the following: * a clear separation between pre-processing and vectorization transformations/stages, * a path to support masked vectorisation for `tensor.insert_slice` (with a dedicated pattern for vectorization, it is much easier to specify the input vector sizes used in masking), * more opportunities to vectorize `tensor.insert_slice`. Note for downstream users: -------------------------- If you were using `populatePadOpVectorizationPatterns`, following this change you will also have to add `populateInsertSliceVectorizationPatterns`. Finer implementation details: ----------------------------- 1. The majority of changes in this patch are copy & paste + some edits. 1.1. The only functional change is that the vectorization of `tensor.insert_slice` is now broadly available (as opposed to being constrained to the pad vectorization pattern: `GenericPadOpVectorizationPattern`). 1.2. Following-on from the above, `@pad_and_insert_slice_dest` is updated. As expected, the input `tensor.insert_slice` Op is no longer "preserved" and instead gets vectorized successfully. 2. The `linalg.fill` case in `getConstantPadVal` works under the assumption that only _scalar_ source values can be used. That's consistent with the definition of the Op, but it's not tested at the moment. Hence a test case in Linalg/invalid.mlir is added. 3. The behaviour of the two TD vectorization Ops, `transform.structured.vectorize_children_and_apply_patterns` and `transform.structured.vectorize` is preserved.	2024-10-29 16:57:23 +00:00
Andrzej Warzyński	ac4bd74190	[mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504 ) This PR simply wraps `populatePadOpVectorizationPatterns` into a new Transform Dialect Op: `apply_patterns.linalg.pad_vectorization`. This change makes it possible to run (and test) the corresponding patterns _without_: `transform.structured.vectorize_children_and_apply_patterns`. Note that the Op above only supports non-masked vectorisation (i.e. when the inputs are static), so, effectively, only fixed-width vectorisation (as opposed to scalable vectorisation). As such, this change is required to construct vectorization pipelines for tensor.pad targeting scalable vectors. To test the new Op and the corresponding patterns, I added "vectorization-pad-patterns.mlir" - most tests have been extracted from "vectorization-with-patterns.mlir".	2024-10-25 10:39:26 -07:00
Andrzej Warzyński	0a3347dc63	[mlir][linalg] Fix idx comparison in the vectorizer (#112900 ) Fixes loop comparison condition in the vectorizer. As that logic is used specifically for vectorising `tensor.extract`, I also added a test that violates the assumptions made inside `getTrailingNonUnitLoopDimIdx`, namely that Linalg loops are non-empty. Vectorizer pre-conditions will capture that much earlier making sure that `getTrailingNonUnitLoopDimIdx` is only run when all the assumptions are actually met. Thank you for pointing this out, @pfusik !	2024-10-18 15:27:43 +01:00
Andrzej Warzyński	f7f51f2afb	[mlir][vector] Clarify the semantics of masking maps (nfc) (#111383 ) We use the term "masking map" throughout the Linalg vectorization logic, but we don't really define what it is and how it differs from Linalg indexing maps. This PR clarifies the differnces, makes sure that the new terminology is used consistenty and improves code re-use.	2024-10-18 08:58:58 +01:00
Alexander Pivovarov	a24c468782	[MLIR] Fix assert expressions (#112474 ) I noticed that several assertions in MLIR codebase have issues with operator precedence The issue with operator precedence in these assertions is due to the way logical operators are evaluated. The `&&` operator has higher precedence than the `\|\|` operator, which means the assertion is currently evaluating incorrectly, like this: ``` assert((resType.getNumDynamicDims() == dynOutDims.size()) \|\| (dynOutDims.empty() && "Either none or all output dynamic dims must be specified!")); ``` We should add parentheses around the entire expression involving `dynOutDims.empty()` to ensure that the logical conditions are grouped correctly. Here’s the corrected version: ``` assert(((resType.getNumDynamicDims() == dynOutDims.size()) \|\| dynOutDims.empty()) && "Either none or all output dynamic dims must be specified!"); ```	2024-10-16 15:22:29 -07:00
Emilio Cota	1276ce9e97	Revert "[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783 )" This reverts commit `03483737a7` and `99c8557`, which is a fix-up on top of the former. I'm reverting because this commit broke two tests: mlir/test/python/integration/dialects/linalg/opsrun.py mlir/test/python/integration/dialects/transform.py See https://lab.llvm.org/buildbot/#/builders/138/builds/4872 I'm not familiar with the tests, so I'm leaving it to the original author to either remove or adapt the broken tests, as discussed here: https://github.com/llvm/llvm-project/pull/104783#issuecomment-2406390905	2024-10-11 05:22:56 -04:00
Md Asghar Ahmad Shahid	03483737a7	[mlir][linalg] Introduce transpose semantic to 'linalg.matmul' ops. (#104783 ) The main goal of this patch is to extend the semantic of 'linalg.matmul' named op to include per operand transpose semantic while also laying out a way to move ops definition from OpDSL to tablegen. Hence, it is implemented in tablegen. Transpose semantic is as follows. By default 'linalg.matmul' behavior will remain as is. Transpose semantics can be appiled on per input operand by specifying the optional permutation attributes (namely 'permutationA' for 1st input and 'permutationB' for 2nd input) for each operand explicitly as needed. By default, no transpose is mandated for any of the input operand. Example: ``` %val = linalg.matmul ins(%arg0, %arg1 : memref<5x3xf32>, memref<5x7xf32>) outs(%arg2: memref<3x7xf32>) permutationA = [1, 0] permutationB = [0, 1] ```	2024-10-10 17:00:58 +01:00
Andrzej Warzyński	d9d623310d	[mlir][linalg] Add a new helper hook: `hasVectorizationImpl` (#110708 ) The newly added hook simply returns `false` for Ops for which there's no "vectorization logic" in the Linalg Vectorizer (i.e. the `vectorize()` method). It's added so that the following two TD ops expose identical level of functionality (that's not the case ATM): * `transform.structured.vectorize_children_and_apply_patterns` * `transform.structured.vectorize` Specifically, ATM, the former works only for Linalg Ops, while the latter works for all Ops that the vectorizer supports (). With this change, I am making sure that both TD will behave consistently. Note, this shouldn't affect any of the current uses of the vectorizer. () This is implemented via the `vectorize()` method in Vectorization.cpp.	2024-10-04 10:06:33 +01:00
Andrzej Warzyński	56d6b56739	[mlir][vector] Relax the requirements on broadcast dims (#99341 ) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated () and will update the attribute value corresponding to broadcast dims to `true` if all non-broadcast dims are marked as "in bounds". Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in #97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from #97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.	2024-10-04 07:41:20 +01:00
Andrzej Warzyński	6d11494414	[mlir][Linalg] Refine how broadcast dims are treated (#99015 ) This PR fixes how broadcast dims (identified as "zero" results in permutation maps) corresponding to a reduction iterator are vectorised in the case of generic Ops. Here's an example: ```mlir #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) { %0 = tensor.empty() : tensor<1x12x197x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%arg0 : tensor<1x12x197x197xf32>) outs(%0 : tensor<1x12x197x1xf32>) { ^bb0(%in: f32, %out: f32): %818 = arith.addf %in, %out : f32 linalg.yield %818 : f32 } -> tensor<1x12x197x1xf32> return %1 : tensor<1x12x197x1xf32> } ``` This is a perfectly valid Generic Op, but currently triggers two issues in the vectoriser. The root cause is this map: ```mlir #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> ``` This map triggers an assert in `reindexIndexingMap` - this hook incorrectly assumes that every result in the input map is a `dim` expression and that there are no constants. That's not the case in this example. `reindexIndexingMap` is extended to allow maps like the one above. For now, only constant "zero" results are allowed. This can be extended in the future once a good motivating example is available. Separately, the permutation map highlighted above "breaks" mask calculation (ATM masks are always computed, even in the presence of static shapes). When applying the following permutation: ```mlir (d0, d1, d2, d3) -> (d0, d1, d2, 0) ``` to these canonical shapes (corresponding to the example above): ``` (1, 12, 197, 197) ``` we end up with the following error: ```bash error: vector types must have positive constant sizes but got 1, 12, 197, 0 ``` The error makes sense and indicates that we should update the permutation map above to: ``` (d0, d1, d2, d3) -> (d0, d1, d2) ``` This would correctly give the following vector type: ``` vector<1x12x197xi1> ``` Fixes #97247	2024-09-26 16:17:15 +01:00
Nirvedh Meshram	234193bae6	[mlir][linalg] Vectorization support for convolution of i1 type (#109480 ) Normally convolutions present with the following linalg op region ``` ^bb0(%arg14: i4, %arg15: i4, %arg16: i4): %17 = arith.muli %arg14, %arg15 : i4 %18 = arith.addi %arg16, %17 : i4 linalg.yield %18 : i4 ``` However, for i1 due to strength reduction we get something like ``` ^bb0(%arg14: i1, %arg15: i1, %arg16: i1): %17 = arith.andi %arg14, %arg15 : i1 %18 = arith.ori %arg16, %17 : i1 linalg.yield %18 : i1 ``` This PR updates the logic to support this region for i1 types.	2024-09-24 12:24:59 -05:00
Andrzej Warzyński	b47d1787b5	[mlir][vector] Refine vectorisation of tensor.extract (#109580 ) This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the following case is vectorised as `vector.gather` (as opposed to attempting a contiguous load): ```mlir func.func @index_from_output_column_vector_gather_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> { %c0 = arith.constant 0 : index %0 = tensor.empty() : tensor<8x1xf32> %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel"] } outs(%0 : tensor<8x1xf32>) { ^bb0(%arg1: f32): %1 = linalg.index 0 : index %extracted = tensor.extract %src[%1, %c0] : tensor<8x128xf32> linalg.yield %extracted : f32 } -> tensor<8x1xf32> return %res : tensor<8x1xf32> } ``` Specifically, when looking for loop-invariant indices in `tensor.extract` Ops, any `linalg.index` Op that's used in address colcluation should only access loop dims that are == 1. In the example above, the following does not meet that criteria: ```mlir %1 = linalg.index 0 : index ``` Note that this PR also effectively addresses the issue fixed in #107922, i.e. exercised by: * `@vectorize_nd_tensor_extract_load_1d_column_vector_using_gather_load` `getNonUnitLoopDim` introduced in #107922 is still valid though. In fact, it is required to identify that the following case is a contiguous load: ```mlir func.func @index_from_output_column_vector_contiguous_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> { %c0 = arith.constant 0 : index %0 = tensor.empty() : tensor<8x1xf32> %res = linalg.generic { indexing_maps = [#map], iterator_types = ["parallel", "parallel"] } outs(%0 : tensor<8x1xf32>) { ^bb0(%arg1: f32): %1 = linalg.index 0 : index %extracted = tensor.extract %src[%c0, %1] : tensor<8x128xf32> linalg.yield %extracted : f32 } -> tensor<8x1xf32> return %res : tensor<8x1xf32> } ``` Some logic is still missing to lower the above to `vector.transfer_read`, so it is conservatively lowered to `vector.gather` instead (see TODO in `getTensorExtractMemoryAccessPattern`). There's a few additional changes: * `getNonUnitLoopDim` is simplified and renamed as `getTrailingNonUnitLoopDimIdx`, additional comments are added (note that the functionality didn't change); * extra comments in a few places, variable names in comments update to use Markdown (which is the preferred approach in MLIR). This is a follow-on for: * https://github.com/llvm/llvm-project/pull/107922 * https://github.com/llvm/llvm-project/pull/102321	2024-09-24 14:03:30 +01:00
Kazu Hirata	f264d9a9d5	[Linalg] Fix a warning This patch fixes: mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:821:12: error: variable 'countNonUnitDim' set but not used [-Werror,-Wunused-but-set-variable]	2024-09-21 13:59:36 -07:00
Nirvedh Meshram	e45fc5140d	[Linalg][Vectorization] Add support for linalg vectorization of a tensor.extract case (#107922 ) In https://github.com/llvm/llvm-project/pull/102321 we relaxed the vectorizer so that when checking for contiguous loads we dont always have a trailing non unit dim. For example in the test case added we have `tensor<8x1xf32>` which is now a valid candidate for contiguous load. However, the logic to check contiguous load assumed that only the trailing dim will be non unit so this PR just updates that logic to find the actual non unit dim.	2024-09-21 15:12:51 -05:00
Andrzej Warzyński	315ba77406	[mlir][linalg] Vectorisation of tensor.extract - dynamic shapes (#100582 ) This PR removes the assumption that reading from a dynamic tensor is always a gather load: ```mlir %extracted = tensor.extract %src[%c79, %3] : tensor<?x?xf32> ``` That assumption was originally introduced to simplify the implementation and to reduce the number of cases to consider. Now that the vectorisation of `tensor.extract` has been around for > 1 year and has been quite stable, we can safely relax it. This is a relatively small change - rather than using the parent linalg Op to infer the target output shape (not possible with dynamic shapes), the vectorizer will use the (previously constructed) output vector shape instead. As expected, the following test required updating (`vector.gather` -> `vector.transfer_read`): * @masked_dynamic_vectorize_nd_tensor_extract_with_affine_apply_contiguous Similar test for scalable vectors is also added.	2024-09-19 19:53:11 +01:00
Andrzej Warzyński	62e5032c9a	Reapply "[mlir][linalg] Relax tensor.extract vectorization" (#102321 ) [This reverts commit 6662523d6b2ca0198141c94ee80ebbb41601df9f] Simplifies the vectorization of tensor.extract so that: * all cases that read into a genuinely multi-dim vector () are considered a gather load, all other cases are considered as potential contiguous loads. This change means that the following extraction from a "column" tensor is correctly identified as a scalar load followed by a broadcast (rather than a gather load). ```mlir func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> { %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %cst = arith.constant dense<[...]> : tensor<15x1xi32> %out = linalg.generic { indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} outs(%in : tensor<1x1x4xi32>) { ^bb0(%out: i32): %8 = linalg.index 0 : index %idx_0 = linalg.index 0 : index %extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32> linalg.yield %extracted : i32 } -> tensor<1x1x4xi32> return %out:tensor<1x1x4xi32> } ``` Overview of the delta compared to the original submission (#99299): * removed an assert representing a condition that is being relaxed here, * added a test (reading from a column tensor) based on a repro from @hanhanW. (*) `vector<1x4x1xf32>` is considered as 1D vector in this context.	2024-08-08 09:36:10 +01:00
Han-Chung Wang	28fa83f8d4	Revert "[mlir][linalg] Relax tensor.extract vectorization" (#102232 ) Reverts llvm/llvm-project#99299 because it breaks the lowering. To repro: `mlir-opt -transform-interpreter ~/repro.mlir` ```mlir #map = affine_map<(d0, d1) -> (d0)> #map1 = affine_map<(d0, d1) -> (d1)> #map2 = affine_map<(d0, d1) -> (d0, d1)> #map3 = affine_map<(d0, d1) -> (d0 + d1)> module { func.func @foo(%arg0: index, %arg1: tensor<2xf32>, %arg2: tensor<4xf32>, %arg3: tensor<1xf32>) -> tensor<4x1xf32> { %c0 = arith.constant 0 : index %cst = arith.constant 1.000000e+00 : f32 %cst_0 = arith.constant 0.000000e+00 : f32 %0 = tensor.empty() : tensor<4x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : tensor<4xf32>, tensor<1xf32>) outs(%0 : tensor<4x1xf32>) { ^bb0(%in: f32, %in_1: f32, %out: f32): %2 = linalg.index 0 : index %3 = linalg.index 1 : index %4 = affine.apply #map3(%3, %arg0) %extracted = tensor.extract %arg1[%c0] : tensor<2xf32> %5 = arith.cmpi eq, %2, %c0 : index %6 = arith.cmpi ult, %2, %c0 : index %7 = arith.select %5, %cst, %in : f32 %8 = arith.select %6, %cst_0, %7 : f32 %9 = arith.cmpi eq, %4, %c0 : index %10 = arith.cmpi ult, %4, %c0 : index %11 = arith.select %9, %cst, %in_1 : f32 %12 = arith.select %10, %cst_0, %11 : f32 %13 = arith.mulf %8, %12 : f32 %14 = arith.mulf %13, %extracted : f32 %15 = arith.cmpi eq, %2, %4 : index %16 = arith.select %15, %cst, %cst_0 : f32 %17 = arith.subf %16, %14 : f32 linalg.yield %17 : f32 } -> tensor<4x1xf32> return %1 : tensor<4x1xf32> } } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op transform.structured.vectorize %0 : !transform.any_op transform.yield } } ```	2024-08-06 14:35:27 -07:00
Andrzej Warzyński	8868c02cda	[mlir][linalg] Relax tensor.extract vectorization (#99299 ) Simplifies the vectorization of tensor.extract so that: * all cases that read into a genuinely multi-dim vector () are considered a gather load, all other cases are considered as potential contiguous loads. This change means that the following extraction from a "column" tensor will be correctly identified as a scalar load followed by a broadcast (rather than a gather load). ```mlir func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> tensor<1x1x4xi32> { %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %cst = arith.constant dense<[...]> : tensor<15x1xi32> %out = linalg.generic { indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} outs(%in : tensor<1x1x4xi32>) { ^bb0(%out: i32): %idx_0 = linalg.index 0 : index %extracted = tensor.extract %cst[%idx_0, %c0] : tensor<15x1xi32> linalg.yield %extracted : i32 } -> tensor<1x1x4xi32> return %out:tensor<1x1x4xi32> } ``` (*) `vector<1x4x1xf32>` is considered as 1D vector in this context.	2024-08-06 10:57:10 +01:00
Han-Chung Wang	6d02f62e4a	[mlir][linalg] Add vectorization support for minnumf/maxnumf reductions. (#101092 ) This is a follow-up for https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671 The ops were splitted to two version, and the vectorization support for one of them is missing. The revision also renames the existing lit tests accordingly, which explicitly puts `maximumf/minimumf` to the function names.	2024-07-29 20:00:40 -07:00
Andrzej Warzyński	c7a3346ab6	[mlir][linalg] Fix scalable vectorisation of tensor.extract (#100325 ) This PR fixes one very specific aspect of vectorising `tensor.extract` Ops when targeting scalable vectors. Namely, it makes sure that the scalable flag is correctly propagated when creating `vector::ShapeCastOp`. BEFORE: ```mlir vector.shape_cast %idx_vec : vector<1x1x[4]xindex> to vector<4xindex> ``` AFTER: ```mlir vector.shape_cast %idx_vec : vector<1x1x[4]xindex> to vector<[4]xindex> ``` This particular ShapeCastOp is created when generating an index for `vector.transfer_read` operations. Strictly speaking, casting is not really required. However, it makes the subsequent address calculation much simpler (). The following test is updated to demonstrate the use of `vector.shape_cast` by the vectoriser: @masked_static_vectorize_nd_tensor_extract_with_affine_apply_contiguous Similar test with scalable vectors is also added. (*) At this point in the vectoriser it is known that all leading dims in the index vector are "1").	2024-07-25 09:44:34 +01:00
Zhaoshi Zheng	6942f1d5aa	[MLIR][Linalg] Scalable Vectorization of Reduction on the Trailing Dimension (#97788 ) Allow scalable vectorization of linalg::reduce and linalg::generic that has reduction iterator(s) with two restrictions: 1. The reduction dim is the last (innermost) dim of the op; and 2. Only the reduction dim is requested for scalable vectorization. One exception is that scalable vectorization of the reduction dim in Matmul-like ops are not supported even above restrictions are met. Allowed combinations of scalable flags and iterator types: Matmul: Iterators: ["parallel", "parallel", "reduction"] Scalable Flags: ["true", "true", "false"] ["false", "true", "false"] Matvec: Iterators: ["parallel", "reduction"] Scalable Flags: ["false", "true"] ["true", "false"]	2024-07-23 21:52:22 -07:00
Andrzej Warzyński	cfe043cf99	[mlir][linalg] Restrict scalable vectorisation (#98639 ) Updates `vectorizeScalableVectorPrecondition` so that scalable vectorisation is only applied in well understood and tested scenarios. It's unlikely that we would ever want an arbitrary dimension to be scalable. While the Linalg vectoriser should be flexible enough to handle all possibilities: * in more "exotic" cases, we are likely to struggle with lowerings further down the compilation stack, * it would be impractical given the limitations of LLVM (which usually reflect the limitations of actual hardware) - e.g. no support for "scalable" arrays of scalable or fixed width vectors (). Ultimately, the goal of this patch is to better document what's currently supported. While this PR adds some new restrictions, no existing tests are affected. () At MLIR vector level that would correspond to e.g. `vector<[4]x8xf32>`.	2024-07-19 08:05:10 +01:00
Andrzej Warzyński	2ee5586ac7	[mlir][vector] Make the in_bounds attribute mandatory (#97049 ) At the moment, the in_bounds attribute has two confusing/contradicting properties: 1. It is both optional _and_ has an effective default-value. 2. The default value is "out-of-bounds" for non-broadcast dims, and "in-bounds" for broadcast dims. (see the `isDimInBounds` vector interface method for an example of this "default" behaviour [1]). This PR aims to clarify the logic surrounding the `in_bounds` attribute by: * making the attribute mandatory (i.e. it is always present), * always setting the default value to "out of bounds" (that's consistent with the current behaviour for the most common cases). #### Broadcast dimensions in tests As per [2], the broadcast dimensions requires the corresponding `in_bounds` attribute to be `true`: ``` vector.transfer_read op requires broadcast dimensions to be in-bounds ``` The changes in this PR mean that we can no longer rely on the default value in cases like the following (dim 0 is a broadcast dim): ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` Instead, the broadcast dimension has to explicitly be marked as "in bounds: ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` All tests with broadcast dims are updated accordingly. #### Changes in "SuperVectorize.cpp" and "Vectorization.cpp" The following patterns in "Vectorization.cpp" are updated to explicitly set the `in_bounds` attribute to `false`: * `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern` Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and `vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to make sure that xfer Ops created by these hooks set the dimension corresponding to broadcast dims as "in bounds". Otherwise, the Op verifier would complain Note that there is no mechanism to verify whether the corresponding memory access are indeed in bounds. Still, this is consistent with the current behaviour where the broadcast dim would be implicitly assumed to be "in bounds". [1] `4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)` [2] https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop	2024-07-16 16:49:52 +01:00
Cullen Rhodes	8011a23948	[mlir][linalg] Support scalable vectorization of linalg.index operations (#96778 ) The vectorization of linalg.index operations doesn't support scalable vectors when computing the index vector. This patch fixes this with the vector.step operation. Depends on #96776	2024-07-09 09:06:58 +01:00
Benjamin Maxwell	34e34a03ac	[mlir][linalg] Mark xfers as in-bounds when masking depthwise convs (#96771 ) If this is not set the fact that the dynamic channel is in-bounds cannot be inferred automatically (like it can for static sizes), which eventually leads to it being marked as out-of-bounds (which prevents some rewrites).	2024-06-27 15:47:00 +01:00
Jay Foad	1650f1b3d7	Fix typo "indicies" (#92232 )	2024-05-15 13:10:16 +01:00
Prashant Kumar	2755c69098	[mlir][linalg] Vectorize unpack op without masking (#89067 ) Enables vectorization of unpack op in the case of unknown vector size. The vector sizes are determined by the result's shape.	2024-05-03 21:38:02 +05:30
Prashant Kumar	8feedd5e06	[mlir][linalg] Fix the semantic use of a flag (#90081 ) `useInBoundsInsteadOfMasking` was doing the opposite i.e., when set to true; was updating the mask instead of updating the inBounds.	2024-04-26 05:39:00 +05:30
Lubomir Litchev	30d4f6afc9	Make createReadOrMaskedRead and isValidMaskedInputVector vector utilities (#89119 ) Made the createReadOrMaskedRead and isValidMaskedInputVector utility functions - to be accessible outside of the CU. Needed by the IREE new TopK implementation.	2024-04-22 17:18:45 -07:00
Christian Sigg	a5757c5b65	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356 ) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.	2024-04-19 15:58:27 +02:00
Prashant Kumar	ce5381e22a	[mlir][vector] Determine vector sizes from the result shape in the ca… (#88249 ) …se of tensor pack When the vector sizes are not passed as inputs to the vector transform operation, the vector sizes are queried from the static result shape in the case of tensor.pack op.	2024-04-17 05:36:40 +05:30
Jakub Kuderski	971b852546	[mlir][NFC] Simplify type checks with isa predicates (#87183 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-04-01 11:40:09 -04:00
Andrzej Warzyński	c56bd7ab79	[mlir][linalg] Enable masked vectorisation for depthwise convolutions (#81625 ) This patch adds support for masked vectorisation of depthwise 1D WC convolutions,`linalg.depthwise_conv_1d_nwc_wc`. This is implemented by adding support for masking. Two major assumptions are made: * only the channel dimension can be dynamic/scalable (i.e. the trailing dim), * when specifying vector sizes to use in the vectoriser, only the size corresponding to the channel dim is effectively used (other dims are inferred from the context). In terms of scalable vectorisation, this should be sufficient to cover all practical cases (i.e. making arbitrary dim scalable wouldn't make much sense). As for more generic cases with dynamic shapes (e.g. W or N dims being dynamic), more work would be needed. In particular, one would have to consider the filter and input/output tensors separately.	2024-03-14 20:19:46 +00:00
Han-Chung Wang	46bd65a050	[mlir][LinAlg] Vectorize reverse-like ops using vector.gather ops. (#83205 ) The reverse op is treated as a VectorMemoryAccessKind::Contiguous load. It is contiguous slice, but we'll need to compute indices differently and apply a reverse at vector level. It takes non-trivial efforts for the approach. The revision flips the case to use vector.gather. Otherwise there are functionality issues. E.g., the below example loaded `2, 3, 4` (which is a bug), but what we want is `2, 1, 0`. Before vectorization: ```mlir func.func @vectorize_reverse_like_tensor_extract(%arg0: tensor<1x2x3xf32>, %arg1: tensor<1x1x3xf32>, %arg2: index) -> tensor<1x1x3xf32> { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %c2 = arith.constant 2 : index %0 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel", "parallel", "parallel"]} outs(%arg1 : tensor<1x1x3xf32>) { ^bb0(%out: f32): %1 = linalg.index 1 : index %2 = linalg.index 0 : index %3 = affine.apply #map1(%1, %2, %arg2) %4 = linalg.index 2 : index %5 = arith.subi %c2, %4 : index %extracted = tensor.extract %arg0[%c0, %3, %5] : tensor<1x2x3xf32> linalg.yield %extracted : f32 } -> tensor<1x1x3xf32> return %0 : tensor<1x1x3xf32> } ``` Partial IR after vectorization: ``` %5 = vector.constant_mask [1, 1, 3] : vector<1x1x4xi1> %6 = vector.broadcast %arg0 : index to vector<1x1x4xindex> %7 = vector.shape_cast %6 : vector<1x1x4xindex> to vector<4xindex> %8 = vector.extractelement %7[%c0_i32 : i32] : vector<4xindex> %9 = vector.transfer_read %3[%c0, %8, %c2], %cst, %5 {in_bounds = [true, true, true]} : tensor<1x2x3xf32>, vector<1x1x4xf32> ```	2024-02-28 09:45:09 -08:00
Balaji V. Iyer	adf838daee	[mlir][Vectorizer] Added support to Vectorize tensor.unpack (#76087 ) Added support to vectorized tensor.unpack. The unpack Op is split into a `vector.transfer_read`, `vector.transpose`, `vector.shape_cast` and a `vector.transfer_write`.	2024-02-20 16:10:14 -06:00
Max191	7880b2c858	[mlir] Add direct vectorization lowering for `tensor.pack` ops (#78660 ) This PR adds a direct vectorization lowering of `tensor.pack` into `mask(vector.transfer_read)`->`vector.shape_cast`->`vector.transpose`->`vector.transfer_write`.	2024-02-07 14:11:11 -05:00
Mehdi Amini	2e0909025e	Apply clang-tidy fixes for readability-simplify-boolean-expr in Vectorization.cpp (NFC)	2024-01-22 17:34:55 -08:00
Matthias Springer	5fcf907b34	[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260 ) This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).	2024-01-17 11:08:59 +01:00
Matthias Springer	0a8e3dd432	[mlir][Interfaces] `DestinationStyleOpInterface`: Rename `hasTensor/BufferSemantics` (#77574 ) Rename interface functions as follows: * `hasTensorSemantics` -> `hasPureTensorSemantics` * `hasBufferSemantics` -> `hasPureBufferSemantics` These two functions return "true" if the op has tensor/buffer operands but not buffer/tensor operands. Also drop the "ranked" part from the interface, i.e., do not distinguish between ranked/unranked types. The new function names describe the functions more accurately. They also align their semantics with the notion of "tensor semantics" with the bufferization framework. (An op is supposed to be bufferized if it has tensor operands, and we don't care if it also has memref operands.) This change is in preparation of #75273, which adds `BufferizableOpInterface::hasTensorSemantics`. By renaming the functions in the `DestinationStyleOpInterface`, we can avoid name clashes between the two interfaces.	2024-01-12 10:02:54 +01:00
Andrzej Warzyński	db9a16eaed	[mlir][nfc] Update comments in the Linalg vectoriser (#76797 )	2024-01-04 17:24:22 +00:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Andrzej Warzyński	f11bda78c8	[mlir][linalg] Use vector.shuffle to flatten conv filter (#75038 ) Updates the vectorisation of 1D depthwise convolution when flattening the channel dimension (introduced in #71918). In particular - how the convolution filter is "flattened". ATM, the vectoriser will use `vector.shape_cast`: ```mlir %b_filter = vector.broadcast %filter : vector<4xf32> to vector<3x2x4xf32> %sc_filter = vector.shape_cast %b_filter : vector<3x2x4xf32> to vector<3x8xf32> ``` This lowering is not ideal - `vector.shape_cast` can be convenient when it's folded away, but that's not happening in this case. Instead, this patch updates the vectoriser to use `vector.shuffle` (the overall result is identical): ```mlir %sh_filter = vector.shuffle %filter, %filter [0, 1, 2, 3, 0, 1, 2, 3] : vector<4xf32>, vector<4xf32> %b_filter = vector.broadcast %sh_filter : vector<8xf32> to vector<3x8xf32> ```	2023-12-15 17:56:59 +00:00
Andrzej Warzyński	03c2f5d8bb	[mlir][linalg][conv] Flatten the channel dimension when vectorizing (#71918 ) The current vectorization of 1D depthwise convolutions in Linalg is _sub-optimal_ for tensor with a low number of channel dimensions, e.g.: ```mlir linalg.depthwise_conv_1d_nwc_wc {dilations = dense<1> : vector<1xi64>, strides = dense<1> : vector<1xi64>} ins(%input, %filter : tensor<1x8x3xi8>, tensor<1x3xi8>) outs(%output : tensor<1x8x3xi8>) -> tensor<1x8x3xi8> ``` That's due to the fact that ultimately (i.e. at LLVM level), vectorization happens along the trailing dimension (i.e. the channel dimension). In this case it leads to vectors with 3 elements (or worse, if there's e.g. only 1 channel dimension). For comparison, a 128 bit wide vector registers can hold 16 x i8. Instead, this patch adds an option to flatten/collapse the channel dimension into the width dimension of the input/filter/output using `vector.shape_cast` operation: ```mlir %sc_input = vector.shape_cast %input : vector<1x8x3xi8> to vector<1x24xi8> %sc_output = vector.shape_cast %output : vector<1x8x3xi8> to vector<1x24xi8> %b_filter = vector.broadcast %filter : vector<3xi8> to vector<1x8x3xi8> %sc_filter = vector.shape_cast %b_filter : vector<1x8x3xi8> to vector<1x24xi8> ``` This new vectorization mode is implemented in `depthwiseConv` by inserting `vector.shape_cast` Ops before and after `depthwiseConv1dSliceAsMulAcc` is invoked. It can be selected through e.g. a transform dialect attribute: ```mlir transform.structured.vectorize_children_and_apply_patterns %conv {flatten_1d_depthwise_conv} ``` A forthcoming patch will implement a strategy to automatically switch between the two implementations, depending on the shape of the input tensors. Co-authored by: Bradley Smith <bradley.smith@arm.com>	2023-12-06 21:35:03 +00:00
long.chen	1609f1c2a5	[mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269 ) detail see the docment: https://mlir.llvm.org/deprecation/ Not all changes are made manually, most of them are made through a clang tool I wrote https://github.com/lipracer/cpp-refactor.	2023-11-14 13:01:19 +08:00
Han-Chung Wang	03529b99b3	[mlir][linalg] Add support for vectorizing dynamic elementwise named ops (#71454 ) We are able to vectorize them in linalg.generic form. We just need to relax the condition, so it can also vectorize named ops.	2023-11-06 15:35:50 -08:00
Jacques Pienaar	b858309ddc	[mlir] Only attempt to vectorize conv if conv. Avoids hitting assertions due to unsupported convolution patterns. See https://github.com/openxla/iree/issues/15207#issuecomment-1767650797	2023-10-18 20:34:39 -07:00
Kazu Hirata	3bca659556	Use llvm::is_contained (NFC)	2023-09-22 17:20:50 -07:00

1 2 3 4 5 ...

290 Commits