clang-p2996

Author	SHA1	Message	Date
Manupa Karunaratne	a6e72f9392	[MLIR][Vector] Add Lowering for vector.step (#113655 ) Currently, the lowering for vector.step lives under a folder. This is not ideal if we want to do transformation on it and defer the materizaliztion of the constants much later. This commits adds a rewrite pattern that could be used by using `transform.structured.vectorize_children_and_apply_patterns` transform dialect operation. Moreover, the rewriter of vector.step is also now used in -convert-vector-to-llvm pass where it handles scalable and non-scalable types as LLVM expects it. As a consequence of removing the vector.step lowering as its folder, linalg vectorization will keep vector.step intact.	2024-11-01 16:38:36 +00:00
Andrzej Warzyński	39ad84e4d1	[mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349 ) At the moment, `GenericPadOpVectorizationPattern` implements two orthogonal transformations: 1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`, `linalg::FillOp` and `tensor::InsertSliceOp`. 2. Vectorizes (where possible) `tensor::InsertSliceOp` (see `tryVectorizeCopy`). This patch splits `GenericPadOpVectorizationPattern` into two separate patterns: 1. `GeneralizePadOpPattern` for the first transformation (note that currently `GenericPadOpVectorizationPattern` inherits from `GeneralizePadOpPattern`). 2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`. With this change, we gain the following: * a clear separation between pre-processing and vectorization transformations/stages, * a path to support masked vectorisation for `tensor.insert_slice` (with a dedicated pattern for vectorization, it is much easier to specify the input vector sizes used in masking), * more opportunities to vectorize `tensor.insert_slice`. Note for downstream users: -------------------------- If you were using `populatePadOpVectorizationPatterns`, following this change you will also have to add `populateInsertSliceVectorizationPatterns`. Finer implementation details: ----------------------------- 1. The majority of changes in this patch are copy & paste + some edits. 1.1. The only functional change is that the vectorization of `tensor.insert_slice` is now broadly available (as opposed to being constrained to the pad vectorization pattern: `GenericPadOpVectorizationPattern`). 1.2. Following-on from the above, `@pad_and_insert_slice_dest` is updated. As expected, the input `tensor.insert_slice` Op is no longer "preserved" and instead gets vectorized successfully. 2. The `linalg.fill` case in `getConstantPadVal` works under the assumption that only _scalar_ source values can be used. That's consistent with the definition of the Op, but it's not tested at the moment. Hence a test case in Linalg/invalid.mlir is added. 3. The behaviour of the two TD vectorization Ops, `transform.structured.vectorize_children_and_apply_patterns` and `transform.structured.vectorize` is preserved.	2024-10-29 16:57:23 +00:00
Andrzej Warzyński	ac4bd74190	[mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504 ) This PR simply wraps `populatePadOpVectorizationPatterns` into a new Transform Dialect Op: `apply_patterns.linalg.pad_vectorization`. This change makes it possible to run (and test) the corresponding patterns _without_: `transform.structured.vectorize_children_and_apply_patterns`. Note that the Op above only supports non-masked vectorisation (i.e. when the inputs are static), so, effectively, only fixed-width vectorisation (as opposed to scalable vectorisation). As such, this change is required to construct vectorization pipelines for tensor.pad targeting scalable vectors. To test the new Op and the corresponding patterns, I added "vectorization-pad-patterns.mlir" - most tests have been extracted from "vectorization-with-patterns.mlir".	2024-10-25 10:39:26 -07:00
Max191	2bff9d9ffe	[mlir] Don't hoist transfers from potentially zero trip loops (#112752 ) The hoistRedundantVectorTransfers function does not verification of loop bounds when hoisting vector transfers. This is not safe in general, since it is possible that the loop will have zero trip count. This PR uses ValueBounds to verify that the lower bound is less than the upper bound of the loop before hoisting. Trip count verification is currently behind an option `verifyNonZeroTrip`, which is false by default. Zero trip count loops can arise in GPU code generation, where a loop bound can be dependent on a thread id. If not all threads execute the loop body, then hoisting out of the loop can cause these threads to execute the transfers when they are not supposed to. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>	2024-10-18 16:11:21 -04:00
Andrzej Warzyński	a758bcdbd9	[mlir][td] Rename pack_paddings in structured.pad (#111036 ) The pack_paddings attribute in the structure.pad TD Op is used to set the `nofold` attribute in the generated tensor.pad Op. The current name is confusing and suggests that there's a relation with the tensor.pack Op. This patch renames it as `nofold_flags` to better match the actual usage.	2024-10-15 19:24:43 +01:00
Quinn Dawkins	9144fed31b	[mlir] Add option for a cleanup pattern set to SCF tiling helper (#109554 ) The SCF helper for tiling an operation implementing the TilingInterface and greedily fusing consumers requires an uninterrupted chain of operations implementing the tiling interface to succeed. There can be cases with intermediate ops that don't implement the interface but have producers that could be fused if various canonicalization/simplification patterns could run in between fusion steps. This adds an option to SCFTileAndFuseOptions for a pattern set to run between fusion steps to the ops that result from fusion/tiling. Removed and newly inserted slices are tracked for continued fusion applications. See this RFC for more discussion: https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155	2024-10-04 14:42:55 -04:00
Andrzej Warzyński	d9d623310d	[mlir][linalg] Add a new helper hook: `hasVectorizationImpl` (#110708 ) The newly added hook simply returns `false` for Ops for which there's no "vectorization logic" in the Linalg Vectorizer (i.e. the `vectorize()` method). It's added so that the following two TD ops expose identical level of functionality (that's not the case ATM): * `transform.structured.vectorize_children_and_apply_patterns` * `transform.structured.vectorize` Specifically, ATM, the former works only for Linalg Ops, while the latter works for all Ops that the vectorizer supports (). With this change, I am making sure that both TD will behave consistently. Note, this shouldn't affect any of the current uses of the vectorizer. () This is implemented via the `vectorize()` method in Vectorization.cpp.	2024-10-04 10:06:33 +01:00
Rolf Morel	94cf80d6fd	[MLIR][Linalg] Pattern to fold AddOp to accumulation via contraction op's dest (#110514 ) Replaces a linalg.add with one operand the single user of a contraction, which has a zero-filled, "identity-mapped" destination and is dominated by the `other` operand, by the contraction with `other` as its dest. Benefits include elision of an elementwise op, namely the linalg.add, and removing a tensor.empty as a destination which is likely to require an allocation upon bufferization.	2024-10-03 12:22:57 +02:00
Kazu Hirata	b52885bc23	[mlir] Use std::optional::value_or (NFC) (#109893 )	2024-09-26 09:53:43 -07:00
Hugo Trachino	28039055e5	[MLIR][Transform] Hoist Pad generates linalg.transpose (#109669 ) For readability purpose, generate linalg named ops when possible. For maintainability purpose, get rid of duplicated code.	2024-09-26 09:33:47 +01:00
JOE1994	884221eddb	[mlir] Tidy uses of llvm::raw_stream_ostream (NFC) As specified in the docs, 1) raw_string_ostream is always unbuffered and 2) the underlying buffer may be used directly ( `65b13610a5` for further reference ) * Don't call raw_string_ostream::flush(), which is essentially a no-op. * Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.	2024-09-16 23:23:25 -04:00
Andrzej Warzyński	42944da5ba	[mlir][vector] Group re-order patterns together (#102856 ) Group all patterns that re-order vector.transpose and vector.broadcast Ops () under `populateSinkVectorOpsPatterns`. These patterns are normally used to "sink" redundant Vector Ops, hence grouping together. Example: ```mlir %at = vector.transpose %a, [1, 0]: vector<4x2xf32> to vector<2x4xf32> %bt = vector.transpose %b, [1, 0]: vector<4x2xf32> to vector<2x4xf32> %r = arith.addf %at, %bt : vector<2x4xf32> ``` would get converted to: ```mlir %0 = arith.addf %a, %b : vector<4x2xf32> %r = vector.transpose %0, [1, 0] : vector<2x4xf32> ``` This patch also moves all tests for these patterns so that all of them are: run under one test-flag: `test-vector-sink-patterns`, * located in one file: "vector-sink.mlir". To facilitate this change: * `-test-sink-vector-broadcast` is renamed as `test-vector-sink-patterns`, * "sink-vector-broadcast.mlir" is renamed as "vector-sink.mlir", * tests for `ReorderCastOpsOnBroadcast` and `ReorderElementwiseOpsOnTranspose` patterns are moved from "vector-reduce-to-contract.mlir" to "vector-sink.mlir", * `ReorderElementwiseOpsOnTranspose` patterns are removed from `populateVectorReductionToContractPatterns` and added to (newly created) `populateSinkVectorOpsPatterns`, * `ReorderCastOpsOnBroadcast` patterns are removed from `populateVectorReductionToContractPatterns` - these are already present in `populateSinkVectorOpsPatterns`. This should allow us better layering and more straightforward testing. For the latter, the goal is to be able to easily identify which pattern a particular test is exercising (especially when it's a specific pattern). NOTES FOR DOWNSTREAM USERS In order to preserve the current functionality, please make sure to add * `populateSinkVectorOpsPatterns`, wherever you are using `populateVectorReductionToContractPatterns`. Also, rename `populateSinkVectorBroadcastPatterns` as `populateSinkVectorOpsPatterns`. (*) I didn't notice any other re-order patterns.	2024-08-16 16:53:53 +01:00
Hsiangkai Wang	c4bf949171	[mlir][linalg] Implement TilingInterface for winograd operators (#96184 ) In order to support arbitrary size input data of conv2d, implement TilingInterface for winograd operations. Before converting winograd operations into nested loops with matrix multiply, tile the input of conv2d into the supported size first. Add a transform operation structured.decompose_winograd_op to decompose winograd operations. Before applying the transform op, use tile_using_for to tile the input data into supported size. The test case shows how to tile and decompose winograd operations.	2024-08-16 16:22:02 +01:00
Nikhil Kalra	84cc1865ef	[mlir] Support DialectRegistry extension comparison (#101119 ) `PassManager::run` loads the dependent dialects for each pass into the current context prior to invoking the individual passes. If the dependent dialect is already loaded into the context, this should be a no-op. However, if there are extensions registered in the `DialectRegistry`, the dependent dialects are unconditionally registered into the context. This poses a problem for dynamic pass pipelines, however, because they will likely be executing while the context is in an immutable state (because of the parent pass pipeline being run). To solve this, we'll update the extension registration API on `DialectRegistry` to require a type ID for each extension that is registered. Then, instead of unconditionally registered dialects into a context if extensions are present, we'll check against the extension type IDs already present in the context's internal `DialectRegistry`. The context will only be marked as dirty if there are net-new extension types present in the `DialectRegistry` populated by `PassManager::getDependentDialects`. Note: this PR removes the `addExtension` overload that utilizes `std::function` as the parameter. This is because `std::function` is copyable and potentially allocates memory for the contained function so we can't use the function pointer as the unique type ID for the extension. Downstream changes required: - Existing `DialectExtension` subclasses will need a type ID to be registered for each subclass. More details on how to register a type ID can be found here: `8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)` - Existing uses of the `std::function` overload of `addExtension` will need to be refactored into dedicated `DialectExtension` classes with associated type IDs. The attached `std::function` can either be inlined into or called directly from `DialectExtension::apply`. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-08-06 01:32:36 +02:00
Kazu Hirata	5262865aac	[mlir] Construct SmallVector with ArrayRef (NFC) (#101896 )	2024-08-04 11:43:05 -07:00
MaheshRavishankar	6740d701bd	[mlir][Linalg] Deprecate `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` (#91878 ) The implementation of these methods are legacy and they are removed in favor of using the `scf::tileUsingSCF` methods as replacements. To get the latter on par with requirements of the deprecated methods, the tiling allows one to specify the maximum number of tiles to use instead of specifying the tile sizes. When tiling to `scf.forall` this specification is used to generate the `num_threads` version of the operation. A slight deviation from previous implementation is that the deprecated method always generated the `num_threads` variant of the `scf.forall` operation. Instead now this is driven by the tiling options specified. This reduces the indexing math generated when the tile sizes are specified. Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF` ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> numThreads; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setNumThreads(numThreads); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /note the difference that setMapping takes an ArrayRef<Attribute> / FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` This generates the `numThreads` version of the `scf.forall` for the inter-tile loops, i.e. ``` ... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...) ``` Moving from `linalg::tileToForallOpUsingTileSizes` to `scf::tileUsingSCF` ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> tileSizes; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setTileSizes(tileSizes); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /note the difference that setMapping takes an ArrayRef<Attribute> / FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` Also note that `linalg::tileToForallOpUsingTileSizes` would effectively call the `linalg::tileToForallOp` by computing the `numThreads` from the `op` and `tileSizes` and generate the `numThreads` version of the `scf.forall`. That is not the case anymore. Instead this will directly generate the `tileSizes` version of the `scf.forall` op ``` ... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...) ``` If you actually want to use the `numThreads` version, it is upto the caller to compute the `numThreads` and set `options.setNumThreads` instead of `options.setTileSizes`. Note that there is a slight difference in the num threads version and tile size version. The former requires an additional `affine.max` on the tile size to ensure non-negative tile sizes. When lowering to `numThreads` version this `affine.max` is not needed since by construction the tile sizes are non-negative. In previous implementations, the `numThreads` version generated when using the `linalg::tileToForallOpUsingTileSizes` method would avoid generating the `affine.max` operation. To get the same state, downstream users will have to additionally normalize the `scf.forall` operation. Changes to `transform.structured.tile_using_forall` The transform dialect op that called into `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` have been modified to call `scf::tileUsingSCF`. The transform dialect op always generates the `numThreads` version of the `scf.forall` op. So when `tile_sizes` are specified for the transform dialect op, first the `tile_sizes` version of the `scf.forall` is generated by the `scf::tileUsingSCF` method which is then further normalized to get back to the same state. So there is no functional change to `transform.structured.tile_using_forall`. It always generates the `numThreads` version of the `scf.forall` op (as it did before this change). --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2024-07-31 12:32:07 -07:00
Felix Schneider	ef8819e256	[mlir] Extend `tile_using_for` verifier to fix a crash (#98366 ) This patch adds a check for the correct number of `loops` results of the `transform.structured.tile_using_for` Op to the verifier, fixing a crash. Fix https://github.com/llvm/llvm-project/issues/98008	2024-07-12 09:05:06 +02:00
Hsiangkai Wang	d9c26b9d56	[mlir][linalg] Add transform operator for Winograd Conv2D algorithm (#96182 ) Add a transform operation structured.winograd_conv2d to convert linalg.conv_2d_nhwc_fhwc to Linalg winograd operations. Reviewers: ftynse, Max191, GeorgeARM, nicolasvasilache, MaheshRavishankar, dcaballe, rengolin Reviewed By: ftynse, Max191 Pull Request: https://github.com/llvm/llvm-project/pull/96182	2024-07-11 14:45:36 +01:00
Matthias Springer	f2d3d829b9	[mlir][linalg][Transform] Fix use-after-free in `SplitOp::apply` (#96390 ) Detected with ASAN. `Operation::getLoc()` was called after erasing the operation. Reverts `48cf6b6bbe`, which attempted to fix the use-after-free. (But the use-after-free is still there when the `hasFailed` branch is taken.)	2024-06-24 21:35:58 +02:00
Christian Sigg	48cf6b6bbe	[mlir] Fix use-after-free introduced in `a9efcbf490`.	2024-06-23 18:56:23 +02:00
muneebkhan85	a9efcbf490	[MLIR] Add continuous tiling to transform dialect (#82792 ) This patch enables continuous tiling of a target structured op using diminishing tile sizes. In cases where the tensor dimensions are not exactly divisible by the tile size, we are left with leftover tensor chunks that are irregularly tiled. This approach enables tiling of the leftover chunk with a smaller tile size and repeats this process recursively using exponentially diminishing tile sizes. This eventually generates a chain of loops that apply tiling using diminishing tile sizes. Adds `continuous_tile_sizes` op to the transform dialect. This op, when given a tile size and a dimension, computes a series of diminishing tile sizes that can be used to tile the target along the given dimension. Additionally, this op also generates a series of chunk sizes that the corresponding tile sizes should be applied to along the given dimension. Adds `multiway` attribute to `transform.structured.split` that enables multiway splitting of a single target op along the given dimension, as specified in a list enumerating the chunk sizes.	2024-06-21 16:39:43 +02:00
donald chen	2c1ae801e1	[mlir][side effect] refactor(*): Include more precise side effects (#94213 ) This patch adds more precise side effects to the current ops with memory effects, allowing us to determine which OpOperand/OpResult/BlockArgument the operation reads or writes, rather than just recording the reading and writing of values. This allows for convenient use of precise side effects to achieve analysis and optimization. Related discussions: https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243	2024-06-19 22:10:34 +08:00
MaheshRavishankar	b99d0b3440	[mlir][TilingInterface] Update `PartialReductionOpInterface` to get it more in line with `TilingInterface`. (#95460 ) The `TilingInterface` methods have return values that allow the interface implementation to return multiple operations, and also return tiled values explicitly. This is to avoid the assumption that the interface needs to return a single operation and this operations result are the expected tiled values. Make the `PartialReductionOpInterface::tileToPartialReduction` return `TilingResult` as well for the same reason. Similarly make the `PartialReductionOpInterface::mergeReductions` also return a list of generated operations and values to use as replacements. This is just a refactoring to allow for deprecation of `linalg::tileReductionUsingForall` with `scf::tileReductionUsingSCF` method.	2024-06-18 09:07:29 -07:00
Ramkumar Ramachandra	0fb216fb2f	mlir/MathExtras: consolidate with llvm/MathExtras (#95087 ) This patch is part of a project to move the Presburger library into LLVM.	2024-06-11 23:00:02 +01:00
Kunwar Grover	9329b20d5d	[mlir][TilingInterface] Allow multiple results in PartialReductionOpInterface (#92624 ) This patch adds support for reducing operations with multiple results using PartialReductionOpInterface. Also adds an implementation of PartialReductionOpInterface for multiple results for linalg.generic.	2024-05-22 19:21:20 +01:00
srcarroll	2c1c67674c	[mlir][transform] Consistent `linalg` `transform` op syntax for dynamic index lists (#90897 ) This patch is a first pass at making consistent syntax across the `LinalgTransformOp`s that use dynamic index lists for size parameters. Previously, there were two different forms: inline types in the list, or place them in the functional style tuple. This patch goes for the latter. In order to do this, the `printPackedOrDynamicIndexList`, `printDynamicIndexList` and their `parse` counterparts were modified so that the types can be optionally provided to the corresponding custom directives. All affected ops now use tablegen `assemblyFormat`, so custom `parse`/`print` functions have been removed. There are a couple ops that will likely add dynamic size support, and once that happens it should be made sure that the assembly remains consistent with the changes in this patch. The affected ops are as follows: `pack`, `pack_greedily`, `tile_using_forall`. The `tile_using_for` and `vectorize` ops already used this syntax, but their custom assembly was removed. --------- Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>	2024-05-08 09:11:53 -05:00
Kazu Hirata	64ee821fad	[mlir] Fix a warning This patch fixes: llvm-project/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp:1715:28: error: unused variable 'kPadToMultipleOfKeyword' [-Werror,-Wunused-const-variable]	2024-05-04 16:53:48 -07:00
srcarroll	f2f65eddc5	[mlir][transform] Add support for transform.param pad multiples in `PadOp` (#90755 ) This patch modifies the definition of `PadOp` to take transform params and handles for the `pad_to_multiple_of` operand. --------- Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>	2024-05-04 17:34:40 -05:00
Cullen Rhodes	be1c72d2ba	[mlir][linalg] Move transpose_matmul to targeted transform op (#89717 ) More targeted than a blanket "apply everywhere" pattern. Follow up to #89075 to address @ftynse's feedback.	2024-04-23 10:52:50 +01:00
Cullen Rhodes	7922534974	[mlir][linalg] Add patterns to convert matmul to transposed variants (#89075 ) This adds patterns to convert from the Linalg matmul and batch_matmul ops to the transposed variants. By default the LHS matrix is transposed. Our work enabling a lowering path from linalg.matmul to ArmSME has revealed the current lowering results in non-contiguous memory accesses for the A matrix and very poor performance. These patterns provide a simple option to fix this.	2024-04-23 07:21:06 +01:00
Steven Varoumas	35b292efc6	[mlir][Hoisting] Hoisting vector.extract/vector.broadcast pairs (#86108 ) This transformation, inspired by what is done in hoist_redundant_transfers, hoists pairs of extract/broadcast operations out of scf.for loops. It changes a loop of the form: ``` %res = scf.for _ = _ to _ step _ iter_args(%iarg = %v) -> (t1) { %e = vector.extract %iarg : t1 to t2 %u = "some_use"(%e) : (t2) -> t2 %b = vector.broadcast %u : t2 to t1 scf.yield %b : t1 } ``` into the following: ``` %e = vector.extract %v: t1 to t2 %res' = scf.for _ = _ to _ step _ iter_args(%iarg = %e) -> (t2) { %u' = "some_use"(%iarg) : (t2) -> t2 scf.yield %u' : t2 } %res = vector.broadcast %res' : t2 to t1 ```	2024-04-22 14:54:01 +02:00
Christian Sigg	a5757c5b65	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356 ) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.	2024-04-19 15:58:27 +02:00
srcarroll	b79db39659	[mlir][linalg] Support `ParamType` in `vector_sizes` option of `VectorizeOp` transform (#87557 )	2024-04-09 15:52:40 -05:00
Oleksandr "Alex" Zinenko	91856b34e3	[mlir] move MatchOpInterface under Transform/Interfaces (#86899 ) This is similar to the TransformOpInterface move.	2024-03-28 14:00:22 +01:00
srcarroll	bbcfe6f431	[mlir][transform] Emit error message with `emitSilenceableFailure` (#86146 ) The previous implementation used a `notifyMatchFailure` to emit failure message inappropriately and then used the `emitDefaultSilenceableFailure`. This patch changes this to use the more appropriate `emitSilenceableFailure` with error message. Additionally a failure test has been added.	2024-03-22 12:37:39 -05:00
srcarroll	df9ed9cf52	[mlir][transform] Fix failure in flattening already flattened linalg ops (#86037 ) The previous implementation was doing an early successful return on `rank <= 1` without adding the original op to transform results. This resulted in errors about number of returns. This patch fixes this by adding the original op to results. Additionally, we first check if op is elementwise and return a slienceable failure early if not.	2024-03-21 00:25:07 -05:00
Oleksandr "Alex" Zinenko	5a9bdd85ee	[mlir] split transform interfaces into a separate library (#85221 ) Transform interfaces are implemented, direction or via extensions, in libraries belonging to multiple other dialects. Those dialects don't need to depend on the non-interface part of the transform dialect, which includes the growing number of ops and transitive dependency footprint. Split out the interfaces into a separate library. This in turn requires flipping the dependency from the interface on the dialect that has crept in because both co-existed in one library. The interface shouldn't depend on the transform dialect either. As a consequence of splitting, the capability of the interpreter to automatically walk the payload IR to identify payload ops of a certain kind based on the type used for the entry point symbol argument is disabled. This is a good move by itself as it simplifies the interpreter logic. This functionality can be trivially replaced by a `transform.structured.match` operation.	2024-03-20 22:15:17 +01:00
lhunloh	47bc565ca7	[MLIR] [Transforms] Let `transform.structured.convert_to_loops` return handles to loops (#83984 ) This lets `transform.structured.convert_to_loops` return handles to the generated loops, making this transformation more useful to use for (transformation-)nesting purposes. This is modelled after SCFs `transform.loop.forall_to_for` which returns handles to loops. Introduced in commit `aa2a96a24a`, with a note that they might move out of the `Linalg`-Dialect, but no reason given for the non-return of handles. As far as I can see, this transform always returns loops.	2024-03-06 17:07:30 -05:00
Congcong Cai	0597644a64	[mlir][transform] replace original op to loop ops (#83537 )	2024-03-05 03:58:12 +08:00
srcarroll	b6f4dd9ee8	[mlir][transform] Implement `FlattenElementwiseLinalgOp` transform op (#81431 ) A `transform.structured.flatten_elementwise` op is implemented for flattening the iteration space and (applicable) operands/results to a single dimension.	2024-02-28 11:19:06 -06:00
Balaji V. Iyer	adf838daee	[mlir][Vectorizer] Added support to Vectorize tensor.unpack (#76087 ) Added support to vectorized tensor.unpack. The unpack Op is split into a `vector.transfer_read`, `vector.transpose`, `vector.shape_cast` and a `vector.transfer_write`.	2024-02-20 16:10:14 -06:00
Matthias Springer	914e607487	[mlir][IR][NFC] Rename `notifyRemoved` to `notifyErased` (#82253 ) Rename listener callback names: * `notifyOperationRemoved` -> `notifyOperationErased` * `notifyBlockRemoved` -> `notifyBlockErased` The current callback names are misnomers. The callbacks are triggered when an operation/block is erased, not when it is removed (unlinked). E.g.: ```c++ /// Notify the listener that the specified operation is about to be erased. /// At this point, the operation has zero uses. /// /// Note: This notification is not triggered when unlinking an operation. virtual void notifyOperationErased(Operation *op) {} ``` This change is in preparation of adding listener support to the dialect conversion. The dialect conversion internally unlinks IR before erasing it at a later point of time. There is an important difference between "remove" and "erase". Lister callback names should be accurate to avoid confusion.	2024-02-20 09:08:19 +01:00
Max191	7880b2c858	[mlir] Add direct vectorization lowering for `tensor.pack` ops (#78660 ) This PR adds a direct vectorization lowering of `tensor.pack` into `mask(vector.transfer_read)`->`vector.shape_cast`->`vector.transpose`->`vector.transfer_write`.	2024-02-07 14:11:11 -05:00
srcarroll	488f88b844	[mlir][transform] Add elementwise criteria to `match.structured.body` (#79626 ) As far as I am aware, there is no simple way to match on elementwise ops. I propose to add an `elementwise` criteria to the `match.structured.body` op. Although my only hesitation is that elementwise is not only determined by the body, but also the indexing maps. So if others find this too awkward, I can implement a separate match op instead.	2024-01-31 10:12:33 +01:00
jinchen62	d439f3640b	Add support of param type for transform.structured.tile_using_forall (#72097 ) Make transform.structured.tile_using_forall be able to take param type tile sizes. Examples: ``` %tile_sizes = transform.param.constant 16 : i64 -> !transform.param<i64> transform.structured.tile_using_forall %matmul tile_sizes [%tile_sizes : !transform.param<i64>, 32] ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op) ``` ``` %c10 = transform.param.constant 10 : i64 -> !transform.any_param %c20 = transform.param.constant 20 : i64 -> !transform.any_param %tile_sizes = transform.merge_handles %c10, %c20 : !transform.any_param transform.structured.tile_using_forall %matmul tile_sizes *(%tile_sizes : !transform.any_param) ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op) ```	2024-01-31 10:02:39 +01:00
MaheshRavishankar	76ead96c1d	[mlir][TilingInterface] Use `LoopLikeOpInterface` in tiling using SCF to unify tiling with `scf.for` and `scf.forall`. (#77874 ) Using `LoopLikeOpInterface` as the basis for the implementation unifies all the tiling logic for both `scf.for` and `scf.forall`. The only difference is the actual loop generation. This is a follow up to https://github.com/llvm/llvm-project/pull/72178 Instead of many entry points for each loop type, the loop type is now passed as part of the options passed to the tiling method. This is a breaking change with the following changes 1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF` 2) The `scf::tileUsingSCFForallOp` is deprecated. The same functionality is obtained by using `scf::tileUsingSCF` and setting the loop type in `scf::SCFTilingOptions` passed into this method to `scf::SCFTilingOptions::LoopType::ForallOp` (using the `setLoopType` method). 3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing any strategy with the default callback implemeting the greedy fusion. 4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use `SmallVector<LoopLikeOpInterface>`. 5) To make `scf::ForallOp` implement the parts of `LoopLikeOpInterface` needed, the `getOutputBlockArguments()` method is replaced with `getRegionIterArgs()` These changes now bring the tiling and fusion capabilities using `scf.forall` on par with what was already supported by `scf.for`	2024-01-25 21:26:23 -08:00
Matthias Springer	5cc0f76d34	[mlir][IR] Add rewriter API for moving operations (#78988 ) The pattern rewriter documentation states that "all IR mutations [...] are required to be performed via the `PatternRewriter`." This commit adds two functions that were missing from the rewriter API: `moveOpBefore` and `moveOpAfter`. After an operation was moved, the `notifyOperationInserted` callback is triggered. This allows listeners such as the greedy pattern rewrite driver to react to IR changes. This commit narrows the discrepancy between the kind of IR modification that can be performed and the kind of IR modifications that can be listened to.	2024-01-25 11:01:28 +01:00
Mehdi Amini	de5cedefab	Apply clang-tidy fixes for llvm-qualified-auto in LinalgTransformOps.cpp (NFC)	2024-01-18 16:39:20 -08:00
Mehdi Amini	42427d805a	Apply clang-tidy fixes for bugprone-macro-parentheses in LinalgTransformOps.cpp (NFC)	2024-01-18 16:39:20 -08:00
Quinn Dawkins	5caab8bbc0	[mlir][transform] Add transform.get_operand op (#78397 ) Similar to `transform.get_result`, except it returns a handle to the operand indicated by a positional specification, same as is defined for the linalg match ops. Additionally updates `get_result` to take the same positional specification. This makes the use case of wanting to get all of the results of an operation easier by no longer requiring the user to reconstruct the list of results one-by-one.	2024-01-18 09:33:14 -05:00

1 2 3 4 5 ...

307 Commits