clang-p2996

Author	SHA1	Message	Date
Aviad Cohen	ccc02563f4	[mlir][linalg]: Fixed possible memory leak in cloneToCollapsedOp (#87595 ) * Direct call to `clone` function leads to memory leak. Instead, we should use `RewriterBase` clone function instead.	2024-04-07 08:23:16 +03:00
Matthias Springer	5e4a44380e	[mlir][Interfaces][NFC] `ValueBoundsConstraintSet`: Pass stop condition in the constructor (#86099 ) This commit changes the API of `ValueBoundsConstraintSet`: the stop condition is now passed to the constructor instead of `processWorklist`. That makes it easier to add items to the worklist multiple times and process them in a consistent manner. The current `ValueBoundsConstraintSet` is passed as a reference to the stop function, so that the stop function can be defined before the the `ValueBoundsConstraintSet` is constructed. This change is in preparation of adding support for branches.	2024-04-04 17:05:47 +09:00
Matthias Springer	a27d886ce4	[mlir][linalg][bufferize] Fix element-wise access optimization for sparse tensors (#87305 ) `linalg.generic` ops with sparse tensors do not necessarily bufferize to element-wise access, because insertions into a sparse tensor may change the layout of (or reallocate) the underlying sparse data structures.	2024-04-03 09:57:25 +09:00
Jakub Kuderski	971b852546	[mlir][NFC] Simplify type checks with isa predicates (#87183 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-04-01 11:40:09 -04:00
Jerry Wu	0c1c0d5393	[MLIR] Add patterns to bubble-up pack and push-down unpack through collapse/expand shape ops (#85297 ) Add DataLayoutPropagation patterns to bubble-up pack and push-down unpack through collapse/expand shape ops. --------- Co-authored-by: Quinn Dawkins <quinn.dawkins@gmail.com>	2024-03-27 21:32:27 -04:00
Pablo Antonio Martinez	c41286af3f	[mlir][linalg] Emit a warning when tile_using_forall generates non thread-safe code (#80813 ) Description The documentation of `transform.structured.tile_using_forall` says: _"It is the user’s responsibility to ensure that num_threads/tile_sizes is a valid tiling specification (i.e. that only tiles parallel dimensions, e.g. in the Linalg case)."_ In other words, tiling a non-parallel dimension would generate code with data races which is not safe to parallelize. For example, consider this example (included in the tests in this PR): ``` func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { %0 = scf.forall (%arg2) in (8) shared_outs(%arg3 = %arg1) -> (tensor<300x8xf32>) { %1 = affine.min #map(%arg2) %2 = affine.max #map1(%1) %3 = affine.apply #map2(%arg2) %extracted_slice = tensor.extract_slice %arg0[%3, 0, 0] [%2, 300, 8] [1, 1, 1] : tensor<100x300x8xf32> to tensor<?x300x8xf32> %4 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["reduction", "parallel", "parallel"]} ins(%extracted_slice : tensor<?x300x8xf32>) outs(%arg3 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %5 = arith.addf %in, %out : f32 linalg.yield %5 : f32 } -> tensor<300x8xf32> scf.forall.in_parallel { tensor.parallel_insert_slice %4 into %arg3[0, 0] [300, 8] [1, 1] : tensor<300x8xf32> into tensor<300x8xf32> } } return %0 : tensor<300x8xf32> } ``` We can easily see that this is not safe to parallelize because all threads would be writing to the same position in `%arg3` (in the `scf.forall.in_parallel`. This PR detects wether it's safe to `tile_using_forall` and emits a warning in the case it is not. Brief explanation It first generates a vector of affine expressions representing the tile values and stores it in `dimExprs`. These affine expressions are compared with the affine expressions coming from the results of the affine map of each output in the linalg op. So going back to the previous example, the original transform is: ``` #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> #map1 = affine_map<(d0, d1, d2) -> (d1, d2)> func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { // expected-warning@+1 {{tiling is not thread safe at axis #0}} %0 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction", "parallel", "parallel"]} ins(%arg0 : tensor<100x300x8xf32>) outs(%arg1 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %1 = arith.addf %in, %out : f32 linalg.yield %1 : f32 } -> tensor<300x8xf32> return %0 : tensor<300x8xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!transform.any_op) -> !transform.any_op %forall, %tiled_generic = transform.structured.tile_using_forall %0 num_threads [8] : (!transform.any_op) -> (!transform.any_op, !transform.any_op) transform.yield } } ``` The `num_threads` attribute would be represented as `(d0)`. Because the linalg op has only one output (`arg1`) it would only check against the results of `#map1`, which are `(d1, d2)`. The idea is to check that all affine expressions in `dimExprs` are present in the output affine map. In this example, `d0` is not in `(d1, d2)`, so tiling that axis is considered not thread safe.	2024-03-22 12:53:29 +01:00
Andrzej Warzyński	c56bd7ab79	[mlir][linalg] Enable masked vectorisation for depthwise convolutions (#81625 ) This patch adds support for masked vectorisation of depthwise 1D WC convolutions,`linalg.depthwise_conv_1d_nwc_wc`. This is implemented by adding support for masking. Two major assumptions are made: * only the channel dimension can be dynamic/scalable (i.e. the trailing dim), * when specifying vector sizes to use in the vectoriser, only the size corresponding to the channel dim is effectively used (other dims are inferred from the context). In terms of scalable vectorisation, this should be sufficient to cover all practical cases (i.e. making arbitrary dim scalable wouldn't make much sense). As for more generic cases with dynamic shapes (e.g. W or N dims being dynamic), more work would be needed. In particular, one would have to consider the filter and input/output tensors separately.	2024-03-14 20:19:46 +00:00
Quinn Dawkins	60e562d11a	[mlir][linalg] Add unit dim folding pattern for tensor.pad (#84684 ) Unit extent dims that are not padded by a tensor.pad can be folded away. When folding unit extent dims of surrounding linalg ops, this increases the chance that the iteration space of the linalg op will align with nearby pad ops, improving fusion opportunities.	2024-03-11 18:24:23 -04:00
Matthias Springer	f1aa783788	[mlir][IR] Fix overload resolution on MSVC build (#84589 ) #82629 added additional overloads to `replaceAllUsesWith` and `replaceUsesWithIf`. This caused a build breakage with MSVC when called with ops that can implicitly convert to `Value`. ``` external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(881): error C2666: 'mlir::RewriterBase::replaceAllUsesWith': 2 overloads have similar conversions external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(631): note: could be 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Operation *,mlir::ValueRange)' external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(626): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::ValueRange,mlir::ValueRange)' external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(616): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Value,mlir::Value)' external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(882): note: while trying to match the argument list '(mlir::tensor::ExtractSliceOp, T)' with [ T=mlir::Value ] ``` Note: The LLVM build bots (Linux and Windows) did not break, this seems to be an issue with `Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe`. This change renames the newly added overloads to `replaceAllOpUsesWith` and `replaceOpUsesWithIf`.	2024-03-11 17:36:18 +09:00
Justin Lebar	fab2bb8bfd	Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678 ) For some reason this was missing from STLExtras.	2024-03-10 20:00:13 -07:00
Jie Fu	474a73d979	[mlir] Fix build failure in MeshShardingInterfaceImpl.cpp (NFC) llvm-project/mlir/lib/Dialect/Linalg/Transforms/MeshShardingInterfaceImpl.cpp:96:8: error: unused variable 'resultElementType' [-Werror,-Wunused-variable] Type resultElementType = ^ llvm-project/mlir/lib/Dialect/Linalg/Transforms/MeshShardingInterfaceImpl.cpp:122:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] } ^ 2 errors generated.	2024-03-08 09:38:29 +08:00
Boian Petkantchin	fb582b6ace	[mlir] Implement Mesh's ShardingInterface for Linalg ops (#82284 ) Allows linalg structured operations to be handled during spmdization and sharding propagation. There is only support for projected permutation indexing maps.	2024-03-07 17:05:44 -08:00
Matthias Springer	59a92019fb	[mlir][IR] Make `replaceOp` / `replaceAllUsesWith` API consistent (#82629 ) * `replaceOp` replaces all uses of the original op and erases the old op. * `replaceAllUsesWith` replaces all uses of the original op/value/block. It does not erase any IR. This commit renames `replaceOpWithIf` to `replaceUsesWithIf`. `replaceOpWithIf` was a misnomer because the function never erases the original op. Similarly, `replaceOpWithinBlock` is renamed to `replaceUsesWithinBlock`. (No "operation replaced" is sent because the op is not erased.) Also improve comments.	2024-03-07 10:26:22 +09:00
Thomas Preud'homme	a9304edf20	Fix remaining build failures with GCC 8.3 (#83266 ) When compiling for GCC 8.x (< 8.4), SFINAE is disabled for iterator_range constructor causing ambiguous resolution to construct an OperandRange from a MutableOperatorRange, even in the presence of a static_cast<OperatorRange>. This adds an explicit conversion method to lift the ambiguity. Tested with a full MLIR build with GCC 8.3.	2024-03-05 19:32:27 +00:00
Quinn Dawkins	3f18f6a2cf	[mlir][linalg] Enable fusion by expansion of reduction and named ops (#83473 ) This adds support for expansion of named linalg ops and linalg ops with reduction iterators. This improves the ability to make fusion decisions WRT reduction operations. To recover the previous behavior, users of the patterns can add a control function to restrict propagation of reshape by expansion through linalg ops with reduction iterators. For named linalg ops, this always converts the named op into a generic.	2024-03-03 01:54:03 -05:00
Han-Chung Wang	46bd65a050	[mlir][LinAlg] Vectorize reverse-like ops using vector.gather ops. (#83205 ) The reverse op is treated as a VectorMemoryAccessKind::Contiguous load. It is contiguous slice, but we'll need to compute indices differently and apply a reverse at vector level. It takes non-trivial efforts for the approach. The revision flips the case to use vector.gather. Otherwise there are functionality issues. E.g., the below example loaded `2, 3, 4` (which is a bug), but what we want is `2, 1, 0`. Before vectorization: ```mlir func.func @vectorize_reverse_like_tensor_extract(%arg0: tensor<1x2x3xf32>, %arg1: tensor<1x1x3xf32>, %arg2: index) -> tensor<1x1x3xf32> { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %c2 = arith.constant 2 : index %0 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel", "parallel", "parallel"]} outs(%arg1 : tensor<1x1x3xf32>) { ^bb0(%out: f32): %1 = linalg.index 1 : index %2 = linalg.index 0 : index %3 = affine.apply #map1(%1, %2, %arg2) %4 = linalg.index 2 : index %5 = arith.subi %c2, %4 : index %extracted = tensor.extract %arg0[%c0, %3, %5] : tensor<1x2x3xf32> linalg.yield %extracted : f32 } -> tensor<1x1x3xf32> return %0 : tensor<1x1x3xf32> } ``` Partial IR after vectorization: ``` %5 = vector.constant_mask [1, 1, 3] : vector<1x1x4xi1> %6 = vector.broadcast %arg0 : index to vector<1x1x4xindex> %7 = vector.shape_cast %6 : vector<1x1x4xindex> to vector<4xindex> %8 = vector.extractelement %7[%c0_i32 : i32] : vector<4xindex> %9 = vector.transfer_read %3[%c0, %8, %c2], %cst, %5 {in_bounds = [true, true, true]} : tensor<1x2x3xf32>, vector<1x1x4xf32> ```	2024-02-28 09:45:09 -08:00
srcarroll	b6f4dd9ee8	[mlir][transform] Implement `FlattenElementwiseLinalgOp` transform op (#81431 ) A `transform.structured.flatten_elementwise` op is implemented for flattening the iteration space and (applicable) operands/results to a single dimension.	2024-02-28 11:19:06 -06:00
Quinn Dawkins	1e98d4883d	[mlir][linalg] NFC: Use tablegen macro for pass constructors (#82892 ) This uses the tablegen macros for generating pass constructors, exposing pass options for fold-unit-extent-dims and linalg-detensorize. Additionally aligns some of the pass namings to their text counterpart. This includes an API change: createLinalgGeneralizationPass -> createLinalgGeneralizeNamedOpsPass	2024-02-24 14:35:39 -05:00
Matthias Springer	91d5653e3a	[mlir] Use `OpBuilder::createBlock` in op builders and patterns (#82770 ) When creating a new block in (conversion) rewrite patterns, `OpBuilder::createBlock` must be used. Otherwise, no `notifyBlockInserted` notification is sent to the listener. Note: The dialect conversion relies on listener notifications to keep track of IR modifications. Creating blocks without the builder API can lead to memory leaks during rollback.	2024-02-24 09:10:07 +01:00
Balaji V. Iyer	adf838daee	[mlir][Vectorizer] Added support to Vectorize tensor.unpack (#76087 ) Added support to vectorized tensor.unpack. The unpack Op is split into a `vector.transfer_read`, `vector.transpose`, `vector.shape_cast` and a `vector.transfer_write`.	2024-02-20 16:10:14 -06:00
srcarroll	9466c4e629	[MLIR][tensor] Improve `tensor.pack` verifier to catch more cases with unconditional runtime errors (#77217 ) Previously, the `tensor.pack` verifier detects unconditional runtime errors only when tile sizes are static. Now, dynamic tiles are considered and we only require that the input and either corresponding tile or output size are static to determine if it will unconditionally produce errors at runtime.	2024-02-19 12:27:24 -06:00
Quinn Dawkins	886294a2fe	[mlir][linalg] Add pattern to propagate pack up through tensor.pad (#82035 ) This mirrors the existing pattern for pushing unpack down through padding, restricting to cases where the padded dimensions aren't tiled by the pack. Additionally reformats the propagation test to make it easier to read.	2024-02-18 11:20:15 -05:00
Javed Absar	7c4c274643	[MLIR][NFC] Fix some comments in padding transform. (#81741 )	2024-02-14 17:00:42 +00:00
Quinn Dawkins	2c3ba9f622	[mlir][Linalg] Unrestrict redundant transfer hoisting from func.func (#79516 ) All the hoistRedundantVectorTransfers op does is walk the target operation, which does not have to be restricted to func.func.	2024-02-10 23:01:14 -05:00
Uday Bondhugula	fe8a62c463	[MLIR] Fix crash in AffineMap::replace for zero result maps (#80930 ) Fix obvious bug in AffineMap::replace for the case of zero result maps. Extend/complete inferExprsFromList to work with empty expression lists.	2024-02-08 19:16:29 +05:30
Max191	7880b2c858	[mlir] Add direct vectorization lowering for `tensor.pack` ops (#78660 ) This PR adds a direct vectorization lowering of `tensor.pack` into `mask(vector.transfer_read)`->`vector.shape_cast`->`vector.transpose`->`vector.transfer_write`.	2024-02-07 14:11:11 -05:00
Han-Chung Wang	1b7b40bf5d	[mlir][Linalg] Support lowerUnPack for identity out_dims_perm cases. (#79594 )	2024-01-26 05:46:53 -08:00
MaheshRavishankar	76ead96c1d	[mlir][TilingInterface] Use `LoopLikeOpInterface` in tiling using SCF to unify tiling with `scf.for` and `scf.forall`. (#77874 ) Using `LoopLikeOpInterface` as the basis for the implementation unifies all the tiling logic for both `scf.for` and `scf.forall`. The only difference is the actual loop generation. This is a follow up to https://github.com/llvm/llvm-project/pull/72178 Instead of many entry points for each loop type, the loop type is now passed as part of the options passed to the tiling method. This is a breaking change with the following changes 1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF` 2) The `scf::tileUsingSCFForallOp` is deprecated. The same functionality is obtained by using `scf::tileUsingSCF` and setting the loop type in `scf::SCFTilingOptions` passed into this method to `scf::SCFTilingOptions::LoopType::ForallOp` (using the `setLoopType` method). 3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing any strategy with the default callback implemeting the greedy fusion. 4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use `SmallVector<LoopLikeOpInterface>`. 5) To make `scf::ForallOp` implement the parts of `LoopLikeOpInterface` needed, the `getOutputBlockArguments()` method is replaced with `getRegionIterArgs()` These changes now bring the tiling and fusion capabilities using `scf.forall` on par with what was already supported by `scf.for`	2024-01-25 21:26:23 -08:00
Mehdi Amini	2e0909025e	Apply clang-tidy fixes for readability-simplify-boolean-expr in Vectorization.cpp (NFC)	2024-01-22 17:34:55 -08:00
Mehdi Amini	3af5ab21b8	Apply clang-tidy fixes for readability-identifier-naming in Transforms.cpp (NFC)	2024-01-22 17:34:55 -08:00
Mehdi Amini	c0fe2b8963	Apply clang-tidy fixes for modernize-loop-convert in Transforms.cpp (NFC)	2024-01-22 17:34:55 -08:00
Mehdi Amini	b1d4265a5f	Apply clang-tidy fixes for llvm-qualified-auto in Promotion.cpp (NFC)	2024-01-19 17:58:15 -08:00
Mehdi Amini	197a73f019	Apply clang-tidy fixes for llvm-include-order in Fusion.cpp (NFC)	2024-01-19 17:58:15 -08:00
Mehdi Amini	46ce993dd4	Apply clang-tidy fixes for llvm-else-after-return in ElementwiseOpFusion.cpp (NFC)	2024-01-19 17:58:14 -08:00
Mehdi Amini	f19f213974	Apply clang-tidy fixes for llvm-else-after-return in DropUnitDims.cpp (NFC)	2024-01-19 17:58:14 -08:00
Mehdi Amini	3b61f5a1bc	Apply clang-tidy fixes for performance-unnecessary-value-param in DataLayoutPropagation.cpp (NFC)	2024-01-19 17:58:14 -08:00
Mehdi Amini	60caa8ef74	Apply clang-tidy fixes for performance-unnecessary-value-param in ConvertConv2DToImg2Col.cpp (NFC)	2024-01-18 16:39:20 -08:00
Matthias Springer	5fcf907b34	[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260 ) This commit renames 4 pattern rewriter API functions: * `updateRootInPlace` -> `modifyOpInPlace` * `startRootUpdate` -> `startOpModification` * `finalizeRootUpdate` -> `finalizeOpModification` * `cancelRootUpdate` -> `cancelOpModification` The term "root" is a misnomer. The root is the op that a rewrite pattern matches against (https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional). A rewriter must be notified of all in-place op modifications, not just in-place modifications of the root (https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old function names were confusing and have contributed to various broken rewrite patterns. Note: The new function names use the term "modify" instead of "update" for consistency with the `RewriterBase::Listener` terminology (`notifyOperationModified`).	2024-01-17 11:08:59 +01:00
Kazu Hirata	8e8bbbd48e	[mlir] Use llvm::is_contained (NFC)	2024-01-12 22:08:29 -08:00
Matthias Springer	0a8e3dd432	[mlir][Interfaces] `DestinationStyleOpInterface`: Rename `hasTensor/BufferSemantics` (#77574 ) Rename interface functions as follows: * `hasTensorSemantics` -> `hasPureTensorSemantics` * `hasBufferSemantics` -> `hasPureBufferSemantics` These two functions return "true" if the op has tensor/buffer operands but not buffer/tensor operands. Also drop the "ranked" part from the interface, i.e., do not distinguish between ranked/unranked types. The new function names describe the functions more accurately. They also align their semantics with the notion of "tensor semantics" with the bufferization framework. (An op is supposed to be bufferized if it has tensor operands, and we don't care if it also has memref operands.) This change is in preparation of #75273, which adds `BufferizableOpInterface::hasTensorSemantics`. By renaming the functions in the `DestinationStyleOpInterface`, we can avoid name clashes between the two interfaces.	2024-01-12 10:02:54 +01:00
Matthias Springer	bb6d5c2200	[mlir][Transforms] `GreedyPatternRewriteDriver`: Do not CSE constants during iterations (#75897 ) The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply rewrite patterns to ops. It has special handling for constants: they are CSE'd and sometimes moved to parent regions to allow for additional CSE'ing. This happens in `OperationFolder`. To allow for efficient CSE'ing, `OperationFolder` maintains an internal lookup data structure to find the existing constant ops with the same value for each `IsolatedFromAbove` region: ```c++ /// A mapping between an insertion region and the constants that have been /// created within it. DenseMap<Region *, ConstantMap> foldScopes; ``` Rewrite patterns are allowed to modify operations. In particular, they may move operations (including constants) from one region to another one. Such an IR rewrite can make the above lookup data structure inconsistent. We encountered such a bug in a downstream project. This bug materialized in the form of an op that uses the result of a constant op from a different `IsolatedFromAbove` region (that is not accessible). This commit changes the behavior of the `GreedyPatternRewriteDriver` such that `OperationFolder` is used to CSE constants at the beginning of each iteration (as the worklist is populated), but no longer during an iteration. `OperationFolder` is no longer used after populating the worklist, so we do not have to care about inconsistent state in the `OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver` now performs the op folding by itself instead of calling `OperationFolder::tryToFold`. This change changes the order of constant ops in test cases, but not the region in which they appear. All broken test cases were fixed by turning `CHECK` into `CHECK-DAG`. Alternatives considered: The state of `OperationFolder` could be partially invalidated with every `notifyOperationModified` notification. That is more fragile than the solution in this commit because incorrect rewriter API usage can lead to missing notifications and hard-to-debug `IsolatedFromAbove` violations. (It did not fix the above mention bug in a downstream project, which could be due to incorrect rewriter API usage or due to another conceptual problem that I missed.) Moreover, ops are frequently getting modified during a greedy pattern rewrite, so we would likely keep invalidating large parts of the state of `OperationFolder` over and over. Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant ops are no longer folded during a greedy pattern rewrite. If you rely on folding (and rematerialization) of constant ops during a greedy pattern rewrite, turn the folder into a pattern.	2024-01-05 09:22:18 +01:00
Andrzej Warzyński	db9a16eaed	[mlir][nfc] Update comments in the Linalg vectoriser (#76797 )	2024-01-04 17:24:22 +00:00
Spenser Bauman	6b65d79fbb	[mlir][linalg] Fix for invalid IR in eliminate_empty_tensors (#73513 ) The transform.structured.eliminate_empty_tensors can produce mis-typed IR when traversing use-def chains past tensor reshaping operations for sharing candidates. This results in Linalg operations whose output types do not match their 'outs' arguments. This patch filters out candidate tensor.empty operations when their types do not match the candidate input operand.	2024-01-01 17:12:40 +00:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Matthias Springer	3a087c1592	[mlir][linalg] Fix invalid IR in Linalg op fusion (#74425 ) Linalg op fusion (`Linalg/Transforms/Fusion.cpp`) used to generate invalid fused producer ops: ``` error: 'linalg.conv_2d_nhwc_hwcf' op expected type of operand #2 ('tensor<1x8x16x4xf32>') to match type of corresponding result ('tensor<?x?x?x?xf32>') note: see current operation: %24 = "linalg.conv_2d_nhwc_hwcf"(%21, %22, %23) <{dilations = dense<1> : tensor<2xi64>, operandSegmentSizes = array<i32: 2, 1>, strides = dense<2> : tensor<2xi64>}> ({ ^bb0(%arg9: f32, %arg10: f32, %arg11: f32): %28 = "arith.mulf"(%arg9, %arg10) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 %29 = "arith.addf"(%arg11, %28) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%29) : (f32) -> () }) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 * 2 + d4, d2 * 2 + d5, d6)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>]} : (tensor<1x?x?x3xf32>, tensor<3x3x3x4xf32>, tensor<1x8x16x4xf32>) -> tensor<?x?x?x?xf32> ``` This is a problem because the input IR to greedy pattern rewriter during `-test-linalg-greedy-fusion` is invalid. This commit fixes tests such as `mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir` when verifying the IR after each pattern application (#74270).	2023-12-19 14:17:10 +09:00
srcarroll	b26ee97537	[MLIR][Linalg] Support dynamic sizes in `lower_unpack` (#75494 )	2023-12-18 19:02:04 +01:00
Quinn Dawkins	82ab0f7f36	[mlir][linalg] Fix rank-reduced cases for extract/insert slice in DropUnitDims (#74723 ) Inferring the reshape reassociation indices for extract/insert slice ops based on the read sizes of the original slicing op will generate an invalid expand/collapse shape op for already rank-reduced cases. Instead just infer from the shape of the slice. Ported from Differential Revision: https://reviews.llvm.org/D147488	2023-12-16 10:08:51 -05:00
Andrzej Warzyński	f11bda78c8	[mlir][linalg] Use vector.shuffle to flatten conv filter (#75038 ) Updates the vectorisation of 1D depthwise convolution when flattening the channel dimension (introduced in #71918). In particular - how the convolution filter is "flattened". ATM, the vectoriser will use `vector.shape_cast`: ```mlir %b_filter = vector.broadcast %filter : vector<4xf32> to vector<3x2x4xf32> %sc_filter = vector.shape_cast %b_filter : vector<3x2x4xf32> to vector<3x8xf32> ``` This lowering is not ideal - `vector.shape_cast` can be convenient when it's folded away, but that's not happening in this case. Instead, this patch updates the vectoriser to use `vector.shuffle` (the overall result is identical): ```mlir %sh_filter = vector.shuffle %filter, %filter [0, 1, 2, 3, 0, 1, 2, 3] : vector<4xf32>, vector<4xf32> %b_filter = vector.broadcast %sh_filter : vector<8xf32> to vector<3x8xf32> ```	2023-12-15 17:56:59 +00:00
Amir Bishara	cf2d625a5d	[mlir][linalg] Expose getPreservedProducerResults method from ElementwiseOpFusion file (#73850 ) Declare `getPreservedProducerResults` function which helps to get the preserved results of the producer linalg generic operation as a result of elementwise fusion.	2023-12-08 11:50:33 +02:00
Matthias Springer	986287e7f3	[mlir][SparseTensor] Fix invalid API usage in patterns (#74690 ) Rewrite patterns must return `success` if the IR was modified. This commit fixes sparse tensor tests such as `SparseTensor/sparse_fusion.mlir`, `SparseTensor/CPU/sparse_reduce_custom.mlir`, `SparseTensor/CPU/sparse_semiring_select.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.	2023-12-07 12:05:20 +09:00

1 2 3 4 5 ...

1504 Commits