Add pattern that converts a `tensor.expand_shape` op to a more static
form.
This matches the pattern: `tensor.cast` -> `tensor.expand_shape` if it
has a foldable `tensor.cast` and some constant foldable `output_shape`
operands for the `tensor.expand_shape`. This makes the
`tensor.expand_shape` more static, as well as allowing the static
information to be propagated further down in the program.
Restricts the verifier for tensor.pack and tensor.unpack Ops so that the
following is no longer allowed:
```mlir
%c8 = arith.constant 8 : index
%0 = tensor.pack %input inner_dims_pos = [0, 1] inner_tiles = [8, %c8] into %output : tensor<?x?xf32> -> tensor<?x?x8x8xf32>
```
Specifically, in line with other Tensor Ops, require:
* a dynamic dimensions for each (dynamic) SSA value,
* a static dimension for each static size (attribute).
In the example above, a static dimension (8) is mixed with a dynamic
size (%c8).
Note that this is mostly deleting existing code - that's because this
change simplifies the logic in verifier.
For more context:
* https://discourse.llvm.org/t/tensor-ops-with-dynamic-sizes-which-behaviour-is-more-correct
This is more efficient to avoid a clone that is immediately removed.
Also guard the insertion of a cast on the result on whether the
destination type changed.
Updates the return type of `getNumDynamicDims` and `getNumScalableDims`
from `int64_t` to `size_t`. This is for consistency with other
helpers/methods that return "size" and to reduce the number of
`static_cast`s in various places.
Implements two helper hooks for PackOp and UnPackOP, `getAllOuterDims`
and `getTiledOuterDims`, and adds them to RelayoutOp (that both PackOp
an UnPackOp inherit from).
This improves code re-use and also clarifies the meaning of "outer dims"
and "tiled outer dims".
`tensor.pad(tensor.pad)` with the same constant padding value can be
combined into a single pad that pads to the sum of the high and low
padding amounts.
This patch add a check for indices of `tensor.gather` and
`tensor.scatter`. For that the length of gather_dims/scatter_dims should
match the size of last dimension of the indices. Fix#94901.
The `getDroppedDims` utility function does not follow the convention of
dropping outermost unit dimensions first when inferring a rank reduction
mask for a slice. This PR updates the implementation to match this
convention.
For patterns where there are multiple results apart from dpsInits, this
fails.
E.g.:
```
%13:2 = iree_codegen.ukernel.generic "iree_uk_unpack"
ins(%extracted_slice : tensor<?x1x16x16xf32>) outs(%11 :
tensor<?x?xf32>) ... -> tensor<?x?xf32>, i32
```
The above op has results apart from dpsInit and hence fails. The PR
assumes that the result has dpsInits followed by nonDpsInits.
Implement folding and rewrite logic to eliminate no-op tensor and memref
operations. This handles two specific cases:
1. tensor.insert_slice operations where the size of the inserted slice
is known to be 0.
2. memref.copy operations where either the source or target memrefs are
known to be emtpy.
Co-authored-by: Spenser Bauman <sabauma@fastmail>
Extends pack/unpack perm attribute checker to account for cases when the
optional outer_dims_perm attribute might be missing in one operation and
the other one has explicit identity permutation. This enables
canonicalizer to fold more unpack(pack(x)) variants.
Previously this was only populated in the create method later. This
resolves some of invalid builder paths. This may also be sufficient that
type inference functions no longer have to consider whether property
conversion has happened (but haven't verified that yet).
This also makes Attributes corresponding to Properties as optional
inside the set from attributes method. Today that is in effect what
happens with Property value initialization and folks use it to define
custom C++ types whose default initialization is what they want. This is
the behavior users get if they use properties directly. Propagating
Attributes without allowing partial setting would require iterating over
the dictionary attribute considering the properties of the op type that
will be created. This could also have been an additional method
generated or optional behavior on the set method. But doing it
consistently seems better. In terms of whats lost, it doesn't seem like
anything compared to the pure Property path where Property is default
value initialized and then partially overwritten (this doesn't seem to
buy anything else verification wise).
Default valued Properties (as specified ODS side rather than C++ side)
triggered error as the containing class was not yet complete but
referenced nested class, so that we couldn't have default initializer
for them in the parent class. Added an additional forwarding builder to
avoid needing to update call sites. This could be split out to separate
change.
Inlined templated function in unit test that was only used once. Moved
initialization earlier where seen.
Windows build of `mlir` with Visual Studio (19.36.32538 for x64) using
with the following command:
`cmake.exe -GNinja -DCMAKE_BUILD_TYPE=Release
-DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=1
-DLLVM_TARGETS_TO_BUILD=host ../llvm`
is leading to a crash when calling canonicalization on
`tensor.pack`/`tensor.unpack` ops `mlir-opt --canonicalize input.mlir`
where the `input.mlir` is as follows (this is taken from one of the
filecheck tests for `tensor.pack`):
```
func.func @pack_unpack(%arg0: tensor<128x256xf32>) -> tensor<128x256xf32> {
%pack_dest = tensor.empty() : tensor<8x16x8x32xf32>
%unpack_dest = tensor.empty() : tensor<128x256xf32>
%tp = tensor.pack %arg0 outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %pack_dest : tensor<128x256xf32> -> tensor<8x16x8x32xf32>
%tup = tensor.unpack %tp outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [8, 32] into %unpack_dest : tensor<8x16x8x32xf32> -> tensor<128x256xf32>
return %tup : tensor<128x256xf32>
}
```
The crash is seemingly coming from invalid memory access during
iterating over `innerDimsPos` within `getPackOpResultTypeShape`.
This crash is also causing the following tests to fail:
```
MLIR :: Dialect/Linalg/canonicalize.mlir
MLIR :: Dialect/Linalg/data-layout-propagation.mlir
MLIR :: Dialect/Linalg/generalize-tensor-pack-tile.mlir
MLIR :: Dialect/Linalg/generalize-tensor-pack.mlir
MLIR :: Dialect/Linalg/generalize-tensor-unpack-tile.mlir
MLIR :: Dialect/Linalg/generalize-tensor-unpack.mlir
MLIR :: Dialect/Linalg/transform-lower-pack.mlir
MLIR :: Dialect/Linalg/transform-op-fuse.mlir
MLIR :: Dialect/Linalg/transform-op-pack.mlir
MLIR :: Dialect/Linalg/transform-pack-greedily.mlir
MLIR :: Dialect/Tensor/canonicalize.mlir
MLIR :: Dialect/Tensor/fold-into-pack-and-unpack.mlir
MLIR :: Dialect/Tensor/invalid.mlir
MLIR :: Dialect/Tensor/ops.mlir
MLIR :: Dialect/Tensor/simplify-pack-unpack.mlir
MLIR :: Dialect/Tensor/tiling.mlir
```
A general question is: is it possible to support hooks here to infer the
encoding? E.g., when the extracted tensor slice is rank-reduced, the
encoding need to be updated accordingly as well.
In some cases this pattern may ignore static information due to dynamic
operands in the insert_slice sizes operands, e.g.:
```
%0 = tensor.cast %arg0 : tensor<1x?xf32> to tensor<?x?xf32>
%1 = tensor.insert_slice %0 into %arg1[...] [%s0, %s1] [...]
: tensor<?x?xf32> into tensor<?x?xf32>
```
Can be rewritten into:
```
%1 = tensor.insert_slice %arg0 into %arg1[...] [1, %s1] [...]
: tensor<1x?xf32> into tensor<?x?xf32>
```
This PR updates the matching in the pattern to allow rewrites like this.
Unblocking downstream integrate where an expected-to-fail test was
expecting this to be a runtime verifier error, not a compiler crash:
https://github.com/llvm/torch-mlir/pull/3279.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
Canonicalization defaulted to replacement when the input dims were from
unknown source. This is obviously incorrect. Tweaked and included test
to prevent future issue.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
If `tensor.reshape` occurs with `d0, d1, d2, ...` for the dimensions we
know that the reshape is a no-op. Checking for this case lets us fold
away the computation.
This fixes an "infinite" loop bug, where the incoming IR was repeatedly
rewritten while adding identical cast operations. The test for
compatible types should include the notion of an encoding. If it
differs, then a
naive fusion into the consumer is invalid.
The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:
```
$> mlir-opt --canonicalize input.mlir
func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
%c4 = arith.constant 4 : index
%reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
%0 = arith.muli %arg2, %c4 : index
%dim = memref.dim %reshape, %0 : memref<*xf32>
return %dim : index
}
```
results in:
```
dominator.mlir:22:12: error: operand #1 does not dominate this use
%dim = memref.dim %reshape, %0 : memref<*xf32>
^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
%0 = arith.muli %arg2, %c4 : index
```
Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
Before: op verifiers failed if the input and output ranks were the same
(i.e. no expansion or collapse). This behavior requires users of these
shape ops to verify manually that they are not creating identity
versions of these ops every time they build them -- problematic. This PR
removes this strict verification, and introduces folders for the the
identity cases.
The PR also removes the special case handling of rank-0 tensors for
expand_shape and collapse_shape, there doesn't seem to be any reason to
treat them differently.
The revision does not refactor the inferStaticShape for pack and unpack
ops because they can diverge quickly. Because there are more dimensions
can be inferred (i.e., with inner_tile_sizes) if the pack op does not
have padding value.
This is a follow-up of https://github.com/llvm/llvm-project/pull/80848
Previously, the `tensor.pack` verifier detects unconditional runtime
errors only when tile sizes are static. Now, dynamic tiles are
considered and we only require that the input and either corresponding
tile or output size are static to determine if it will unconditionally
produce errors at runtime.
Previously, `InsertSliceOpSourceCastInserter` was incorrectly applied to
a case when tensor types have an encoding attribute attached to them.
The type `newSrcType` was missing that attribute from the old `srcType`,
which made the expression `srcType == newSrcType` false, since
`tensor<2x2xf32, "foo">` is not equal to `tensor<2x2xf32>`. That lead to
an endless back and forth between `InsertSliceOpSourceCastInserter` that
would introduce a cast and `InsertSliceOpCastFolder` that would remove
it right after.
The canonicalization incrementally converts foldable dynamic hi/lo
padding to static hi/lo values. During this canonicalization the
static-fied valued should be removed from the dynamic values.
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
The revision moves pack/unpack related patterns to
PackAndUnpackPatterns.cpp. This follows the convention like other tensor
ops.
It also renames `populateSimplifyTensorPack` to
`populateSimplifyPackAndUnpackPatterns` and adds a TODO item for
tensor.unpack op.
This feature had been marked as `TODO` in the `tensor.splat`
documentation for a while. This MR includes:
- Support for dynamically shaped tensors in the return type of
`tensor.splat` with the syntax suggested in the `TODO` comment.
- Updated op documentation.
- Bufferization support.
- Updates in op folders affected by the new feature.
- Unit tests for valid/invalid syntax, valid/invalid folding, and
lowering through bufferization.
- Additional op builders resembling those available in `tensor.empty`.
When the concatenated dim is statically sized but the inputs are
dynamically sized, reifyResultShapes must return the static shape. Fixes
the implementation of the interface for tensor.concat in such cases.
The folder for `tensor.extract` is not operating correctly when it is
consuming the result of a `tensor.from_elements` operation.
The existing unit test named `@extract_from_tensor.from_elements_3d` in
`mlir/test/Dialect/Tensor/canonicalize.mlir` seems an attempt to stress
this code. However, this unit tests creates a `tensor.from_elements` op
exclusively from constants, which gets folded away into a single
constant tensor. Therefore, the buggy code was never executed in unit
tests.
I have added a new unit test named
`@extract_from_tensor.from_elements_variable_3d` that makes sure the
`tensor.from_elements` op is not folded away by having its input
operands come directly from function arguments. The original folder code
would have made this test fail.
This bug was notably affecting the lowering of the `tosa.pad` op in the
`tosa-to-tensor` pass, where the generated code is likely to contain a
`tensor.from_elements` + `tensor.extract` op sequence.
Op verifiers should verify only local properties of an op. The dynamic
sizes of a `tensor.generate` op should not be verified. Dynamic sizes
that have a negative constant value should not prevent the
`tensor.generate` op from verifying.
Also share some code between the `tensor.empty` and `tensor.generate`
"dynamic dim -> static dim" canonicalization patterns.
Remove the `invalid-canonicalize.mlir` file and move the test case to
`canonicalize.mlir`. Canonicalization no longer produces IR that does
not verify (and leaves the op as is).