This feature had been marked as `TODO` in the `tensor.splat`
documentation for a while. This MR includes:
- Support for dynamically shaped tensors in the return type of
`tensor.splat` with the syntax suggested in the `TODO` comment.
- Updated op documentation.
- Bufferization support.
- Updates in op folders affected by the new feature.
- Unit tests for valid/invalid syntax, valid/invalid folding, and
lowering through bufferization.
- Additional op builders resembling those available in `tensor.empty`.
When the concatenated dim is statically sized but the inputs are
dynamically sized, reifyResultShapes must return the static shape. Fixes
the implementation of the interface for tensor.concat in such cases.
The folder for `tensor.extract` is not operating correctly when it is
consuming the result of a `tensor.from_elements` operation.
The existing unit test named `@extract_from_tensor.from_elements_3d` in
`mlir/test/Dialect/Tensor/canonicalize.mlir` seems an attempt to stress
this code. However, this unit tests creates a `tensor.from_elements` op
exclusively from constants, which gets folded away into a single
constant tensor. Therefore, the buggy code was never executed in unit
tests.
I have added a new unit test named
`@extract_from_tensor.from_elements_variable_3d` that makes sure the
`tensor.from_elements` op is not folded away by having its input
operands come directly from function arguments. The original folder code
would have made this test fail.
This bug was notably affecting the lowering of the `tosa.pad` op in the
`tosa-to-tensor` pass, where the generated code is likely to contain a
`tensor.from_elements` + `tensor.extract` op sequence.
Op verifiers should verify only local properties of an op. The dynamic
sizes of a `tensor.generate` op should not be verified. Dynamic sizes
that have a negative constant value should not prevent the
`tensor.generate` op from verifying.
Also share some code between the `tensor.empty` and `tensor.generate`
"dynamic dim -> static dim" canonicalization patterns.
Remove the `invalid-canonicalize.mlir` file and move the test case to
`canonicalize.mlir`. Canonicalization no longer produces IR that does
not verify (and leaves the op as is).
Without folding the result of the initial tensor.dim, the
ReifyResultShapes implementation would be incorrect because it would
return a dynamic shape for a static result shape.
This adds an operation for concatenating ranked tensors along a static
dimension, as well as a decomposition mirroring the existing lowering
from TOSA to Tensor. This offers a convergence point for "input" like
dialects that include various lowerings for concatenation operations,
easing later analysis. In the future, this op can implement the
necessary interfaces for tiling, as well as potentially add conversions
to some kind of linalg and/or memref counterpart.
This patch adds the op, the decomposition, and some basic
folding/canonicalization. Replacing lowerings with the op (such as the
TOSA lowering) will come as a follow up.
See
https://discourse.llvm.org/t/rfc-tensor-add-a-tensor-concatenate-operation/74858
This commit fixes a crash of the canonicalizer when there are slice ops
with offset/size SSA values that have a negative constant value. Such
ops are invalid if they are reachable and their offsets/sizes should not
be folded to static integer values. (But such ops may appear in
non-reachable block.)
This commit fixes#71150.
The current implementation of tiling using `scf.for` is convoluted to
make sure that the destination passing style of the untiled program is
preserved. The addition of support to tile using `scf.forall` (adapted
from the transform operation in Linalg) in
https://github.com/llvm/llvm-project/pull/67083 used cloning of the
tiled operations to better streamline the implementation. This PR adapts
the other tiling methods to use a similar approach, making the
transformations (and handling destination passing style semantics) more
systematic.
---------
Co-authored-by: Abhishek-Varma <avarma094@gmail.com>
Fixes https://github.com/llvm/llvm-project/issues/60656.
This patch implements a basic fold for various reshape/resize tensor
operations. Specifically, the folding removes tensor reshape/resize ops
when they are applied to a constant tensor. For example, the following
function:
```mlir
func.func @main(%dest : tensor<8x16x8x32xf32>) -> tensor<8x16x8x32xf32> {
%cst = arith.constant dense<1.000000e-01> : tensor<64x128xf32>
%0 = tensor.pack %cst outer_dims_perm = [1, 0] inner_dims_pos = [0, 1]
inner_tiles = [8, 32] into %dest : tensor<64x128xf32> -> tensor<8x16x8x32xf32>
return %0 : tensor<8x16x8x32xf32>
}
```
will be changed into the following with `mlir-opt -canonicalize`:
```mlir
func.func @main(%arg0: tensor<8x16x8x32xf32>) -> tensor<8x16x8x32xf32> {
%cst = arith.constant dense<1.000000e-01> : tensor<8x16x8x32xf32>
return %cst : tensor<8x16x8x32xf32>
}
```
As a side-note, this patch is essentially an extension of
f79f430d4b.
The destination operand of the `tensor.unpack` operation is only needed
to carry shape information. So if the producer of the destination
operand implements the `DestinationStyleOpInterface`, then fold it into
the `tensor.unpack` operation by replacing the destination operand with
the destination for the source.
Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.
Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.
Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
A recent change modified the parameter tileSize from Value to
OpFoldResult. Therefore we should call getAsOpFoldResult before passing
on the tileSize.
Adjust a test regarding this new behavior.
* `tensor.collapse_shape` may bufferize to a memory read because the op
may have to reallocate the source buffer.
* `tensor.reshape` should not use `bufferization.clone` for
reallocation. This op has requirements wrt. the order of buffer
writes/reads. Use `memref.alloc` and `memref.copy` instead. Also fix a
bug where the memory space of the source buffer was not propagated to
the reallocated buffer.
Make `tensor.empty` bufferizable, so that the
`-empty-tensor-to-alloc-tensor` pass becomes optional. This makes the
bufferization easier to use. `tensor.empty` used to be non-bufferizable,
so that there two separate ops, one that can be optimized away
(`tensor.empty`) and one that is guaranteed to bufferize to an
allocation (`bufferization.alloc_tensor`). With the recent improvements
of "empty tensor elimination" this is no longer needed and
`bufferization.alloc_tensor` can be phased out.
Rename and restructure tiling-related transform ops from the structured
extension to be more homogeneous. In particular, all ops now follow a
consistent naming scheme:
- `transform.structured.tile_using_for`;
- `transform.structured.tile_using_forall`;
- `transform.structured.tile_reduction_using_for`;
- `transform.structured.tile_reduction_using_forall`.
This drops the "_op" naming artifact from `tile_to_forall_op` that
shouldn't have been included in the first place, consistently specifies
the name of the control flow op to be produced for loops (instead of
`tile_reduction_using_scf` since `scf.forall` also belongs to `scf`),
and opts for the `using` connector to avoid ambiguity.
The loops produced by tiling are now systematically placed as *trailing*
results of the transform op. While this required changing 3 out of 4 ops
(except for `tile_using_for`), this is the only choice that makes sense
when producing multiple `scf.for` ops that can be associated with a
variadic number of handles. This choice is also most consistent with
*other* transform ops from the structured extension, in particular with
fusion ops, that produce the structured op as the leading result and the
loop as the trailing result.
Bufferization of tensor.reshape generates a memref.reshape operation.
memref.reshape requires the source memref to have an identity layout.
The bufferization process may result in the source memref having a
non-identity layout, resulting in a verification failure.
This change causes the bufferization interface for tensor.reshape to
copy the source memref to a new buffer when the source has a
non-identity layout.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.
This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.
The documentation should w.r.t. these pass option changes should also be
updated in this commit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156662
This patch addresses a crash that occurs when negative dynamic sizes are
provided in tensor.emptyOp by adding a check to ensure that dynamic
sizes are non-negative.
Fixes#64064
Both `TileOp` and `TileToScfForOp` use the tiling interface and the
`tileUsingSCFForOp` method. This duplication was introduced in
44cfea0279
as a way to retire `linalg::tileLinalgOp,` now there is not more need
for this duplication, and it seems that `tileOp` has more recent
changes, thus retire `TileToScfForOp.`
Fixes an issue where `isCastLikeExtractSliceOp` did not account for the fact
that `tensor.extract_slice` may drop non-unit dimensions. This change makes the
utility function behave inline with its name/description. The only user of this
function is in the `FindPayloadReplacementOpInterface` for the
`tensor::ExtractSliceOp`. This can potentially cause downstream projects to have
more "listener could not find replacement op" errors when interpreting Transform
IR, but the behavior is inline with the documented conservative behavior of the
Transform dialect's TrackingListener.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158635
`reifyResultShapes` should return an `Attribute` if and only if the respective dimension is static.
This fixes#64256.
Differential Revision: https://reviews.llvm.org/D158166
To keep the pass simple, users should apply cleanup passes manually when necessary. In particular, `-cse -canonicalize` are often desireable to fold away self-copies that are created by the bufferization.
This addresses a comment in D120191.
Differential Revision: https://reviews.llvm.org/D155923
In https://reviews.llvm.org/D151611, a check was added to the tensor verifier to
emit an error on negative tensor dimensions. This check allowed for dynamic
dimensions, hence negative dimensions were still able to get through the verifier.
This is a problem in situations such as #60558, where the dynamic dimension is
converted to a static (and possibly negative) dimension by another pass in the
compiler. This patch fixes that by doing another check during the
`StaticTensorGenerate` conversion, and return a failure if the dimension is
negative.
As a side-note, I have to admit that I do not know why returning a failure in
`StaticTensorGenerate` gives a nice "tensor dimensions must be non-negative"
error. I suspect that the verifier runs again when `return failure()` is called,
but I am not sure.
Fixes#60558.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D155728
Remove patterns that fold tensor subset ops into vector transfer ops from the vector dialect. These patterns already exist in the tensor dialect.
Differential Revision: https://reviews.llvm.org/D154932
As a convenience to the user, top-level sequence ops can optionally be used as matchers: the op type is specified by the type of the block argument.
This is similar to how pass pipeline targets can be specified on the command line (`-pass-pipeline='builtin.module(func.func(...))`).
Differential Revision: https://reviews.llvm.org/D153121
Add an extra check to make sure that transform IR is not getting modified by this op while it is being interpreted. This generally dangerous and we may want to enforce this for all transform ops that modify the payload in the future.
Users should generally try to apply patterns only to the piece of IR where it is needed (e.g., a matched function) and not the entire module (which may contain the transform IR).
This revision is in response to a crash in a downstream compiler that was caused by a dead `transform.structured.match` op that was removed by the GreedyPatternRewriteDriver's DCE while the enclosing sequence was being interpreted.
Differential Revision: https://reviews.llvm.org/D153113
This is useful for transformations such as bufferization, which is looking for tensor.extract_slice/insert_slice pairs.
Also fix the documentation of the corresponding tranform op.
Differential Revision: https://reviews.llvm.org/D152455
Certain ExtractSliceOps, that do extract all elements from the destination, are treated like casts when looking for replacement ops. Such ExtractSliceOps are typically rank expansions.
Differential Revision: https://reviews.llvm.org/D151804
I believe that the previous implementation did not work on any input. It
called getMemRefType with `layout = {}`, presumably with the intention
to create a MemrefType with identity layout. However, the implementation
of that function returns a MemrefType with *unknown* layout if it is
provided with a default-constructed layout attribute. This patch uses
getMemRefTypeWithStaticIdentityLayout instead, with has identical
behavior except for the case of a default-constructed layout, which it
passes on as-is to the MemrefType.
This problem did not surface in the test because tensor.reshape was not
tested with -one-shot-bufferize. This patch introduces a test copied
from the tests for -tesnor-bufferize adapted in as follows: since the
test is run with "bufferize-function-boundaries", a tensor that is
passed into the function is bufferized into a memref with unknown
layout, which wouldn't be a valid intput for memref.reshape, so the
tests now uses a tensor constructed with arith.constant inside of the
function.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D151544
When looking for payload op replacements, rank-expanding InsertSliceOps of dynamically-typed tensors are now supported.
Differential Revision: https://reviews.llvm.org/D151444
Add a helper function that computes if two SSA values have the same value, utilizing the `ValueBoundsOpInterface` infrastructure. Two SSA values have the same value, an equality bound of 0 can be derived for their subtraction.
The helper function can also be used to determine if two tensor dimension sizes are equal.
Differential Revision: https://reviews.llvm.org/D151443
Certain InsertSliceOps, that do not use elements from the destination, are treated like casts when looking for replacement ops. Such InsertSliceOps are typically rank expansions.
Tensors with dynamic shape are not supported at the moment.
Also adds test cases for the TrackingListener.
Differential Revision: https://reviews.llvm.org/D151422
The op bufferizes similarly to tensor.generate: it is lowered to a linalg.map, which may then lower to a loop nest that fills the buffer.
Differential Revision: https://reviews.llvm.org/D150952
Update operations in Transform dialect extensions defined in the Affine,
GPU, MemRef and Tensor dialects to use the more generic
`TransformHandleTypeInterface` type constraint instead of hardcoding
`PDL_Operation`. See
https://discourse.llvm.org/t/rfc-type-system-for-the-transform-dialect/65702
for motivation.
Remove the dependency on PDLDialect from these extensions.
Update tests to use `!transform.any_op` instead of `!pdl.operation`.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150781
Types have been introduced a while ago and provide for better
readability and transform-time verification. Use them in the ops from
the structured transform dialect extension.
In most cases, the types are appended as trailing functional types or a
derived format of the functional type that allows for an empty right
hand size without the annoying `-> ()` syntax (similarly to `func.func`
declaration that may omit the arrow). When handles are used inside mixed
static/dynamic lists, such as tile sizes, types of those handles follow
them immediately as in `sizes [%0 : !transform.any_value, 42]`. This
allows for better readability than matching the trailing type.
Update code to remove hardcoded PDL dependencies and expunge PDL from
structured transform op code.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D144515
Add tensor.bitcast operator to bitcast between two tensors of compatible shape
and same bit width. This can be use to reinterpret an unsigned integer as a
signed integer or vice versa.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D149608
The terminator of this op is special: it does not just yield a value,
but bufferizes to a memcpy. This requires special treatment to make sure
that deallocs are placed after the memcpy. (By default, deallocs are
placed right before the terminator.)
Differential Revision: https://reviews.llvm.org/D148408
These old patterns are not in use in either MLIR or downstream projects except for one test.
Additionally this is redundant with logic in the tensor.pad tiling implementation.
Drop SplitPaddingPatterns to reduce entropy.
Differential Revision: https://reviews.llvm.org/D148207
Add a helper function that computes a constant (`int64_t`) bound. The `stopCondition` is optional: If none is provided, the traversal continues until a constant bound could be computed.
Differential Revision: https://reviews.llvm.org/D146296