Added support to vectorized tensor.unpack. The unpack Op is split into a
`vector.transfer_read`, `vector.transpose`, `vector.shape_cast` and a
`vector.transfer_write`.
Previously, the `tensor.pack` verifier detects unconditional runtime
errors only when tile sizes are static. Now, dynamic tiles are
considered and we only require that the input and either corresponding
tile or output size are static to determine if it will unconditionally
produce errors at runtime.
This PR adds a direct vectorization lowering of `tensor.pack` into
`mask(vector.transfer_read)`->`vector.shape_cast`->`vector.transpose`->`vector.transfer_write`.
Rename interface functions as follows:
* `hasTensorSemantics` -> `hasPureTensorSemantics`
* `hasBufferSemantics` -> `hasPureBufferSemantics`
These two functions return "true" if the op has tensor/buffer operands
but not buffer/tensor operands.
Also drop the "ranked" part from the interface, i.e., do not distinguish
between ranked/unranked types.
The new function names describe the functions more accurately. They also
align their semantics with the notion of "tensor semantics" with the
bufferization framework. (An op is supposed to be bufferized if it has
tensor operands, and we don't care if it also has memref operands.)
This change is in preparation of #75273, which adds
`BufferizableOpInterface::hasTensorSemantics`. By renaming the functions
in the `DestinationStyleOpInterface`, we can avoid name clashes between
the two interfaces.
When lowering `tensor.unpack`, we need to use the sizes of the
destination tensor in the final `tensor.extract_slice` operation. Prior
to this patch, when the destination tensor had dynamic dimensions, we
would compute them from the result of the `tensor.unpack` operation
instead of its destination argument.
This would produce invalid IR because the `tensor.dim` operations would
need to appear before the `tensor.extract_slice` operation, but the
input of the `tensor.dim` operations would consume the final result of
the lowering of `tensor.unpack`, which happens after the
`tensor.extract_slice` operation. In other words, the definition
wouldn't dominate its uses.
I.e., we were generating:
```
%dynDim = tensor.dim %defLater, ... <-- %defLater defined below
%res = tensor.extract_slice ..., %dynDim, ...
%defLater = linalg.copy (ins %res)
```
Note: I checked the implementation of `lower_pack` and the code is
correct as far as I can tell.
Prior to this patch, `GeneralizeOuterUnitDimsUnPackOpPattern` would
assert that we cannot create a `tensor.empty` operation with dynamic
shapes.
The problem stems from the fact that we were not using the right builder
for the `tensor.empty` operation. Indeed, each dynamic dim needs to be
specified by an input variable.
Simply provide the dynamic dimensions to the `tensor.empty` builder to
fix that.
* "init" operands are specified with `MutableOperandRange` (which gives
access to the underlying `OpOperand *`). No more magic numbers.
* Remove most interface methods and make them helper functions. Only
`getInitsMutable` should be implemented.
* Provide separate helper functions for accessing mutable/immutable
operands (`OpOperand`/`Value`, in line with #66515): `getInitsMutable`
and `getInits` (same naming convention as auto-generated op accessors).
`getInputOperands` was not renamed because this function cannot return a
`MutableOperandRange` (because the operands are not necessarily
consecutive). `OpOperandVector` is no longer needed.
* The new `getDpsInits`/`getDpsInitsMutable` is more efficient than the
old `getDpsInitOperands` because no `SmallVector` is created. The new
functions return a range of operands.
* Fix a bug in `getDpsInputOperands`: out-of-bounds operands were
potentially returned.
Tensor pack operations are optimistically lowered to pad + insert_slice
when the pack operation only pads the input tensor. The existing
lowering emits insert_slice operations which do not meet the
rank-reducibility requirements of insert_slice.
This change updates the logic in linalg::lowerPack to first check the
rank-reducibility requirement. When the requirement is not met, the
lowering will emit the full sequence of pad + expand + transpose.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D159382
`tensor.unpack` implements the DPS (Destination Passing Style) interface
and expects the result to be "stored" in the `outs` operand, but this is
not the case with the current decomposition as the final operation is a
`tensor.extract_slice` that does not implement DPS. Add a `linalg.copy`
to fix the problem.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158393
If we deal with statically known tensors and tiles and a given tile
perfectly divides a given dimension, we can omit the padding attribute.
As a bonus point, we can now run pack and unpack propagation
(currently, we bail out during propagation if we have the padding
attribute).
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D154607
Also remove `LinalgPaddingPattern`, which has no uses. (There is a transform dialect op that is used for testing instead.)
Differential Revision: https://reviews.llvm.org/D153512
There is another transform that lowers tensor.pad to tensor.empty + linalg.fill + tensor.insert_slice: `transform.structured.rewrite_in_destination_passing_style`. Delete the other transform.
Differential Revision: https://reviews.llvm.org/D153429
* Remove duplicate functions. `tensor::getMixedSize` and `tensor::getMixedSizes` should be used.
* Use `tensor::getMixedSize` instead of `createOrFold<tensor::DimOp>`. This is more efficient. `createOrFold` will create an op an immediately try to fold it. In case of a static dimension size, an attribute can be used directly.
Differential Revision: https://reviews.llvm.org/D153332
The implementation is based on `ValueBoundsOpInterface` to compute upper bounds for tensor dim sizes. It is not necessary to skip over certain ops and reify shape dims; `ValueBoundsOpInterface` already takes care of that.
Differential Revision: https://reviews.llvm.org/D152256
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
The padded sizes should be derived from destination tensor, not source
tensor. There could be more than one incomplete tile in padding domain.
Reviewed By: qedawkins
Differential Revision: https://reviews.llvm.org/D150726
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
The revision adds support for tensor.pack op decomposition when all
inner tile sizes are static. The generated tensor.expand_shape op is
still valid because only one of the expanding dimension is dynamic.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D150233
The information is not tied to tensor.empty op and tensor.extract_slice
op. We can infer smallest static bounding box for pad transform if
they implement ReifyRankedShapedTypeOpInterface. The revision extends
the usability for downstream projects. No tests are added because the
existing tests cover the change, and most of MLIR
ReifyRankedShapedTypeOpInterface ops are covered in the tests, except
tensor.generate and bufferization.alloc_tensor ops.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D150227
Don't choke on `outs` arguments that are not produced by `tensor.empty` or
`tensor.extract_slice`.
When the `outs` argument has a static shape we have all the necessary
information to proceed with the padding.
This makes the `transform.structured.pad` a little bit more resilient.
Differential Revision: https://reviews.llvm.org/D150112
Extends the pack/unpack generalization patterns to work for any packing
op with only full tiles. This produces a combination of rank-reduced
insert/extract slice ops paired with a transpose on the reduced shape,
similar to what the pattern currently produces for fully tiled
pack/unpacks. Note that only the outer dims are rank-reduced in this
pattern, leaving the shape of the inner tile intact.
Differential Revision: https://reviews.llvm.org/D147555
Currently conversions to interfaces may happen implicitly (e.g.
`Attribute -> TypedAttr`), failing a runtime assert if the interface
isn't actually implemented. This change marks the `Interface(ValueT)`
constructor as explicit so that a cast is required.
Where it was straightforward to I adjusted code to not require casts,
otherwise I just made them explicit.
Depends on D148491, D148492
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D148493
Use `reifyValueBound` instead, which is more general not hard-coded to a specific list supported ops.
Also add a `closedUB` parameter to the ValueBoundsOpInterface API.
Differential Revision: https://reviews.llvm.org/D146356
Currently the `getTiledImplementation` and `generateResultTileValue`
return just `SmallVector<Operation *>` and `FailureOr<Value>`.
- For `getTiledImplementation` returning empty implies tiling wasnt
done. There is also an implicit assumption that the tiled operation
results correspond to the tiled values of the result of the original
operation. This cannot handle cases where the tiled implementation
might use multiple operations to compute the tiled value for the
results of the untiled operation. Sometimes, the tiled operation
might not directly give the tiled values, and might require casts,
etc to get a replacement.
- For `generateResultTileValue`, it is assumed that the op defining
the returned `Value` is the operation that represents the tiled
computation. Again presence of casts, etc violate this.
Instead make these methods return
```
struct TilingResult {
SmallVector<Operation *> tiledOps;
SmallVector<Value> tiledValues;
};
```
The `tiledOps` represent the operations generated that are relevant
for subsequent transformations. The `tiledValues` represent the tiled
values for the results of the original operation. This better
transmits the state of the transformed IR.
As a consequence the following methods also return `FailureOr<TilingResult>`
- `tensor::replaceExtractSliceWithTiledProducer`
- `tensor::bubbleUpPadSlice`
Differential Revision: https://reviews.llvm.org/D145133
This change adds a new helper function `mlir::reifyResultShapes` that calls the corresponding interface method and also checks the result produced by the implementation when running in debug mode. Bugs due to incorrect interface implementations can be difficult to debug.
This helper function also reduces the amount of code needed at call sites: the cast to `ReifyRankedShapedTypeOpInterface` is done in the helper function.
Differential Revision: https://reviews.llvm.org/D145777
Decompose conv_2d -> conv_1d.
This MR follows a similar approach to https://reviews.llvm.org/D112928.
This patch adds support to convert conv_2D operation with either unit height or unit width to conv_1D operation.
This is useful when 2D convolution is tiled to have a single dimension for either height or width and then can be vectorized once it is decomposed into 1D convolution.
This patch https://reviews.llvm.org/D145160 adds vector support for linalg.conv_1d operation and thereby allowing us to vectorize linalg.conv_2d operation after proper tiling.
This missing feature is reported here: https://discourse.llvm.org/t/vectorization-of-convolution-op/60458.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D145162
`reifyResultShapes` now returns `OpFoldResult`s instead of `Value`s. This is often more efficient because many transformations immediately attempt to extract a constant from the reified values.
Differential Revision: https://reviews.llvm.org/D145250
The generalization pattern for tensor.pack was inverting the
innerDimsPos permutation when normalizing. Thus, the transpose op
produced by the generalization would be incorrect.
Differential Revision: https://reviews.llvm.org/D144425
This revision introduces `transform.structured.lower_pack` which allows
rewriting a `tensor.pack` to `tensor.pad` + `tensor.expand_shape` + `linalg.transpose`.
The implementation is currently limited to static pack ops that do not have outer_dims permutations.
Differential Revision: https://reviews.llvm.org/D142881
The patch adds operations to `BlockAndValueMapping` and renames it to `IRMapping`. When operations are cloned, old operations are mapped to the cloned operations. This allows mapping from an operation to a cloned operation. Example:
```
Operation *opWithRegion = ...
Operation *opInsideRegion = &opWithRegion->front().front();
IRMapping map
Operation *newOpWithRegion = opWithRegion->clone(map);
Operation *newOpInsideRegion = map.lookupOrNull(opInsideRegion);
```
Migration instructions:
All includes to `mlir/IR/BlockAndValueMapping.h` should be replaced with `mlir/IR/IRMapping.h`. All uses of `BlockAndValueMapping` need to be renamed to `IRMapping`.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D139665