Generalize `extractFromI64ArrayAttr` to `extractFromIntegerArrayAttr`, so that arbitrary integer/bool types can be extracted.
Differential Revision: https://reviews.llvm.org/D154974
Add a new option that allows users to specify a memcpy op: "memref.tensor_store", "memref.copy" or "linalg.copy".
Differential Revision: https://reviews.llvm.org/D154968
Return all ops that were generated as part of the bufferization, so that users do not have to match them in the enclosing op.
Differential Revision: https://reviews.llvm.org/D154966
Remove patterns that fold tensor subset ops into vector transfer ops from the vector dialect. These patterns already exist in the tensor dialect.
Differential Revision: https://reviews.llvm.org/D154932
This revision adds a new transformation to map a copy operation to a gpu grid of threads.
It implements a first heuristic that allows trading off coalesced accesses vs predication and occupancy.
Differential Revision: https://reviews.llvm.org/D154836
The TileOp builders did not set `scalable_sizes`, which produces invalid ops. `scalable_sizes` must contain as any booleans as there are sizes.
Differential Revision: https://reviews.llvm.org/D154585
This change lifts the limitation that only the trailing dimensions/sizes
in dynamic index lists can be scalable. It allows us to extend
`MaskedVectorizeOp` and `TileOp` from the Transform dialect so that the
following is allowed:
%1, %loops:3 = transform.structured.tile %0 [4, [4], [4]]
This is also a follow up for https://reviews.llvm.org/D153372
that will enable the following (middle vector dimension is scalable):
transform.structured.masked_vectorize %0 vector_sizes [2, [4], 8]
To facilate this change, the hooks for parsing and printing dynamic
index lists are updated accordingly (`printDynamicIndexList` and
`parseDynamicIndexList`, respectively). `MaskedVectorizeOp` and `TileOp`
are updated to include an array of attribute of bools that captures
whether the corresponding vector dimension/tile size, respectively, are
scalable or not.
NOTE 1: I am re-landing this after the initial version was reverted. To
fix the regression and in addition to the original patch, this revision
updates the Python bindings for the transform dialect
NOTE 2: This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
This relands 048764f23a with fixes.
Differential Revision: https://reviews.llvm.org/D154336
The `bufferize_to_allocation` transform op now operates on payload ops, not payload values. Only ops can be bufferized, not values.
Also remove the `replacement` result from the transform op.
Differential Revision: https://reviews.llvm.org/D153970
"transform.structured.pad" now returns all `tensor::PadOp` in addition to the padded ops.
Also add a test case that shows how to force an allocation for "tensor.pad" ops with a custom memory space.
Differential Revision: https://reviews.llvm.org/D153555
This change lifts the limitation that only the trailing dimensions/sizes
in dynamic index lists can be scalable. It allows us to extend
`MaskedVectorizeOp` and `TileOp` from the Transform dialect so that the
following is allowed:
%1, %loops:3 = transform.structured.tile %0 [[4], [4], 4]
This is also a follow up for https://reviews.llvm.org/D153372
that will enable the following (middle vector dimension is scalable):
transform.structured.masked_vectorize %0 vector_sizes [2, [4], 8]
To facilate this change, the hooks for parsing and printing dynamic
index lists are updated accordingly (`printDynamicIndexList` and
`parseDynamicIndexList`, respectively). `MaskedVectorizeOp` and `TileOp`
are updated to include an array of attribute of bools that captures
whether the corresponding vector dimension/tile size, respectively, are
scalable or not.
This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Differential Revision: https://reviews.llvm.org/D154336
This revision adds a pre-bufferization transform that can reduce the number of allocation. It is similar to `bufferization.eliminate_empty_tensors`, but specific to LinalgOp.
The transform looks for `tensor.empty` ops where the SSA use-def chain ends in an "ins" operand of a `LinalgOp`. If the same `LinalgOp` has an unused "outs" operand (and some other conditions are met), this "outs" operand can be used instead of the `tensor.empty` and the "ins" operand can be turned into an "outs" operand.
Differential Revision: https://reviews.llvm.org/D153952
This is almost NFC except for the fact that:
- when multiple candidates are available we now return them in sorted order vs undetermined order previously
- the type of the transform return is relaxed an a test is added for the case where the transform does not apply
Differential Revision: https://reviews.llvm.org/D153941
Copy back the padded result to the original destination of the computation. This is important for bufferization, to ensure that the result of the computation does not suddenly materialize in a different buffer due to padding.
A `bufferization.copy_tensor` is inserted for every (unpadded) result. Such ops bufferize to memcpys, but they fold away, should the padding fold away.
Differential Revision: https://reviews.llvm.org/D153554
Add an additional result handle to the op. This new handle is mapped to the newly allocated buffer.
Differential Revision: https://reviews.llvm.org/D153514
* Use LinalgPaddingOptions instead of passing many parameters.
* Split function into two parts.
Differential Revision: https://reviews.llvm.org/D153853
This function allows users to update payload op mappings in cases where such replacements cannot be performed automatically by the rewriter/listener interface.
Differential Revision: https://reviews.llvm.org/D153764
All `apply` functions now have a `TransformRewriter &` parameter. This rewriter should be used to modify the IR. It has a `TrackingListener` attached and updates the internal handle-payload mappings based on rewrites.
Implementations no longer need to create their own `TrackingListener` and `IRRewriter`. Error checking is integrated into `applyTransform`. Tracking listener errors are reported only for ops with the `ReportTrackingListenerFailuresOpTrait` trait attached, allowing for a gradual migration. Furthermore, errors can be silenced with an op attribute.
Additional API will be added to `TransformRewriter` in subsequent revisions. This revision just adds an "empty" `TransformRewriter` class and updates all `apply` implementations.
Differential Revision: https://reviews.llvm.org/D152427
The new pattern will replace elementwise(broadcast) with
broadcast(elementwise) when safe.
This change affects tests for vectorising nD-extract. In one case
("vectorize_nd_tensor_extract_with_tensor_extract") I just trimmed the
test and only preserved the key parts (scalar and contiguous load from
the original Op). We could do the same with some other tests if that
helps maintainability.
Differential Revision: https://reviews.llvm.org/D152812
For now, only elementwise operations are supported. Operations that perform any
kind of data permutation require changes in the representation of scalable
dimensions in VectorType.
Differential Revision: https://reviews.llvm.org/D152599
This patch enables specifying scalable vector sizes when using the
Transform dialect to drive vectorisation, e.g.:
```
transform.structured.masked_vectorize %0 vector_sizes [8, 16, [4]]
```
This is implemented by extending the MaskedVectorizeOp with a dedicated
attribute for "scalability" and by overloading `parseDynamicIndexList`
so that MaskedVectorizeOp can continue using the auto-generated parser
and printer.
At the moment, only the trailing vec size can be scalable. The following
is not yet supported:
```
transform.structured.masked_vectorize %0 vector_sizes [8, [16], [4]]
```
As the vectoriser does not support scalable vectorisation just yet, a
warning is issues when scalable vector sizes are used. You can also use
the debug output, `--debug-only=linalg-vectorization`, to check whether
scalable vectorisation has been switched on.
This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Similar patch for tiling: https://reviews.llvm.org/D150944
Differential Revision: https://reviews.llvm.org/D151892
* Remove `transform::PatternRegistry`.
* Add a new op for each currently registered pattern set.
* Change names of vector dialect pattern selector ops, so that they are consistent with the remaining code base.
* Remove redundant `transform.vector.extract_address_computations` op.
Differential Revision: https://reviews.llvm.org/D152249
This patch enables specifying scalable tile sizes when using the
Transform dialect to drive tiling, e.g.:
```
%1, %loop = transform.structured.tile %0 [[4]]
```
This is implemented by extending the TileOp with a dedicated attribute
for "scalability" and by updating various parsing hooks. At the moment,
only the trailing tile size can be scalable. The following is not yet
supported:
```
%1, %loop = transform.structured.tile %0 [[4], [4]]
```
This change is a part of larger effort to enable scalable vectorisation
in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Differential Revision: https://reviews.llvm.org/D150944
This partially undoes the intent of https://reviews.llvm.org/D151418 by
cheating its way to keep the "containing op" (aka loop) handle read-only
in fusion. It is crucial to do so for composability of tiling and
fusion. Specfically, after the "containing op" handle started being
consumed, it became impossible to perform additional tiling after fusion
except tiling the last-fused op:
%tiled1, %loop1 = tile %op
%producer1, %loop2 = fuse %producer into %loop1
// invalid, because %tiled1 is invalidated by consuming %loop1
// that points to its parent
tile %tiled1
or
%tiled1, %loop1 = tile %op
%tiled2, %loop2 = tile %tiled1
%p2 = fuse %producer into %loop1
// invalid, because %loop2 is invalidated by consuming %loop1
// that points to its parent
fuse %p2 into %loop2
The approach here makes creative use of the state extension mechanism to
update the payload operation associted with the operand handle. Further
investigation is necessary to understand if is consistent with the
overall execution model of the transform dialect, but it is crucial to
restore composability ASAP.
Reviewed By: springerm, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D151555
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
Since the scf.forall is now consumed by the fuse into
containing op, we need to return a handle to the new scf.forall.
This patch does that and also ensures that the new bbArg
added to the scf.forall is used in its body.
Differential Revision: https://reviews.llvm.org/D151418
In the tile and fuse of the first extract use, we add support
for scenarios where the results of the tiled op have uses
that are dominated by the scf.for_all. Specifically, we replace
the scf.for_all with a new scf.for_all that has an additional
shared_out and add the appropriate parallel insert slice op.
Differential Revision: https://reviews.llvm.org/D151275
It breaks the logic of maskedVectorize (on tensor.pad ops) into
precondition checks and vectorization implementation; unifies the
interface.
The revision also rename`s vectorizeLinalgOpPrecondition` to
`vectorizeOpPrecondition` because we can vectorize ops other
than LinalgOps.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D150495
These patterns touches the structure generated from tiling so it
affects later steps like bufferization and vector hoisting.
Instead of putting them in canonicalization, this commit creates
separate entry points for them to be called explicitly.
This is NFC regarding the functionality and tests of those patterns.
It also addresses two TODO items in the codebase.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D150702
Types have been introduced a while ago and provide for better
readability and transform-time verification. Use them in the ops from
the structured transform dialect extension.
In most cases, the types are appended as trailing functional types or a
derived format of the functional type that allows for an empty right
hand size without the annoying `-> ()` syntax (similarly to `func.func`
declaration that may omit the arrow). When handles are used inside mixed
static/dynamic lists, such as tile sizes, types of those handles follow
them immediately as in `sizes [%0 : !transform.any_value, 42]`. This
allows for better readability than matching the trailing type.
Update code to remove hardcoded PDL dependencies and expunge PDL from
structured transform op code.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D144515
Structured fusion proceeds by iteratively finding the next suitable
producer to be fused into the loop. Therefore, it shouldn't matter if
the same producer is listed multiple times (e.g., it is used as multiple
operands). Adjust the implementation of the transform op to support this
case.
Also fix the checking code in the interpreter to actually respect the
TransformOpInterface indication that repeated payload is allowed, it
seems to have been accidentally dropped in one of the refactorings.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150561
Instead of returning an `ArrayRef<Operation *>`, return at iterator that skips ops that were erased/replaced while iterating over the payload ops.
This fixes an issue in conjuction with TrackingListener, where a tracked op was erased during the iteration. Elements may not be removed from an array while iterating over it; this invalidates the iterator.
When ops are erased/removed via `replacePayloadOp`, they are not immediately removed from the mappings data structure. Instead, they are set to `nullptr`. `nullptr`s are not enumerated by `getPayloadOps`. At the end of each transformation, `nullptr`s are removed from the mapping data structure.
Differential Revision: https://reviews.llvm.org/D149847
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
`tileToForallOpImpl` takes only one target instead of a list of all targets.
This is in preparation of D149847.
Differential Revision: https://reviews.llvm.org/D149929
Mirror the separation between LinalgTransformOps and LinalgMatchOps in
headers. Create a separate pair of files for the extension.
Depends on D148017
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D148075
The TrackingListener removes ops from the internal transform dialect state when they are erased. At the same time, the `apply` interface method is iterating over all payload ops. Elements may not be removed from the state while iterating over the state.
Differential Revision: https://reviews.llvm.org/D148771