`SimplifyClones` used to generate an invalid op:
```
error: 'memref.cast' op operand type 'memref<*xf32>' and result type 'memref<*xf32>' are cast incompatible
%2 = bufferization.clone %1 : memref<*xf32> to memref<*xf32
```
This commit fixes tests such as
`mlir/test/Dialect/Bufferization/canonicalize.mlir` when verifying the
IR after each pattern (#74270).
`bufferization.materialize_in_destination` should be used instead. Both
ops bufferize to a memcpy. This change also conceptually cleans up the
memref dialect a bit: the memref dialect no longer contains ops that
operate on tensor values.
There is currently an op interface for subset insertion ops
(`SubsetInsertionOpInterface`), but not for subset extraction ops. This
commit adds `SubsetExtractionOpInterface` to `mlir/Interfaces`, as well
as a common dependent op interface: `SubsetOpInterface`.
- `SubsetOpInterface` is for ops that operate on tensor subsets. It
provides interface methods to check if two subset ops operate on
equivalent or disjoint subsets. Ops that implement this interface must
implement either `SubsetExtractionOpInterface` or
`SubsetInsertionOpInterface`.
- `SubsetExtractionOpInterface` is for ops that extract from a tensor at
a subset. E.g., `tensor.extract_slice`, `tensor.gather`,
`vector.transfer_read`. Current implemented only on
`tensor.extract_slice`.
- `SubsetInsertionOpInterface` is for ops that insert into a destination
tensor at a subset. E.g., `tensor.insert_slice`,
`tensor.parallel_insert_slice`, `tensor.scatter`,
`vector.transfer_write`. Currently only implemented on
`tensor.insert_slice`, `tensor.parallel_insert_slice`.
Other changes:
- Rename `SubsetInsertionOpInterface.td` to `SubsetOpInterface.td`.
- Add helper functions to `ValueBoundsOpInterface.cpp` for checking
whether two slices are disjoint.
The new interfaces will be utilized by a new "loop-invariant subset
hoisting"
transformation. (This new transform is roughly
what `Linalg/Transforms/SubsetHoisting.cpp` is doing, but in a generic
and interface-driven way.)
Two `OpOperand`s are the same if they belong to the same owner and have
the same operand number. There are currently no comparison operators
defined on `OpOperand` and we work around this in multiple places by
comparing pointers.
Note: `OpOperand`s are stored in an op, so it is valid to compare their
pointers to determine if they are the same operand. E.g.,
`getOperandNumber` is also implemented via pointer arithmetics.
Empty tensor elimination is looking for
`bufferization.materialize_in_destination` ops with a `tensor.empty`
source. It replaces the `tensor.empty` with a `bufferization.to_tensor
restrict` of the memref destination. As part of this rewrite, the
`restrict` keyword should be removed, so that no second `to_tensor
restrict` op will be inserted. Such IR would be invalid.
`bufferization.materialize_in_destination` with memref destination and
without the `restrict` attribute are ignored by empty tensor
elimination.
Also relax the verifier of `materialize_in_destination`. The `restrict`
keyword is not generally needed because the op does not expose the
buffer as a tensor.
Extend `bufferization.materialize_in_destination` to support memref
destinations. This op can now be used to indicate that a tensor
computation should materialize in a given buffer (that may have been
allocated by another component/runtime). The op still participates in
"empty tensor elimination".
Example:
```mlir
func.func @test(%out: memref<10xf32>) {
%t = tensor.empty() : tensor<10xf32>
%c = linalg.generic ... outs(%t: tensor<10xf32>) -> tensor<10xf32>
bufferization.materialize_in_destination %c in restrict writable %out : (tensor<10xf32>, memref<10xf32>) -> ()
return
}
```
After "empty tensor elimination", the above IR can bufferize without an
allocation:
```mlir
func.func @test(%out: memref<10xf32>) {
linalg.generic ... outs(%out: memref<10xf32>)
return
}
```
This change also clarifies the meaning of the `restrict` unit attribute
on `bufferization.to_tensor` ops.
The TableGen code generator now generates C++ code that returns a single
`OpOperand &` for `get...Mutable` of operands that are not variadic and
not optional. `OpOperand::set`/`assign` can be used to set a value (same
as `MutableOperandRange::assign`). This is safer than
`MutableOperandRange` because only single values (and no longer
`ValueRange`) can be assigned.
E.g.:
```
// Assignment of multiple values to non-variadic operand.
// Before: Compiles, but produces invalid op.
// After: Compilation error.
extractSliceOp.getSourceMutable().assign({v1, v2});
```
One-Shot Bufferize no longer deallocates buffers, so `deallocationFn`
can be removed.
Note: There is a `bufferization.dealloc_tensor` op that now always
bufferizes to `memref.dealloc`. This op will be phased out soon.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
This revision adds support for empty tensor elimination to
"bufferization.materialize_in_destination" by implementing the
`SubsetInsertionOpInterface`.
Furthermore, the One-Shot Bufferize conflict detection is improved for
"bufferization.materialize_in_destination".
`operator[]` returns `OpOperand &` instead of `Value`.
* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.
Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.
This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.
The documentation should w.r.t. these pass option changes should also be
updated in this commit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156662
Deallocation operations where the allocated value is the 'memref' and
'retained' list are currently not supported. This is because when values
are in the retained list, they typically have a use-site at a later
point and another deallocation op exists at that later point to free the
memref then. There alrady exists a canonicalization pattern in the
buffer deallocation simplification pass that removes the allocated value
from the earlier dealloc because it will never be actually deallocated
in that case and thus does not have to be considered in this new
pattern.
Differential Revision: https://reviews.llvm.org/D158740
`getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop.
To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument).
Also improve error messages in case of inconsistent memory spaces inside of a loop.
Differential Revision: https://reviews.llvm.org/D158060
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.
This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.
If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.
Differential Revision: https://reviews.llvm.org/D157957
Adds a pass that can be run after buffer deallocation to simplify the deallocation operations.
In particular, there are patterns that need alias information and thus cannot be added as a regular canonicalization pattern.
This initial commit moves an incorrect canonicalization pattern from over to this new pass and fixes it by querying the alias analysis for the additional information it needs to be correct (there must not by any potential aliasing memref in the retain list other than the currently mached one).
Also, improves this pattern by considering the `extract_strided_metadata` operation which is inserted by the deallocation pass by default.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157398
The `extract_strided_metadata` will be heavily used by the new buffer deallocation pass to get the base memref and pass it to the deallocation operation. This commit factors out some simplification logic of the pass into a canonicalization pattern.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157255
This change allows supporting operations for which we don't get precise aliasing information without the need to insert clone operations. E.g., `arith.select`.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156992
This simplifies the op and avoids unnecessary alias checks introduced during the lowering to memref.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156807
Since memrefs in the retained list will never be deallocated, we can remove them from the list of memrefs to be deallocated. If the list of memrefs to deallocate becomes empty, we can just delete the dealloc operation.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156186
Duplicate values in the retained list can just be removed, however, for duplicates in the list of memrefs to deallocate, we also need to check the conditions and if thhey don't match, we need to compute the OR in order to not miss a case leading to a memory leak.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156157
The dealloc operation deallocates each of the given memrefs if there is no alias
to that memref in the list of retained memrefs and the corresponding
condition value is set. This condition can be used to indicate and pass on
ownership of memref values (or in other words, the responsibility of
deallocating that memref). If two memrefs alias each other, only one will be
deallocated to avoid double free situations.
The memrefs to be deallocated must be the originally allocated memrefs,
however, the memrefs to be retained may be arbitrary memrefs.
Returns a list of conditions corresponding to the list of memrefs which
indicates the new ownerships, i.e., if the memref was deallocated the
ownership was dropped (set to 'false') and otherwise will be the same as the
input condition.
Differential Revision: https://reviews.llvm.org/D155467
It also unifies the computation of StridedLayoutAttr. If the stride is
static known value, we can just use it.
Differential Revision: https://reviews.llvm.org/D155017
* Use `create` instead of `createOrFold` for constant ops. Constants cannot be folded any further.
* Use `create` instead of `createOrFold` for ops that do not have a folder.
* Use C++ op builders that take an `int` instead of creating a `ConstantIndexOp`.
* Create `tensor::DimOp` instead of `linalg::createOrFoldDimOp` when it is certain that the operand is a tensor.
Differential Revision: https://reviews.llvm.org/D154196
This operation is a "copy" operation on tensors. It is guaranteed to bufferize to a memcpy. This is different from "tensor.insert_slice", which may fold away.
Note: There is a symmetry between certain tensor, bufferization and memref ops:
* `tensor.empty`, `bufferization.alloc_tensor`, `memref.alloc`
* (none), `bufferization.dealloc_tensor`, `memref.dealloc`
* `tensor.insert_slice`, `bufferization.copy_tensor`, `memref.copy`
Tensor ops can generally canonicalize/fold away, while bufferization dialect ops can be used when a certain side effect is expected to materialize; so they do not fold away.
Differential Revision: https://reviews.llvm.org/D153552
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
* https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
* Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This follows a previous patch that updated calls
`op.cast<T>()-> cast<T>(op)`. However some cases could not handle an
unprefixed `cast` call due to occurrences of variables named cast, or
occurring inside of class definitions which would resolve to the method.
All C++ files that did not work automatically with `cast<T>()` are
updated here to `llvm::cast` and similar with the intention that they
can be easily updated after the methods are removed through a
find-replace.
See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
for the clang-tidy check that is used and then update printed
occurrences of the function to include `llvm::` before.
One can then run the following:
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-export-fixes /tmp/cast/casts.yaml mlir/*\
-header-filter=mlir/ -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D150348
`reifyResultShapes` now returns `OpFoldResult`s instead of `Value`s. This is often more efficient because many transformations immediately attempt to extract a constant from the reified values.
Differential Revision: https://reviews.llvm.org/D145250
`restrict` is similar to the C++ restrict keyword. Results of `to_tensor` that have the `restrict` attribute are guaranteed to not alias any other `to_tensor` result (after bufferization).
Note: Since `to_memref` ops are not supported by One-Shot Bufferize and all bufferizable ops follow DPS rules (i.e., the buffer of the result is the buffer of an operand or an alias thereof), the buffer of a `to_tensor` op that has the `restrict` attribute is always an entirely "new" buffer that is not aliasing with the future buffer of any tensor value in the entire program. This makes such `to_tensor` ops "safe" from a bufferization perspective; they cannot cause RaW conflicts.
Differential Revision: https://reviews.llvm.org/D144021
* `getAliasingOpOperand` => `getAliasingOpOperands`
* `getAliasingOpResult` => `getAliasingOpResults`
Also a few minor code cleanups and better documentation.
Differential Revision: https://reviews.llvm.org/D142979
The name of the method was confusing. It is bufferizesToMemoryWrite, but from the perspective of OpResults.
`bufferizesToMemoryWrite(OpResult)` now supports ops with regions that do not have aliasing OpOperands (such as `scf.if`). These ops no longer need to implement `isMemoryWrite`.
Differential Revision: https://reviews.llvm.org/D141684
This is part of an effort to migrate from llvm::Optional to
std::optional. 22426110c5 changed the way mlir-tblgen generates .inc
files, emitting std::optional when an Optional attribute is specified in
a .td file. It also changed several .td files hard-coding llvm::Optional
to use std::optional. However, the patch excluded a few .td files in
SPIRV and Bufferization hard-coding llvm::Optional. This patch fixes
that defect, and after this patch, references to llvm::Optional in .cpp
and .h files can be replaced mechanically.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D140329
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
D138330 updated the deprecated `getMemorySpaceAsInt` uses to `getMemorySpace`. There are few uses that were missed.
Differential Revision: https://reviews.llvm.org/D139526
MemRef has been accepting a general Attribute as memory space for
a long time. This commits updates bufferization side to catch up,
which allows downstream users to plugin customized symbolic memory
space. This also eliminates quite a few `getMemorySpaceAsInt`
calls, which is deprecated.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D138330
Even though iter_arg and init_arg of an scf.for loop may have the same tensor type, their bufferized memref types are not necessarily equal. It is sometimes necessary to insert a cast in case of differing layout maps.
Differential Revision: https://reviews.llvm.org/D132860
This change generalizes getBufferType. This function can be used to predict the buffer type of any tensor value (not just BlockArguments) without changing any IR. It also subsumes getMemorySpace. This is useful for loop bufferization, where the precise buffer type of an iter_arg cannot be known without examining the loop body.
Differential Revision: https://reviews.llvm.org/D132859