The buffer deallocation pass checks the IR ("operation preconditions")
to make sure that there is no IR that is unsupported. In such a case,
the pass signals a failure.
The pass now rejects all ops with unknown memory effects. We do not know
whether such an op allocates memory or not. Therefore, the buffer
deallocation pass does not know whether a deallocation op should be
inserted or not.
Memory effects are queried from the `MemoryEffectOpInterface` interface.
Ops that do not implement this interface but have the
`RecursiveMemoryEffects` trait do not have any side effects (apart from
the ones that their nested ops may have).
Unregistered ops are now rejected by the pass because they do not
implement the `MemoryEffectOpInterface` and neither do we know if they
have `RecursiveMemoryEffects` or not. All test cases that currently have
unregistered ops are updated to use registered ops.
This commit simplifies a helper function in the ownership-based buffer
deallocation pass. Fixes a potential double-free (depending on the
scheduling of patterns).
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Add a new interface method to `BufferizableOpInterface`:
`hasTensorSemantics`. This method returns "true" if the op has tensor
semantics and should be bufferized.
Until now, we assumed that an op has tensor semantics if it has tensor
operands and/or tensor op results. However, there are ops like
`ml_program.global` that do not have any results/operands but must still
be bufferized (#75103). The new interface method can return "true" for
such ops.
This change also decouples `bufferization::bufferizeOp` a bit from the
func dialect.
`BufferPlacementTransformationBase::isLoop` checks if there a loop in
the region branching graph of an operation. This algorithm is similar to
`isRegionReachable` in the `RegionBranchOpInterface`. To avoid duplicate
code, `isRegionReachable` is generalized, so that it can be used to
detect region loops. A helper function
`RegionBranchOpInterface::hasLoop` is added.
This change also turns a recursive implementation into an iterative one,
which is the preferred implementation strategy in LLVM.
Also move the `isLoop` to `BufferOptimizations.cpp`, so that we can
gradually retire `BufferPlacementTransformationBase`. (This is so that
proper error handling can be added to `BufferViewFlowAnalysis`.)
This change makes block/region walkers consistent with operation
walkers. An operation walk enumerates the current operation. Similarly,
block/region walks should enumerate the current block/region.
Example:
```
// Current behavior:
op1->walk([](Operation *op2) { /* op1 is enumerated */ });
block1->walk([](Block *block2) { /* block1 is NOT enumerated */ });
region1->walk([](Block *block) { /* blocks of region1 are NOT enumerated */ });
region1->walk([](Region *region2) { /* region1 is NOT enumerated });
// New behavior:
op1->walk([](Operation *op2) { /* op1 is enumerated */ });
block1->walk([](Block *block2) { /* block1 IS enumerated */ });
region1->walk([](Block *block) { /* blocks of region1 ARE enumerated */ });
region1->walk([](Region *region2) { /* region1 IS enumerated });
```
Fixes a bug in `SplitDeallocWhenNotAliasingAnyOther`. This pattern used
to generate invalid IR (op dominance error). We never noticed this bug
in existing test cases because other patterns and/or foldings were
applied afterwards and those rewrites "fixed up" the IR again. (The bug
is visible when running `mlir-opt -debug`.) Also add additional comments
to the implementation and simplify the code a bit.
Apart from the fixed dominance error, this change is NFC. Without this
change, buffer deallocation tests will fail when running with #74270.
There is currently an op interface for subset insertion ops
(`SubsetInsertionOpInterface`), but not for subset extraction ops. This
commit adds `SubsetExtractionOpInterface` to `mlir/Interfaces`, as well
as a common dependent op interface: `SubsetOpInterface`.
- `SubsetOpInterface` is for ops that operate on tensor subsets. It
provides interface methods to check if two subset ops operate on
equivalent or disjoint subsets. Ops that implement this interface must
implement either `SubsetExtractionOpInterface` or
`SubsetInsertionOpInterface`.
- `SubsetExtractionOpInterface` is for ops that extract from a tensor at
a subset. E.g., `tensor.extract_slice`, `tensor.gather`,
`vector.transfer_read`. Current implemented only on
`tensor.extract_slice`.
- `SubsetInsertionOpInterface` is for ops that insert into a destination
tensor at a subset. E.g., `tensor.insert_slice`,
`tensor.parallel_insert_slice`, `tensor.scatter`,
`vector.transfer_write`. Currently only implemented on
`tensor.insert_slice`, `tensor.parallel_insert_slice`.
Other changes:
- Rename `SubsetInsertionOpInterface.td` to `SubsetOpInterface.td`.
- Add helper functions to `ValueBoundsOpInterface.cpp` for checking
whether two slices are disjoint.
The new interfaces will be utilized by a new "loop-invariant subset
hoisting"
transformation. (This new transform is roughly
what `Linalg/Transforms/SubsetHoisting.cpp` is doing, but in a generic
and interface-driven way.)
`SubsetInsertionOpInterface` is an interface for ops that insert into a
destination tensor at a subset. It is currently used by the
bufferization framework to support efficient
`tensor.extract_slice/insert_slice` bufferization and to drive "empty
tensor elimination".
This commit moves the interface to `mlir/Interfaces`. This is in
preparation of adding a new "loop-invariant subset hoisting"
transformation to
`mlir/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp`, which will
utilize `SubsetInsertionOpInterface`. (This new transform is roughly
what `Linalg/Transforms/SubsetHoisting.cpp` is doing, but in a generic
and interface-driven way.)
Add a new attribute `bufferization.manual_deallocation` that can be
attached to allocation and deallocation ops. Buffers that are allocated
with this attribute are assigned an ownership of "false". Such buffers
can be deallocated manually (e.g., with `memref.dealloc`) if the
deallocation op also has the attribute set. Previously, the
ownership-based buffer deallocation pass used to reject IR with existing
deallocation ops. This is no longer the case if such ops have this new
attribute.
This change is useful for the sparse compiler, which currently
deallocates the sparse tensor buffers by itself.
Cyclic function call graphs are generally not supported by One-Shot
Bufferize. However, they can be allowed when a function does not have
tensor arguments or results. This is because it is then no longer
necessary that the callee will be bufferized before the caller.
The empty tensor elimination pass semantics have changed recently: when
applied to a module, the One-Shot Module Analysis is run. Otherwise, the
regular One-Shot Analysis is run. The latter one is slightly different
because it ignores function boundaries and treats function block
arguments as "read-only".
This commit updates the transform dialect op to behave in the same way.
* Fixes#67977, a crash in `empty-tensor-elimination`.
* Also improves `linalg.copy` canonicalization.
* Also improves indentation indentation in `mlir-linalg-ods-yaml-gen.cpp`.
Values that are the result of buffer allocation ops are guaranteed to
*not* be the same allocation as block arguments of containing blocks.
This fact can be used to allow for more aggressive simplification of
`bufferization.dealloc` ops.
Inserting clones requires a lot of assumptions to hold on the input IR, e.g., all writes to a buffer need to dominate all reads. This is not guaranteed by one-shot bufferization and isn't easy to verify, thus it could quickly lead to incorrect results that are hard to debug. This commit changes the mechanism of how an ownership indicator is materialized when there is not already a unique ownership present. Additionally, we don't create copies of returned memrefs anymore when we don't have ownership. Instead, we insert assert operations to make sure we have ownership at runtime, or otherwise report to the user that correctness could not be guaranteed.
The buffer deallocation pipeline now works on modules and functions.
Also add extra test cases that run the buffer deallocation pipeline on
modules and functions. (Test cases that insert a helper function.)
* Properly handle the case where an op is deleted and thus no other
interfaces should be processed anymore.
* Don't add ownership indicator arguments and results to function
declarations
Remove the yielded tensor analysis. This analysis was used to detect
cases where One-Shot Bufferize cannot deallocate buffers. Deallocation
has recently been removed from One-Shot Bufferize. Buffers are now
deallocated by the buffer deallocation pass. This analysis is no longer
needed.
This is necessary to support deallocation of IR with gpu.launch
operations because it does not implement the RegionBranchOpInterface.
Implementing the interface would require it to support regions with
unstructured control flow and produced arguments/results.
When cloning an op, the `notifyOperationInserted` callback is triggered
for all nested ops. Similarly, the `notifyOperationRemoved` callback
should be triggered for all nested ops when removing an op.
Listeners may inspect the IR during a `notifyOperationRemoved` callback.
Therefore, when multiple ops are removed in a single
`RewriterBase::eraseOp` call, the notifications must be triggered in an
order in which the ops could have been removed one-by-one:
* Op removals must be interleaved with `notifyOperationRemoved`
callbacks. A callback is triggered right before the respective op is
removed.
* Ops are removed post-order and in reverse order. Other traversal
orders could delete an op that still has uses. (This is not avoidable in
graph regions and with cyclic block graphs.)
Differential Revision: Imported from https://reviews.llvm.org/D144193.
In line with #66515, change `MutableArrayRange::begin`/`end` to
enumerate `OpOperand &` instead of `Value`. Also remove
`ForOp::getIterOpOperands`/`setIterArg`, which are now redundant.
Note: `MutableOperandRange` cannot be made a derived class of
`indexed_accessor_range_base` (like `OperandRange`), because
`MutableOperandRange::assign` can change the number of operands in the
range.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
Since ownership based buffer deallocation requires a few passes to be run in a somewhat fixed sequence, it makes sense to have a pipeline for convenience (and to reduce the number of transform ops to represent default deallocation).
Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.
This new interface allows operations to implement custom handling of ownership values and insertion of dealloc operations which is useful when an op cannot implement the interfaces supported by default by the buffer deallocation pass (e.g., because they are not exactly compatible or because there are some additional semantics to it that would render the default implementations in buffer deallocation invalid, or because no interfaces exist for this
kind of behavior and it's not worth introducing one plus a default implementation in buffer deallocation). Additionally, it can also be used to provide more efficient handling for a specific op than the interface based default
implementations can.
Add a new Buffer Deallocation pass with the intend to replace the old
one. For now it is added as a separate pass alongside in order to allow
downstream users to migrate over gradually. This new pass has the goal
of inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.
This commit generalizes empty tensor elimination to operate on subset
ops. No new test cases are added because all current subset ops were
already supported previously. From this perspective, this change is NFC.
A new interface method (and a helper method) are added to
`SubsetInsertionOpInterface` to build the subset of the destination
tensor.
This commit generalizes the special
tensor.extract_slice/tensor.insert_slice bufferization rules to tensor
subset ops.
Ops that insert a tensor into a tensor at a specified subset (e.g.,
tensor.insert_slice, tensor.scatter) can implement the
`SubsetInsertionOpInterface`.
Apart from adding a new op interface (extending the API), this change is
NFC. The only ops that currently implement the new interface are
tensor.insert_slice and tensor.parallel_insert_slice, and those ops were
are supported by One-Shot Bufferize.
Since buffer deallocation requires a few passes to be run in a somewhat fixed
sequence, it makes sense to have a pipeline for convenience (and to reduce the
number of transform ops to represent default deallocation).
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D159432
Add a method to the BufferDeallocationOpInterface that allows operations to
implement the interface and provide custom logic to compute the ownership
indicators of values it defines. As a demonstrating example, this new method is
implemented by the `arith.select` operation.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158828
This new interface allows operations to implement custom handling of ownership
values and insertion of dealloc operations which is useful when an op cannot
implement the interfaces supported by default by the buffer deallocation pass
(e.g., because they are not exactly compatible or because there are some
additional semantics to it that would render the default implementations in
buffer deallocation invalid, or because no interfaces exist for this kind of
behavior and it's not worth introducing one plus a default implementation in
buffer deallocation). Additionally, it can also be used to provide more
efficient handling for a specific op than the interface based default
implementations can.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158756
Add a new Buffer Deallocation pass replacing the old one with the goal of
inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158421