As described in issue llvm/llvm-project#91518, a previous PR
llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.
However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under
`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.
Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:
- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
operations to explicitly include operand and destination types,
whereas previously they relied on type inference to deduce the
tensor types. Since the type inference cannot recover the correct
tensor encoding/memory space, the operand and result types must be
explicitly included. This is a small assembly format change, but it
touches a large number of test files.
- Makes minor updates to other bufferization functions to handle the
changes in building the above ops.
- Updates bufferization of `tensor.from_elements` to handle memory
space.
Integration/upgrade guide:
In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:
```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```
becomes
```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32>
```
Currently the implementation is within a pattern that cannot be used
without a pattern rewriter. Move the decomposition as a method of the
operation to make it usable outside of pattern rewrites.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
In the insert_slice bufferization interface implementation, the
destination tensor is not considered read if the full tensor is
overwritten by the slice. This PR adds the same check for
tensor.parallel_insert_slice.
Adds two new StaticValueUtils:
- `isAllConstantIntValue` checks if an array of `OpFoldResult` are all
equal to a passed `int64_t` value.
- `areConstantIntValues` checks if an array of `OpFoldResult` are all
equal to a passed array of `int64_t` values.
fixes https://github.com/llvm/llvm-project/issues/112435
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Resolves#101708
The updated logic now correctly checks if `transfer_write` completely
overwrites `insert_slice` and only then applies the rewrite for this
pattern.
This check currently covers static sizes, for dynamic sizes
value bounds analysis is needed (see `TODO:`).
There are some spurious libraries which can be removed.
I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
Just directly create the empty tensor of appropriate shape instead of
relying on `UnPackOp::createDestinationTensor` which is trying to infer
the destination shape, which isn't possible in general with the set of
paramters that it is taking.
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
Refactored @Max191's PR https://github.com/llvm/llvm-project/pull/94637
to move it to `Tensor`
From the original PR
>This PR adds fusion by expansion patterns to push a tensor.expand_shape
up through a tensor.collapse_shape with non-intersecting reassociations.
Sometimes parallel collapse_shape ops like this can block propagation of
expand_shape ops, so this allows them to pass through each other.
I'm not sure if I put the code/tests in the right places, so let me know
where those go if they aren't.
cc @MaheshRavishankar @hanhanW
---------
Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
A concatenation of empty tensors can be replaced by a single empty
tensor of the concatenated shape. Add this pattern to
`populateFoldTensorEmptyPatterns`.
tensor.parallel_insert_slice op has implicit inplace behavior. In the
"copy-before-write" bufferize mode, the resolveConflict function will
generate bufferize.copy, making the result incorrect. This patch fixes
this issue.
This PR adds transpose + pack/unpack folding support for transpose ops
in the form of `linalg.generic` ops. There were also some bugs with the
permutation composing in the previous patterns, so this PR fixes these
bugs and adds tests for them as well.
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within
scf.for/scf.forall loop.
To support this two new methods are added to the `TilingInterface`
- `getIterationDomainTileFromOperandTile`
- `getTiledImplementationFromOperandTile`.
Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer.
Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time.
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
---------
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: cxy <chenxunyu1993@gmail.com>
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.
Relands https://github.com/llvm/llvm-project/pull/93488 which failed on
buildbot. Fixes the failure by updating integration tests to use
one-shot-bufferize instead.
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.
Previously if the producer tensor.unpack op had "unpadding" semantics,
the folding pattern would construct a destination that does not match
with the result type of the transpose. Because both ops are DPS we can
just reuse the destination of the transpose.
Additionally cleans up a bunch of trailing whitespace in the test file.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This commit generalizes and cleans up the `ValueBoundsConstraintSet`
API. The API used to provide function overloads for comparing/computing
bounds of:
- index-typed SSA value
- dimension of shaped value
- affine map + operands
This commit removes all overloads. There is now a single entry point for
each `compare` variant and each `computeBound` variant. These functions
now take a `Variable`, which is internally represented as an affine map
and map operands.
This commit also adds support for computing bounds for an affine map +
operands. There was previously no public API for that.
…d viceversa
-- Adds folding of producer linalg transpose op with consumer unpack op,
also adds folding of producer unpack op and consumer transpose op.
-- Minor bug fixes w.r.t. to the test cases.
This patch fixes:
mlir/lib/Dialect/Tensor/Transforms/MergeConsecutiveInsertExtractSlicePatterns.cpp:158:17:
error: 'matchAndRewrite' overrides a member function but is not
marked 'override' [-Werror,-Wsuggest-override]
Fold the `tensor.insert_slice` of `tensor.extract_slice` into
`tensor_extract_slice` when the `insert_slice` simply expand some unit
dims dropped by the `extract_slice`.
Collection of changes with the goal of being able to convert `encoding`
to `memorySpace` during bufferization
- new API for encoder to allow implementation to select destination
memory space
- update existing bufferization implementations to support the new
interface
The pattern rewriter documentation states that "*all* IR mutations [...]
are required to be performed via the `PatternRewriter`." This commit
adds two functions that were missing from the rewriter API:
`moveOpBefore` and `moveOpAfter`.
After an operation was moved, the `notifyOperationInserted` callback is
triggered. This allows listeners such as the greedy pattern rewrite
driver to react to IR changes.
This commit narrows the discrepancy between the kind of IR modification
that can be performed and the kind of IR modifications that can be
listened to.
They can be simplified to reshape ops if outer_dims_perm is an identity
permutation. The revision adds a `isIdentityPermutation` method to
IndexingUtils.
A tensor.pack op can be rewritten to a tensor.expand_shape op if the
packing only happens on inner most dimension.
This also formats the lit checks better.
The revision moves pack/unpack related patterns to
PackAndUnpackPatterns.cpp. This follows the convention like other tensor
ops.
It also renames `populateSimplifyTensorPack` to
`populateSimplifyPackAndUnpackPatterns` and adds a TODO item for
tensor.unpack op.
This adds an operation for concatenating ranked tensors along a static
dimension, as well as a decomposition mirroring the existing lowering
from TOSA to Tensor. This offers a convergence point for "input" like
dialects that include various lowerings for concatenation operations,
easing later analysis. In the future, this op can implement the
necessary interfaces for tiling, as well as potentially add conversions
to some kind of linalg and/or memref counterpart.
This patch adds the op, the decomposition, and some basic
folding/canonicalization. Replacing lowerings with the op (such as the
TOSA lowering) will come as a follow up.
See
https://discourse.llvm.org/t/rfc-tensor-add-a-tensor-concatenate-operation/74858
This patch fixes two checks where a `SmallBitVector` containing the
potential dropped dims of a SubView/ExtractSlice operation was queried
via `empty()` instead of `none()`.
The majority of subset ops operate on hyperrectangular subsets. This
commit adds a new optional interface method
(`getAccessedHyperrectangularSlice`) that can be implemented by such
subset ops. If implemented, the other `operatesOn...` interface methods
of the `SubsetOpInterface` do not have to be implemented anymore.
The comparison logic for hyperrectangular subsets (is
disjoint/equivalent) is implemented with `ValueBoundsOpInterface`. This
makes the subset hoisting more powerful: simple cases where two
different SSA values always have the same runtime value can now be
supported.
There is currently an op interface for subset insertion ops
(`SubsetInsertionOpInterface`), but not for subset extraction ops. This
commit adds `SubsetExtractionOpInterface` to `mlir/Interfaces`, as well
as a common dependent op interface: `SubsetOpInterface`.
- `SubsetOpInterface` is for ops that operate on tensor subsets. It
provides interface methods to check if two subset ops operate on
equivalent or disjoint subsets. Ops that implement this interface must
implement either `SubsetExtractionOpInterface` or
`SubsetInsertionOpInterface`.
- `SubsetExtractionOpInterface` is for ops that extract from a tensor at
a subset. E.g., `tensor.extract_slice`, `tensor.gather`,
`vector.transfer_read`. Current implemented only on
`tensor.extract_slice`.
- `SubsetInsertionOpInterface` is for ops that insert into a destination
tensor at a subset. E.g., `tensor.insert_slice`,
`tensor.parallel_insert_slice`, `tensor.scatter`,
`vector.transfer_write`. Currently only implemented on
`tensor.insert_slice`, `tensor.parallel_insert_slice`.
Other changes:
- Rename `SubsetInsertionOpInterface.td` to `SubsetOpInterface.td`.
- Add helper functions to `ValueBoundsOpInterface.cpp` for checking
whether two slices are disjoint.
The new interfaces will be utilized by a new "loop-invariant subset
hoisting"
transformation. (This new transform is roughly
what `Linalg/Transforms/SubsetHoisting.cpp` is doing, but in a generic
and interface-driven way.)
`SubsetInsertionOpInterface` is an interface for ops that insert into a
destination tensor at a subset. It is currently used by the
bufferization framework to support efficient
`tensor.extract_slice/insert_slice` bufferization and to drive "empty
tensor elimination".
This commit moves the interface to `mlir/Interfaces`. This is in
preparation of adding a new "loop-invariant subset hoisting"
transformation to
`mlir/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp`, which will
utilize `SubsetInsertionOpInterface`. (This new transform is roughly
what `Linalg/Transforms/SubsetHoisting.cpp` is doing, but in a generic
and interface-driven way.)
Two `OpOperand`s are the same if they belong to the same owner and have
the same operand number. There are currently no comparison operators
defined on `OpOperand` and we work around this in multiple places by
comparing pointers.
Note: `OpOperand`s are stored in an op, so it is valid to compare their
pointers to determine if they are the same operand. E.g.,
`getOperandNumber` is also implemented via pointer arithmetics.
* `tensor.collapse_shape` may bufferize to a memory read because the op
may have to reallocate the source buffer.
* `tensor.reshape` should not use `bufferization.clone` for
reallocation. This op has requirements wrt. the order of buffer
writes/reads. Use `memref.alloc` and `memref.copy` instead. Also fix a
bug where the memory space of the source buffer was not propagated to
the reallocated buffer.