The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:
```
$> mlir-opt --canonicalize input.mlir
func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
%c4 = arith.constant 4 : index
%reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
%0 = arith.muli %arg2, %c4 : index
%dim = memref.dim %reshape, %0 : memref<*xf32>
return %dim : index
}
```
results in:
```
dominator.mlir:22:12: error: operand #1 does not dominate this use
%dim = memref.dim %reshape, %0 : memref<*xf32>
^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
%0 = arith.muli %arg2, %c4 : index
```
Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
Before: op verifiers failed if the input and output ranks were the same
(i.e. no expansion or collapse). This behavior requires users of these
shape ops to verify manually that they are not creating identity
versions of these ops every time they build them -- problematic. This PR
removes this strict verification, and introduces folders for the the
identity cases.
The PR also removes the special case handling of rank-0 tensors for
expand_shape and collapse_shape, there doesn't seem to be any reason to
treat them differently.
When creating a new block in (conversion) rewrite patterns,
`OpBuilder::createBlock` must be used. Otherwise, no
`notifyBlockInserted` notification is sent to the listener.
Note: The dialect conversion relies on listener notifications to keep
track of IR modifications. Creating blocks without the builder API can
lead to memory leaks during rollback.
The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases.
Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit.
Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
* Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering.
Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)
Reverts llvm/llvm-project#79865
I think there is a bug in the stride computation in
`SubViewOp::inferResultType`. (Was already there before this change.)
Reverting this commit for now and updating the original pull request
with a fix and more test cases.
The `memref.subview` verifier currently checks result shape, element
type, memory space and offset of the result type. However, the strides
of the result type are currently not verified. This commit adds
verification of result strides for non-rank reducing ops and fixes
invalid IR in test cases.
Verification of result strides for ops with rank reductions is more
complex (and there could be multiple possible result types). That is
left for a separate commit.
Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped
dimensions, there must be something wrong with the op. Return
`FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the
op has rank reductions or not. Inline the implementation into the
verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
This folded casts into `memref.transpose` without updating the result
type of the transpose op, which resulted in IR that failed to verify for
statically sized memrefs.
i.e.
```mlir
%cast = memref.cast %0 : memref<?x4xf32> to memref<?x?xf32>
%transpose = memref.transpose %cast : memref<?x?xf32> to memref<?x?xf32>
```
would fold to:
```mlir
// Fails verification:
%transpose = memref.transpose %cast : memref<?x4xf32> to memref<?x?xf32>
```
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Currently, the `memref.transpose` verifier forces the result type of the
Op to have an explicit `StridedLayoutAttr` via the method
`inferTransposeResultType`. This means that the example Op
given in the documentation is actually invalid because it uses an `AffineMap`
to specify the layout.
It also means that we can't "un-transpose" a transposed memref back to
the implicit layout form, because the verifier will always enforce the
explicit strided layout.
This patch makes the following changes:
1. The verifier checks whether the canonicalized strided layout of the
result Type is identitcal to the canonicalized infered result type
layout. This way, it's only important that the two Types have the same
strided layout, not necessarily the same representation of it.
2. The folder is extended to support folding away the trivial case of
identity permutation and to fold one transposition into another by
composing the permutation maps.
Fixes https://github.com/llvm/llvm-project/issues/71326.
The cause of the issue was that a new `LoadOp` was created which looked
something like:
```mlir
%arg4 =
func.func main(%arg1 : index, %arg2 : index) {
%alloca_0 = memref.alloca() : memref<vector<1x32xi1>>
%1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>>
%2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>>
return
}
```
which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is
0 dimensional, `%1` has one dimension, but `memref.load` tries to index
`%1` with two indices.
This is now fixed by using the fact that `unpackOneDim` always unpacks
one dim
1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)
and so the `loadOp` should just index only one dimension.
---------
Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
This adds an operation for concatenating ranked tensors along a static
dimension, as well as a decomposition mirroring the existing lowering
from TOSA to Tensor. This offers a convergence point for "input" like
dialects that include various lowerings for concatenation operations,
easing later analysis. In the future, this op can implement the
necessary interfaces for tiling, as well as potentially add conversions
to some kind of linalg and/or memref counterpart.
This patch adds the op, the decomposition, and some basic
folding/canonicalization. Replacing lowerings with the op (such as the
TOSA lowering) will come as a follow up.
See
https://discourse.llvm.org/t/rfc-tensor-add-a-tensor-concatenate-operation/74858
In #71153, the `memref.subview` canonicalizer crashes due to a negative
`size` being passed as an operand. During `SubViewOp::verify` this
negative `size` is not yet detectable since it is dynamic and only
available after constant folding, which happens during the
canonicalization passes. As discussed in
<https://discourse.llvm.org/t/rfc-more-opfoldresult-and-mixed-indices-in-ops-that-deal-with-shaped-values/72510>,
the verifier should not be extended as it should "only verify local
aspects of an operation".
This patch fixes#71153 by not folding in aforementioned situation.
Also, this patch adds a basic offset and size check in the
`OffsetSizeAndStrideOpInterface` verifier.
Note: only `offset` and `size` are checked because `stride` is allowed
to be negative
(54d81e49e3).
This patch fixes two checks where a `SmallBitVector` containing the
potential dropped dims of a SubView/ExtractSlice operation was queried
via `empty()` instead of `none()`.
The current implementation is not very ergonomic or descriptive: It uses `std::optional<unsigned>` where `std::nullopt` represents the parent op and `unsigned` is the region number.
This doesn't give us any useful methods specific to region control flow and makes the code fragile to changes due to now taking the region number into account.
This patch introduces a new type called `RegionBranchPoint`, replacing all uses of `std::optional<unsigned>` in the interface. It can be implicitly constructed from a region or a `RegionSuccessor`, can be compared with a region to check whether the branch point is branching from the parent, adds `isParent` to check whether we are coming from a parent op and adds `RegionSuccessor::parent` as a descriptive way to indicate branching from the parent.
Differential Revision: https://reviews.llvm.org/D159116
`SubViewReturnTypeCanonicalizer` is used by `OpWithOffsetSizesAndStridesConstantArgumentFolder`, which folds constant SSA value (dynamic) sizes into static sizes. The previous implementation crashed when a dynamic size was folded into a static `1` dimension, which was then mistaken as a rank reduction.
Differential Revision: https://reviews.llvm.org/D158721
The `RegionBranchOpInterface` had a few fundamental issues caused by the API design of `getSuccessorRegions`.
It always required passing values for the `operands` parameter. This is problematic as the operands parameter actually changes meaning depending on which predecessor `index` is referring to. If coming from a region, you'd have to find a `RegionBranchTerminatorOpInterface` in that region, get its operand count, and then create a `SmallVector` of that size.
This is not only inconvenient, but also error-prone, which has lead to a bug in the implementation of a previously existing `getSuccessorRegions` overload.
Additionally, this made the method dual-use, trying to serve two different use-cases: 1) Trying to determine possible control flow edges between regions and 2) Trying to determine the region being branched to based on constant operands.
This patch fixes these issues by changing the interface methods and adding new ones:
* The `operands` argument of `getSuccessorRegions` has been removed. The method is now only responsible for returning possible control flow edges between regions.
* An optional `getEntrySuccessorRegions` method has been added. This is used to determine which regions are branched to from the parent op based on constant operands of the parent op. By default, it calls `getSuccessorRegions`. This is analogous to `getSuccessorForOperands` from `BranchOpInterface`.
* Add `getSuccessorRegions` to `RegionBranchTerminatorOpInterface`. This is used to get the possible successors of the terminator based on constant operands. By default, it calls the containing `RegionBranchOpInterface`s `getSuccessorRegions` method.
* `getSuccessorEntryOperands` was renamed to `getEntrySuccessorOperands` for consistency.
Differential Revision: https://reviews.llvm.org/D157506
Author inferReturnTypes methods with the Op Adaptor by using the InferTypeOpAdaptor.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D155115
It also unifies the computation of StridedLayoutAttr. If the stride is
static known value, we can just use it.
Differential Revision: https://reviews.llvm.org/D155017
Support extra concrete class declarations and definitions under NativeTrait that get injected into the class that specifies the trait. Extra declarations and definitions can be passed in as template arguments for NativeOpTraitNativeAttrTrait and NativeTypeTrait.
Usage examples of this feature include:
- Creating a wrapper Trait for authoring inferReturnTypes with the OpAdaptor by specifying necessary Op specific declarations and definitions directly in the trait
- Refactoring the InferTensorType trait
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D154731
* Add `memref::getMixedSize` (same as in the tensor dialect).
* Simplify in-bounds check in `VectorTransferSplitRewritePatterns.cpp` and fix off-by-one error in the static in-bounds check.
* Use "memref::DimOp" instead of `createOrFoldDimOp` when possible.
Differential Revision: https://reviews.llvm.org/D154218
`RewriterBase::Listener::notifyOperationReplaced` notifies observers that an op is about to be replaced with a range of values. This notification is not very useful for ops without results, because it does not specify the replacement op (and it cannot be deduced from the replacement values). It provides no additional information over the `notifyOperationRemoved` notification.
This revision adds an additional notification when a rewriter replaces an op with another op. By default, this notification triggers the original "op replaced with values" notification, so there is no functional change for existing code.
This new API is useful for the transform dialect, which needs to track op replacements. (Updated in a subsequent revision.)
Also includes minor documentation improvements.
Differential Revision: https://reviews.llvm.org/D152814
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
* https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
* Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This follows a previous patch that updated calls
`op.cast<T>()-> cast<T>(op)`. However some cases could not handle an
unprefixed `cast` call due to occurrences of variables named cast, or
occurring inside of class definitions which would resolve to the method.
All C++ files that did not work automatically with `cast<T>()` are
updated here to `llvm::cast` and similar with the intention that they
can be easily updated after the methods are removed through a
find-replace.
See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
for the clang-tidy check that is used and then update printed
occurrences of the function to include `llvm::` before.
One can then run the following:
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-export-fixes /tmp/cast/casts.yaml mlir/*\
-header-filter=mlir/ -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D150348
A new forwarding constructor is introduced on the adaptor to take directly
an OpaqueProperties object and do the nullptr checking and casting to avoid
the boilerplate at callsites.
Differential Revision: https://reviews.llvm.org/D150003
This new features enabled to dedicate custom storage inline within operations.
This storage can be used as an alternative to attributes to store data that is
specific to an operation. Attribute can also be stored inside the properties
storage if desired, but any kind of data can be present as well. This offers
a way to store and mutate data without uniquing in the Context like Attribute.
See the OpPropertiesTest.cpp for an example where a struct with a
std::vector<> is attached to an operation and mutated in-place:
struct TestProperties {
int a = -1;
float b = -1.;
std::vector<int64_t> array = {-33};
};
More complex scheme (including reference-counting) are also possible.
The only constraint to enable storing a C++ object as "properties" on an
operation is to implement three functions:
- convert from the candidate object to an Attribute
- convert from the Attribute to the candidate object
- hash the object
Optional the parsing and printing can also be customized with 2 extra
functions.
A new options is introduced to ODS to allow dialects to specify:
let usePropertiesForAttributes = 1;
When set to true, the inherent attributes for all the ops in this dialect
will be using properties instead of being stored alongside discardable
attributes.
The TestDialect showcases this feature.
Another change is that we introduce new APIs on the Operation class
to access separately the inherent attributes from the discardable ones.
We envision deprecating and removing the `getAttr()`, `getAttrsDictionary()`,
and other similar method which don't make the distinction explicit, leading
to an entirely separate namespace for discardable attributes.
Recommit d572cd1b06 after fixing python bindings build.
Differential Revision: https://reviews.llvm.org/D141742
This new features enabled to dedicate custom storage inline within operations.
This storage can be used as an alternative to attributes to store data that is
specific to an operation. Attribute can also be stored inside the properties
storage if desired, but any kind of data can be present as well. This offers
a way to store and mutate data without uniquing in the Context like Attribute.
See the OpPropertiesTest.cpp for an example where a struct with a
std::vector<> is attached to an operation and mutated in-place:
struct TestProperties {
int a = -1;
float b = -1.;
std::vector<int64_t> array = {-33};
};
More complex scheme (including reference-counting) are also possible.
The only constraint to enable storing a C++ object as "properties" on an
operation is to implement three functions:
- convert from the candidate object to an Attribute
- convert from the Attribute to the candidate object
- hash the object
Optional the parsing and printing can also be customized with 2 extra
functions.
A new options is introduced to ODS to allow dialects to specify:
let usePropertiesForAttributes = 1;
When set to true, the inherent attributes for all the ops in this dialect
will be using properties instead of being stored alongside discardable
attributes.
The TestDialect showcases this feature.
Another change is that we introduce new APIs on the Operation class
to access separately the inherent attributes from the discardable ones.
We envision deprecating and removing the `getAttr()`, `getAttrsDictionary()`,
and other similar method which don't make the distinction explicit, leading
to an entirely separate namespace for discardable attributes.
Differential Revision: https://reviews.llvm.org/D141742
This adds `arith::ConstantOp::materialize`, which builds a constant from
an attribute and type only if it would result in a valid op. This is
useful for dialect `materializeConstant` hooks, and allows for removing
the previous `Attribute, Type` builder which was only used during
materialization.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D148491
The error message that is displayed when the offset in the MemRefType of
the Op's result is unexpected was the wrong way around, mixing up
expected and actual offset.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D148009
This does not work by a mere composition of `enumerate` and `zip_equal`,
because C++17 does not allow for recursive expansion of structured
bindings.
This implementation uses `zippy` to manage the iteratees and adds the
stream of indices as the first zipped range. Because we have an upfront
assertion that all input ranges are of the same length, we only need to
check if the second range has ended during iteration.
As a consequence of using `zippy`, `enumerate` will now follow the
reference and lifetime semantics of the `zip*` family of functions. The
main difference is that `enumerate` exposes each tuple of references
through a new tuple-like type `enumerate_result`, with the familiar
`.index()` and `.value()` member functions.
Because the `enumerate_result` returned on dereference is a
temporary, enumeration result can no longer be used through an
lvalue ref.
Reviewed By: dblaikie, zero9178
Differential Revision: https://reviews.llvm.org/D144503
* `RewriterBase::mergeBlocks` is simplified: it is implemented in terms of `mergeBlockBefore`.
* The signature of `mergeBlockBefore` is consistent with other API (such as `inlineRegionBefore`): an overload for a `Block::iterator` is added.
* Additional safety checks are added to `mergeBlockBefore`: detect cases where the resulting IR could be invalid (no more `dropAllUses`) or partly unreachable (likely a case of incorrect API usage).
* Rename `mergeBlockBefore` to `inlineBlockBefore`.
Differential Revision: https://reviews.llvm.org/D144969
Although specifying an index that is out of bounds for both `memref.dim`
and `tensor.dim` produces an undefined behavior, this is still valid IR.
In particular, we could expose an out of bound index because of some
optimizations, for instance as demonstrated with
https://github.com/llvm/llvm-project/issues/60295, and this shouldn't
cause the compiler to abort.
This patch removes the overzealous verifier checks and properly handles
out of bound indices (as in it doesn't crash the compiler, but still
produces UB).
This fixes https://github.com/llvm/llvm-project/issues/60295.
Note: That `shape.dim` has a similar problem but we're not supposed to
produce UB in this case. Instead we're supposed to propagate an error in
the resulting value and I don't know how to do that at the moment. Hence I
left this part out of the patch.
Differential Revision: https://reviews.llvm.org/D143999
Address space casts are present in common MLIR targets (LLVM, SPIRV).
Some planned rewrites (such as one of the potential fixes to the fact
that the AMDGPU backend requires alloca() to live in address space 5 /
the GPU private memory space) may require such casts to be inserted
into MLIR code, where those address spaces could be represented by
arbitrary memory space attributes.
Therefore, we define memref.memory_space_cast and its lowerings.
Depends on D141293
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D141148