This commit relaxes the assumption of type consistency for LLVM dialect
load and store operations in SROA. Instead, there is now a check that
loads and stores are in the bounds specified by the sub-slot they
access.
This commit additionally removes the corresponding patterns from the
type consistency pass, as they are no longer necessary.
Note: It will be necessary to extend Mem2Reg with the logic for
differently sized accesses as well. This is non-the-less a strict
upgrade for productive flows, as the type consistency pass can produce
invalid IR for some odd cases.
This commit expends the Mem2Reg and SROA interface methods with passed
in handles to a `DataLayout` structure. This is done to avoid
superfluous retreiving of data layouts during each conversion of
intrinsics.
This change, additionally, enables subsequent changes to make the LLVM
dialect implementation of these interfaces type agnostic.
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.
Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:
```
$> mlir-opt --canonicalize input.mlir
func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
%c4 = arith.constant 4 : index
%reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
%0 = arith.muli %arg2, %c4 : index
%dim = memref.dim %reshape, %0 : memref<*xf32>
return %dim : index
}
```
results in:
```
dominator.mlir:22:12: error: operand #1 does not dominate this use
%dim = memref.dim %reshape, %0 : memref<*xf32>
^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
%0 = arith.muli %arg2, %c4 : index
```
Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
Before: op verifiers failed if the input and output ranks were the same
(i.e. no expansion or collapse). This behavior requires users of these
shape ops to verify manually that they are not creating identity
versions of these ops every time they build them -- problematic. This PR
removes this strict verification, and introduces folders for the the
identity cases.
The PR also removes the special case handling of rank-0 tensors for
expand_shape and collapse_shape, there doesn't seem to be any reason to
treat them differently.
When creating a new block in (conversion) rewrite patterns,
`OpBuilder::createBlock` must be used. Otherwise, no
`notifyBlockInserted` notification is sent to the listener.
Note: The dialect conversion relies on listener notifications to keep
track of IR modifications. Creating blocks without the builder API can
lead to memory leaks during rollback.
The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases.
Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit.
Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
* Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering.
Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)
Reverts llvm/llvm-project#79865
I think there is a bug in the stride computation in
`SubViewOp::inferResultType`. (Was already there before this change.)
Reverting this commit for now and updating the original pull request
with a fix and more test cases.
The `memref.subview` verifier currently checks result shape, element
type, memory space and offset of the result type. However, the strides
of the result type are currently not verified. This commit adds
verification of result strides for non-rank reducing ops and fixes
invalid IR in test cases.
Verification of result strides for ops with rank reductions is more
complex (and there could be multiple possible result types). That is
left for a separate commit.
Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped
dimensions, there must be something wrong with the op. Return
`FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the
op has rank reductions or not. Inline the implementation into the
verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
This folded casts into `memref.transpose` without updating the result
type of the transpose op, which resulted in IR that failed to verify for
statically sized memrefs.
i.e.
```mlir
%cast = memref.cast %0 : memref<?x4xf32> to memref<?x?xf32>
%transpose = memref.transpose %cast : memref<?x?xf32> to memref<?x?xf32>
```
would fold to:
```mlir
// Fails verification:
%transpose = memref.transpose %cast : memref<?x4xf32> to memref<?x?xf32>
```
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Currently, the `memref.transpose` verifier forces the result type of the
Op to have an explicit `StridedLayoutAttr` via the method
`inferTransposeResultType`. This means that the example Op
given in the documentation is actually invalid because it uses an `AffineMap`
to specify the layout.
It also means that we can't "un-transpose" a transposed memref back to
the implicit layout form, because the verifier will always enforce the
explicit strided layout.
This patch makes the following changes:
1. The verifier checks whether the canonicalized strided layout of the
result Type is identitcal to the canonicalized infered result type
layout. This way, it's only important that the two Types have the same
strided layout, not necessarily the same representation of it.
2. The folder is extended to support folding away the trivial case of
identity permutation and to fold one transposition into another by
composing the permutation maps.
This PR adds promised interface declarations for
`ConvertToLLVMPatternInterface` in all the dialects that support the
`ConvertToLLVM` dialect extension.
Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
Fixes https://github.com/llvm/llvm-project/issues/71326.
The cause of the issue was that a new `LoadOp` was created which looked
something like:
```mlir
%arg4 =
func.func main(%arg1 : index, %arg2 : index) {
%alloca_0 = memref.alloca() : memref<vector<1x32xi1>>
%1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>>
%2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>>
return
}
```
which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is
0 dimensional, `%1` has one dimension, but `memref.load` tries to index
`%1` with two indices.
This is now fixed by using the fact that `unpackOneDim` always unpacks
one dim
1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)
and so the `loadOp` should just index only one dimension.
---------
Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
This adds an operation for concatenating ranked tensors along a static
dimension, as well as a decomposition mirroring the existing lowering
from TOSA to Tensor. This offers a convergence point for "input" like
dialects that include various lowerings for concatenation operations,
easing later analysis. In the future, this op can implement the
necessary interfaces for tiling, as well as potentially add conversions
to some kind of linalg and/or memref counterpart.
This patch adds the op, the decomposition, and some basic
folding/canonicalization. Replacing lowerings with the op (such as the
TOSA lowering) will come as a follow up.
See
https://discourse.llvm.org/t/rfc-tensor-add-a-tensor-concatenate-operation/74858
In #71153, the `memref.subview` canonicalizer crashes due to a negative
`size` being passed as an operand. During `SubViewOp::verify` this
negative `size` is not yet detectable since it is dynamic and only
available after constant folding, which happens during the
canonicalization passes. As discussed in
<https://discourse.llvm.org/t/rfc-more-opfoldresult-and-mixed-indices-in-ops-that-deal-with-shaped-values/72510>,
the verifier should not be extended as it should "only verify local
aspects of an operation".
This patch fixes#71153 by not folding in aforementioned situation.
Also, this patch adds a basic offset and size check in the
`OffsetSizeAndStrideOpInterface` verifier.
Note: only `offset` and `size` are checked because `stride` is allowed
to be negative
(54d81e49e3).
This patch fixes two checks where a `SmallBitVector` containing the
potential dropped dims of a SubView/ExtractSlice operation was queried
via `empty()` instead of `none()`.
Fixes#70902.
The out of bounds check in the SROA implementation for MemRef was not
actually testing anything because it only operated on a store op which
does not trigger the logic by itself. It is now checked for real and the
underlying bug is fixed.
I checked the LLVM implementation just in case but this should not
happen as out-of-bound checks happen in GEP's verifier there.
The current implementation is not very ergonomic or descriptive: It uses `std::optional<unsigned>` where `std::nullopt` represents the parent op and `unsigned` is the region number.
This doesn't give us any useful methods specific to region control flow and makes the code fragile to changes due to now taking the region number into account.
This patch introduces a new type called `RegionBranchPoint`, replacing all uses of `std::optional<unsigned>` in the interface. It can be implicitly constructed from a region or a `RegionSuccessor`, can be compared with a region to check whether the branch point is branching from the parent, adds `isParent` to check whether we are coming from a parent op and adds `RegionSuccessor::parent` as a descriptive way to indicate branching from the parent.
Differential Revision: https://reviews.llvm.org/D159116
`SubViewReturnTypeCanonicalizer` is used by `OpWithOffsetSizesAndStridesConstantArgumentFolder`, which folds constant SSA value (dynamic) sizes into static sizes. The previous implementation crashed when a dynamic size was folded into a static `1` dimension, which was then mistaken as a rank reduction.
Differential Revision: https://reviews.llvm.org/D158721
The `RegionBranchOpInterface` had a few fundamental issues caused by the API design of `getSuccessorRegions`.
It always required passing values for the `operands` parameter. This is problematic as the operands parameter actually changes meaning depending on which predecessor `index` is referring to. If coming from a region, you'd have to find a `RegionBranchTerminatorOpInterface` in that region, get its operand count, and then create a `SmallVector` of that size.
This is not only inconvenient, but also error-prone, which has lead to a bug in the implementation of a previously existing `getSuccessorRegions` overload.
Additionally, this made the method dual-use, trying to serve two different use-cases: 1) Trying to determine possible control flow edges between regions and 2) Trying to determine the region being branched to based on constant operands.
This patch fixes these issues by changing the interface methods and adding new ones:
* The `operands` argument of `getSuccessorRegions` has been removed. The method is now only responsible for returning possible control flow edges between regions.
* An optional `getEntrySuccessorRegions` method has been added. This is used to determine which regions are branched to from the parent op based on constant operands of the parent op. By default, it calls `getSuccessorRegions`. This is analogous to `getSuccessorForOperands` from `BranchOpInterface`.
* Add `getSuccessorRegions` to `RegionBranchTerminatorOpInterface`. This is used to get the possible successors of the terminator based on constant operands. By default, it calls the containing `RegionBranchOpInterface`s `getSuccessorRegions` method.
* `getSuccessorEntryOperands` was renamed to `getEntrySuccessorOperands` for consistency.
Differential Revision: https://reviews.llvm.org/D157506
Author inferReturnTypes methods with the Op Adaptor by using the InferTypeOpAdaptor.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D155115
It also unifies the computation of StridedLayoutAttr. If the stride is
static known value, we can just use it.
Differential Revision: https://reviews.llvm.org/D155017
Support extra concrete class declarations and definitions under NativeTrait that get injected into the class that specifies the trait. Extra declarations and definitions can be passed in as template arguments for NativeOpTraitNativeAttrTrait and NativeTypeTrait.
Usage examples of this feature include:
- Creating a wrapper Trait for authoring inferReturnTypes with the OpAdaptor by specifying necessary Op specific declarations and definitions directly in the trait
- Refactoring the InferTensorType trait
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D154731
This diff makes findDealloc return nullopt if the list of users contains "realloc".
Fixes https://github.com/llvm/llvm-project/issues/60726
(In particular, in SimplifyClones (uses findDealloc) treating "realloc" as "dealloc"
breaks the assumption that "dealloc" has no users)
Test plan: ninja check-mlir check-all
Differential revision: https://reviews.llvm.org/D154892
* Add `memref::getMixedSize` (same as in the tensor dialect).
* Simplify in-bounds check in `VectorTransferSplitRewritePatterns.cpp` and fix off-by-one error in the static in-bounds check.
* Use "memref::DimOp" instead of `createOrFoldDimOp` when possible.
Differential Revision: https://reviews.llvm.org/D154218
This revision introduces support for memset intrinsics in SROA and
mem2reg for the LLVM dialect. This is achieved for SROA by breaking
memsets of aggregates into multiple memsets of scalars, and for mem2reg
by promoting memsets of single integer slots into the value the memset
operation would yield.
The SROA logic supports breaking memsets of static size operating at the
start of a memory slot. The intended most common case is for memsets
covering the entirety of a struct, most often as a way to initialize it
to 0.
The mem2reg logic supports dynamic values and static sizes as input to
promotable memsets. This is achieved by lowering memsets into
`ceil(log_2(n))` LeftShift operations, `ceil(log_2(n))` Or operations
and up to one ZExt operation (for n the byte width of the integer),
computing in registers the integer value the memset would create. Only
byte-aligned integers are supported, more types could easily be added
afterwards.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D152367
`RewriterBase::Listener::notifyOperationReplaced` notifies observers that an op is about to be replaced with a range of values. This notification is not very useful for ops without results, because it does not specify the replacement op (and it cannot be deduced from the replacement values). It provides no additional information over the `notifyOperationRemoved` notification.
This revision adds an additional notification when a rewriter replaces an op with another op. By default, this notification triggers the original "op replaced with values" notification, so there is no functional change for existing code.
This new API is useful for the transform dialect, which needs to track op replacements. (Updated in a subsequent revision.)
Also includes minor documentation improvements.
Differential Revision: https://reviews.llvm.org/D152814
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
This patch implements SROA interfaces for MemRef, up to a given fixed
size.
Reviewed By: gysit, Dinistro
Differential Revision: https://reviews.llvm.org/D151102
This revision modifies the mem2reg interfaces and algorithm to be more
omfortable to use as a pattern. The motivation behind this is that
currently the pattern needs to be applied to the scope op of the region
in which allocators should be promoted. However, a more natural way to
apply the pattern would be to apply it on the allocator directly. This
is not only clearer but easier to parallelize.
This revision changes the mem2reg pattern to operate this way. This
required restraining the interfaces to only mutate IR using
RewriterBase, as the previously used escape hatch is not granular enough
to match on the region that is modified only. This has the unfortunate
cost of preventing batching allocator promotion and making the block
argument adding logic more complex. Because batching no longer made any
sense, I made the internal analyzer/promoter decoupling private again.
This also adds statistics to the mem2reg infrastructure.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D150432
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
* https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
* Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This follows a previous patch that updated calls
`op.cast<T>()-> cast<T>(op)`. However some cases could not handle an
unprefixed `cast` call due to occurrences of variables named cast, or
occurring inside of class definitions which would resolve to the method.
All C++ files that did not work automatically with `cast<T>()` are
updated here to `llvm::cast` and similar with the intention that they
can be easily updated after the methods are removed through a
find-replace.
See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
for the clang-tidy check that is used and then update printed
occurrences of the function to include `llvm::` before.
One can then run the following:
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-export-fixes /tmp/cast/casts.yaml mlir/*\
-header-filter=mlir/ -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D150348
This patch refactors the Mem2Reg infrastructure. It decouples
analysis from promotion, allowing for more control over the execution of
the logic. It also adjusts the interfaces to be less coupled to mem2reg
and more general. This will be useful for an upcoming revision
introducing generic SROA.
This commit reverts f333977eb2 and relands 91cff8a718.
The original commit was reverted accidentally due to a misinterpretation
of a bazel build bot failure.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D149825