This commit simplifies the implementation of the
`ValueBoundsOpInterface` for `scf.for` based on the newly added
`ValueBoundsConstraintSet::compare` API and adds additional
documentation.
Previously, the interface implementation created a new constraint set
just to check if the yielded value and iter_arg are equal. This was
inefficient because constraints were added multiple times (to two
different constraint sets) for ops that are inside the loop.
Note: This is a re-upload of #86239.
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.
Example:
```
%0 = scf.if ... -> index {
scf.yield %a : index
} else {
scf.yield %b : index
}
```
The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)
Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b
This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.
Note: This is a re-upload of #85895, which was reverted. The bug that
caused the failure was fixed in #87859.
This commit simplifies the implementation of the
`ValueBoundsOpInterface` for `scf.for` based on the newly added
`ValueBoundsConstraintSet::compare` API and adds additional
documentation.
Previously, the interface implementation created a new constraint set
just to check if the yielded value and iter_arg are equal. This was
inefficient because constraints were added multiple times (to two
different constraint sets) for ops that are inside the loop.
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.
Example:
```
%0 = scf.if ... -> index {
scf.yield %a : index
} else {
scf.yield %b : index
}
```
The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)
Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b
This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.
This commit changes the API of `ValueBoundsConstraintSet`: the stop
condition is now passed to the constructor instead of `processWorklist`.
That makes it easier to add items to the worklist multiple times and
process them in a consistent manner. The current
`ValueBoundsConstraintSet` is passed as a reference to the stop
function, so that the stop function can be defined before the the
`ValueBoundsConstraintSet` is constructed.
This change is in preparation of adding support for branches.
If `before` block args are directly forwarded to `scf.condition` make
sure they are passed in the same order.
This is needed for `scf.while` uplifting
https://github.com/llvm/llvm-project/pull/76108
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.
Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
This commit fixes memory leaks that were introduced by #81759. The way
ops and blocks are erased changed slightly.
The leaks were caused by an incorrect implementation of op builders:
blocks must be created with the supplied builder object. Otherwise, they
are not properly tracked by the dialect conversion and can leak during
rollback.
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.
This is a breaking change with the following changes
1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
functionality is obtained by using `scf::tileUsingSCF` and setting
the loop type in `scf::SCFTilingOptions` passed into this method to
`scf::SCFTilingOptions::LoopType::ForallOp` (using the
`setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
`SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
`LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
method is replaced with `getRegionIterArgs()`
These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
An op verifier should verify only local properties. This commit removes
the verification of `scf.for` step sizes. (Verifiers can check
attributes but should not follow SSA values.) This verification could
reject IR that is actually valid, e.g.:
```mlir
scf.if %always_false {
// Branch is never entered.
scf.for ... step %c0 { ... }
}
```
This commit fixes `for-loop-peeling.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`:
```
within split at llvm-project/mlir/test/Dialect/SCF/for-loop-peeling.mlir:293 offset :9:3: note: see current operation:
"scf.for"(%0, %3, %2) ({
^bb0(%arg1: index):
%4 = "arith.index_cast"(%arg1) : (index) -> i64
"memref.store"(%4, %arg0) : (i64, memref<i64>) -> ()
"scf.yield"() : () -> ()
}) {__peeled_loop__} : (index, index, index) -> ()
LLVM ERROR: IR failed to verify after folding
```
Note: `%2` is `arith.constant 0 : index`.
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.
Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
-> f32, f32 {
%elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
%elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.addf %lhs, %rhs : f32
scf.reduce.return %res : f32
}, {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.mulf %lhs, %rhs : f32
scf.reduce.return %res : f32
}
}
```
`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
The partial bufferization framework has been replaced with One-Shot
Bufferize. SCF-specific canonicalization patterns for
`to_memref`/`to_tensor` are no longer needed.
`ParallelOpSingleOrZeroIterationDimsFolder` used to produce invalid IR:
```
within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: error: 'scf.parallel' op expects region #0 to have 0 or 1 blocks
scf.parallel (%i0, %i1, %i2) = (%c0, %c3, %c7) to (%c1, %c6, %c10) step (%c1, %c2, %c3) {
^
within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: note: see current operation:
"scf.parallel"(%4, %5, %3) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
^bb0(%arg1: index):
"memref.store"(%0, %arg0, %1, %arg1, %6) : (i32, memref<?x?x?xi32>, index, index, index) -> ()
"scf.yield"() : () -> ()
^bb1(%8: index): // no predecessors
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
```
Together with #74551, this commit fixes
`mlir/test/Dialect/SCF/canonicalize.mlir` when verifying the IR after
each pattern application (#74270).
MLIR can't really be const-correct (it would need a `ConstValue` class
alongside the `Value` class really, like `ArrayRef` and
`MutableArrayRef`). This is however making is more consistent: method
that are directly modifying the Value shouldn't be marked const.
Expose loop results, which correspond to the region iter_arg values that
are returned from the loop when there are no more iterations. Exposing
loop results is optional because some loops (e.g., `scf.while`) do not
have a 1-to-1 mapping between region iter_args and op results.
Also add additional helper functions to query tied
results/iter_args/inits.
The `LoopLikeOpInterface` already has interface methods to query inits
and iter_args. This commit adds helper functions to query tied
init/iter_arg pairs and removes the corresponding functions for
`scf::ForOp`.
Expose a `MutableArrayRef<OpOperand>` instead of
`ValueRange`/`OperandRange`. This allows users of this interface to
change the yielded values and the init values. The names of the
interface methods are the same as the auto-generated op accessor names
(`get...()` returns `OperandRange`, `get...Mutable()` returns
`MutableOperandRange`).
Note: The interface methods return a `MutableArrayRef` instead of a
`MutableOperandRange` because a loop op may not implement
`getYieldedValuesMutable` etc. and there is no safe way to return an
"empty" range with a `MutableOperandRange`.
This adds implementations for `getSingleIterationVar`,
`getSingleLowerBound`, `getSingleUpperBound`, `getSingleStep` of
`LoopLikeOpInterface` to `scf::ParallelOp`. Until now, the
implementations for these methods defaulted to returning `std::nullopt`,
even in the special case where the parallel Op only has one dimension.
Related: https://github.com/llvm/llvm-project/pull/67883
The `getSingle(IterationVar|UpperBound|LowerBound|Step)` methods of
`LoopLikeOpInterface` are useful to quickly query the iteration space of
unidimensional loops. Until now, `scf::ForallOp` always fell back to the
default implementation of these methods, returning `std::nullopt`.
This patch implements those methods, returning the respective bounds
or steps in the special case of `rank == 1`.
Add a new interface method that returns the yielded values.
Also add a verifier that checks the number of inits/iter_args/yielded
values. Most of the checked invariants (but not all of them) are already
covered by the `RegionBranchOpInterface`, but the `LoopLikeOpInterface`
now provides (additional) error messages that are easier to read.
`affine::replaceForOpWithNewYields` and `replaceLoopWithNewYields` (for
"scf.for") are now interface methods and additional loop-carried
variables can now be added to "scf.for"/"affine.for" uniformly. (No more
`TypeSwitch` needed.)
Note: `scf.while` and other loops with loop-carried variables can
implement `replaceWithAdditionalYields`, but to keep this commit small,
that is not done in this commit.
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.
`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.
Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
In line with #66515, change `MutableArrayRange::begin`/`end` to
enumerate `OpOperand &` instead of `Value`. Also remove
`ForOp::getIterOpOperands`/`setIterArg`, which are now redundant.
Note: `MutableOperandRange` cannot be made a derived class of
`indexed_accessor_range_base` (like `OperandRange`), because
`MutableOperandRange::assign` can change the number of operands in the
range.
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
Functions are always callable operations and thus every operation
implementing the `FunctionOpInterface` also implements the
`CallableOpInterface`. The only exception was the FuncOp in the toy
example. To make implementation of the `FunctionOpInterface` easier,
this commit lets `FunctionOpInterface` inherit from
`CallableOpInterface` and merges some of their methods. More precisely,
the `CallableOpInterface` has methods to get the argument and result
attributes and a method to get the result types of the callable region.
These methods are always implemented the same way as their analogues in
`FunctionOpInterface` and thus this commit moves all the argument and
result attribute handling methods to the callable interface as well as
the methods to get the argument and result types. The
`FuntionOpInterface` then does not have to declare them as well, but
just inherits them from the `CallableOpInterface`.
Adding the inheritance relation also required to move the
`FunctionOpInterface` from the IR directory to the Interfaces directory
since IR should not depend on Interfaces.
Reviewed By: jpienaar, springerm
Differential Revision: https://reviews.llvm.org/D157988
The current implementation is not very ergonomic or descriptive: It uses `std::optional<unsigned>` where `std::nullopt` represents the parent op and `unsigned` is the region number.
This doesn't give us any useful methods specific to region control flow and makes the code fragile to changes due to now taking the region number into account.
This patch introduces a new type called `RegionBranchPoint`, replacing all uses of `std::optional<unsigned>` in the interface. It can be implicitly constructed from a region or a `RegionSuccessor`, can be compared with a region to check whether the branch point is branching from the parent, adds `isParent` to check whether we are coming from a parent op and adds `RegionSuccessor::parent` as a descriptive way to indicate branching from the parent.
Differential Revision: https://reviews.llvm.org/D159116
Add two new helper functions `getBeforeBody` and `getAfterBody` to be consistent with "scf.for" (`getBody`) and to show in the API that both regions have exactly one block. Also simplify some code that assumed that there can be more than one block in a region.
Differential Revision: https://reviews.llvm.org/D157860
In essentially all occurrences of adaptor constructions in the codebase, an instance of the op is available and only a different value range is being used. Nevertheless, one had to perform the ritual of calling and pass `getAttrDictionary()`, `getProperties` and `getRegions` manually.
This patch changes that by teaching TableGen to generate a new constructor in the adaptor that is constructable using `GenericAdaptor(valueRange, op)`. The (discardable) attr dictionary, properties and the regions are then taken directly from the passed op, with only the value range being taken from the first parameter.
This simplifies a lot of code and also guarantees that all the various getters of the adaptor work in all scenarios.
Differential Revision: https://reviews.llvm.org/D157516
The verifier incorrectly passed the region number of the predecessor region instead of the successor region to `getSuccessorOperands`. This went unnoticed since all upstream `RegionBranchTerminatorOpInterface` implementations did not make use of the `index` parameter.
Adding an assert to e.g. `scf.condition` to make sure the index is valid or adding a region terminator that passes different operands to different successors immediately causes the verifier to fail as it suddenly gets incorrect types.
This patch fixes the implementation to correctly pass the successor region index.
Differential Revision: https://reviews.llvm.org/D157507
The `RegionBranchOpInterface` had a few fundamental issues caused by the API design of `getSuccessorRegions`.
It always required passing values for the `operands` parameter. This is problematic as the operands parameter actually changes meaning depending on which predecessor `index` is referring to. If coming from a region, you'd have to find a `RegionBranchTerminatorOpInterface` in that region, get its operand count, and then create a `SmallVector` of that size.
This is not only inconvenient, but also error-prone, which has lead to a bug in the implementation of a previously existing `getSuccessorRegions` overload.
Additionally, this made the method dual-use, trying to serve two different use-cases: 1) Trying to determine possible control flow edges between regions and 2) Trying to determine the region being branched to based on constant operands.
This patch fixes these issues by changing the interface methods and adding new ones:
* The `operands` argument of `getSuccessorRegions` has been removed. The method is now only responsible for returning possible control flow edges between regions.
* An optional `getEntrySuccessorRegions` method has been added. This is used to determine which regions are branched to from the parent op based on constant operands of the parent op. By default, it calls `getSuccessorRegions`. This is analogous to `getSuccessorForOperands` from `BranchOpInterface`.
* Add `getSuccessorRegions` to `RegionBranchTerminatorOpInterface`. This is used to get the possible successors of the terminator based on constant operands. By default, it calls the containing `RegionBranchOpInterface`s `getSuccessorRegions` method.
* `getSuccessorEntryOperands` was renamed to `getEntrySuccessorOperands` for consistency.
Differential Revision: https://reviews.llvm.org/D157506
This renaming started with the native ODS support for properties, this is completing it.
A mass automated textual rename seems safe for most codebases.
Drop also the ods prefix to keep the accessors the same as they were before
this change:
properties.odsOperandSegmentSizes
reverts back to:
properties.operandSegementSizes
The ODS prefix was creating divergence between all the places and make it harder to
be consistent.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D157173
* Move `foldDynamicIndexList` to `DialectUtils` and simplify function.
* Move `OpWithOffsetSizesAndStridesConstantArgumentFolder` to `ViewLikeInterface` and add documentation.
Differential Revision: https://reviews.llvm.org/D156581
RegionBranchOpInterface did not allow the operation with regions to
specify itself as successors. Therefore, this implied that the control
is always transferred to a region before being transferred back to the
parent op. Since the region can only transfer the control back to the
parent op from a terminator, this transitively implied that the first
block of any region with a RegionBranchOpInterface is always executed
until the terminator can transfer the control flow back. This is
trivially false for any conditional-like operation that may or may not
execute the region, as well as for loop-like operations that may not
execute the body.
Remove the restriction from the interface description and update the
only transform that relied on it.
See
https://discourse.llvm.org/t/rfc-region-control-flow-interfaces-should-encode-region-not-executed-correctly/72103.
Depends On: https://reviews.llvm.org/D155757
Reviewed By: Mogball, springerm
Differential Revision: https://reviews.llvm.org/D155822
Author inferReturnTypes methods with the Op Adaptor by using the InferTypeOpAdaptor.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D155115
This change lifts the limitation that only the trailing dimensions/sizes
in dynamic index lists can be scalable. It allows us to extend
`MaskedVectorizeOp` and `TileOp` from the Transform dialect so that the
following is allowed:
%1, %loops:3 = transform.structured.tile %0 [4, [4], [4]]
This is also a follow up for https://reviews.llvm.org/D153372
that will enable the following (middle vector dimension is scalable):
transform.structured.masked_vectorize %0 vector_sizes [2, [4], 8]
To facilate this change, the hooks for parsing and printing dynamic
index lists are updated accordingly (`printDynamicIndexList` and
`parseDynamicIndexList`, respectively). `MaskedVectorizeOp` and `TileOp`
are updated to include an array of attribute of bools that captures
whether the corresponding vector dimension/tile size, respectively, are
scalable or not.
NOTE 1: I am re-landing this after the initial version was reverted. To
fix the regression and in addition to the original patch, this revision
updates the Python bindings for the transform dialect
NOTE 2: This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
This relands 048764f23a with fixes.
Differential Revision: https://reviews.llvm.org/D154336
`getConstantIntValue` extracts constant values from all constant-like ops, not just `arith::ConstantIndexOp`.
Differential Revision: https://reviews.llvm.org/D154356
This change lifts the limitation that only the trailing dimensions/sizes
in dynamic index lists can be scalable. It allows us to extend
`MaskedVectorizeOp` and `TileOp` from the Transform dialect so that the
following is allowed:
%1, %loops:3 = transform.structured.tile %0 [[4], [4], 4]
This is also a follow up for https://reviews.llvm.org/D153372
that will enable the following (middle vector dimension is scalable):
transform.structured.masked_vectorize %0 vector_sizes [2, [4], 8]
To facilate this change, the hooks for parsing and printing dynamic
index lists are updated accordingly (`printDynamicIndexList` and
`parseDynamicIndexList`, respectively). `MaskedVectorizeOp` and `TileOp`
are updated to include an array of attribute of bools that captures
whether the corresponding vector dimension/tile size, respectively, are
scalable or not.
This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Differential Revision: https://reviews.llvm.org/D154336
There are existing implementations for `scf.for`, `scf.forall` and `affine.for`. This revision adds an interface method to the `LoopLikeOpInterface`.
* `scf.forall` now implements the `LoopLikeOpInterface`.
* The implementations of `scf.for` and `scf.forall` become interface method implementations. `affine.for` remains as is for the moment. (The implementation of `promoteIfSingleIteration` depepends on helper functions from `MLIRAffineAnalysis`, which cannot be used from `MLIRAffineDialect`, where the interface is currently implemented.)
* More efficient implementations of `promoteIfSingleIteration`. In particular, the `scf.forall` operation now inlines operations instead of cloning them. This also preserves handles when used from the transform dialect.
Differential Revision: https://reviews.llvm.org/D154343