Commit Graph

567 Commits

Author SHA1 Message Date
Matthias Springer
0ba3e96be1 [mlir][SCF][NFC] ValueBoundsConstraintSet: Simplify scf.for implementation (#87862)
This commit simplifies the implementation of the
`ValueBoundsOpInterface` for `scf.for` based on the newly added
`ValueBoundsConstraintSet::compare` API and adds additional
documentation.

Previously, the interface implementation created a new constraint set
just to check if the yielded value and iter_arg are equal. This was
inefficient because constraints were added multiple times (to two
different constraint sets) for ops that are inside the loop.

Note: This is a re-upload of #86239.
2024-04-06 15:30:26 +09:00
Matthias Springer
76435f2dca [mlir][SCF] ValueBoundsConstraintSet: Support scf.if (branches) (#87860)
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.

Example:
```
%0 = scf.if ... -> index {
  scf.yield %a : index
} else {
  scf.yield %b : index
}
```

The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)

Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b

This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.

Note: This is a re-upload of #85895, which was reverted. The bug that
caused the failure was fixed in #87859.
2024-04-06 13:04:49 +09:00
Mehdi Amini
8487e05967 Revert "[mlir][SCF] ValueBoundsConstraintSet: Support scf.if (branches) (#85895)"
This reverts commit 6b30ffef28.

gcc7 bot is broken
2024-04-05 03:00:35 -07:00
Mehdi Amini
e5e1bc0ad8 Revert "[mlir][SCF][NFC] ValueBoundsConstraintSet: Simplify scf.for implementation (#86239)"
This reverts commit 24e4429980.

gcc7 bot is broken
2024-04-05 03:00:29 -07:00
Matthias Springer
24e4429980 [mlir][SCF][NFC] ValueBoundsConstraintSet: Simplify scf.for implementation (#86239)
This commit simplifies the implementation of the
`ValueBoundsOpInterface` for `scf.for` based on the newly added
`ValueBoundsConstraintSet::compare` API and adds additional
documentation.

Previously, the interface implementation created a new constraint set
just to check if the yielded value and iter_arg are equal. This was
inefficient because constraints were added multiple times (to two
different constraint sets) for ops that are inside the loop.
2024-04-05 13:27:50 +09:00
Matthias Springer
6b30ffef28 [mlir][SCF] ValueBoundsConstraintSet: Support scf.if (branches) (#85895)
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.

Example:
```
%0 = scf.if ... -> index {
  scf.yield %a : index
} else {
  scf.yield %b : index
}
```

The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)

Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b

This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.
2024-04-05 13:14:00 +09:00
MaheshRavishankar
5aeb604c7c [mlir][SCF] Modernize coalesceLoops method to handle scf.for loops with iter_args (#87019)
As part of this extension this change also does some general cleanup

1) Make all the methods take `RewriterBase` as arguments instead of
   creating their own builders that tend to crash when used within
   pattern rewrites
2) Split `coalesePerfectlyNestedLoops` into two separate methods, one
   for `scf.for` and other for `affine.for`. The templatization didnt
   seem to be buying much there.

Also general clean up of tests.
2024-04-04 13:44:24 -07:00
Matthias Springer
5e4a44380e [mlir][Interfaces][NFC] ValueBoundsConstraintSet: Pass stop condition in the constructor (#86099)
This commit changes the API of `ValueBoundsConstraintSet`: the stop
condition is now passed to the constructor instead of `processWorklist`.
That makes it easier to add items to the worklist multiple times and
process them in a consistent manner. The current
`ValueBoundsConstraintSet` is passed as a reference to the stop
function, so that the stop function can be defined before the the
`ValueBoundsConstraintSet` is constructed.

This change is in preparation of adding support for branches.
2024-04-04 17:05:47 +09:00
Ivan Butygin
a6d932bca8 [mlir][scf] Align scf.while before block args in canonicalizer (#76195)
If `before` block args are directly forwarded to `scf.condition` make
sure they are passed in the same order.
This is needed for `scf.while` uplifting
https://github.com/llvm/llvm-project/pull/76108
2024-04-02 17:54:06 +03:00
Rolf Morel
eacda36c7d [SCF][Transform] Add support for scf.for in LoopFuseSibling op (#81495)
Adds support for fusing two scf.for loops occurring in the same block.
Uses the rudimentary checks already in place for scf.forall (like the
target loop's operands being dominated by the source loop).

- Fixes a bug in the dominance check whereby it was checked that values
in the target loop themselves dominated the source loop rather than the
ops that define these operands.
- Renames the LoopFuseSibling op to LoopFuseSiblingOp.
- Updates LoopFuseSiblingOp's description.
- Adds tests for using LoopFuseSiblingOp on scf.for loops, including one
which fails without the fix for the dominance check.
- Adds tests checking the different failure modes of the dominance
checker.
- Adds test for case whereby scf.yield is automatically generated when
there are no loop-carried variables.
2024-03-28 14:13:08 +01:00
Justin Fargnoli
35d55f2894 [NFC][mlir] Reorder declarePromisedInterface() operands (#86628)
Reorder the template operands of `declarePromisedInterface()` to match
`declarePromisedInterfaces()`.
2024-03-27 10:30:17 -07:00
Vivian
a9d1fead96 Fix the condition for peeling the first iteration (#86350)
This PR fixes the condition used in loop peeling of the first iteration.
Using ceilDiv instead of floorDiv when calculating the loop counts, so
that the first iteration gets peeled as needed.
2024-03-25 09:53:57 -07:00
Oleksandr "Alex" Zinenko
5a9bdd85ee [mlir] split transform interfaces into a separate library (#85221)
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.

Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.

As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
2024-03-20 22:15:17 +01:00
Justin Fargnoli
513cdb8222 [mlir] Declare promised interfaces for all dialects (#78368)
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.

Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
2024-03-15 20:23:20 -07:00
Matthias Springer
492e8ba038 [mlir] Fix memory leaks after #81759 (#82762)
This commit fixes memory leaks that were introduced by #81759. The way
ops and blocks are erased changed slightly.

The leaks were caused by an incorrect implementation of op builders:
blocks must be created with the supplied builder object. Otherwise, they
are not properly tracked by the dialect conversion and can leak during
rollback.
2024-02-23 14:28:57 +01:00
mlevesquedion
d4fd20258f [mlir] Use arith max or min ops instead of cmp + select (#82178)
I believe the semantics should be the same, but this saves 1 op and simplifies the code.

For example, the following two instructions:

```
%2 = cmp sgt %0, %1
%3 = select %2, %0, %1
```

Are equivalent to:

```
%2 = maxsi %0 %1
```
2024-02-21 12:28:05 -08:00
Mehdi Amini
e546a6e95d Apply clang-tidy fixes for modernize-loop-convert in Utils.cpp (NFC) 2024-02-12 13:27:49 -08:00
Mehdi Amini
8189db978f Apply clang-tidy fixes for performance-unnecessary-value-param in BufferizableOpInterfaceImpl.cpp (NFC) 2024-02-12 13:27:48 -08:00
Mehdi Amini
5370425772 Apply clang-tidy fixes for bugprone-argument-comment in SCF.cpp (NFC) 2024-02-12 13:27:48 -08:00
Jerry Wu
f7201505a6 [mlir] Add transformation to wrap scf::while in zero-trip-check (#81050)
Add `scf::wrapWhileLoopInZeroTripCheck` to wrap scf while loop in
zero-trip-check.
2024-02-08 17:52:09 -05:00
Ivan Butygin
6050cf2884 [mlir][scf] Add reductions support to scf.parallel fusion (#75955)
Properly handle fusion of loops with reductions:
* Check there are no first loop results users between loops
* Create new loop op with merged reduction init values
* Update `scf.reduce` op to contain reductions from both loops
* Update loops users with new loop results
2024-02-01 18:37:19 +03:00
Hsiangkai Wang
c3eb2978a6 [mlir][scf] Considering defining operators of indices when fusing scf::ParallelOp (#80145)
When checking the load indices of the second loop coincide with the
store indices of the first loop, it only considers the index values are
the same or not. However, there are some cases the index values defined
by other operators. In these cases, it will treat them as different even
the results of defining operators are the same.

We already check if the iteration space is the same in isFusionLegal().
When checking operands of defining operators, we only need to consider
the operands come from the same induction variables. If so, we know the
results of defining operators are the same.
2024-02-01 13:57:31 +00:00
fabrizio-indirli
d17b005e46 [mlir][scf] Relax requirements for loops fusion (#79187)
Enable the fusion of parallel loops also when the 1st loop contains
multiple write accesses to the same buffer, if the accesses are always
on the same indices.
Fix LIT test cases whose loops were not being fused.

Signed-off-by: Fabrizio Indirli <Fabrizio.Indirli@arm.com>
2024-01-30 20:17:06 +03:00
MaheshRavishankar
76ead96c1d [mlir][TilingInterface] Use LoopLikeOpInterface in tiling using SCF to unify tiling with scf.for and scf.forall. (#77874)
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.

This is a breaking change with the following changes

1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
   functionality is obtained by using `scf::tileUsingSCF` and setting
   the loop type in `scf::SCFTilingOptions` passed into this method to
   `scf::SCFTilingOptions::LoopType::ForallOp` (using the
   `setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
   renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
   the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
   any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
   `SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
   `LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
   method is replaced with `getRegionIterArgs()`

These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
2024-01-25 21:26:23 -08:00
Matthias Springer
5cc0f76d34 [mlir][IR] Add rewriter API for moving operations (#78988)
The pattern rewriter documentation states that "*all* IR mutations [...]
are required to be performed via the `PatternRewriter`." This commit
adds two functions that were missing from the rewriter API:
`moveOpBefore` and `moveOpAfter`.

After an operation was moved, the `notifyOperationInserted` callback is
triggered. This allows listeners such as the greedy pattern rewrite
driver to react to IR changes.

This commit narrows the discrepancy between the kind of IR modification
that can be performed and the kind of IR modifications that can be
listened to.
2024-01-25 11:01:28 +01:00
Matthias Springer
5fcf907b34 [mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Matthias Springer
c1730f4221 [mlir][SCF] Do not verify step size of scf.for (#78141)
An op verifier should verify only local properties. This commit removes
the verification of `scf.for` step sizes. (Verifiers can check
attributes but should not follow SSA values.) This verification could
reject IR that is actually valid, e.g.:

```mlir
scf.if %always_false {
  // Branch is never entered.
  scf.for ... step %c0 { ... }
}
```

This commit fixes `for-loop-peeling.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`:
```
within split at llvm-project/mlir/test/Dialect/SCF/for-loop-peeling.mlir:293 offset :9:3: note: see current operation:
"scf.for"(%0, %3, %2) ({
^bb0(%arg1: index):
  %4 = "arith.index_cast"(%arg1) : (index) -> i64
  "memref.store"(%4, %arg0) : (i64, memref<i64>) -> ()
  "scf.yield"() : () -> ()
}) {__peeled_loop__} : (index, index, index) -> ()
LLVM ERROR: IR failed to verify after folding
```

Note: `%2` is `arith.constant 0 : index`.
2024-01-15 17:46:54 +01:00
Felix Schneider
f6f1ab9d90 [mlir][scf] Fix for-loop-peeling crash (#77697)
Before applying the peeling patterns, it can happen that the `ForOp`
gets a step of zero during folding. This leads to a division-by-zero
down the line.

This patch adds an additional check for a constant-zero step and a
 test.

Fix https://github.com/llvm/llvm-project/issues/75758
2024-01-12 19:08:16 +01:00
MaheshRavishankar
aa2a96a24a [mlir][TilingInterface] Move TilingInterface tests to use transform dialect ops. (#77204)
In the process a couple of test transform dialect ops are added just
for testing. These operations are not intended to use as full flushed
out of transformation ops, but are rather operations added for testing.

A separate operation is added to `LinalgTransformOps.td` to convert a
`TilingInterface` operation to loops using the
`generateScalarImplementation` method implemented by the
operation. Eventually this and other operations related to tiling
using the `TilingInterface` need to move to a better place (i.e. out
of `Linalg` dialect)
2024-01-11 21:31:03 -08:00
Thomas Raoux
c933bd8185 [MLIR][SCF] Add checks to verify that the pipeliner schedule is correct. (#77083)
Add a check to validate that the schedule passed to the pipeliner
transformation is valid and won't cause the pipeliner to break SSA.

This checks that the for each operation in the loop operations are
scheduled after their operands.
2024-01-10 04:25:57 -08:00
MaheshRavishankar
4435ced949 [mlir][TilingInterface] Allow controlling what fusion is done within tile and fuse (#76871)
Currently the `tileConsumerAndFuseProducerGreedilyUsingSCFFor` method
greedily fuses through all slices that are generated during the tile and
fuse flow. That is not the normal use case. Ideally the caller would
like to control which slices get fused and which dont. This patch
introduces a new field to the `SCFTileAndFuseOptions` to specify this
control.

The contol function also allows the caller to specify if the replacement
for the fused producer needs to be yielded from within the tiled
computation. This allows replacing the fused producers in case they have
other uses. Without this the original producers still survive negating
the utility of the fusion.

The change here also means that the name of the function
`tileConsumerAndFuseProducerGreedily...` can be updated. Defering that
to a later stage to reduce the churn of API changes.
2024-01-08 13:26:10 -08:00
Arseniy Obolenskiy
59569eb756 [mlir] Fix support for loop normalization with integer indices (#76566)
Choose correct type for updated loop boundaries after scf loop
normalization, do not force chosen type to IndexType
2024-01-05 17:49:21 +03:00
Matthias Springer
10056c821a [mlir][SCF] scf.parallel: Make reductions part of the terminator (#75314)
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
2023-12-20 11:06:27 +09:00
Han-Chung Wang
899c2bed9e [mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410)
It is a trivial early-return case. If the cloned ops are not returned,
it will generate `extract_slice` op that extracts the whole slice.
However, it is not folded away. Early-return to avoid the case.

E.g.,

```mlir
func.func @matmul_tensors(
  %arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>)
    -> tensor<?x?xf32> {
  %0 = linalg.matmul  ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>)
                     outs(%arg2: tensor<?x?xf32>)
    -> tensor<?x?xf32>
  return %0 : tensor<?x?xf32>
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op
    %1 = transform.structured.tile_using_for %0 [0, 0, 0] : (!transform.any_op) -> (!transform.any_op)
    transform.yield
  }
}
```

Apply the transforms and canonicalize the IR:

```
mlir-opt --transform-interpreter -canonicalize input.mlir
```

we will get

```mlir
module {
  func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32>
    %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
    %dim_1 = tensor.dim %arg1, %c1 : tensor<?x?xf32>
    %extracted_slice = tensor.extract_slice %arg0[0, 0] [%dim, %dim_0] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
    %extracted_slice_2 = tensor.extract_slice %arg1[0, 0] [%dim_0, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
    %extracted_slice_3 = tensor.extract_slice %arg2[0, 0] [%dim, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
    %0 = linalg.matmul ins(%extracted_slice, %extracted_slice_2 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?x?xf32>) -> tensor<?x?xf32>
    return %0 : tensor<?x?xf32>
  }
}
```

The revision early-return the case so we can get:

```mlir
func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32>
  return %0 : tensor<?x?xf32>
}
```
2023-12-19 09:14:43 -08:00
Ivan Butygin
c0d2ea9d42 [mlir][scf] Improve scf.parallel fusion pass (#75852)
Abort fusion if memref load may alias write, but not the exact alias. 
Add alias check hook to `naivelyFuseParallelOps`, so user can customize
alias checking.
Use builtin alias analysis in `ParallelLoopFusion` pass.
2023-12-19 18:07:46 +03:00
Vivian
bd6a2452ae [mlir][SCF] Add support for peeling the first iteration out of the loop (#74015)
There is a use case that we need to peel the first iteration out of the
for loop so that the peeled forOp can be canonicalized away and the
fillOp can be fused into the inner forall loop. For example, we have
nested loops as below

```
  linalg.fill ins(...) outs(...)
  scf.for %arg = %lb to %ub step %step
    scf.forall ...
```

After the peeling transform, it is expected to be

```
  scf.forall ...
    linalg.fill ins(...) outs(...)
  scf.for %arg = %(lb + step) to %ub step %step
    scf.forall ...
```

This patch makes the most use of the existing peeling functions and adds
support for peeling the first iteration out of the loop.
2023-12-14 17:03:52 -08:00
Keren Zhou
e66f97e8a8 [mlir] Fix loop pipelining when the operand of yield is not defined in the loop body (#75423) 2023-12-13 19:19:13 -08:00
Thomas Raoux
ef112833e1 [MLIR][SCF] Add support for pipelining dynamic loops (#74350)
Support loops without static boundaries. Since the number of iteration
is not known we need to predicate prologue and epilogue in case the
number of iterations is smaller than the number of stages.

This patch includes work from @chengjunlu
2023-12-10 22:32:11 -08:00
Matthias Springer
77f5b33c46 [mlir][SCF] Retire SCF-specific to_memref/to_tensor canonicalization patterns (#74551)
The partial bufferization framework has been replaced with One-Shot
Bufferize. SCF-specific canonicalization patterns for
`to_memref`/`to_tensor` are no longer needed.
2023-12-07 08:24:17 +09:00
Matthias Springer
df7545e4be [mlir][SCF] Fix invalid IR in ParallelOpSingleOrZeroIterationDimsFolder pattern (#74552)
`ParallelOpSingleOrZeroIterationDimsFolder` used to produce invalid IR:
```
within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: error: 'scf.parallel' op expects region #0 to have 0 or 1 blocks
  scf.parallel (%i0, %i1, %i2) = (%c0, %c3, %c7) to (%c1, %c6, %c10) step (%c1, %c2, %c3) {
  ^
within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: note: see current operation:
"scf.parallel"(%4, %5, %3) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
^bb0(%arg1: index):
  "memref.store"(%0, %arg0, %1, %arg1, %6) : (i32, memref<?x?x?xi32>, index, index, index) -> ()
  "scf.yield"() : () -> ()
^bb1(%8: index):  // no predecessors
  "scf.yield"() : () -> ()
}) : (index, index, index) -> ()
```

Together with #74551, this commit fixes
`mlir/test/Dialect/SCF/canonicalize.mlir` when verifying the IR after
each pattern application (#74270).
2023-12-06 15:14:29 +09:00
Thomas Raoux
19e068b048 [MLIR][SCF] Handle more cases in pipelining transform (#74007)
-Fix case where an op is scheduled in stage 0 and used with a distance
of 1
-Fix case where we don't peel the epilogue and a value not part of the
last stage is used outside the loop.
2023-12-01 21:28:21 -08:00
MaheshRavishankar
ec1086f2a0 Fix build error from #72178 (#72905) 2023-11-20 23:09:59 -08:00
Mehdi Amini
26a0b27736 Make MLIR Value more consistent in terms of const "correctness" (NFC) (#72765)
MLIR can't really be const-correct (it would need a `ConstValue` class
alongside the `Value` class really, like `ArrayRef` and
`MutableArrayRef`). This is however making is more consistent: method
that are directly modifying the Value shouldn't be marked const.
2023-11-20 20:52:15 -08:00
Jie Fu
3e6ae77950 [mlir] Non-void lambda does not return a value in all control paths in yieldReplacementForFusedProducer (NFC)
/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp:703:5: error: non-void lambda does not return a value in all
 control paths [-Werror,-Wreturn-type]
    };
    ^
1 error generated.
2023-11-21 09:11:50 +08:00
MaheshRavishankar
4a020018ce [NFC] Simplify the tiling implementation using cloning. (#72178)
The current implementation of tiling using `scf.for` is convoluted to
make sure that the destination passing style of the untiled program is
preserved. The addition of support to tile using `scf.forall` (adapted
from the transform operation in Linalg) in
https://github.com/llvm/llvm-project/pull/67083 used cloning of the
tiled operations to better streamline the implementation. This PR adapts
the other tiling methods to use a similar approach, making the
transformations (and handling destination passing style semantics) more
systematic.

---------

Co-authored-by: Abhishek-Varma <avarma094@gmail.com>
2023-11-20 09:05:48 -08:00
Matthias Springer
96901f1b02 [mlir][SCF] Do not peel already peeled loops (#71900)
Loop peeling is not beneficial if the step size already divides "ub -
lb". There are currently some simple checks to prevent peeling in such
cases when lb, ub, step are constants. This commit adds support for IR
that is the result of loop peeling in the general case; i.e., lb, ub,
step do not necessarily have to be constants.

This change adds a new affine_map simplification rule for semi-affine
maps that appear during loop peeling and are guaranteed to evaluate to a
constant zero. Affine maps such as:
```
(1) affine_map<()[ub, step] -> ((ub - ub mod step) mod step)
(2) affine_map<()[ub, lb, step] -> ((ub - (ub - lb) mod step - lb) mod step)
(3)                                         ^ may contain additional summands
```
Other affine maps with modulo expressions are not supported by the new
simplification rule.

This fixes #71469.
2023-11-16 11:47:57 +09:00
long.chen
1609f1c2a5 [mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269)
detail see the docment: https://mlir.llvm.org/deprecation/

Not all changes are made manually, most of them are made through a clang
tool I wrote https://github.com/lipracer/cpp-refactor.
2023-11-14 13:01:19 +08:00
Nicolas Vasilache
644cd0724d [mlir][SCF] Add folding for IndexSwitchOp (#70924) 2023-11-01 14:30:42 +01:00
Matthias Springer
98a6edd38f [mlir][Interfaces] LoopLikeOpInterface: Expose tied loop results (#70535)
Expose loop results, which correspond to the region iter_arg values that
are returned from the loop when there are no more iterations. Exposing
loop results is optional because some loops (e.g., `scf.while`) do not
have a 1-to-1 mapping between region iter_args and op results.

Also add additional helper functions to query tied
results/iter_args/inits.
2023-11-01 08:34:14 +09:00
Matthias Springer
04736c7f7a [mlir][SCF] Use transform.get_parent_op instead of transform.loop.get_parent_for (#70757)
Add a new attribute to `get_parent_op` to get the n-th parent. Remove
`transform.loop.get_parent_for`, which is no longer needed.
2023-10-31 18:36:40 +09:00