Commit Graph

810 Commits

Author SHA1 Message Date
MaheshRavishankar
5aeb604c7c [mlir][SCF] Modernize coalesceLoops method to handle scf.for loops with iter_args (#87019)
As part of this extension this change also does some general cleanup

1) Make all the methods take `RewriterBase` as arguments instead of
   creating their own builders that tend to crash when used within
   pattern rewrites
2) Split `coalesePerfectlyNestedLoops` into two separate methods, one
   for `scf.for` and other for `affine.for`. The templatization didnt
   seem to be buying much there.

Also general clean up of tests.
2024-04-04 13:44:24 -07:00
Ivan Butygin
5b66b6a32a [mlir][pass] Add composite pass utility (#87166)
Composite pass allows to run sequence of passes in the loop until fixed
point or maximum number of iterations is reached. The usual candidates
are canonicalize+CSE as canonicalize can open more opportunities for CSE
and vice-versa.
2024-04-02 13:30:45 +03:00
long.chen
631e54aa1a [mlir][arith] fix wrong floordivsi fold (#83248)
Fixs https://github.com/llvm/llvm-project/issues/83079
2024-03-22 23:52:47 +08:00
Matthias Gehre
e6048b728d [MLIR][Bufferization] BufferResultsToOutParams: Add option to add attribute to output arguments (#84320)
Adds a new pass option `add-result-attr` that will make the pass add the
attribute `{bufferize.result}` to each argument that was converted from
a result.
This is important e.g. when later using the python bindings / execution
engine to understand which arguments are actually results.

To be able to test this, the pass option was added to the tablegen. To
avoid collisions with the existing, manually defined option struct
`BufferResultsToOutParamsOptions`, that one was renamed to
`BufferResultsToOutParamsOpts`.
2024-03-14 07:50:16 +01:00
Christian Sigg
bb893fa23f [mlir] Fix inlining-threshold.mlir test for NDEBUG builds. 2024-03-13 17:26:50 +01:00
Slava Zakharin
732f5368cd [RFC][mlir] Add profitability callback to the Inliner. (#84258)
Discussion at https://discourse.llvm.org/t/inliner-cost-model/2992

This change adds a callback that reports whether inlining
of the particular call site (communicated via ResolvedCall argument)
is profitable or not. The default MLIR inliner pass behavior
is unchanged, i.e. the callback always returns true.
This callback may be used to customize the inliner behavior
based on the target specifics (like target instructions costs),
profitability of the inlining for further optimizations
(e.g. if inlining may enable loop optimizations or scalar optimizations
due to object shape propagation), optimization levels (e.g. -Os inlining
may be quite different from -Ofast inlining), etc.

One of the questions is whether the ResolvedCall entity represents
enough of the context for the custom inlining models to come up with
the profitability decision. I think we can start with this and
extend it as necessary.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-03-13 08:23:10 -07:00
Congcong Cai
ad23127222 [mlir][inline] avoid inline self-recursive function (#83092) 2024-03-12 06:49:09 +08:00
Matthias Springer
60a20bd697 [mlir][Transforms] Add listener support to dialect conversion (#83425)
This commit adds listener support to the dialect conversion. Similarly
to the greedy pattern rewrite driver, an optional listener can be
specified in the configuration object.

Listeners are notified only if the dialect conversion succeeds. In case
of a failure, where some IR changes are first performed and then rolled
back, no notifications are sent.

Due to the fact that some kinds of rewrite are reflected in the IR
immediately and some in a delayed fashion, there are certain limitations
when attaching a listener; these are documented in `ConversionConfig`.
To summarize, users are always notified about all rewrites that
happened, but the notifications are sent all at once at the very end,
and not interleaved with the actual IR changes.

This change is in preparation improvements to
`transform.apply_conversion_patterns`, which currently invalidates all
handles. In the future, it can use a listener to update handles
accordingly, similar to `transform.apply_patterns`.
2024-03-08 10:34:45 +09:00
Congcong Cai
46f65e45e0 [mlir]use correct iterator when eraseOp (#83444)
#66771 introduce `llvm::post_order(&r.front())` which is equal to
`r.front().getSuccessor(...)`.
It will visit the succ block of current block. But actually here need to
visit all block of region in reverse order.
Fixes: #77420.
2024-03-05 15:33:49 +08:00
Matthias Springer
9606655fbb [mlir][Transforms] Fix use-after-free when accessing replaced block args (#83646)
This commit fixes a bug in a dialect conversion. Currently, when a block
is replaced via a signature conversion, the block is erased during the
"commit" phase. This is problematic because the block arguments may
still be referenced internal data structures of the dialect conversion
(`mapping`). Blocks should be treated same as ops: they should be erased
during the "cleanup" phase.

Note: The test case fails without this fix when running with ASAN, but
may pass when running without ASAN.
2024-03-04 11:09:39 +09:00
mlevesquedion
d4fd20258f [mlir] Use arith max or min ops instead of cmp + select (#82178)
I believe the semantics should be the same, but this saves 1 op and simplifies the code.

For example, the following two instructions:

```
%2 = cmp sgt %0, %1
%3 = select %2, %0, %1
```

Are equivalent to:

```
%2 = maxsi %0 %1
```
2024-02-21 12:28:05 -08:00
Matthias Springer
3a70335bae [mlir][Transforms] Support rolling back properties in dialect conversion (#82474)
The dialect conversion rolls back in-place op modifications upon
failure. Rolling back modifications of attributes is already supported,
but there was no support for properties until now.
2024-02-21 16:41:45 +01:00
Matthias Springer
914e607487 [mlir][IR][NFC] Rename notify*Removed to notify*Erased (#82253)
Rename listener callback names:
* `notifyOperationRemoved` -> `notifyOperationErased`
* `notifyBlockRemoved` -> `notifyBlockErased`

The current callback names are misnomers. The callbacks are triggered
when an operation/block is erased, not when it is removed (unlinked).

E.g.:
```c++
/// Notify the listener that the specified operation is about to be erased.
/// At this point, the operation has zero uses.
///
/// Note: This notification is not triggered when unlinking an operation.
virtual void notifyOperationErased(Operation *op) {}
```

This change is in preparation of adding listener support to the dialect
conversion. The dialect conversion internally unlinks IR before erasing
it at a later point of time. There is an important difference between
"remove" and "erase". Lister callback names should be accurate to avoid
confusion.
2024-02-20 09:08:19 +01:00
Artem Tyurin
3bef17eac6 [mlir] Handle cycles and back edges in --view-op-graph (#82002)
Fixes #62128.
2024-02-17 13:37:49 -08:00
Matthias Springer
8f4cd2c7e3 [mlir][Transforms] Support moveOpBefore/After in dialect conversion (#81240)
Add a new rewrite class for "operation movements". This rewrite class
can roll back `moveOpBefore` and `moveOpAfter`.

`RewriterBase::moveOpBefore` and `RewriterBase::moveOpAfter` is no
longer virtual. (The dialect conversion can gather all required
information for rollbacks from listener notifications.)
2024-02-14 17:39:59 +01:00
lonely eagle
2ecf608829 [mlir]Fix compose subview (#80551)
I found a bug in `test-compose-subview`,You can see the example I gave.
```
#map = affine_map<() -> ()>
module {
  func.func private @fun(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32> {
    %c0 = arith.constant 0 : index
    %c5 = arith.constant 5 : index
    %c1 = arith.constant 1 : index
    %subview = memref.subview %arg0[0, 0] [5, 5] [1, 1] : memref<10x10xf32> to memref<5x5xf32, strided<[10, 1]>>
    %alloc = memref.alloc() : memref<5x5xf32>
    scf.for %arg2 = %c0 to %c5 step %c1 {
      scf.for %arg3 = %c0 to %c5 step %c1 {
        %subview_0 = memref.subview %subview[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32, strided<[10, 1]>> to memref<f32, strided<[], offset: ?>>
        %subview_1 = memref.subview %arg1[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>>
        %alloc_2 = memref.alloc() : memref<f32>
        linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []} ins(%subview_0, %subview_1 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_2 : memref<f32>) {
        ^bb0(%in: f32, %in_4: f32, %out: f32):
          %0 = arith.addf %in, %in_4 : f32
          linalg.yield %0 : f32
        }
        %subview_3 = memref.subview %alloc[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>>
        memref.copy %alloc_2, %subview_3 : memref<f32> to memref<f32, strided<[], offset: ?>>
      }
    }
    return %alloc : memref<5x5xf32>
  }
  func.func @test(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32> {
    %0 = call @fun(%arg0, %arg1) : (memref<10x10xf32>, memref<5x5xf32>) -> memref<5x5xf32>
    return %0 : memref<5x5xf32>
  }
}
```
When I run `mlir-opt test.mlir ---test-compose-subview`.
```
test.mlir:14:9: error: 'linalg.generic' op expected operand rank (2) to match the result rank of indexing_map #0 (0)
        linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []} ins(%subview_0, %subview_1 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_2 : memref<f32>) {
        ^
test1.mlir:14:9: note: see current operation: 
"linalg.generic"(%4, %5, %6) <{indexing_maps = [affine_map<() -> ()>, affine_map<() -> ()>, affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 2, 1>}> ({
^bb0(%arg4: f32, %arg5: f32, %arg6: f32):
  %8 = "arith.addf"(%arg4, %arg5) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  "linalg.yield"(%8) : (f32) -> ()
}) : (memref<1x1xf32, strided<[10, 1], offset: ?>>, memref<f32, strided<[], offset: ?>>, memref<f32>) -> ()
```
This PR fixes that.In the meantime I've extended this PR to handle cases
where stride is greater than 1.
```
func.func private @Unknown0(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32> {
  %c0 = arith.constant 0 : index
  %c5 = arith.constant 5 : index
  %c1 = arith.constant 1 : index
  %subview = memref.subview %arg0[0, 0] [5, 5] [2, 2] : memref<10x10xf32> to memref<5x5xf32, strided<[20, 2]>>
  %alloc = memref.alloc() : memref<5x5xf32>
  scf.for %arg2 = %c0 to %c5 step %c1 {
    scf.for %arg3 = %c0 to %c5 step %c1 {
      %subview_0 = memref.subview %subview[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32, strided<[20, 2]>> to memref<f32, strided<[], offset: ?>>
      %subview_1 = memref.subview %arg1[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>>
      %alloc_2 = memref.alloc() : memref<f32>
      linalg.generic {indexing_maps = [affine_map<() -> ()>, affine_map<() -> ()>, affine_map<() -> ()>], iterator_types = []} ins(%subview_0, %subview_1 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_2 : memref<f32>) {
      ^bb0(%in: f32, %in_4: f32, %out: f32):
        %0 = arith.addf %in, %in_4 : f32
        linalg.yield %0 : f32
      }
      %subview_3 = memref.subview %alloc[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>>
      memref.copy %alloc_2, %subview_3 : memref<f32> to memref<f32, strided<[], offset: ?>>
    }
  }
  return %alloc : memref<5x5xf32>
}
$ mlir-opt test.mlir -test-compose-subview
#map = affine_map<()[s0] -> (s0 * 2)>
#map1 = affine_map<() -> ()>
module {
  func.func private @Unknown0(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32>  {
    %c0 = arith.constant 0 : index
    %c5 = arith.constant 5 : index
    %c1 = arith.constant 1 : index
    %alloc = memref.alloc() : memref<5x5xf32>
    scf.for %arg2 = %c0 to %c5 step %c1 {
      scf.for %arg3 = %c0 to %c5 step %c1 {
        %0 = affine.apply #map()[%arg2]
        %1 = affine.apply #map()[%arg3]
        %subview = memref.subview %arg0[%0, %1] [1, 1] [2, 2] : memref<10x10xf32> to memref<f32, strided<[], offset: ?>>
        %subview_0 = memref.subview %arg1[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>>
        %alloc_1 = memref.alloc() : memref<f32>
        linalg.generic {indexing_maps = [#map1, #map1, #map1], iterator_types = []} ins(%subview, %subview_0 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_1 : memref<f32>) {
        ^bb0(%in: f32, %in_3: f32, %out: f32):
          %2 = arith.addf %in, %in_3 : f32
          linalg.yield %2 : f32
        }
        %subview_2 = memref.subview %alloc[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>>
        memref.copy %alloc_1, %subview_2 : memref<f32> to memref<f32, strided<[], offset: ?>>
      }
    }
    return %alloc : memref<5x5xf32>
  }
}
```
2024-02-07 20:49:27 +01:00
Joshua Cao
7d055af14b [mlir][Symbol] Add verification that symbol's parent is a SymbolTable (#80590)
Following the discussion in
https://discourse.llvm.org/t/symboltable-and-symbol-parent-child-relationship/75446,
we should enforce that a symbol's immediate parent is a symbol table.

I changed some tests to pass the verification. In most cases, we can
wrap the func with a module, change the func to another op with regions
i.e. scf.if, or change the expected error message.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-02-05 22:59:03 -08:00
Matthias Springer
b840d29683 [mlir][IR] Send notifications for cloneRegionBefore (#66871)
Similar to `OpBuilder::clone`, operation/block insertion notifications
should be sent when cloning the contents of a region. E.g., this is to
ensure that the newly created operations are put on the worklist of the
greedy pattern rewriter driver.

Also move `cloneRegionBefore` from `RewriterBase` to `OpBuilder`. It
only creates new IR, so it should be part of the builder API (like
`clone(Operation &)`). The function does not have to be virtual. Now
that notifications are properly sent, the override in the dialect
conversion is no longer needed.
2024-02-02 10:06:10 +01:00
Matthias Springer
237a799e93 [mlir][IR] Notify about block insertion when cloning an op (#80262)
`OpBuilder::clone(Operation &)` should trigger not only
`notifyOperationInserted` but also `notifyBlockInserted` (for all block
contained in `op`).
2024-02-02 09:48:32 +01:00
Matthias Springer
c2675ba91a [mlir][IR] Send missing notification when splitting a block (#79597)
When a block is split with `RewriterBase::splitBlock`, a
`notifyBlockInserted` notification, followed by
`notifyOperationInserted` notifications (for moving over the operations
into the new block) should be sent. This commit adds those
notifications.
2024-01-31 14:56:26 +01:00
Matthias Springer
c672b342c3 [mlir][IR] Send missing notifications when inlining a block (#79593)
When a block is inlined into another block, the nested operations are
moved into another block and the `notifyOperationInserted` callback
should be triggered. This commit adds the missing notifications for:
* `RewriterBase::inlineBlockBefore`
* `RewriterBase::mergeBlocks`
2024-01-31 14:40:38 +01:00
Matthias Springer
da784a2555 [mlir][IR] Add RewriterBase::moveBlockBefore and fix bug in moveOpBefore (#79579)
This commit adds a new method to the rewriter API: `moveBlockBefore`.
This op is utilized by `inlineRegionBefore` and covered by dialect
conversion test cases.

Also fixes a bug in `moveOpBefore`, where the previous op location was
not passed correctly. Adds a test case to
`test-strict-pattern-driver.mlir`.
2024-01-31 11:25:11 +01:00
Dmitriy Smirnov
65fd05517f [MLIR] Added check for IsTerminator trait (#79317)
This PR adds a check for IsTerminator trait to prevent deletion of ops
like gpu.terminator as a "simple op" by RemoveDeadValues pass.
2024-01-26 14:14:54 +01:00
Okwan Kwon
7cc9ae9551 [mlir] allow inlining complex ops (#77514)
Complex ops are pure ops just like the arithmetic ops so they can be
inlined.
2024-01-10 09:23:36 -08:00
Billy Zhu
eb42868f25 [MLIR] Handle materializeConstant failure in GreedyPatternRewriteDriver (#77258)
Make GreedyPatternRewriteDriver handle failures of `materializeConstant`
gracefully. Previously it was not checking whether the returned op was
null and crashing. This PR handles it similarly to how OperationFolder
does it.
2024-01-08 10:29:32 -08:00
Billy Zhu
34a65980d7 [MLIR] Erase location of folded constants (#75415)
Follow up to the discussion from #75258, and serves as an alternate
solution for #74670.

Set the location to Unknown for deduplicated / moved / materialized
constants by OperationFolder. This makes sure that the folded constants
don't end up with an arbitrary location of one of the original ops that
became it, and that hoisted ops don't confuse the stepping order.
2023-12-21 09:54:48 -08:00
Matthias Springer
f10302e3fa [mlir] Require folders to produce Values of same type (#75887)
This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.

Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.

Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.

Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
2023-12-20 14:39:22 +09:00
Matthias Springer
10056c821a [mlir][SCF] scf.parallel: Make reductions part of the terminator (#75314)
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
2023-12-20 11:06:27 +09:00
Fangrui Song
2a9d8caf29 Revert "[MLIR] Fuse locations of merged constants (#74670)"
This reverts commit 87e2e89019.
and its follow-ups 0d1490f09f (#75218)
and 6fe3cd5467 (#75312).

We observed significant OOM/timeout issues due to #74670 to quite a few
services including google-research/swirl-lm. The follow-up #75218 and
 #75312 do not address the issue. Perhaps this is worth more
investigation.
2023-12-13 13:49:03 -08:00
Benjamin Chetioui
0d1490f09f [MLIR] Flatten fused locations when merging constants. (#75218)
[PR 74670](https://github.com/llvm/llvm-project/pull/74670) added
support for merging locations at constant folding time. We have
discovered that in some cases, the number of locations grows so big as
to cause a compilation process to OOM. In that case, many of the
locations end up appearing several times in nested fused locations.

We add here a helper that always flattens fused locations in order to
eliminate duplicates in the case of nested fused locations.
2023-12-12 22:00:23 +01:00
Billy Zhu
87e2e89019 [MLIR] Fuse locations of merged constants (#74670)
When merging constants by the operation folder, the location of the op
that remains should be updated to track the new meaning of this op. This
way we do not lose track of all possible source locations that the
constant op came from, and the final location of the op is less reliant
on the order of folding. This will also help debuggers understand how to
step these instructions.

This PR introduces a helper for operation folder to fuse another
location into the location of an op. When an op is deduplicated, fuse
the location of the op to be removed into the op that is retained. The
retained op now represents both original ops.

The FusedLoc will have a string metadata to help understand the reason
for the location fusion (motivated by the
[example](71be8f3c23/mlir/include/mlir/IR/BuiltinLocationAttributes.td (L130))
in the docstring of FusedLoc).
2023-12-11 19:31:54 -08:00
Rik Huijzer
c9c1b3c37f [mlir][memref] Fix an invalid dim loop motion crash (#74204)
Fixes https://github.com/llvm/llvm-project/issues/73382.

This PR suggests to replace two assertions that were introduced in
adabce4118
(https://reviews.llvm.org/D135748). According to the enum definition of
`NotSpeculatable`, an op that invokes undefined behavior is
`NotSpeculatable`.

0c06e8745f/mlir/include/mlir/Interfaces/SideEffectInterfaces.h (L248-L258)

and both `tensor.dim` and `memref.dim` state that "If the dimension
index is out of bounds, the behavior is undefined."

So therefore it seems to me that `DimOp::getSpeculatability()` should
return `NotSpeculatable` if the dimension index is out of bounds.

The added test is just a simplified version of
https://github.com/llvm/llvm-project/issues/73382.
2023-12-04 08:57:59 +01:00
Christian Ulmann
610c761714 [MLIR][FuncToLLVM] Remove typed pointers from call conversion test pass (#71107)
This commit removes typed pointers from the Func to LLVM test pass.
Typed pointers have been deprecated for a while now and it's planned to
soon remove them from the LLVM dialect.

Related PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
2023-11-03 08:08:12 +01:00
Matthias Springer
1df6504ac2 [mlir][vector] LISH: Implement SubsetOpInterface for transfer_read/write (#70629)
- Implement `SubsetOpInterface`, `SubsetExtractionOpInterface`,
`SubsetInsertionOpInterface` for `vector.transfer_read` and
`vector.transfer_write`.
- Move all tensor subset hoisting test cases from `Linalg` to
`loop-invariant-subset-hoisting.mlir`. (Removing 1 duplicate test case.)
2023-11-01 12:19:30 +09:00
Matthias Springer
ff614a5729 [mlir][Interfaces] LISH: Add helpers for hyperrectangular subsets (#70628)
The majority of subset ops operate on hyperrectangular subsets. This
commit adds a new optional interface method
(`getAccessedHyperrectangularSlice`) that can be implemented by such
subset ops. If implemented, the other `operatesOn...` interface methods
of the `SubsetOpInterface` do not have to be implemented anymore.

The comparison logic for hyperrectangular subsets (is
disjoint/equivalent) is implemented with `ValueBoundsOpInterface`. This
makes the subset hoisting more powerful: simple cases where two
different SSA values always have the same runtime value can now be
supported.
2023-11-01 11:29:00 +09:00
Matthias Springer
7ea1c395cc [mlir][Transforms] LISH: Improve bypass analysis for loop-like ops (#70623)
Improve the bypass analysis for loop-like ops. Until now, loop-like ops
were treated like any other non-subset ops: they prevent hoisting of any
sort because the analysis does not know which parts of a tensor init
operand are accessed by the loop-like op. With this change, the analysis
can look into loop-like ops and analyze which subset they are operating
on.
2023-11-01 11:14:10 +09:00
Matthias Springer
2164a449dc [mlir][Transforms] Add loop-invariant subset hoisting (LISH) transformation (#70619)
Add a loop-invariant subset hoisting pass to `mlir/Interfaces`. This
pass hoist loop-invariant tensor subsets (subset extraction and subset
insertion ops) from loop-like ops. Extraction ops are moved before the
loop. Insertion ops are moved after the loop. The loop body operates on
newly added region iter_args (one per extraction-insertion pair).

This new pass will be improved in subsequent commits (to support more
cases/ops) and will eventually replace
`Linalg/Transforms/SubsetHoisting.cpp`. In contrast to the existing
Linalg subset hoisting, the new pass is op interface-based
(`SubsetOpInterface` and `LoopLikeOpInterface`).
2023-11-01 10:57:17 +09:00
Uday Bondhugula
da0ce32cc3 [MLIR] NFC. Move remaining affine test cases to its dialect dir (#67921)
NFC. Move remaining affine test cases to its dialect dir.
2023-10-06 08:32:49 +05:30
Matthias Springer
63086d6aa0 [mlir][Interfaces] LoopLikeOpInterface: Add replaceWithAdditionalYields (#67121)
`affine::replaceForOpWithNewYields` and `replaceLoopWithNewYields` (for
"scf.for") are now interface methods and additional loop-carried
variables can now be added to "scf.for"/"affine.for" uniformly. (No more
`TypeSwitch` needed.)

Note: `scf.while` and other loops with loop-carried variables can
implement `replaceWithAdditionalYields`, but to keep this commit small,
that is not done in this commit.
2023-09-27 07:53:39 +02:00
Tobias Gysi
85175edd4e [mlir][llvm] Replace NullOp by ZeroOp (#67183)
This revision replaces the LLVM dialect NullOp by the recently
introduced ZeroOp. The ZeroOp is more generic in the sense that it
represents zero values of any LLVM type rather than null pointers only.

This is a follow to https://github.com/llvm/llvm-project/pull/65508
2023-09-25 11:11:52 +02:00
Matthias Springer
695a5a6a66 [mlir][IR] Trigger notifyOperationRemoved callback for nested ops (#66771)
When cloning an op, the `notifyOperationInserted` callback is triggered
for all nested ops. Similarly, the `notifyOperationRemoved` callback
should be triggered for all nested ops when removing an op.

Listeners may inspect the IR during a `notifyOperationRemoved` callback.
Therefore, when multiple ops are removed in a single
`RewriterBase::eraseOp` call, the notifications must be triggered in an
order in which the ops could have been removed one-by-one:

* Op removals must be interleaved with `notifyOperationRemoved`
callbacks. A callback is triggered right before the respective op is
removed.
* Ops are removed post-order and in reverse order. Other traversal
orders could delete an op that still has uses. (This is not avoidable in
graph regions and with cyclic block graphs.)

Differential Revision: Imported from https://reviews.llvm.org/D144193.
2023-09-20 08:45:46 +02:00
Matthias Springer
9b5ef2bea8 [mlir][Interfaces] LoopLikeOpInterface: Support ops with multiple regions (#66754)
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.

`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.

Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
2023-09-19 17:35:38 +02:00
Matthias Springer
4fdc019a89 [mlir][SCF] Add SingleBlock op trait to "scf.while"
This trait is needed so that unstructured control flow is not inlined into "scf.while" ops.

Note: The two regions of "scf.while" are already defined as `SizedRegion<1>`. `SingleBlock` can be queried from C++, `SizedRegion<n>` not.

Fixes #64976.

Differential Revision: https://reviews.llvm.org/D159199
2023-08-31 08:56:31 +02:00
Matthias Springer
8dd8c4adba [mlir][Transforms] Inliner: Extra checks for unstructured control flow
Do not inline IR with multiple blocks into ops that may not support unstructured control flow.

This fixes #64978.

Differential Revision: https://reviews.llvm.org/D159072
2023-08-30 15:28:29 +02:00
Srishti Srivastava
b6bab6db9b [MLIR][transforms] Fix cloneInto() error in RemoveDeadValues pass
This commit fixes an error in the `RemoveDeadValues` pass that is
associated with its incorrect usage of the `cloneInto()` function.

The `setOperands()` function that is used by the `cloneInto()` function
requires all operands to not be null. But, that is not possible in this
pass because we drop uses of dead values, thus making them null. It is
only at the end of the pass that we are assured that such null values
won't exist but during the execution of the pass, there could be null
values.

To fix this, we replace the usage of the `cloneInto()` function to copy
a region with `moveBlock()` to move each block of the region one by one.
This function does not require the presence of non-null values and is
thus the right choice here. This implementation is also more opttimized
because we are moving things instead of copying them. The goal was
always moving.

Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com>

Reviewed By: srishti-pm

Differential Revision: https://reviews.llvm.org/D158941
2023-08-26 19:50:24 +00:00
Srishti Srivastava
0e98fb9fad [MLIR][transforms] Add an optimization pass to remove dead values
Large deep learning models rely on heavy computations. However, not
every computation is necessary. And, even when a computation is
necessary, it helps if the values needed for the computation are
available in registers (which have low-latency) rather than being in
memory (which has high-latency).

Compilers can use liveness analysis to:-
(1) Remove extraneous computations from a program before it executes on
hardware, and,
(2) Optimize register allocation.

Both these tasks help achieve one very important goal: reducing runtime.

Recently, liveness analysis was added to MLIR. Thus, this commit uses
the recently added liveness analysis utility to try to accomplish task
(1).

It adds a pass called `remove-dead-values` whose goal is
optimization (reducing runtime) by removing unnecessary instructions.
Unlike other passes that rely on local information gathered from
patterns to accomplish optimization, this pass uses a full analysis of
the IR, specifically, liveness analysis, and is thus more powerful.

Currently, this pass performs the following optimizations:
(A) Removes function arguments that are not live,
(B) Removes function return values that are not live across all callers of
the function,
(C) Removes unneccesary operands, results, region arguments, region
terminator operands of region branch ops, and,
(D) Removes simple and region branch ops that have all non-live results and
don't affect memory in any way,

iff

the IR doesn't have any non-function symbol ops, non-call symbol user ops
and branch ops.

Here, a "simple op" refers to an op that isn't a symbol op, symbol-user op,
region branch op, branch op, region branch terminator op, or return-like.

It is noteworthy that we do not refer to non-live values as "dead" in this
file to avoid confusing it with dead code analysis's "dead", which refers to
unreachable code (code that never executes on hardware) while "non-live"
refers to code that executes on hardware but is unnecessary. Thus, while the
removal of dead code helps little in reducing runtime, removing non-live
values should theoretically have significant impact (depending on the amount
removed).

It is also important to note that unlike other passes (like `canonicalize`)
that apply op-specific optimizations through patterns, this pass uses
different interfaces to handle various types of ops and tries to cover all
existing ops through these interfaces.

It is because of its reliance on (a) liveness analysis and (b) interfaces
that makes it so powerful that it can optimize ops that don't have a
canonicalizer and even when an op does have a canonicalizer, it can perform
more aggressive optimizations, as observed in the test files associated with
this pass.

Example of optimization (A):-

```
int add_2_to_y(int x, int y) {
  return 2 + y
}

print(add_2_to_y(3, 4))
print(add_2_to_y(5, 6))
```

becomes

```
int add_2_to_y(int y) {
  return 2 + y
}

print(add_2_to_y(4))
print(add_2_to_y(6))
```

Example of optimization (B):-

```
int, int get_incremented_values(int y) {
  store y somewhere in memory
  return y + 1, y + 2
}

y1, y2 = get_incremented_values(4)
y3, y4 = get_incremented_values(6)
print(y2)
```

becomes

```
int get_incremented_values(int y) {
  store y somewhere in memory
  return y + 2
}

y2 = get_incremented_values(4)
y4 = get_incremented_values(6)
print(y2)
```

Example of optimization (C):-

Assume only `%result1` is live here. Then,

```
%result1, %result2, %result3 = scf.while (%arg1 = %operand1, %arg2 = %operand2) {
  %terminator_operand2 = add %arg2, %arg2
  %terminator_operand3 = mul %arg2, %arg2
  %terminator_operand4 = add %arg1, %arg1
  scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3, %terminator_operand4
} do {
^bb0(%arg3, %arg4, %arg5):
  %terminator_operand6 = add %arg4, %arg4
  %terminator_operand5 = add %arg5, %arg5
  scf.yield %terminator_operand5, %terminator_operand6
}
```

becomes

```
%result1, %result2 = scf.while (%arg2 = %operand2) {
  %terminator_operand2 = add %arg2, %arg2
  %terminator_operand3 = mul %arg2, %arg2
  scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3
} do {
^bb0(%arg3, %arg4):
  %terminator_operand6 = add %arg4, %arg4
  scf.yield %terminator_operand6
}
```

It is interesting to see that `%result2` won't be removed even though it is
not live because `%terminator_operand3` forwards to it and cannot be
removed. And, that is because it also forwards to `%arg4`, which is live.

Example of optimization (D):-

```
int square_and_double_of_y(int y) {
  square = y ^ 2
  double = y * 2
  return square, double
}

sq, do = square_and_double_of_y(5)
print(do)
```

becomes

```
int square_and_double_of_y(int y) {
  double = y * 2
  return double
}

do = square_and_double_of_y(5)
print(do)
```

Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com>

Reviewed By: matthiaskramm, Mogball, jcai19

Differential Revision: https://reviews.llvm.org/D157049
2023-08-23 23:54:44 +00:00
Tom Eccles
dea33c80d3 [mlir][Transforms] teach CSE about recursive memory effects
Add support for reasoning about operations with recursive memory effects
to CSE. The recursive effects are gathered by a helper function. I
decided to allow returning duplicates from the helper function because
there's no benefit to spending the computation time to remove them in
the existing use case.

Differential Revision: https://reviews.llvm.org/D156805
2023-08-10 09:40:01 +00:00
Mehdi Amini
363b655920 Finish renaming getOperandSegmentSizeAttr() from operand_segment_sizes to operandSegmentSizes
This renaming started with the native ODS support for properties, this is completing it.

A mass automated textual rename seems safe for most codebases.
Drop also the ods prefix to keep the accessors the same as they were before
this change:
 properties.odsOperandSegmentSizes
reverts back to:
 properties.operandSegementSizes

The ODS prefix was creating divergence between all the places and make it harder to
be consistent.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D157173
2023-08-09 19:37:01 -07:00
Uday Bondhugula
b36de52c98 NFC. Move remaining affine/memref test cases into respective dialect dirs
Move a bunch of lingering test cases from test/Transforms/ into
test/Dialect/Affine and MemRef.

Differential Revision: https://reviews.llvm.org/D155855
2023-07-21 22:36:01 +05:30
tomnatan
2109587cee [MLIR] Don't sort operand of commutative ops when comparing two ops as there is a correctness issue
This feature was introduced in `D123492`.

Doing equivalence on pointers to sort operands of commutative operations is incorrect when checking equivalence of ops in separate regions (where the lhs and rhs operands are marked as equivalent but are not the same value).

It was also discussed in `D123492` and `D129480` that the correct solution would be to stable sort the operands in canonicalization (based on some numbering in the region maybe), but until that lands, reverting this change will unblock us and other users.

An example of a pass that might not work properly because of this is `DuplicateFunctionEliminationPass`.

Reviewed By: mehdi_amini, jpienaar

Differential Revision: https://reviews.llvm.org/D154699
2023-07-14 16:11:54 -07:00