clang-p2996

Author	SHA1	Message	Date
MaheshRavishankar	5aeb604c7c	[mlir][SCF] Modernize `coalesceLoops` method to handle `scf.for` loops with iter_args (#87019 ) As part of this extension this change also does some general cleanup 1) Make all the methods take `RewriterBase` as arguments instead of creating their own builders that tend to crash when used within pattern rewrites 2) Split `coalesePerfectlyNestedLoops` into two separate methods, one for `scf.for` and other for `affine.for`. The templatization didnt seem to be buying much there. Also general clean up of tests.	2024-04-04 13:44:24 -07:00
Ivan Butygin	5b66b6a32a	[mlir][pass] Add composite pass utility (#87166 ) Composite pass allows to run sequence of passes in the loop until fixed point or maximum number of iterations is reached. The usual candidates are canonicalize+CSE as canonicalize can open more opportunities for CSE and vice-versa.	2024-04-02 13:30:45 +03:00
long.chen	631e54aa1a	[mlir][arith] fix wrong floordivsi fold (#83248 ) Fixs https://github.com/llvm/llvm-project/issues/83079	2024-03-22 23:52:47 +08:00
Matthias Gehre	e6048b728d	[MLIR][Bufferization] BufferResultsToOutParams: Add option to add attribute to output arguments (#84320 ) Adds a new pass option `add-result-attr` that will make the pass add the attribute `{bufferize.result}` to each argument that was converted from a result. This is important e.g. when later using the python bindings / execution engine to understand which arguments are actually results. To be able to test this, the pass option was added to the tablegen. To avoid collisions with the existing, manually defined option struct `BufferResultsToOutParamsOptions`, that one was renamed to `BufferResultsToOutParamsOpts`.	2024-03-14 07:50:16 +01:00
Christian Sigg	bb893fa23f	[mlir] Fix inlining-threshold.mlir test for NDEBUG builds.	2024-03-13 17:26:50 +01:00
Slava Zakharin	732f5368cd	[RFC][mlir] Add profitability callback to the Inliner. (#84258 ) Discussion at https://discourse.llvm.org/t/inliner-cost-model/2992 This change adds a callback that reports whether inlining of the particular call site (communicated via ResolvedCall argument) is profitable or not. The default MLIR inliner pass behavior is unchanged, i.e. the callback always returns true. This callback may be used to customize the inliner behavior based on the target specifics (like target instructions costs), profitability of the inlining for further optimizations (e.g. if inlining may enable loop optimizations or scalar optimizations due to object shape propagation), optimization levels (e.g. -Os inlining may be quite different from -Ofast inlining), etc. One of the questions is whether the ResolvedCall entity represents enough of the context for the custom inlining models to come up with the profitability decision. I think we can start with this and extend it as necessary. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-03-13 08:23:10 -07:00
Congcong Cai	ad23127222	[mlir][inline] avoid inline self-recursive function (#83092 )	2024-03-12 06:49:09 +08:00
Matthias Springer	60a20bd697	[mlir][Transforms] Add listener support to dialect conversion (#83425 ) This commit adds listener support to the dialect conversion. Similarly to the greedy pattern rewrite driver, an optional listener can be specified in the configuration object. Listeners are notified only if the dialect conversion succeeds. In case of a failure, where some IR changes are first performed and then rolled back, no notifications are sent. Due to the fact that some kinds of rewrite are reflected in the IR immediately and some in a delayed fashion, there are certain limitations when attaching a listener; these are documented in `ConversionConfig`. To summarize, users are always notified about all rewrites that happened, but the notifications are sent all at once at the very end, and not interleaved with the actual IR changes. This change is in preparation improvements to `transform.apply_conversion_patterns`, which currently invalidates all handles. In the future, it can use a listener to update handles accordingly, similar to `transform.apply_patterns`.	2024-03-08 10:34:45 +09:00
Congcong Cai	46f65e45e0	[mlir]use correct iterator when eraseOp (#83444 ) #66771 introduce `llvm::post_order(&r.front())` which is equal to `r.front().getSuccessor(...)`. It will visit the succ block of current block. But actually here need to visit all block of region in reverse order. Fixes: #77420.	2024-03-05 15:33:49 +08:00
Matthias Springer	9606655fbb	[mlir][Transforms] Fix use-after-free when accessing replaced block args (#83646 ) This commit fixes a bug in a dialect conversion. Currently, when a block is replaced via a signature conversion, the block is erased during the "commit" phase. This is problematic because the block arguments may still be referenced internal data structures of the dialect conversion (`mapping`). Blocks should be treated same as ops: they should be erased during the "cleanup" phase. Note: The test case fails without this fix when running with ASAN, but may pass when running without ASAN.	2024-03-04 11:09:39 +09:00
mlevesquedion	d4fd20258f	[mlir] Use arith max or min ops instead of cmp + select (#82178 ) I believe the semantics should be the same, but this saves 1 op and simplifies the code. For example, the following two instructions: ``` %2 = cmp sgt %0, %1 %3 = select %2, %0, %1 ``` Are equivalent to: ``` %2 = maxsi %0 %1 ```	2024-02-21 12:28:05 -08:00
Matthias Springer	3a70335bae	[mlir][Transforms] Support rolling back properties in dialect conversion (#82474 ) The dialect conversion rolls back in-place op modifications upon failure. Rolling back modifications of attributes is already supported, but there was no support for properties until now.	2024-02-21 16:41:45 +01:00
Matthias Springer	914e607487	[mlir][IR][NFC] Rename `notifyRemoved` to `notifyErased` (#82253 ) Rename listener callback names: * `notifyOperationRemoved` -> `notifyOperationErased` * `notifyBlockRemoved` -> `notifyBlockErased` The current callback names are misnomers. The callbacks are triggered when an operation/block is erased, not when it is removed (unlinked). E.g.: ```c++ /// Notify the listener that the specified operation is about to be erased. /// At this point, the operation has zero uses. /// /// Note: This notification is not triggered when unlinking an operation. virtual void notifyOperationErased(Operation *op) {} ``` This change is in preparation of adding listener support to the dialect conversion. The dialect conversion internally unlinks IR before erasing it at a later point of time. There is an important difference between "remove" and "erase". Lister callback names should be accurate to avoid confusion.	2024-02-20 09:08:19 +01:00
Artem Tyurin	3bef17eac6	[mlir] Handle cycles and back edges in --view-op-graph (#82002 ) Fixes #62128.	2024-02-17 13:37:49 -08:00
Matthias Springer	8f4cd2c7e3	[mlir][Transforms] Support `moveOpBefore`/`After` in dialect conversion (#81240 ) Add a new rewrite class for "operation movements". This rewrite class can roll back `moveOpBefore` and `moveOpAfter`. `RewriterBase::moveOpBefore` and `RewriterBase::moveOpAfter` is no longer virtual. (The dialect conversion can gather all required information for rollbacks from listener notifications.)	2024-02-14 17:39:59 +01:00
lonely eagle	2ecf608829	[mlir]Fix compose subview (#80551 ) I found a bug in `test-compose-subview`,You can see the example I gave. ``` #map = affine_map<() -> ()> module { func.func private @fun(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32> { %c0 = arith.constant 0 : index %c5 = arith.constant 5 : index %c1 = arith.constant 1 : index %subview = memref.subview %arg0[0, 0] [5, 5] [1, 1] : memref<10x10xf32> to memref<5x5xf32, strided<[10, 1]>> %alloc = memref.alloc() : memref<5x5xf32> scf.for %arg2 = %c0 to %c5 step %c1 { scf.for %arg3 = %c0 to %c5 step %c1 { %subview_0 = memref.subview %subview[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32, strided<[10, 1]>> to memref<f32, strided<[], offset: ?>> %subview_1 = memref.subview %arg1[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>> %alloc_2 = memref.alloc() : memref<f32> linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []} ins(%subview_0, %subview_1 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_2 : memref<f32>) { ^bb0(%in: f32, %in_4: f32, %out: f32): %0 = arith.addf %in, %in_4 : f32 linalg.yield %0 : f32 } %subview_3 = memref.subview %alloc[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>> memref.copy %alloc_2, %subview_3 : memref<f32> to memref<f32, strided<[], offset: ?>> } } return %alloc : memref<5x5xf32> } func.func @test(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32> { %0 = call @fun(%arg0, %arg1) : (memref<10x10xf32>, memref<5x5xf32>) -> memref<5x5xf32> return %0 : memref<5x5xf32> } } ``` When I run `mlir-opt test.mlir ---test-compose-subview`. ``` test.mlir:14:9: error: 'linalg.generic' op expected operand rank (2) to match the result rank of indexing_map #0 (0) linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []} ins(%subview_0, %subview_1 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_2 : memref<f32>) { ^ test1.mlir:14:9: note: see current operation: "linalg.generic"(%4, %5, %6) <{indexing_maps = [affine_map<() -> ()>, affine_map<() -> ()>, affine_map<() -> ()>], iterator_types = [], operandSegmentSizes = array<i32: 2, 1>}> ({ ^bb0(%arg4: f32, %arg5: f32, %arg6: f32): %8 = "arith.addf"(%arg4, %arg5) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%8) : (f32) -> () }) : (memref<1x1xf32, strided<[10, 1], offset: ?>>, memref<f32, strided<[], offset: ?>>, memref<f32>) -> () ``` This PR fixes that.In the meantime I've extended this PR to handle cases where stride is greater than 1. ``` func.func private @Unknown0(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32> { %c0 = arith.constant 0 : index %c5 = arith.constant 5 : index %c1 = arith.constant 1 : index %subview = memref.subview %arg0[0, 0] [5, 5] [2, 2] : memref<10x10xf32> to memref<5x5xf32, strided<[20, 2]>> %alloc = memref.alloc() : memref<5x5xf32> scf.for %arg2 = %c0 to %c5 step %c1 { scf.for %arg3 = %c0 to %c5 step %c1 { %subview_0 = memref.subview %subview[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32, strided<[20, 2]>> to memref<f32, strided<[], offset: ?>> %subview_1 = memref.subview %arg1[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>> %alloc_2 = memref.alloc() : memref<f32> linalg.generic {indexing_maps = [affine_map<() -> ()>, affine_map<() -> ()>, affine_map<() -> ()>], iterator_types = []} ins(%subview_0, %subview_1 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_2 : memref<f32>) { ^bb0(%in: f32, %in_4: f32, %out: f32): %0 = arith.addf %in, %in_4 : f32 linalg.yield %0 : f32 } %subview_3 = memref.subview %alloc[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>> memref.copy %alloc_2, %subview_3 : memref<f32> to memref<f32, strided<[], offset: ?>> } } return %alloc : memref<5x5xf32> } $ mlir-opt test.mlir -test-compose-subview #map = affine_map<()[s0] -> (s0 * 2)> #map1 = affine_map<() -> ()> module { func.func private @Unknown0(%arg0: memref<10x10xf32>, %arg1: memref<5x5xf32>) -> memref<5x5xf32> { %c0 = arith.constant 0 : index %c5 = arith.constant 5 : index %c1 = arith.constant 1 : index %alloc = memref.alloc() : memref<5x5xf32> scf.for %arg2 = %c0 to %c5 step %c1 { scf.for %arg3 = %c0 to %c5 step %c1 { %0 = affine.apply #map()[%arg2] %1 = affine.apply #map()[%arg3] %subview = memref.subview %arg0[%0, %1] [1, 1] [2, 2] : memref<10x10xf32> to memref<f32, strided<[], offset: ?>> %subview_0 = memref.subview %arg1[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>> %alloc_1 = memref.alloc() : memref<f32> linalg.generic {indexing_maps = [#map1, #map1, #map1], iterator_types = []} ins(%subview, %subview_0 : memref<f32, strided<[], offset: ?>>, memref<f32, strided<[], offset: ?>>) outs(%alloc_1 : memref<f32>) { ^bb0(%in: f32, %in_3: f32, %out: f32): %2 = arith.addf %in, %in_3 : f32 linalg.yield %2 : f32 } %subview_2 = memref.subview %alloc[%arg2, %arg3] [1, 1] [1, 1] : memref<5x5xf32> to memref<f32, strided<[], offset: ?>> memref.copy %alloc_1, %subview_2 : memref<f32> to memref<f32, strided<[], offset: ?>> } } return %alloc : memref<5x5xf32> } } ```	2024-02-07 20:49:27 +01:00
Joshua Cao	7d055af14b	[mlir][Symbol] Add verification that symbol's parent is a SymbolTable (#80590 ) Following the discussion in https://discourse.llvm.org/t/symboltable-and-symbol-parent-child-relationship/75446, we should enforce that a symbol's immediate parent is a symbol table. I changed some tests to pass the verification. In most cases, we can wrap the func with a module, change the func to another op with regions i.e. scf.if, or change the expected error message. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-02-05 22:59:03 -08:00
Matthias Springer	b840d29683	[mlir][IR] Send notifications for `cloneRegionBefore` (#66871 ) Similar to `OpBuilder::clone`, operation/block insertion notifications should be sent when cloning the contents of a region. E.g., this is to ensure that the newly created operations are put on the worklist of the greedy pattern rewriter driver. Also move `cloneRegionBefore` from `RewriterBase` to `OpBuilder`. It only creates new IR, so it should be part of the builder API (like `clone(Operation &)`). The function does not have to be virtual. Now that notifications are properly sent, the override in the dialect conversion is no longer needed.	2024-02-02 10:06:10 +01:00
Matthias Springer	237a799e93	[mlir][IR] Notify about block insertion when cloning an op (#80262 ) `OpBuilder::clone(Operation &)` should trigger not only `notifyOperationInserted` but also `notifyBlockInserted` (for all block contained in `op`).	2024-02-02 09:48:32 +01:00
Matthias Springer	c2675ba91a	[mlir][IR] Send missing notification when splitting a block (#79597 ) When a block is split with `RewriterBase::splitBlock`, a `notifyBlockInserted` notification, followed by `notifyOperationInserted` notifications (for moving over the operations into the new block) should be sent. This commit adds those notifications.	2024-01-31 14:56:26 +01:00
Matthias Springer	c672b342c3	[mlir][IR] Send missing notifications when inlining a block (#79593 ) When a block is inlined into another block, the nested operations are moved into another block and the `notifyOperationInserted` callback should be triggered. This commit adds the missing notifications for: * `RewriterBase::inlineBlockBefore` * `RewriterBase::mergeBlocks`	2024-01-31 14:40:38 +01:00
Matthias Springer	da784a2555	[mlir][IR] Add `RewriterBase::moveBlockBefore` and fix bug in `moveOpBefore` (#79579 ) This commit adds a new method to the rewriter API: `moveBlockBefore`. This op is utilized by `inlineRegionBefore` and covered by dialect conversion test cases. Also fixes a bug in `moveOpBefore`, where the previous op location was not passed correctly. Adds a test case to `test-strict-pattern-driver.mlir`.	2024-01-31 11:25:11 +01:00
Dmitriy Smirnov	65fd05517f	[MLIR] Added check for IsTerminator trait (#79317 ) This PR adds a check for IsTerminator trait to prevent deletion of ops like gpu.terminator as a "simple op" by RemoveDeadValues pass.	2024-01-26 14:14:54 +01:00
Okwan Kwon	7cc9ae9551	[mlir] allow inlining complex ops (#77514 ) Complex ops are pure ops just like the arithmetic ops so they can be inlined.	2024-01-10 09:23:36 -08:00
Billy Zhu	eb42868f25	[MLIR] Handle materializeConstant failure in GreedyPatternRewriteDriver (#77258 ) Make GreedyPatternRewriteDriver handle failures of `materializeConstant` gracefully. Previously it was not checking whether the returned op was null and crashing. This PR handles it similarly to how OperationFolder does it.	2024-01-08 10:29:32 -08:00
Billy Zhu	34a65980d7	[MLIR] Erase location of folded constants (#75415 ) Follow up to the discussion from #75258, and serves as an alternate solution for #74670. Set the location to Unknown for deduplicated / moved / materialized constants by OperationFolder. This makes sure that the folded constants don't end up with an arbitrary location of one of the original ops that became it, and that hoisted ops don't confuse the stepping order.	2023-12-21 09:54:48 -08:00
Matthias Springer	f10302e3fa	[mlir] Require folders to produce Values of same type (#75887 ) This commit adds extra assertions to `OperationFolder` and `OpBuilder` to ensure that the types of the folded SSA values match with the result types of the op. There used to be checks that discard the folded results if the types do not match. This commit makes these checks stricter and turns them into assertions. Discarding folded results with the wrong type (without failing explicitly) can hide bugs in op folders. Two such bugs became apparent in MLIR (and some more in downstream projects) and are fixed with this change. Note: The existing type checks were introduced in https://reviews.llvm.org/D95991. Migration guide: If you see failing assertions (`folder produced value of incorrect type`; make sure to run with assertions enabled!), run with `-debug` or dump the operation right before the failing assertion. This will point you to the op that has the broken folder. A common mistake is a mismatch between static/dynamic dimensions (e.g., input has a static dimension but folded result has a dynamic dimension).	2023-12-20 14:39:22 +09:00
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
Fangrui Song	2a9d8caf29	Revert "[MLIR] Fuse locations of merged constants (#74670 )" This reverts commit `87e2e89019`. and its follow-ups `0d1490f09f` (#75218) and `6fe3cd5467` (#75312). We observed significant OOM/timeout issues due to #74670 to quite a few services including google-research/swirl-lm. The follow-up #75218 and #75312 do not address the issue. Perhaps this is worth more investigation.	2023-12-13 13:49:03 -08:00
Benjamin Chetioui	0d1490f09f	[MLIR] Flatten fused locations when merging constants. (#75218 ) [PR 74670](https://github.com/llvm/llvm-project/pull/74670) added support for merging locations at constant folding time. We have discovered that in some cases, the number of locations grows so big as to cause a compilation process to OOM. In that case, many of the locations end up appearing several times in nested fused locations. We add here a helper that always flattens fused locations in order to eliminate duplicates in the case of nested fused locations.	2023-12-12 22:00:23 +01:00
Billy Zhu	87e2e89019	[MLIR] Fuse locations of merged constants (#74670 ) When merging constants by the operation folder, the location of the op that remains should be updated to track the new meaning of this op. This way we do not lose track of all possible source locations that the constant op came from, and the final location of the op is less reliant on the order of folding. This will also help debuggers understand how to step these instructions. This PR introduces a helper for operation folder to fuse another location into the location of an op. When an op is deduplicated, fuse the location of the op to be removed into the op that is retained. The retained op now represents both original ops. The FusedLoc will have a string metadata to help understand the reason for the location fusion (motivated by the [example](`71be8f3c23/mlir/include/mlir/IR/BuiltinLocationAttributes.td (L130)`) in the docstring of FusedLoc).	2023-12-11 19:31:54 -08:00
Rik Huijzer	c9c1b3c37f	[mlir][memref] Fix an invalid dim loop motion crash (#74204 ) Fixes https://github.com/llvm/llvm-project/issues/73382. This PR suggests to replace two assertions that were introduced in `adabce4118` (https://reviews.llvm.org/D135748). According to the enum definition of `NotSpeculatable`, an op that invokes undefined behavior is `NotSpeculatable`. `0c06e8745f/mlir/include/mlir/Interfaces/SideEffectInterfaces.h (L248-L258)` and both `tensor.dim` and `memref.dim` state that "If the dimension index is out of bounds, the behavior is undefined." So therefore it seems to me that `DimOp::getSpeculatability()` should return `NotSpeculatable` if the dimension index is out of bounds. The added test is just a simplified version of https://github.com/llvm/llvm-project/issues/73382.	2023-12-04 08:57:59 +01:00
Christian Ulmann	610c761714	[MLIR][FuncToLLVM] Remove typed pointers from call conversion test pass (#71107 ) This commit removes typed pointers from the Func to LLVM test pass. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-03 08:08:12 +01:00
Matthias Springer	1df6504ac2	[mlir][vector] LISH: Implement `SubsetOpInterface` for transfer_read/write (#70629 ) - Implement `SubsetOpInterface`, `SubsetExtractionOpInterface`, `SubsetInsertionOpInterface` for `vector.transfer_read` and `vector.transfer_write`. - Move all tensor subset hoisting test cases from `Linalg` to `loop-invariant-subset-hoisting.mlir`. (Removing 1 duplicate test case.)	2023-11-01 12:19:30 +09:00
Matthias Springer	ff614a5729	[mlir][Interfaces] LISH: Add helpers for hyperrectangular subsets (#70628 ) The majority of subset ops operate on hyperrectangular subsets. This commit adds a new optional interface method (`getAccessedHyperrectangularSlice`) that can be implemented by such subset ops. If implemented, the other `operatesOn...` interface methods of the `SubsetOpInterface` do not have to be implemented anymore. The comparison logic for hyperrectangular subsets (is disjoint/equivalent) is implemented with `ValueBoundsOpInterface`. This makes the subset hoisting more powerful: simple cases where two different SSA values always have the same runtime value can now be supported.	2023-11-01 11:29:00 +09:00
Matthias Springer	7ea1c395cc	[mlir][Transforms] LISH: Improve bypass analysis for loop-like ops (#70623 ) Improve the bypass analysis for loop-like ops. Until now, loop-like ops were treated like any other non-subset ops: they prevent hoisting of any sort because the analysis does not know which parts of a tensor init operand are accessed by the loop-like op. With this change, the analysis can look into loop-like ops and analyze which subset they are operating on.	2023-11-01 11:14:10 +09:00
Matthias Springer	2164a449dc	[mlir][Transforms] Add loop-invariant subset hoisting (LISH) transformation (#70619 ) Add a loop-invariant subset hoisting pass to `mlir/Interfaces`. This pass hoist loop-invariant tensor subsets (subset extraction and subset insertion ops) from loop-like ops. Extraction ops are moved before the loop. Insertion ops are moved after the loop. The loop body operates on newly added region iter_args (one per extraction-insertion pair). This new pass will be improved in subsequent commits (to support more cases/ops) and will eventually replace `Linalg/Transforms/SubsetHoisting.cpp`. In contrast to the existing Linalg subset hoisting, the new pass is op interface-based (`SubsetOpInterface` and `LoopLikeOpInterface`).	2023-11-01 10:57:17 +09:00
Uday Bondhugula	da0ce32cc3	[MLIR] NFC. Move remaining affine test cases to its dialect dir (#67921 ) NFC. Move remaining affine test cases to its dialect dir.	2023-10-06 08:32:49 +05:30
Matthias Springer	63086d6aa0	[mlir][Interfaces] `LoopLikeOpInterface`: Add `replaceWithAdditionalYields` (#67121 ) `affine::replaceForOpWithNewYields` and `replaceLoopWithNewYields` (for "scf.for") are now interface methods and additional loop-carried variables can now be added to "scf.for"/"affine.for" uniformly. (No more `TypeSwitch` needed.) Note: `scf.while` and other loops with loop-carried variables can implement `replaceWithAdditionalYields`, but to keep this commit small, that is not done in this commit.	2023-09-27 07:53:39 +02:00
Tobias Gysi	85175edd4e	[mlir][llvm] Replace NullOp by ZeroOp (#67183 ) This revision replaces the LLVM dialect NullOp by the recently introduced ZeroOp. The ZeroOp is more generic in the sense that it represents zero values of any LLVM type rather than null pointers only. This is a follow to https://github.com/llvm/llvm-project/pull/65508	2023-09-25 11:11:52 +02:00
Matthias Springer	695a5a6a66	[mlir][IR] Trigger `notifyOperationRemoved` callback for nested ops (#66771 ) When cloning an op, the `notifyOperationInserted` callback is triggered for all nested ops. Similarly, the `notifyOperationRemoved` callback should be triggered for all nested ops when removing an op. Listeners may inspect the IR during a `notifyOperationRemoved` callback. Therefore, when multiple ops are removed in a single `RewriterBase::eraseOp` call, the notifications must be triggered in an order in which the ops could have been removed one-by-one: * Op removals must be interleaved with `notifyOperationRemoved` callbacks. A callback is triggered right before the respective op is removed. * Ops are removed post-order and in reverse order. Other traversal orders could delete an op that still has uses. (This is not avoidable in graph regions and with cyclic block graphs.) Differential Revision: Imported from https://reviews.llvm.org/D144193.	2023-09-20 08:45:46 +02:00
Matthias Springer	9b5ef2bea8	[mlir][Interfaces] `LoopLikeOpInterface`: Support ops with multiple regions (#66754 ) This commit implements `LoopLikeOpInterface` on `scf.while`. This enables LICM (and potentially other transforms) on `scf.while`. `LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and can now return multiple regions. Also fix a bug in the default implementation of `LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false" for some values that are defined outside of the loop (in a nested op, in such a way that the value does not dominate the loop). This interface is currently only used for LICM and there is no way to trigger this bug, so no test is added.	2023-09-19 17:35:38 +02:00
Matthias Springer	4fdc019a89	[mlir][SCF] Add `SingleBlock` op trait to "scf.while" This trait is needed so that unstructured control flow is not inlined into "scf.while" ops. Note: The two regions of "scf.while" are already defined as `SizedRegion<1>`. `SingleBlock` can be queried from C++, `SizedRegion<n>` not. Fixes #64976. Differential Revision: https://reviews.llvm.org/D159199	2023-08-31 08:56:31 +02:00
Matthias Springer	8dd8c4adba	[mlir][Transforms] Inliner: Extra checks for unstructured control flow Do not inline IR with multiple blocks into ops that may not support unstructured control flow. This fixes #64978. Differential Revision: https://reviews.llvm.org/D159072	2023-08-30 15:28:29 +02:00
Srishti Srivastava	b6bab6db9b	[MLIR][transforms] Fix `cloneInto()` error in `RemoveDeadValues` pass This commit fixes an error in the `RemoveDeadValues` pass that is associated with its incorrect usage of the `cloneInto()` function. The `setOperands()` function that is used by the `cloneInto()` function requires all operands to not be null. But, that is not possible in this pass because we drop uses of dead values, thus making them null. It is only at the end of the pass that we are assured that such null values won't exist but during the execution of the pass, there could be null values. To fix this, we replace the usage of the `cloneInto()` function to copy a region with `moveBlock()` to move each block of the region one by one. This function does not require the presence of non-null values and is thus the right choice here. This implementation is also more opttimized because we are moving things instead of copying them. The goal was always moving. Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com> Reviewed By: srishti-pm Differential Revision: https://reviews.llvm.org/D158941	2023-08-26 19:50:24 +00:00
Srishti Srivastava	0e98fb9fad	[MLIR][transforms] Add an optimization pass to remove dead values Large deep learning models rely on heavy computations. However, not every computation is necessary. And, even when a computation is necessary, it helps if the values needed for the computation are available in registers (which have low-latency) rather than being in memory (which has high-latency). Compilers can use liveness analysis to:- (1) Remove extraneous computations from a program before it executes on hardware, and, (2) Optimize register allocation. Both these tasks help achieve one very important goal: reducing runtime. Recently, liveness analysis was added to MLIR. Thus, this commit uses the recently added liveness analysis utility to try to accomplish task (1). It adds a pass called `remove-dead-values` whose goal is optimization (reducing runtime) by removing unnecessary instructions. Unlike other passes that rely on local information gathered from patterns to accomplish optimization, this pass uses a full analysis of the IR, specifically, liveness analysis, and is thus more powerful. Currently, this pass performs the following optimizations: (A) Removes function arguments that are not live, (B) Removes function return values that are not live across all callers of the function, (C) Removes unneccesary operands, results, region arguments, region terminator operands of region branch ops, and, (D) Removes simple and region branch ops that have all non-live results and don't affect memory in any way, iff the IR doesn't have any non-function symbol ops, non-call symbol user ops and branch ops. Here, a "simple op" refers to an op that isn't a symbol op, symbol-user op, region branch op, branch op, region branch terminator op, or return-like. It is noteworthy that we do not refer to non-live values as "dead" in this file to avoid confusing it with dead code analysis's "dead", which refers to unreachable code (code that never executes on hardware) while "non-live" refers to code that executes on hardware but is unnecessary. Thus, while the removal of dead code helps little in reducing runtime, removing non-live values should theoretically have significant impact (depending on the amount removed). It is also important to note that unlike other passes (like `canonicalize`) that apply op-specific optimizations through patterns, this pass uses different interfaces to handle various types of ops and tries to cover all existing ops through these interfaces. It is because of its reliance on (a) liveness analysis and (b) interfaces that makes it so powerful that it can optimize ops that don't have a canonicalizer and even when an op does have a canonicalizer, it can perform more aggressive optimizations, as observed in the test files associated with this pass. Example of optimization (A):- ``` int add_2_to_y(int x, int y) { return 2 + y } print(add_2_to_y(3, 4)) print(add_2_to_y(5, 6)) ``` becomes ``` int add_2_to_y(int y) { return 2 + y } print(add_2_to_y(4)) print(add_2_to_y(6)) ``` Example of optimization (B):- ``` int, int get_incremented_values(int y) { store y somewhere in memory return y + 1, y + 2 } y1, y2 = get_incremented_values(4) y3, y4 = get_incremented_values(6) print(y2) ``` becomes ``` int get_incremented_values(int y) { store y somewhere in memory return y + 2 } y2 = get_incremented_values(4) y4 = get_incremented_values(6) print(y2) ``` Example of optimization (C):- Assume only `%result1` is live here. Then, ``` %result1, %result2, %result3 = scf.while (%arg1 = %operand1, %arg2 = %operand2) { %terminator_operand2 = add %arg2, %arg2 %terminator_operand3 = mul %arg2, %arg2 %terminator_operand4 = add %arg1, %arg1 scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3, %terminator_operand4 } do { ^bb0(%arg3, %arg4, %arg5): %terminator_operand6 = add %arg4, %arg4 %terminator_operand5 = add %arg5, %arg5 scf.yield %terminator_operand5, %terminator_operand6 } ``` becomes ``` %result1, %result2 = scf.while (%arg2 = %operand2) { %terminator_operand2 = add %arg2, %arg2 %terminator_operand3 = mul %arg2, %arg2 scf.condition(%terminator_operand1) %terminator_operand2, %terminator_operand3 } do { ^bb0(%arg3, %arg4): %terminator_operand6 = add %arg4, %arg4 scf.yield %terminator_operand6 } ``` It is interesting to see that `%result2` won't be removed even though it is not live because `%terminator_operand3` forwards to it and cannot be removed. And, that is because it also forwards to `%arg4`, which is live. Example of optimization (D):- ``` int square_and_double_of_y(int y) { square = y ^ 2 double = y * 2 return square, double } sq, do = square_and_double_of_y(5) print(do) ``` becomes ``` int square_and_double_of_y(int y) { double = y * 2 return double } do = square_and_double_of_y(5) print(do) ``` Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com> Reviewed By: matthiaskramm, Mogball, jcai19 Differential Revision: https://reviews.llvm.org/D157049	2023-08-23 23:54:44 +00:00
Tom Eccles	dea33c80d3	[mlir][Transforms] teach CSE about recursive memory effects Add support for reasoning about operations with recursive memory effects to CSE. The recursive effects are gathered by a helper function. I decided to allow returning duplicates from the helper function because there's no benefit to spending the computation time to remove them in the existing use case. Differential Revision: https://reviews.llvm.org/D156805	2023-08-10 09:40:01 +00:00
Mehdi Amini	363b655920	Finish renaming getOperandSegmentSizeAttr() from `operand_segment_sizes` to `operandSegmentSizes` This renaming started with the native ODS support for properties, this is completing it. A mass automated textual rename seems safe for most codebases. Drop also the ods prefix to keep the accessors the same as they were before this change: properties.odsOperandSegmentSizes reverts back to: properties.operandSegementSizes The ODS prefix was creating divergence between all the places and make it harder to be consistent. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D157173	2023-08-09 19:37:01 -07:00
Uday Bondhugula	b36de52c98	NFC. Move remaining affine/memref test cases into respective dialect dirs Move a bunch of lingering test cases from test/Transforms/ into test/Dialect/Affine and MemRef. Differential Revision: https://reviews.llvm.org/D155855	2023-07-21 22:36:01 +05:30
tomnatan	2109587cee	[MLIR] Don't sort operand of commutative ops when comparing two ops as there is a correctness issue This feature was introduced in `D123492`. Doing equivalence on pointers to sort operands of commutative operations is incorrect when checking equivalence of ops in separate regions (where the lhs and rhs operands are marked as equivalent but are not the same value). It was also discussed in `D123492` and `D129480` that the correct solution would be to stable sort the operands in canonicalization (based on some numbering in the region maybe), but until that lands, reverting this change will unblock us and other users. An example of a pass that might not work properly because of this is `DuplicateFunctionEliminationPass`. Reviewed By: mehdi_amini, jpienaar Differential Revision: https://reviews.llvm.org/D154699	2023-07-14 16:11:54 -07:00

1 2 3 4 5 ...

810 Commits