clang-p2996

Author	SHA1	Message	Date
MaheshRavishankar	a1bc979aa8	[mlir][Bufferization] Do not have read semantics for destination of `tensor.parallel_insert_slice`. (#134169 ) `tensor.insert_slice` needs to have read semantics on its destination operand. Since it has a return value, its semantics are - Copy dest to result - Copy source to subview of destination. `tensor.parallel_insert_slice` though has no result. So it does not need to have read semantics. The op description [here](`a3ac318e5f/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (L1524)`) also says that it is expected to lower to a `memref.subview`, that does not have read semantics on the destination (its just a view). This patch drops the read semantics for destination of `tensor.parallel_insert_slice` but also makes the `shared_outs` operands of `scf.forall` have read semantics. Earlier it would rely indirectly on read semantics of destination operand of `tensor.parallel_insert_slice` to propagate the read semantics for `shared_outs`. Now that is specified more directly. Fixes #133964 --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-04-03 09:47:36 -07:00
Kazu Hirata	d66af9c69b	[mlir] Use SetVector::insert_range (NFC) (#133595 )	2025-03-29 14:27:10 -07:00
Michael Liao	52975d5c9f	[mlir][scf] Allow different forwarding ordering in uplift - Allow 'before' arguments are forwarded in different order to 'after' body when uplifting `scf.while` to `scf.for`.	2025-03-27 18:09:07 -04:00
Kazu Hirata	1cc07a0865	[mlir] Use Set::insert_range (NFC) (#133043 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range); In some cases, we can further fold that into the set declaration.	2025-03-26 07:47:02 -07:00
MaheshRavishankar	e4172196a7	[mlir][TilingInterface] Make `tileAndFuseConsumerOfSlice` take surrounding loops as an argument. (#132082 ) This gets the consumer fusion method in sync with the corresponding producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as input required use of complicated analysis to retrieve the surrounding loops which are very fragile. Just like the producer fusion method, the loops need to be taken in as an argument, with typically the loops being created by the tiling methods. Some utilities are added to check that the loops passed in are perfectly nested (in the case of an `scf.for` loop nest. This is change 1 of N to simplify the implementation of tile and fuse consumers. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-03-24 11:41:26 -07:00
Longsheng Mou	ead2724600	[mlir][scf] Fix a div-by-zero bug when step of `scf.for` is zero (#131079 ) Fixes #130095.	2025-03-20 09:23:59 +08:00
lorenzo chelini	57dc71352c	[MLIR][Bufferization] Retire `enforce-aliasing-invariants` (#130929 ) Why? This option can lead to incorrect IR if used in isolation, for example, consider the IR below: ```mlir func.func @loop_with_aliasing(%arg0: tensor<5xf32>, %arg1: index, %arg2: index) -> tensor<5xf32> { %c1 = arith.constant 1 : index %cst = arith.constant 1.000000e+00 : f32 %0 = tensor.empty() : tensor<5xf32> %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<5xf32>) -> tensor<5xf32> // The BufferizableOpInterface says that %2 alias with %arg0 or be a newly // allocated buffer %2 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (tensor<5xf32>) { scf.yield %1 : tensor<5xf32> } %cst_0 = arith.constant 1.000000e+00 : f32 %inserted = tensor.insert %cst_0 into %1[%c1] : tensor<5xf32> return %2 : tensor<5xf32> } ``` If we bufferize with: enforce-aliasing-invariants=false, we get: ``` func.func @loop_with_aliasing(%arg0: memref<5xf32, strided<[?], offset: ?>>, %arg1: index, %arg2: index) -> memref<5xf32, strided<[?], offset: ?>> { %c1 = arith.constant 1 : index %cst = arith.constant 1.000000e+00 : f32 %alloc = memref.alloc() {alignment = 64 : i64} : memref<5xf32> linalg.fill ins(%cst : f32) outs(%alloc : memref<5xf32>) %0 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (memref<5xf32, strided<[?], offset: ?>>) { %cast = memref.cast %alloc : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>> scf.yield %cast : memref<5xf32, strided<[?], offset: ?>> } %cst_0 = arith.constant 1.000000e+00 : f32 memref.store %cst_0, %alloc[%c1] : memref<5xf32> return %0 : memref<5xf32, strided<[?], offset: ?>> } ``` Which is not correct IR since the loop yields the allocation. I am using this option. What do I need to do now? If you are using this option in isolation, you are possibly generating incorrect IR, so you need to revisit your bufferization strategy. If you are using it together with `copyBeforeWrite,` you simply need to retire the `enforceAliasingInvariants` option. Co-authored-by: Matthias Springer <mspringer@nvidia.com>	2025-03-18 08:42:43 +01:00
Matthias Springer	6c867e27a7	[mlir] Use `getSingleElement`/`hasSingleElement` in various places (#131460 ) This is a code cleanup. Update a few places in MLIR that should use `hasSingleElement`/`getSingleElement`. Note: `hasSingleElement` is faster than `.getSize() == 1` when it is used with linked lists etc. Depends on #131508.	2025-03-17 07:43:18 +01:00
Amir Bishara	ac8b5a9e47	[mlir][scf]-Fix reverse iterator overflow in loop traversal (#128421 ) Fix a bug in method `getUntiledProducerFromSliceSource` where address sanitizer fails compilation on heap buffer overflow for accessing value out of the iteration range. This PR fixes the issue and adds a lit test to reproduce it.	2025-03-02 14:08:10 +02:00
Kunwar Grover	91bbebc7e1	[mlir][scf] Add getPartialResultTilePosition to PartialReductionOpInterface (#120465 ) This PR adds a new interface method to PartialReductionOpInterface which allows it to query the result tile position for the partial result. Previously, tiling the reduction dimension with SplitReductionOuterReduction when the result has transposed parallel dimensions would produce wrong results. Other fixes that were needed to make this PR work: - Instead of ad-hoc logic to decide where to place the new reduction dimensions in the partial result based on the iteration space, the reduction dimensions are always appended to the partial result tensor. - Remove usage of PartialReductionOpInterface in Mesh dialect. The implementation was trying to just get a neutral element, but ended up trying to use PartialReductionOpInterface for it, which is not right. It was also passing the wrong sizes to it.	2024-12-27 16:52:34 +00:00
Kunwar Grover	6e3631d0e3	[mlir][scf] Track replacements using a listener in TileAndFuse (#120999 ) This PR makes TileAndFuse explicitly track replacements using a listener instead of assuming that the results always come from the outer most tiling loop. scf::tileUsingInterface can introduce merge operations whose results are the actual replacements to use, instead of the outer most loop results.	2024-12-24 18:01:41 +00:00
Jacques Pienaar	09dfc5713d	[mlir] Enable decoupling two kinds of greedy behavior. (#104649 ) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.	2024-12-20 08:15:48 -08:00
Kunwar Grover	4b56345895	[mlir][SCF] Unify tileUsingFor and tileReductionUsingFor implementation (#120115 ) This patch unifies the tiling implementation for tileUsingFor and tileReductionUsingFor. This is done by passing an addition option to SCFTilingOptions, allowing it to set how reduction dimensions should be tiled. Currently, there are 3 different options for reduction tiling: FullReduction (old tileUsingFor), PartialReductionOuterReduction (old tileReductionUsingFor) and PartialReductionOuterParallel (linalg::tileReductionUsingForall, this isn't implemented in this patch). The patch makes tileReductionUsingFor use the tileUsingFor implementation with the new reduction tiling options. There are no test changes because the implementation was doing almost the exactly same thing. This was also tested in IREE (which uses both these APIs heavily) and there were no test changes.	2024-12-18 13:24:47 +00:00
Matthias Springer	9df63b2651	[mlir][Transforms] Add 1:N `matchAndRewrite` overload (#116470 ) This commit adds a new `matchAndRewrite` overload to `ConversionPattern` to support 1:N replacements. This is the first of two main PRs that merge the 1:1 and 1:N dialect conversion drivers. The existing `matchAndRewrite` function supports only 1:1 replacements, as can be seen from the `ArrayRef<Value>` parameter. ```c++ LogicalResult ConversionPattern::matchAndRewrite( Operation op, ArrayRef<Value> operands /adaptor values/, ConversionPatternRewriter &rewriter) const; ``` This commit adds a `matchAndRewrite` overload that is called by the dialect conversion driver. By default, this new overload dispatches to the original 1:1 `matchAndRewrite` implementation. Existing `ConversionPattern`s do not need to be changed as long as there are no 1:N type conversions or value replacements. ```c++ LogicalResult ConversionPattern::matchAndRewrite( Operation op, ArrayRef<ValueRange> operands /adaptor values/, ConversionPatternRewriter &rewriter) const { // Note: getOneToOneAdaptorOperands produces a fatal error if at least one // ValueRange has 0 or more than 1 value. return matchAndRewrite(op, getOneToOneAdaptorOperands(operands), rewriter); } ``` The `ConversionValueMapping`, which keeps track of value replacements and materializations, still does not support 1:N replacements. We still rely on argument materializations to convert N replacement values back into a single value. The `ConversionValueMapping` will be generalized to 1:N mappings in the second main PR. Before handing the adaptor values to a `ConversionPattern`, all argument materializations are "unpacked". The `ConversionPattern` receives N replacement values and does not see any argument materializations. This implementation strategy allows us to use the 1:N infrastructure/API in `ConversionPattern`s even though some functionality is still missing in the driver. This strategy was chosen to keep the sizes of the PRs smaller and to make it easier for downstream users to adapt to API changes. This commit also updates the the "decompose call graphs" transformation and the "sparse tensor codegen" transformation to use the new 1:N `ConversionPattern` API. Note for LLVM conversion: If you are using a type converter with 1:N type conversion rules or if your patterns are performing 1:N replacements (via `replaceOpWithMultiple` or `applySignatureConversion`), conversion pattern applications will start failing (fatal LLVM error) with this error message: `pattern 'name' does not support 1:N conversion`. The name of the failing pattern is shown in the error message. These patterns must be updated to the new 1:N `matchAndRewrite` API.	2024-11-30 09:27:47 +09:00
Christopher Bate	ced2fc7819	[mlir][bufferization] Fix OneShotBufferize when `defaultMemorySpaceFn` is used (#91524 ) As described in issue llvm/llvm-project#91518, a previous PR llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into bufferization options, allowing one to inform OneShotBufferize that it should use a specified function to derive the memory space attribute from the encoding attribute attached to tensor types. However, introducing this feature exposed unhandled edge cases, examples of which are introduced by this change in the new test under `test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`. Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is pretty simple. This change: - Updates the `bufferization.to_memref` and `bufferization.to_tensor` operations to explicitly include operand and destination types, whereas previously they relied on type inference to deduce the tensor types. Since the type inference cannot recover the correct tensor encoding/memory space, the operand and result types must be explicitly included. This is a small assembly format change, but it touches a large number of test files. - Makes minor updates to other bufferization functions to handle the changes in building the above ops. - Updates bufferization of `tensor.from_elements` to handle memory space. Integration/upgrade guide: In downstream projects, if you have tests or MLIR files that explicitly use `bufferization.to_tensor` or `bufferization.to_memref`, then update them to the new assembly format as follows: ``` %1 = bufferization.to_memref %0 : memref<10xf32> %2 = bufferization.to_tensor %1 : memref<10xf32> ``` becomes ``` %1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32> %2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32> ```	2024-11-26 09:45:57 -07:00
Max191	8cc616bc71	[mlir] Clamp UnPackOp tiling sizes from operand tile (#112429 ) The `getIterationDomainTileFromOperandTile` implementation for tensor.unpack did not clamp sizes when the unpack op had extract_slice semantics. This PR fixes the bug. The PR also makes a minor change to `tileAndFuseConsumerOfSlice`. When replacing DPS inits, the iteration domain is needed, and it is computed from the tiled version of the operation after the initial tiling transformation. This can result in some extra indexing computation, so the PR changes it to use the original full sized cloned consumer op. --------- Signed-off-by: Max Dawkins <max.dawkins@gmail.com>	2024-11-13 09:49:19 -05:00
Quinn Dawkins	54ae9e7bba	[mlir][SCF] Fix condition for fusability in consumer fusion API (#115768 ) It was previously allowing either a tilable or dps op to be fused. Both are required for consumer fusion.	2024-11-11 16:44:24 -05:00
Yun-Fly	9bc3102bea	[mlir][scf] Extend consumer fusion to multiple tilable users (#111955 ) Before, consumer fusion expects single usage(or others are terminator op). This patch supports multiple tilable consumers fusion. E.g. ``` %0 = scf.for { ... %p = tiledProducer ... } %1 = tilableConsumer1 ins(%0 : ...) %2 = tilableConsumer2 ins(%0 : ...) ``` ===> ``` %0:3 = scf.for { ... %p = tiledProducer %1 = tiledConsumer1 ins(%p : ...) %2 = tiledConsumer2 ins(%p : ...) ... } ``` The key process is ensuring that the first user of loop should not dominate any define of consumer operand(s).	2024-11-06 10:03:23 +08:00
Hugo Trachino	a9c417c28a	[MLIR][SCF] Fix LoopPeelOp documentation (NFC) (#113179 ) As an example, I added annotations to the peel_front unit test. ``` func.func @loop_peel_first_iter_op() { // CHECK: %[[C0:.+]] = arith.constant 0 // CHECK: %[[C41:.+]] = arith.constant 41 // CHECK: %[[C5:.+]] = arith.constant 5 // CHECK: %[[C5_0:.+]] = arith.constant 5 // CHECK: scf.for %{{.+}} = %[[C0]] to %[[C5_0]] step %[[C5]] // CHECK: arith.addi // CHECK: scf.for %{{.+}} = %[[C5_0]] to %[[C41]] step %[[C5]] // CHECK: arith.addi %0 = arith.constant 0 : index %1 = arith.constant 41 : index %2 = arith.constant 5 : index scf.for %i = %0 to %1 step %2 { arith.addi %i, %i : index } return } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["arith.addi"]} in %arg1 : (!transform.any_op) -> !transform.any_op %1 = transform.get_parent_op %0 {op_name = "scf.for"} : (!transform.any_op) -> !transform.op<"scf.for"> %main_loop, %remainder = transform.loop.peel %1 {peel_front = true} : (!transform.op<"scf.for">) -> (!transform.op<"scf.for">, !transform.op<"scf.for">) transform.annotate %main_loop "main_loop" : !transform.op<"scf.for"> transform.annotate %remainder "remainder" : !transform.op<"scf.for"> transform.yield } } ``` Gives : ``` func.func @loop_peel_first_iter_op() { %c0 = arith.constant 0 : index %c41 = arith.constant 41 : index %c5 = arith.constant 5 : index %c5_0 = arith.constant 5 : index scf.for %arg0 = %c0 to %c5_0 step %c5 { %0 = arith.addi %arg0, %arg0 : index } {remainder} // The first iteration loop (second result) has been annotated remainder scf.for %arg0 = %c5_0 to %c41 step %c5 { %0 = arith.addi %arg0, %arg0 : index } {main_loop} // The main loop (first result) has been annotated main_loop return } ``` --------- Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>	2024-10-29 15:47:13 +00:00
Matthias Springer	1549a0c183	[mlir][SCF] Remove `scf-bufferize` pass (#113840 ) The dialect conversion-based bufferization passes have been migrated to One-Shot Bufferize about two years ago. To clean up the code base, this commit removes the `scf-bufferize` pass, one of the few remaining parts of the old infrastructure. Most bufferization passes have already been removed. Note for LLVM integration: If you depend on this pass, migrate to One-Shot Bufferize or copy the pass to your codebase.	2024-10-29 09:10:30 +09:00
SJW	8da5aa16f6	[mlir][SCF] Fix dynamic loop pipeline peeling for num_stages > total_iters (#112418 ) When pipelining an `scf.for` with dynamic loop bounds, the epilogue ramp-down must align with the prologue when num_stages > total_iterations. For example: ``` scf.for (0..ub) { load(i) add(i) store(i) } ``` When num_stages=3 the pipeline follows: ``` load(0) - add(0) - scf.for (0..ub-2) - store(ub-2) load(1) - - add(ub-1) - store(ub-1) ``` The trailing `store(ub-2)`, `i=ub-2`, must align with the ramp-up for `i=0` when `ub < num_stages-1`, so the index `i` should be `max(0, ub-2)` and each subsequent index is an increment. The predicate must also handle this scenario, so it becomes `predicate[0] = total_iterations > epilogue_stage`.	2024-10-15 13:13:49 -07:00
Sasha Lopoukhine	36a405519b	[mlir][SCF] Multiply lower bound in loop range folding (#111875 ) Fixes #83482	2024-10-14 20:15:12 +02:00
BARRET	1666d13078	[CMake]: Remove unnecessary dependencies on LLVM/MLIR (#111255 ) Previous https://github.com/llvm/llvm-project/pull/110362 (reverted) caused breakage. Here is the PR with fix. My build cmdline: ``` cmake ../llvm \ -G Ninja \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=install \ -DCMAKE_C_COMPILER=gcc-9 \ -DCMAKE_CXX_COMPILER=g++-9 \ -DCMAKE_CUDA_COMPILER=$(which nvcc) \ -DLLVM_ENABLE_LLD=OFF \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_BUILD_EXAMPLES=ON \ -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \ -DLLVM_CCACHE_BUILD=ON \ -DMLIR_ENABLE_BINDINGS_PYTHON=ON \ -DBUILD_SHARED_LIBS=ON \ -DLLVM_ENABLE_PROJECTS='llvm;mlir' ```	2024-10-07 15:52:43 +02:00
Matthias Springer	206fad0e21	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250 ) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally.	2024-10-05 21:32:40 +02:00
Quinn Dawkins	9144fed31b	[mlir] Add option for a cleanup pattern set to SCF tiling helper (#109554 ) The SCF helper for tiling an operation implementing the TilingInterface and greedily fusing consumers requires an uninterrupted chain of operations implementing the tiling interface to succeed. There can be cases with intermediate ops that don't implement the interface but have producers that could be fused if various canonicalization/simplification patterns could run in between fusion steps. This adds an option to SCFTileAndFuseOptions for a pattern set to run between fusion steps to the ops that result from fusion/tiling. Removed and newly inserted slices are tracked for continued fusion applications. See this RFC for more discussion: https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155	2024-10-04 14:42:55 -04:00
Mehdi Amini	8b47711e84	Revert "CMake: Remove unnecessary dependencies on LLVM/MLIR" (#110594 ) Reverts llvm/llvm-project#110362 Multiple bots are broken.	2024-10-01 00:44:21 +02:00
BARRET	4980f2177e	CMake: Remove unnecessary dependencies on LLVM/MLIR (#110362 ) There are some spurious libraries which can be removed. I'm trying to bundle MLIR/LLVM library dependencies for our own libraries. We're utilizing cmake function to recursively collect MLIR/LLVM related dependencies. However, we identified certain library dependencies as redundant and safe for removal.	2024-09-30 23:57:13 +02:00
Abhishek Varma	b8c974f093	[MLIR][TilingInterface] Extend consumer fusion for multi-use of producer shared by terminator ops (#110105 ) -- This commit extends consumer fusion to take place even if the producer has multiple uses. -- The multiple uses of the producer essentially means that besides the consumer op in concern, the only other uses of the producer are allowed in :- 1. scf.yield 2. tensor.parallel_insert_slice Signed-off-by: Abhishek Varma <abhvarma@amd.com>	2024-09-30 14:51:06 +05:30
MaheshRavishankar	cca32174fe	[mlir][SCF] Use Affine ops for indexing math. (#108450 ) For index type of induction variable, the indexing math is better represented using affine ops such as `affine.delinearize_index`. This also further demonstrates that some of these `affine` ops might need to move to a different dialect. For one these ops only support `IndexType` when they should be able to work with any integer type. This change also includes some canonicalization patterns for `affine.delinearize_index` operation to 1) Drop unit `basis` values 2) Remove the `delinearize_index` op when the `linear_index` is a loop induction variable of a normalized loop and the `basis` is of size 1 and is also the upper bound of the normalized loop. --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2024-09-27 18:25:41 -07:00
SJW	7645d9c77d	[mlir][scf] Fix loop iteration calculation for negative step in LoopPipelining (#110035 ) This fixes loop iteration count calculation if the step is a negative value, where we should adjust the added delta from `step-1` to `step+1` when doing the ceil div.	2024-09-25 13:32:12 -07:00
SJW	fa089b014b	[SCF] Fixed epilogue predicates in loop pipelining (#108964 ) The computed loop iteration is zero based, so only check it is less than zero. This fixes the case when lower bound is not zero.	2024-09-23 22:06:19 -07:00
Adrian Kuegel	b3a2208c56	[mlir] Apply ClangTidy fixes. - Prefer to check empty() instead of size() == 0. - Remove unused using declarations.	2024-09-17 11:02:20 +00:00
MaheshRavishankar	d5f0969c96	[mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882 ) Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF` looks at operands of tiled/tiled+fused operations to see if they are produced by `extract_slice` operations to populate the worklist used to continue fusion. This implicit assumption does not always work. Instead make the implementations of `getTiledImplementation` return the slices to use to continue fusion. This is a breaking change - To continue to get the same behavior of `scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree implementation of `TilingInterface::getTiledImplementation` to return the slices to continue fusion on. All in-tree implementations have been adapted to this. - This change touches parts that required a simplification to the `ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a `std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that should be `std::nullopt` if fusion is not to be performed. Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>	2024-09-11 22:15:43 -07:00
Yun-Fly	a9ba1b6dd5	[mlir][scf] Extend consumer fuse to single nested `scf.for` (#108318 ) Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ``` Compared with #94190, this PR fix build failure by making C++17 happy.	2024-09-12 12:01:23 +08:00
Kazu Hirata	335538c271	Revert "[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190 )" This reverts commit `2d4bdfba96`. A build breakage is reported at: https://lab.llvm.org/buildbot/#/builders/138/builds/3524	2024-09-11 19:18:37 -07:00
Yun-Fly	2d4bdfba96	[mlir][scf] Extend consumer fuse to single nested `scf.for` (#94190 ) Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ```	2024-09-12 10:02:57 +08:00
SJW	18926666f5	[MLIR][SCF] Loop pipelining fails on failed predication (no assert) (#107442 ) The SCFLoopPipelining allows predication on peeled or loop ops. When the predicationFn returns a nullptr this signifies the op type is unsupported and the pipeliner fails except in `emitPrologue` where it asserts. This patch fixes handling in the prologue to gracefully fail.	2024-09-05 11:46:18 -07:00
SJW	ebf0599314	[MLIR][SCF] Add support for loop pipeline peeling for dynamic loops. (#106436 ) Allow speculative execution and predicate results per stage.	2024-09-04 12:24:58 -07:00
pawelszczerbuk	7c9008115a	[SCF][PIPELINE] Handle the case when values from the peeled prologue may escape out of the loop (#105755 ) Previously the values in the peeled prologue that weren't treated with the `predicateFn` were passed to the loop body without any other predication. If those values are later used outside of the loop body, they may be incorrect if the num iterations is smaller than num stages - 1. We need similar masking for those, as is done in the main loop body, using already existing predicates.	2024-08-23 08:23:11 -07:00
MaheshRavishankar	6740d701bd	[mlir][Linalg] Deprecate `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` (#91878 ) The implementation of these methods are legacy and they are removed in favor of using the `scf::tileUsingSCF` methods as replacements. To get the latter on par with requirements of the deprecated methods, the tiling allows one to specify the maximum number of tiles to use instead of specifying the tile sizes. When tiling to `scf.forall` this specification is used to generate the `num_threads` version of the operation. A slight deviation from previous implementation is that the deprecated method always generated the `num_threads` variant of the `scf.forall` operation. Instead now this is driven by the tiling options specified. This reduces the indexing math generated when the tile sizes are specified. Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF` ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> numThreads; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setNumThreads(numThreads); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /note the difference that setMapping takes an ArrayRef<Attribute> / FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` This generates the `numThreads` version of the `scf.forall` for the inter-tile loops, i.e. ``` ... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...) ``` Moving from `linalg::tileToForallOpUsingTileSizes` to `scf::tileUsingSCF` ``` OpBuilder b; TilingInterface op; ArrayRef<OpFoldResult> tileSizes; ArrayAttr mapping; FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping); ``` can be replaced by ``` scf::SCFTilingOptions options; options.setTileSizes(tileSizes); options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp); options.setMapping(mapping.getValue()); /note the difference that setMapping takes an ArrayRef<Attribute> / FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options); ``` Also note that `linalg::tileToForallOpUsingTileSizes` would effectively call the `linalg::tileToForallOp` by computing the `numThreads` from the `op` and `tileSizes` and generate the `numThreads` version of the `scf.forall`. That is not the case anymore. Instead this will directly generate the `tileSizes` version of the `scf.forall` op ``` ... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...) ``` If you actually want to use the `numThreads` version, it is upto the caller to compute the `numThreads` and set `options.setNumThreads` instead of `options.setTileSizes`. Note that there is a slight difference in the num threads version and tile size version. The former requires an additional `affine.max` on the tile size to ensure non-negative tile sizes. When lowering to `numThreads` version this `affine.max` is not needed since by construction the tile sizes are non-negative. In previous implementations, the `numThreads` version generated when using the `linalg::tileToForallOpUsingTileSizes` method would avoid generating the `affine.max` operation. To get the same state, downstream users will have to additionally normalize the `scf.forall` operation. Changes to `transform.structured.tile_using_forall` The transform dialect op that called into `linalg::tileToForallOp` and `linalg::tileToForallOpUsingTileSizes` have been modified to call `scf::tileUsingSCF`. The transform dialect op always generates the `numThreads` version of the `scf.forall` op. So when `tile_sizes` are specified for the transform dialect op, first the `tile_sizes` version of the `scf.forall` is generated by the `scf::tileUsingSCF` method which is then further normalized to get back to the same state. So there is no functional change to `transform.structured.tile_using_forall`. It always generates the `numThreads` version of the `scf.forall` op (as it did before this change). --------- Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2024-07-31 12:32:07 -07:00
Victor Perez	e8f07cdb57	[MLIR][SCF] Define `-scf-rotate-while` pass (#99850 ) Define SCF dialect patterns rotating `scf.while` loops leveraging existing `mlir::scf::wrapWhileLoopInZeroTripCheck`. `forceCreateCheck` is always `false` as the pattern would lead to an infinite recursion otherwise. This pattern rotates `scf.while` ops, mutating them from "while" loops to "do-while" loops. A guard checking the condition for the first iteration is inserted. Note this guard can be optimized away if the compiler can prove the loop will be executed at least once. Using this pattern, the following while loop: ```mlir scf.while (%arg0 = %init) : (i32) -> i64 { %val = .., %arg0 : i64 %cond = arith.cmpi .., %arg0 : i32 scf.condition(%cond) %val : i64 } do { ^bb0(%arg1: i64): %next = .., %arg1 : i32 scf.yield %next : i32 } ``` Can be transformed into: ``` mlir %pre_val = .., %init : i64 %pre_cond = arith.cmpi .., %init : i32 scf.if %pre_cond -> i64 { %res = scf.while (%arg1 = %va0) : (i64) -> i64 { // Original after block %next = .., %arg1 : i32 // Original before block %val = .., %next : i64 %cond = arith.cmpi .., %next : i32 scf.condition(%cond) %val : i64 } do { ^bb0(%arg2: i64): %scf.yield %arg2 : i32 } scf.yield %res : i64 } else { scf.yield %pre_val : i64 } ``` The test pass for `wrapWhileLoopInZeroTripCheck` has been modified to use the new pattern when `forceCreateCheck=false`. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-07-30 10:06:01 +02:00
Alexander Belyaev	97a2bd8415	Revert "[mlir][loops] Reland Refactor LoopFuseSiblingOp and support parallel fusion #94391 (#97607 )" This reverts commit `edbc0e30a9`. Reason for rollback. ASAN complains about this PR: ==4320==ERROR: AddressSanitizer: heap-use-after-free on address 0x502000006cd8 at pc 0x55e2978d63cf bp 0x7ffe6431c2b0 sp 0x7ffe6431c2a8 READ of size 8 at 0x502000006cd8 thread T0 #0 0x55e2978d63ce in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr> mlir/include/mlir/IR/IRMapping.h:40:11 #1 0x55e2978d63ce in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:156:11 #2 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43 #3 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17 #4 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const, mlir::Operation, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56 #5 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14 #6 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48 #7 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15 #8 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10 #9 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const, mlir::Operation, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56 #10 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14 #11 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48 #12 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16 #13 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10 #14 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16 #15 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17 #16 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass, mlir::Operation, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #17 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #18 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7 #19 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass, mlir::Operation, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21 #20 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16 #21 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10 #22 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation) mlir/lib/Pass/Pass.cpp:884:60 #23 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17 #24 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9 #25 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12 #26 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #27 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #28 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16 #29 0x55e297b1c9c5 in interleave<const llvm::StringRef , (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3 #30 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3 #31 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3 #32 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10 #33 0x55e291eb115c in mlir::MlirOptMain(int, char, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14 #34 0x55e291eb15f8 in mlir::MlirOptMain(int, char, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10 #35 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33 #36 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94) #37 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 0x502000006cd8 is located 8 bytes inside of 16-byte region [0x502000006cd0,0x502000006ce0) freed by thread T0 here: #0 0x55e29130b7e2 in operator delete(void, unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:155:3 #1 0x55e2979eb657 in __libcpp_operator_delete<void , unsigned long> #2 0x55e2979eb657 in __do_deallocate_handle_size<> #3 0x55e2979eb657 in __libcpp_deallocate #4 0x55e2979eb657 in deallocate #5 0x55e2979eb657 in deallocate #6 0x55e2979eb657 in operator() #7 0x55e2979eb657 in ~vector #8 0x55e2979eb657 in mlir::Block::~Block() mlir/lib/IR/Block.cpp:24:1 #9 0x55e2979ebc17 in deleteNode llvm/include/llvm/ADT/ilist.h:42:39 #10 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:205:5 #11 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:209:39 #12 0x55e2979ebc17 in mlir::Block::erase() mlir/lib/IR/Block.cpp:67:28 #13 0x55e297aef978 in mlir::RewriterBase::eraseBlock(mlir::Block) mlir/lib/IR/PatternMatch.cpp:245:10 #14 0x55e297af0563 in mlir::RewriterBase::inlineBlockBefore(mlir::Block, mlir::Block, llvm::ilist_iterator<llvm::ilist_detail::node_options<mlir::Operation, false, false, void, false, void>, false, false>, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:331:3 #15 0x55e297af06d8 in mlir::RewriterBase::mergeBlocks(mlir::Block, mlir::Block, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:341:3 #16 0x55e297036608 in mlir::scf::ForOp::replaceWithAdditionalYields(mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) mlir/lib/Dialect/SCF/IR/SCF.cpp:575:12 #17 0x55e2970673ca in mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Model<mlir::scf::ForOp>::replaceWithAdditionalYields(mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Concept const, mlir::Operation, mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.h.inc:658:56 #18 0x55e2978d5feb in replaceWithAdditionalYields blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.cpp.inc:105:14 #19 0x55e2978d5feb in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:135:14 #20 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43 #21 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17 #22 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const, mlir::Operation, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56 #23 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14 #24 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48 #25 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15 #26 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10 #27 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const, mlir::Operation, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56 #28 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14 #29 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48 #30 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16 #31 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10 #32 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16 #33 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17 #34 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass, mlir::Operation, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #35 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #36 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7 #37 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass, mlir::Operation, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21 #38 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16 #39 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10 #40 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation) mlir/lib/Pass/Pass.cpp:884:60 #41 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17 #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9 #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12 #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16 #47 0x55e297b1c9c5 in interleave<const llvm::StringRef , (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3 #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3 #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3 #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10 #51 0x55e291eb115c in mlir::MlirOptMain(int, char, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14 previously allocated by thread T0 here: #0 0x55e29130ab5d in operator new(unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:86:3 #1 0x55e2979ed5d4 in __libcpp_operator_new<unsigned long> #2 0x55e2979ed5d4 in __libcpp_allocate #3 0x55e2979ed5d4 in allocate #4 0x55e2979ed5d4 in __allocate_at_least<std::__u::allocator<mlir::BlockArgument> > #5 0x55e2979ed5d4 in __split_buffer #6 0x55e2979ed5d4 in mlir::BlockArgument std::__u::vector<mlir::BlockArgument, std::__u::allocator<mlir::BlockArgument>>::__push_back_slow_path<mlir::BlockArgument const&>(mlir::BlockArgument const&) #7 0x55e2979ec0f2 in push_back #8 0x55e2979ec0f2 in mlir::Block::addArgument(mlir::Type, mlir::Location) mlir/lib/IR/Block.cpp:154:13 #9 0x55e29796e457 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2172:34 #10 0x55e29796e457 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7 #11 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16 #12 0x55e297035742 in mlir::scf::ForOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/SCF/IR/SCF.cpp:521:14 #13 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult ()(mlir::OpAsmParser&, mlir::OperationState&)>(void, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12 #14 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12 #15 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #16 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #17 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9 #18 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19 #19 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10 #20 0x55e297971d20 in parseBlockBody mlir/lib/AsmParser/Parser.cpp:2296:9 #21 0x55e297971d20 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block&) mlir/lib/AsmParser/Parser.cpp:2226:12 #22 0x55e29796e4f5 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2184:7 #23 0x55e29796e4f5 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7 #24 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16 #25 0x55e29796b2cf in (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1796:12 #26 0x55e2978d89ff in mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) mlir/lib/Interfaces/FunctionImplementation.cpp:232:14 #27 0x55e2969ba41d in mlir::func::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/Func/IR/FuncOps.cpp:203:10 #28 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult ()(mlir::OpAsmParser&, mlir::OperationState&)>(void, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12 #29 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12 #30 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #31 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #32 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9 #33 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19 #34 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10 #35 0x55e297959b78 in parse mlir/lib/AsmParser/Parser.cpp:2725:20 #36 0x55e297959b78 in mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block, mlir::ParserConfig const&, mlir::AsmParserState, mlir::AsmParserCodeCompleteContext) mlir/lib/AsmParser/Parser.cpp:2785:41 #37 0x55e29790d5c2 in mlir::parseSourceFile(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::Block, mlir::ParserConfig const&, mlir::LocationAttr) mlir/lib/Parser/Parser.cpp:46:10 #38 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp, const std::__u::shared_ptr<llvm::SourceMgr> &> mlir/include/mlir/Parser/Parser.h:159:14 #39 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp> mlir/include/mlir/Parser/Parser.h:189:10 #40 0x55e291ebbfe2 in mlir::parseSourceFileForTool(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::ParserConfig const&, bool) mlir/include/mlir/Tools/ParseUtilities.h:31:12 #41 0x55e291ebb263 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:383:33 #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9 #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12 #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16 #47 0x55e297b1c9c5 in interleave<const llvm::StringRef , (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3 #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3 #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3 #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10 #51 0x55e291eb115c in mlir::MlirOptMain(int, char, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14 #52 0x55e291eb15f8 in mlir::MlirOptMain(int, char, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10 #53 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33 #54 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94) #55 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120 SUMMARY: AddressSanitizer: heap-use-after-free mlir/include/mlir/IR/IRMapping.h:40:11 in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr> Shadow bytes around the buggy address: 0x502000006a00: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa 0x502000006a80: fa fa 00 fa fa fa 00 00 fa fa 00 00 fa fa 00 00 0x502000006b00: fa fa 00 00 fa fa 00 00 fa fa 00 fa fa fa 00 fa 0x502000006b80: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 00 0x502000006c00: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa fd fa =>0x502000006c80: fa fa fd fa fa fa fd fd fa fa fd[fd]fa fa fd fd 0x502000006d00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa 0x502000006d80: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa 0x502000006e00: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 fa 0x502000006e80: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa 0x502000006f00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==4320==ABORTING	2024-07-04 09:24:23 +02:00
srcarroll	edbc0e30a9	[mlir][loops] Reland Refactor LoopFuseSiblingOp and support parallel fusion #94391 (#97607 ) The refactor had a bug where the fused loop was inserted in an incorrect location. This patch fixes the bug and relands the original PR https://github.com/llvm/llvm-project/pull/94391. This patch refactors code related to LoopFuseSiblingOp transform in attempt to reduce duplicate common code. The aim is to refactor as much as possible to a functions on LoopLikeOpInterfaces, but this is still a work in progress. A full refactor will require more additions to the LoopLikeOpInterface. In addition, scf.parallel fusion support has been added.	2024-07-03 14:03:54 -05:00
srcarroll	4e78d3a6b1	Revert "Refactor LoopFuseSiblingOp and support parallel fusion (#94391 )" (#97523 ) This reverts commit `6820b08718`.	2024-07-03 01:27:19 -05:00
srcarroll	6820b08718	Refactor LoopFuseSiblingOp and support parallel fusion (#94391 ) This patch refactors code related to `LoopFuseSiblingOp` transform in attempt to reduce duplicate common code. The aim is to refactor as much as possible to a functions on `LoopLikeOpInterface`s, but this is still a work in progress. A full refactor will require more additions to the `LoopLikeOpInterface`. In addition, `scf.parallel` fusion support has been added.	2024-07-02 11:12:51 -05:00
Yun-Fly	7ef08eacd5	[mlir][scf] Extend option to yield replacement for multiple results case (#93144 ) This patch extends the functionality of yielding replacement for multiple results case and adds another optional argument called `yieldResultNumber` indicating which result(s) need yield. If not given, all of results will be yield by default.	2024-06-28 20:43:52 +08:00
MaheshRavishankar	b99d0b3440	[mlir][TilingInterface] Update `PartialReductionOpInterface` to get it more in line with `TilingInterface`. (#95460 ) The `TilingInterface` methods have return values that allow the interface implementation to return multiple operations, and also return tiled values explicitly. This is to avoid the assumption that the interface needs to return a single operation and this operations result are the expected tiled values. Make the `PartialReductionOpInterface::tileToPartialReduction` return `TilingResult` as well for the same reason. Similarly make the `PartialReductionOpInterface::mergeReductions` also return a list of generated operations and values to use as replacements. This is just a refactoring to allow for deprecation of `linalg::tileReductionUsingForall` with `scf::tileReductionUsingSCF` method.	2024-06-18 09:07:29 -07:00
Ramkumar Ramachandra	0fb216fb2f	mlir/MathExtras: consolidate with llvm/MathExtras (#95087 ) This patch is part of a project to move the Presburger library into LLVM.	2024-06-11 23:00:02 +01:00
srcarroll	6b4c122847	[mlir][loops] Add getters for multi dim loop variables in `LoopLikeOpInterface` (#94516 ) This patch adds `getLoopInductionVars`, `getLoopLowerBounds`, `getLoopBounds`, `getLoopSteps` interface methods to `LoopLIkeOpInterface`. The corresponding single value versions have been moved to shared class declaration and have been implemented based on the new interface methods.	2024-06-07 18:25:43 -05:00
Spenser Bauman	0b665c3dd2	[mlir][scf] Implement conversion from scf.forall to scf.parallel (#94109 ) There is currently no path to lower scf.forall to scf.parallel with the goal of targeting the OpenMP dialect. In the SCF->ControlFlow conversion, scf.forall is briefly converted to scf.parallel, but the scf.parallel is lowered directly to a sequential loop. This makes experimenting with scf.forall for CPU execution difficult. This change factors out the rewrite in the SCF->ControlFlow pass into a utility function that can then be used in the SCF->ControlFlow lowering and via a separate -scf-forall-to-parallel pass. --------- Co-authored-by: Spenser Bauman <sabauma@fastmail>	2024-06-04 15:41:09 -04:00

1 2 3 4 5 ...

359 Commits