clang-p2996

Author	SHA1	Message	Date
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
Han-Chung Wang	899c2bed9e	[mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410 ) It is a trivial early-return case. If the cloned ops are not returned, it will generate `extract_slice` op that extracts the whole slice. However, it is not folded away. Early-return to avoid the case. E.g., ```mlir func.func @matmul_tensors( %arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2: tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op %1 = transform.structured.tile_using_for %0 [0, 0, 0] : (!transform.any_op) -> (!transform.any_op) transform.yield } } ``` Apply the transforms and canonicalize the IR: ``` mlir-opt --transform-interpreter -canonicalize input.mlir ``` we will get ```mlir module { func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32> %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32> %dim_1 = tensor.dim %arg1, %c1 : tensor<?x?xf32> %extracted_slice = tensor.extract_slice %arg0[0, 0] [%dim, %dim_0] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_2 = tensor.extract_slice %arg1[0, 0] [%dim_0, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_3 = tensor.extract_slice %arg2[0, 0] [%dim, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %0 = linalg.matmul ins(%extracted_slice, %extracted_slice_2 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } } ``` The revision early-return the case so we can get: ```mlir func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } ```	2023-12-19 09:14:43 -08:00
Ivan Butygin	c0d2ea9d42	[mlir][scf] Improve `scf.parallel` fusion pass (#75852 ) Abort fusion if memref load may alias write, but not the exact alias. Add alias check hook to `naivelyFuseParallelOps`, so user can customize alias checking. Use builtin alias analysis in `ParallelLoopFusion` pass.	2023-12-19 18:07:46 +03:00
Vivian	bd6a2452ae	[mlir][SCF] Add support for peeling the first iteration out of the loop (#74015 ) There is a use case that we need to peel the first iteration out of the for loop so that the peeled forOp can be canonicalized away and the fillOp can be fused into the inner forall loop. For example, we have nested loops as below ``` linalg.fill ins(...) outs(...) scf.for %arg = %lb to %ub step %step scf.forall ... ``` After the peeling transform, it is expected to be ``` scf.forall ... linalg.fill ins(...) outs(...) scf.for %arg = %(lb + step) to %ub step %step scf.forall ... ``` This patch makes the most use of the existing peeling functions and adds support for peeling the first iteration out of the loop.	2023-12-14 17:03:52 -08:00
Keren Zhou	e66f97e8a8	[mlir] Fix loop pipelining when the operand of `yield` is not defined in the loop body (#75423 )	2023-12-13 19:19:13 -08:00
Thomas Raoux	ef112833e1	[MLIR][SCF] Add support for pipelining dynamic loops (#74350 ) Support loops without static boundaries. Since the number of iteration is not known we need to predicate prologue and epilogue in case the number of iterations is smaller than the number of stages. This patch includes work from @chengjunlu	2023-12-10 22:32:11 -08:00
Thomas Raoux	19e068b048	[MLIR][SCF] Handle more cases in pipelining transform (#74007 ) -Fix case where an op is scheduled in stage 0 and used with a distance of 1 -Fix case where we don't peel the epilogue and a value not part of the last stage is used outside the loop.	2023-12-01 21:28:21 -08:00
MaheshRavishankar	ec1086f2a0	Fix build error from #72178 (#72905 )	2023-11-20 23:09:59 -08:00
Jie Fu	3e6ae77950	[mlir] Non-void lambda does not return a value in all control paths in yieldReplacementForFusedProducer (NFC) /llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp:703:5: error: non-void lambda does not return a value in all control paths [-Werror,-Wreturn-type] }; ^ 1 error generated.	2023-11-21 09:11:50 +08:00
MaheshRavishankar	4a020018ce	[NFC] Simplify the tiling implementation using cloning. (#72178 ) The current implementation of tiling using `scf.for` is convoluted to make sure that the destination passing style of the untiled program is preserved. The addition of support to tile using `scf.forall` (adapted from the transform operation in Linalg) in https://github.com/llvm/llvm-project/pull/67083 used cloning of the tiled operations to better streamline the implementation. This PR adapts the other tiling methods to use a similar approach, making the transformations (and handling destination passing style semantics) more systematic. --------- Co-authored-by: Abhishek-Varma <avarma094@gmail.com>	2023-11-20 09:05:48 -08:00
Matthias Springer	96901f1b02	[mlir][SCF] Do not peel already peeled loops (#71900 ) Loop peeling is not beneficial if the step size already divides "ub - lb". There are currently some simple checks to prevent peeling in such cases when lb, ub, step are constants. This commit adds support for IR that is the result of loop peeling in the general case; i.e., lb, ub, step do not necessarily have to be constants. This change adds a new affine_map simplification rule for semi-affine maps that appear during loop peeling and are guaranteed to evaluate to a constant zero. Affine maps such as: ``` (1) affine_map<()[ub, step] -> ((ub - ub mod step) mod step) (2) affine_map<()[ub, lb, step] -> ((ub - (ub - lb) mod step - lb) mod step) (3) ^ may contain additional summands ``` Other affine maps with modulo expressions are not supported by the new simplification rule. This fixes #71469.	2023-11-16 11:47:57 +09:00
long.chen	1609f1c2a5	[mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269 ) detail see the docment: https://mlir.llvm.org/deprecation/ Not all changes are made manually, most of them are made through a clang tool I wrote https://github.com/lipracer/cpp-refactor.	2023-11-14 13:01:19 +08:00
Matthias Springer	98a6edd38f	[mlir][Interfaces] `LoopLikeOpInterface`: Expose tied loop results (#70535 ) Expose loop results, which correspond to the region iter_arg values that are returned from the loop when there are no more iterations. Exposing loop results is optional because some loops (e.g., `scf.while`) do not have a 1-to-1 mapping between region iter_args and op results. Also add additional helper functions to query tied results/iter_args/inits.	2023-11-01 08:34:14 +09:00
Matthias Springer	3cd2a0bc1a	[mlir][Interfaces] `LoopLikeOpInterface`: Add helpers to query tied inits/iter_args (#70408 ) The `LoopLikeOpInterface` already has interface methods to query inits and iter_args. This commit adds helper functions to query tied init/iter_arg pairs and removes the corresponding functions for `scf::ForOp`.	2023-10-28 12:10:36 +09:00
Justin Fargnoli	e38c8bdca8	[MLIR][scf.parallel] Don't allow a tile size of 0 (#68762 ) Fix a crash reported in #64331. The crash is described in the following comment: > It looks like the bug is being caused by the command line argument --scf-parallel-loop-tiling=parallel-loop-tile-sizes=0. More specifically, --scf-parallel-loop-tiling=parallel-loop-tile-sizes sets the tileSize variable to 0 on [this line](`7cc1bfaf37/mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp (L67)`). tileSize is then used on [this line](`7cc1bfaf37/mlir/lib/Dialect/SCF/Transforms/ParallelLoopTiling.cpp (L117)`) causing a divide by zero exception. This PR will: 1. Call `signalPassFail()` when 0 is passed as a tile size. 2. Avoid the divide by zero that causes the crash. Note: This is my first PR for MLIR, so please liberally critique it.	2023-10-23 14:28:44 -07:00
Adrian Kuegel	1c27899e24	[mlir][SCF] Pass result of getAsOpFoldResult to getBoundedTileSize. A recent change modified the parameter tileSize from Value to OpFoldResult. Therefore we should call getAsOpFoldResult before passing on the tileSize. Adjust a test regarding this new behavior.	2023-10-20 10:25:32 +00:00
MaheshRavishankar	d871daea81	[mlir][TilingInterface] Add scf::tileUsingSCFForallOp method to tile using the interface to generate `scf::forall`. (#67083 ) Similar to `scf::tileUsingSCFForOp` that is a method that tiles operations that implement the `TilingInterface`, using `scf.for` operations, this method introduces tiling of operations using `scf.forall`. Most of this implementation is derived from `linalg::tileToForallOp` method. Eventually that method will either be deprecated or moved to use the method introduced here.	2023-10-19 23:21:45 -07:00
Matthias Springer	ab737a8699	[mlir][Interfaces] `LoopLikeOpInterface`: Add helper to get yielded values (#67305 ) Add a new interface method that returns the yielded values. Also add a verifier that checks the number of inits/iter_args/yielded values. Most of the checked invariants (but not all of them) are already covered by the `RegionBranchOpInterface`, but the `LoopLikeOpInterface` now provides (additional) error messages that are easier to read.	2023-10-16 08:45:48 +09:00
Matthias Springer	8823e961f6	[mlir][ODS] Change `get...Mutable` to return `OpOperand &` for single operands (#66519 ) The TableGen code generator now generates C++ code that returns a single `OpOperand &` for `get...Mutable` of operands that are not variadic and not optional. `OpOperand::set`/`assign` can be used to set a value (same as `MutableOperandRange::assign`). This is safer than `MutableOperandRange` because only single values (and no longer `ValueRange`) can be assigned. E.g.: ``` // Assignment of multiple values to non-variadic operand. // Before: Compiles, but produces invalid op. // After: Compilation error. extractSliceOp.getSourceMutable().assign({v1, v2}); ```	2023-10-04 08:35:40 +02:00
Matthias Springer	173fd67a12	[mlir][scf][bufferize] Improve bufferization of allocs yielded from `scf.for` (#68089 ) The `BufferizableOpInterface` implementation of `scf.for` currently assumes that an OpResult does not alias with any tensor apart from the corresponding init OpOperand. Newly allocated buffers (inside of the loop) are also allowed. The current implementation checks whether the respective init_arg and yielded value are equivalent. This is overly strict and causes extra buffer allocations/copies when yielding a new buffer allocation from a loop.	2023-10-03 16:08:50 +02:00
Matthias Springer	e52899ea52	[mlir][SCF] Bufferize scf.index_switch (#67666 ) Add the `BufferizableOpInterface` implementation of `scf.index_switch`.	2023-09-28 19:05:14 +02:00
Adrian Kuegel	d2b7a8e83e	[mlir] Partial revert of `93c42299bd` This part of the change was not NFC.	2023-09-27 06:27:04 +00:00
Matthias Springer	63086d6aa0	[mlir][Interfaces] `LoopLikeOpInterface`: Add `replaceWithAdditionalYields` (#67121 ) `affine::replaceForOpWithNewYields` and `replaceLoopWithNewYields` (for "scf.for") are now interface methods and additional loop-carried variables can now be added to "scf.for"/"affine.for" uniformly. (No more `TypeSwitch` needed.) Note: `scf.while` and other loops with loop-carried variables can implement `replaceWithAdditionalYields`, but to keep this commit small, that is not done in this commit.	2023-09-27 07:53:39 +02:00
MaheshRavishankar	93c42299bd	[mlir][TilingInterface] NFC code changes separated out from introduction of `scf::tileUsingSCFForallop`. (#67081 ) This patch contains NFC changes that are precursor to the introduction of `scf::tileUsingSCFForallOp` method introduced in https://github.com/llvm/llvm-project/pull/67083.	2023-09-26 13:42:27 -07:00
Matthias Springer	0b2197b0cf	[mlir][Interfaces] Clean up `DestinationStyleOpInterface` (#67015 ) * "init" operands are specified with `MutableOperandRange` (which gives access to the underlying `OpOperand `). No more magic numbers. Remove most interface methods and make them helper functions. Only `getInitsMutable` should be implemented. * Provide separate helper functions for accessing mutable/immutable operands (`OpOperand`/`Value`, in line with #66515): `getInitsMutable` and `getInits` (same naming convention as auto-generated op accessors). `getInputOperands` was not renamed because this function cannot return a `MutableOperandRange` (because the operands are not necessarily consecutive). `OpOperandVector` is no longer needed. * The new `getDpsInits`/`getDpsInitsMutable` is more efficient than the old `getDpsInitOperands` because no `SmallVector` is created. The new functions return a range of operands. * Fix a bug in `getDpsInputOperands`: out-of-bounds operands were potentially returned.	2023-09-21 18:04:08 +02:00
Martin Erhart	ba727ac219	[mlir][bufferization][scf] Implement BufferDeallocationOpInterface for scf.reduce.return (#66886 ) This is necessary to run the new buffer deallocation pipeline as part of the sparse compiler pipeline.	2023-09-20 14:19:13 +02:00
Martin Erhart	522c1d0eea	[mlir][gpu][bufferization] Implement BufferDeallocationOpInterface for gpu.terminator (#66880 ) This is necessary to support deallocation of IR with gpu.launch operations because it does not implement the RegionBranchOpInterface. Implementing the interface would require it to support regions with unstructured control flow and produced arguments/results.	2023-09-20 12:28:28 +02:00
Matthias Springer	9b5ef2bea8	[mlir][Interfaces] `LoopLikeOpInterface`: Support ops with multiple regions (#66754 ) This commit implements `LoopLikeOpInterface` on `scf.while`. This enables LICM (and potentially other transforms) on `scf.while`. `LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and can now return multiple regions. Also fix a bug in the default implementation of `LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false" for some values that are defined outside of the loop (in a nested op, in such a way that the value does not dominate the loop). This interface is currently only used for LICM and there is no way to trigger this bug, so no test is added.	2023-09-19 17:35:38 +02:00
Matthias Springer	d69293c1c8	[mlir][SCF] `ForOp`: Remove `getIterArgNumberForOpOperand` (#66629 ) This function was inconsistent with the remaining API because it accepted `OpOperand &` that do not belong to the op. All the other functions assert. This helper function is also not really necessary, as the iter_arg number is identical to the result number.	2023-09-19 17:33:40 +02:00
Matthias Springer	6923a31542	[mlir][IR] Change `MutableArrayRange` to enumerate `OpOperand &` (#66622 ) In line with #66515, change `MutableArrayRange::begin`/`end` to enumerate `OpOperand &` instead of `Value`. Also remove `ForOp::getIterOpOperands`/`setIterArg`, which are now redundant. Note: `MutableOperandRange` cannot be made a derived class of `indexed_accessor_range_base` (like `OperandRange`), because `MutableOperandRange::assign` can change the number of operands in the range.	2023-09-19 09:09:21 +02:00
MaheshRavishankar	170a25a793	[mlir][TilingInterface] Make the tiling set tile sizes function use `OpFoldResult`. (#66566 )	2023-09-18 17:18:51 -07:00
Martin Erhart	6bf043e743	[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619 ) This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the `allow-return-allocs` pass option will default to true now, `create-deallocs` defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. A new `allow-return-allocs-from-loops` option is added as a temporary workaround for some bufferization limitations.	2023-09-18 16:44:48 +02:00
Matthias Springer	0f952cfe24	[mlir][IR] Change `MutableOperandRange::operator[]` to return an `OpOperand &` (#66515 ) `operator[]` returns `OpOperand &` instead of `Value`. * This allows users to get OpOperands by name instead of "magic" number. E.g., `extractSliceOp->getOpOperand(0)` can be written as `extractSliceOp.getSourceMutable()[0]`. * `OperandRange` provides a read-only API to operands: `operator[]` returns `Value`. `MutableOperandRange` now provides a mutable API: `operator[]` returns `OpOperand &`, which can be used to set operands. Note: The TableGen code generator could be changed to return `OpOperand &` (instead of `MutableOperandRange`) for non-variadic and non-optional arguments in a subsequent change. Then the `[0]` part in the above example would no longer be necessary.	2023-09-18 09:43:03 +02:00
Matthias Springer	5cf714bb2f	[mlir][SCF] scf.for: Consistent API around `initArgs` (#66512 ) * Always use the auto-generated `getInitArgs` function. Remove the hand-written `getInitOperands` duplicate. * Remove `hasIterOperands` and `getNumIterOperands`. The names were inconsistent because the "arg" is called `initArgs` in TableGen. Use `getInitArgs().size()` instead. * Fix verification around ops with no results.	2023-09-18 09:13:43 +02:00
Christopher Bate	e2d39f799b	[mlir][Transform] Add `updateConversionTarget` to `ConversionPatternDescriptorOpInterface` This change adds a method to modify the ConversionTarget used during `transform.apply_conversion_patterns` to the `ConversionPatternDescriptorOpInterface`. This is needed when the TypeConverter is used to dictate the dynamic legality of operations, as in "structural" conversion patterns present in, for example, the SCF and func dialects. As a first use case/test, this change also adds a `transform.apply_patterns.scf.structural_conversions` operation to the SCF dialect. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D158672	2023-09-14 11:39:47 -06:00
Martin Erhart	66aa9a2517	[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel (#66351 ) The scf.forall.in_parallel terminator operation has a nested graph region with the NoTerminator trait. Such regions are not supported by the default implementations. Therefore, this commit adds a specialized implementation for this operation which only covers the case where the nested region is empty. This is because after bufferization, ops like tensor.parallel_insert_slice were already converted to memref operations residing int the scf.forall only and the nested region of scf.forall.in_parallel ends up empty.	2023-09-14 16:20:24 +02:00
Martin Erhart	c199f7dc62	Revert "[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute" This reverts commit `6a91dfedeb`. This caused problems in downstream projects. We are reverting to give them more time for integration.	2023-09-13 13:53:48 +00:00
Martin Erhart	ccb16acd46	Revert "[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel" This reverts commit `1356e853d4`. This caused problems in downstream projects. We are reverting to give them more time for integration.	2023-09-13 13:53:47 +00:00
Martin Erhart	1356e853d4	[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel The scf.forall.in_parallel terminator operation has a nested graph region with the NoTerminator trait. Such regions are not supported by the default implementations. Therefore, this commit adds a specialized implementation for this operation which only covers the case where the nested region is empty. This is because after bufferization, ops like tensor.parallel_insert_slice were already converted to memref operations residing int the scf.forall only and the nested region of scf.forall.in_parallel ends up empty. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D158979	2023-09-13 09:30:24 +00:00
Martin Erhart	6a91dfedeb	[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute This is the first commit in a series with the goal to rework the BufferDeallocation pass. Currently, this pass heavily relies on copies to perform correct deallocations, which leads to very slow code and potentially high memory usage. Additionally, there are unsupported cases such as returning memrefs which this series of commits aims to add support for as well. This first commit removes the deallocation capabilities of one-shot-bufferization.One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the buffer-deallocation pass going forward. This means the allow-return-allocs pass option will default to true now, create-deallocs defaults to false and they, as well as the escape attribute indicating whether a memref escapes the current region, will be removed. The documentation should w.r.t. these pass option changes should also be updated in this commit. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D156662	2023-09-13 09:30:22 +00:00
Matthias Springer	1e1a3112f1	[mlir][bufferization] Privatize buffers for parallel regions One-Shot Bufferize correctly handles RaW conflicts around repetitive regions (loops). Specical handling is needed for parallel regions. These are a special kind of repetitive regions that can have additional RaW conflicts that would not be present if the regions would be executed sequentially. Example: ``` %0 = bufferization.alloc_tensor() scf.forall ... { %1 = linalg.fill ins(...) outs(%0) ... scf.forall.in_parallel { tensor.parallel_insert_slice %1 into ... } } ``` A separate (private) buffer must be allocated for each iteration of the `scf.forall` loop. This change adds a new interface method to `BufferizableOpInterface` to detect parallel regions. By default, regions are assumed to be sequential. A buffer is privatized if an OpOperand bufferizes to a memory read inside a parallel region that is different from the parallel region where operand's value is defined. Differential Revision: https://reviews.llvm.org/D159286	2023-09-06 14:28:43 +02:00
Matthias Springer	6ecebb496c	[mlir][bufferization] Support unstructured control flow This revision adds support for unstructured control flow to the bufferization infrastructure. In particular: regions with multiple blocks, `cf.br`, `cf.cond_br`. Two helper templates are added to `BufferizableOpInterface.h`, which can be implemented by ops that supported unstructured control flow in their regions (e.g., `func.func`) and ops that branch to another block (e.g., `cf.br`). A block signature is always bufferized together with the op that owns the block. Differential Revision: https://reviews.llvm.org/D158094	2023-08-31 12:55:53 +02:00
Groverkss	2cc5f5d43c	[mlir][Linalg] Implement tileReductionUsingScf for multiple reductions This patch improves the reduction tiling for linalg to support multiple reduction dimensions. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D158005	2023-08-17 02:17:03 +05:30
Matthias Springer	878950b82c	[mlir][bufferization] Simplify `getBufferType` `getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop. To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument). Also improve error messages in case of inconsistent memory spaces inside of a loop. Differential Revision: https://reviews.llvm.org/D158060	2023-08-16 15:02:07 +02:00
Matthias Springer	a02ad6c177	[mlir][bufferization] Generalize getAliasingOpResults to getAliasingValues This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops. This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block. If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block. Differential Revision: https://reviews.llvm.org/D157957	2023-08-15 15:02:47 +02:00
Matthias Springer	7c74a2507c	[mlir][SCF][NFC] Add helper functions to get body of scf.while Add two new helper functions `getBeforeBody` and `getAfterBody` to be consistent with "scf.for" (`getBody`) and to show in the API that both regions have exactly one block. Also simplify some code that assumed that there can be more than one block in a region. Differential Revision: https://reviews.llvm.org/D157860	2023-08-14 14:57:09 +02:00
Alex Zinenko	4a6b31b8d8	[mlir] NFC: untangle SCF Patterns.h and Transforms.h These two headers both contained a strange mix of definitions related to both patterns and non-pattern transforms. Put patterns and "populate" functions into Patterns.h and standalone transforms into Transforms.h. Depends On: D155223 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155454	2023-07-18 11:27:36 +00:00
Alex Zinenko	371366ce27	[mlir][nvgpu] add simple pipelining for shared memory copies Add a simple transform operation to the NVGPU extension that performs software pipelining of copies to shared memory. The functionality is extremely minimalistic in this version and only supports copies from global to shared memory inside an `scf.for` loop with either `vector.transfer` or `nvgpu.device_async_copy` operations when pipelining preconditions are already satisfied in the IR. This is the minimally useful version that uses the more general loop pipeliner in an NVGPU-specific way. Further extensions and orthogonalizations will be necessary. This required a change to the loop pipeliner itself to properly propagate errors should the predicate generator fail. This is loosely inspired from the vesion in IREE, but has less unsafe assumptions and more principled way of communicating decisions. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155223	2023-07-17 14:29:12 +00:00
Spenser Bauman	272cf8f7b2	[mlir] Implement one-to-n structural conversion for ForOp Add the missing one-to-n structural type conversion pattern for the scf.for operation. Reviewed By: ingomueller-net Differential Revision: https://reviews.llvm.org/D154299	2023-07-07 15:50:54 +00:00
Ingo Müller	c000b403fc	[mlir] Avoid unnecessary copies in SCF's OneToNTypeConversions. (NFC) In two places, a ResultRange was copied into a SmallVector just to be passed as a ValueRange argument. With this patch, the ResultRanges are passed directly, avoiding a copy. Reviewed By: ingomueller-net Differential Revision: https://reviews.llvm.org/D154685	2023-07-07 09:15:30 +00:00

1 2 3 4 5 ...

287 Commits