clang-p2996

Author	SHA1	Message	Date
Kunwar Grover	1004865f1c	[mlir][Vector] Support 0-d vectors natively in TransferOpReduceRank (#112907 ) Since `ddf2d62c7d` , 0-d vectors are supported in VectorType. This patch removes 0-d vector handling with scalars for the TransferOpReduceRank pattern. This pattern specifically introduces tensor.extract_slice during vectorization, causing vectorization to not fold transfer_read/transfer_write slices properly. The changes in vectorization test files reflect this. There are other places where lowering patterns are still side-stepping from handling 0-d vectors properly, by turning them into scalars, but this patch only focuses on the vector.transfer_x patterns.	2024-10-22 15:50:16 +01:00
Andrzej Warzyński	56d6b56739	[mlir][vector] Relax the requirements on broadcast dims (#99341 ) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated () and will update the attribute value corresponding to broadcast dims to `true` if all non-broadcast dims are marked as "in bounds". Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in #97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from #97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.	2024-10-04 07:41:20 +01:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Benjamin Maxwell	27a713f5b0	[mlir][vector] Add scalable lowering for `transfer_write(transpose)` (#101353 ) This specifically handles the case of a transpose from a vector type like `vector<8x[4]xf32>` to `vector<[4]x8xf32>`. Such transposes occur fairly frequently when scalably vectorizing `linalg.generic`s. There is no direct lowering for these (as types like `vector<[4]x8xf32>` cannot be represented in LLVM-IR). However, if the only use of the transpose is a write, then it is possible to lower the `transfer_write(transpose)` as a VLA loop. Example: ```mlir %transpose = vector.transpose %vec, [1, 0] : vector<4x[4]xf32> to vector<[4]x4xf32> vector.transfer_write %transpose, %dest[%i, %j] {in_bounds = [true, true]} : vector<[4]x4xf32>, memref<?x?xf32> ``` Becomes: ```mlir %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %0 = vector.extract %arg0[0] : vector<[4]xf32> from vector<4x[4]xf32> %1 = vector.extract %arg0[1] : vector<[4]xf32> from vector<4x[4]xf32> %2 = vector.extract %arg0[2] : vector<[4]xf32> from vector<4x[4]xf32> %3 = vector.extract %arg0[3] : vector<[4]xf32> from vector<4x[4]xf32> %vscale = vector.vscale %c4_vscale = arith.muli %vscale, %c4 : index scf.for %idx = %c0 to %c4_vscale step %c1 { %4 = vector.extract %0[%idx] : f32 from vector<[4]xf32> %5 = vector.extract %1[%idx] : f32 from vector<[4]xf32> %6 = vector.extract %2[%idx] : f32 from vector<[4]xf32> %7 = vector.extract %3[%idx] : f32 from vector<[4]xf32> %slice_i = affine.apply #map(%idx)[%i] %slice = vector.from_elements %4, %5, %6, %7 : vector<4xf32> vector.transfer_write %slice, %arg1[%slice_i, %j] {in_bounds = [true]} : vector<4xf32>, memref<?x?xf32> } ```	2024-08-12 16:31:03 +01:00
Andrzej Warzyński	2ee5586ac7	[mlir][vector] Make the in_bounds attribute mandatory (#97049 ) At the moment, the in_bounds attribute has two confusing/contradicting properties: 1. It is both optional _and_ has an effective default-value. 2. The default value is "out-of-bounds" for non-broadcast dims, and "in-bounds" for broadcast dims. (see the `isDimInBounds` vector interface method for an example of this "default" behaviour [1]). This PR aims to clarify the logic surrounding the `in_bounds` attribute by: * making the attribute mandatory (i.e. it is always present), * always setting the default value to "out of bounds" (that's consistent with the current behaviour for the most common cases). #### Broadcast dimensions in tests As per [2], the broadcast dimensions requires the corresponding `in_bounds` attribute to be `true`: ``` vector.transfer_read op requires broadcast dimensions to be in-bounds ``` The changes in this PR mean that we can no longer rely on the default value in cases like the following (dim 0 is a broadcast dim): ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` Instead, the broadcast dimension has to explicitly be marked as "in bounds: ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` All tests with broadcast dims are updated accordingly. #### Changes in "SuperVectorize.cpp" and "Vectorization.cpp" The following patterns in "Vectorization.cpp" are updated to explicitly set the `in_bounds` attribute to `false`: * `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern` Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and `vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to make sure that xfer Ops created by these hooks set the dimension corresponding to broadcast dims as "in bounds". Otherwise, the Op verifier would complain Note that there is no mechanism to verify whether the corresponding memory access are indeed in bounds. Still, this is consistent with the current behaviour where the broadcast dim would be implicitly assumed to be "in bounds". [1] `4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)` [2] https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop	2024-07-16 16:49:52 +01:00
Matthias Springer	bb6d5c2200	[mlir][Transforms] `GreedyPatternRewriteDriver`: Do not CSE constants during iterations (#75897 ) The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply rewrite patterns to ops. It has special handling for constants: they are CSE'd and sometimes moved to parent regions to allow for additional CSE'ing. This happens in `OperationFolder`. To allow for efficient CSE'ing, `OperationFolder` maintains an internal lookup data structure to find the existing constant ops with the same value for each `IsolatedFromAbove` region: ```c++ /// A mapping between an insertion region and the constants that have been /// created within it. DenseMap<Region *, ConstantMap> foldScopes; ``` Rewrite patterns are allowed to modify operations. In particular, they may move operations (including constants) from one region to another one. Such an IR rewrite can make the above lookup data structure inconsistent. We encountered such a bug in a downstream project. This bug materialized in the form of an op that uses the result of a constant op from a different `IsolatedFromAbove` region (that is not accessible). This commit changes the behavior of the `GreedyPatternRewriteDriver` such that `OperationFolder` is used to CSE constants at the beginning of each iteration (as the worklist is populated), but no longer during an iteration. `OperationFolder` is no longer used after populating the worklist, so we do not have to care about inconsistent state in the `OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver` now performs the op folding by itself instead of calling `OperationFolder::tryToFold`. This change changes the order of constant ops in test cases, but not the region in which they appear. All broken test cases were fixed by turning `CHECK` into `CHECK-DAG`. Alternatives considered: The state of `OperationFolder` could be partially invalidated with every `notifyOperationModified` notification. That is more fragile than the solution in this commit because incorrect rewriter API usage can lead to missing notifications and hard-to-debug `IsolatedFromAbove` violations. (It did not fix the above mention bug in a downstream project, which could be due to incorrect rewriter API usage or due to another conceptual problem that I missed.) Moreover, ops are frequently getting modified during a greedy pattern rewrite, so we would likely keep invalidating large parts of the state of `OperationFolder` over and over. Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant ops are no longer folded during a greedy pattern rewrite. If you rely on folding (and rematerialization) of constant ops during a greedy pattern rewrite, turn the folder into a pattern.	2024-01-05 09:22:18 +01:00
Rik Huijzer	6b21948f26	[mlir][vector] Fix invalid `LoadOp` indices being created (#76292 ) Fixes https://github.com/llvm/llvm-project/issues/71326. This is the second PR. The first PR at https://github.com/llvm/llvm-project/pull/75519 was reverted because an integration test failed. The failed integration test was simplified and added to the core MLIR tests. Compared to the first PR, the current PR uses a more reliable approach. In summary, the current PR determines the mask indices by looking up the _mask_ buffer load indices from the previous iteration, whereas `main` looks up the indices for the _data_ buffer. The mask and data indices can differ when using a `permutation_map`. The cause of the issue was that a new `LoadOp` was created which looked something like: ```mlir func.func main(%arg1 : index, %arg2 : index) { %alloca_0 = memref.alloca() : memref<vector<1x32xi1>> %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>> %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>> return } ``` which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is the mask as can be seen from the `i1` element type and note it is 0 dimensional. Next, `%1` has one dimension, but `memref.load` tries to index it with two indices. This issue occured in the following code (a simplified version of the bug report): ```mlir #map1 = affine_map<(d0, d1, d2, d3) -> (d0, 0, 0, d3)> func.func @main(%subview: memref<1x1x1x1xi32>, %mask: vector<1x1xi1>) -> vector<1x1x1x1xi32> { %c0 = arith.constant 0 : index %c0_i32 = arith.constant 0 : i32 %3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %mask {permutation_map = #map1} : memref<1x1x1x1xi32>, vector<1x1x1x1xi32> return %3 : vector<1x1x1x1xi32> } ``` After this patch, it is lowered to the following by `-convert-vector-to-scf`: ```mlir func.func @main(%arg0: memref<1x1x1x1xi32>, %arg1: vector<1x1xi1>) -> vector<1x1x1x1xi32> { %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %alloca = memref.alloca() : memref<vector<1x1x1x1xi32>> %alloca_0 = memref.alloca() : memref<vector<1x1xi1>> memref.store %arg1, %alloca_0[] : memref<vector<1x1xi1>> %0 = vector.type_cast %alloca : memref<vector<1x1x1x1xi32>> to memref<1xvector<1x1x1xi32>> %1 = vector.type_cast %alloca_0 : memref<vector<1x1xi1>> to memref<1xvector<1xi1>> scf.for %arg2 = %c0 to %c1 step %c1 { %3 = vector.type_cast %0 : memref<1xvector<1x1x1xi32>> to memref<1x1xvector<1x1xi32>> scf.for %arg3 = %c0 to %c1 step %c1 { %4 = vector.type_cast %3 : memref<1x1xvector<1x1xi32>> to memref<1x1x1xvector<1xi32>> scf.for %arg4 = %c0 to %c1 step %c1 { %5 = memref.load %1[%arg2] : memref<1xvector<1xi1>> %6 = vector.transfer_read %arg0[%arg2, %c0, %c0, %c0], %c0_i32, %5 {in_bounds = [true]} : memref<1x1x1x1xi32>, vector<1xi32> memref.store %6, %4[%arg2, %arg3, %arg4] : memref<1x1x1xvector<1xi32>> } } } %2 = memref.load %alloca[] : memref<vector<1x1x1x1xi32>> return %2 : vector<1x1x1x1xi32> } ``` What was causing the problems is that one dimension of the data buffer `%alloca` (eltype `i32`) is unpacked (`vector.type_cast`) inside the outmost loop (loop with index variable `%arg2`) and the nested loop (loop with index variable `%arg3`), whereas the mask buffer `%alloca_0` (eltype `i1`) is not unpacked in these loops. Before this patch, the load indices would be determined by looking up the load indices for the data buffer load op. However, as shown in the specific example, when a permutation map is specified then the load indices from the data buffer load op start to differ from the indices for the mask op. To fix this, this patch ensures that the load indices for the mask buffer are used instead. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-01-03 13:46:52 +01:00
Rik Huijzer	9f5afc3de9	Revert "[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 )" This reverts commit `3a1ae2f46d`.	2023-12-17 12:34:17 +01:00
Rik Huijzer	3a1ae2f46d	[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 ) Fixes https://github.com/llvm/llvm-project/issues/71326. The cause of the issue was that a new `LoadOp` was created which looked something like: ```mlir %arg4 = func.func main(%arg1 : index, %arg2 : index) { %alloca_0 = memref.alloca() : memref<vector<1x32xi1>> %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>> %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>> return } ``` which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is 0 dimensional, `%1` has one dimension, but `memref.load` tries to index `%1` with two indices. This is now fixed by using the fact that `unpackOneDim` always unpacks one dim `1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)` and so the `loadOp` should just index only one dimension. --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-17 11:42:35 +01:00
Rik Huijzer	c84061fd34	[mlir][vector] Fix a `target-rank=0` unrolling (#73365 ) Fixes https://github.com/llvm/llvm-project/issues/64269. With this patch, calling `mlir-opt "-convert-vector-to-scf=full-unroll target-rank=0"` on ```mlir func.func @main(%vec : vector<2xi32>) { %alloc = memref.alloc() : memref<4xi32> %c0 = arith.constant 0 : index vector.transfer_write %vec, %alloc[%c0] : vector<2xi32>, memref<4xi32> return } ``` will result in ```mlir module { func.func @main(%arg0: vector<2xi32>) { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %alloc = memref.alloc() : memref<4xi32> %0 = vector.extract %arg0[0] : i32 from vector<2xi32> %1 = vector.broadcast %0 : i32 to vector<i32> vector.transfer_write %1, %alloc[%c0] : vector<i32>, memref<4xi32> %2 = vector.extract %arg0[1] : i32 from vector<2xi32> %3 = vector.broadcast %2 : i32 to vector<i32> vector.transfer_write %3, %alloc[%c1] : vector<i32>, memref<4xi32> return } } ``` I've also tried to proactively find other `target-rank=0` bugs, but couldn't find any. `options.targetRank` is only used 8 times throughout the `mlir` folder, all inside `VectorToSCF.cpp`. None of the other uses look like they could cause a crash. I've also tried ```mlir func.func @main(%vec : vector<2xi32>) -> vector<2xi32> { %alloc = memref.alloc() : memref<4xindex> %c0 = arith.constant 0 : index %out = vector.transfer_read %alloc[%c0], %c0 : memref<4xindex>, vector<2xi32> return %out : vector<2xi32> } ``` with `"--convert-vector-to-scf=full-unroll target-rank=0"` and that also didn't crash. (Maybe obvious. I have to admit that I'm not very familiar with these ops.)	2023-11-30 13:29:09 +01:00
Benjamin Maxwell	ced97ffd08	[mlir][Vector] Don't fully unroll transfer_writes of n-D scalable vectors (#71924 ) It is not possible to unroll a scalable vector at compile time. This currently prevents transfer_writes from being lowered to arm_sme.tile_writes (downstream).	2023-11-13 11:22:55 +00:00
Cullen Rhodes	9816edc9f3	[mlir][vector] add result type to vector.extract assembly format (#66499 ) The vector.extract assembly format currently only contains the source type, for example: %1 = vector.extract %0[1] : vector<3x7x8xf32> it's not immediately obvious if this is the source or result type. This patch improves the assembly format to make this clearer, so the above becomes: %1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>	2023-09-28 11:11:16 +01:00
Benjamin Maxwell	2a82dfd704	[mlir][VectorOps] Don't drop scalable dims when lowering transfer_reads/writes (in VectorToSCF) This allows the lowering of > rank 1 transfer_reads/writes to equivalent lower-rank ones when the trailing dimension is scalable. The resulting ops still cannot be completely lowered as they depend on arrays of scalable vectors being enabled, and a few related fixes (see D158517). This patch also explicitly disables lowering transfer_reads/writes with a leading scalable dimension, as more changes would be needed to handle that correctly and it is unclear if it is required. Examples of ops that can now be further lowered: %vec = vector.transfer_read %arg0[%c0, %c0], %cst, %mask {in_bounds = [true, true]} : memref<3x?xf32>, vector<3x[4]xf32> vector.transfer_write %vec, %arg0[%c0, %c0], %mask {in_bounds = [true, true]} : vector<3x[4]xf32>, memref<3x?xf32> Reviewed By: c-rhodes, awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D158753	2023-09-08 09:43:17 +00:00
Benjamin Maxwell	f36e909da0	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors Reland of the original patch after updating the Python binding tests, a few CUDA/GPU MLIR tests, and ensuring the assembly format is round-trippable. This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print punctuation <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-11 09:29:54 +00:00
Mehdi Amini	1b272d21c8	Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors" This reverts commit `490dae26cb`. Bot is broken, seems like there is a problem of ambiguity in the parser.	2023-08-09 19:37:01 -07:00
Benjamin Maxwell	490dae26cb	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors Reland of the original patch after updating the Python binding tests and a few CUDA/GPU MLIR tests. This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-09 11:47:18 +00:00
Benjamin Maxwell	b160442dd2	Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors" This reverts commit `3875804a07`. This caused some test failures for the MLIR python bindings. Reverting until those are addressed.	2023-08-09 09:54:05 +00:00
Benjamin Maxwell	3875804a07	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-09 09:38:05 +00:00
Matthias Springer	6040044f2f	[mlir][vector] VectorToSCF: Omit redundant out-of-bounds check There was a bug in `TransferWriteNonPermutationLowering`, a pattern that extends the permutation map of a TransferWriteOp with leading transfer dimensions of size ones. These newly added transfer dimensions are always in-bounds, because the starting point of any dimension is in-bounds. VectorToSCF inserts out-of-bounds checks based on the "in_bounds" attribute and dims that are marked as out-of-bounds but that are actually always in-bounds lead to unnecessary "scf.if" ops. Differential Revision: https://reviews.llvm.org/D155196	2023-07-14 09:50:37 +02:00
Andrzej Warzynski	5cebffc276	[mlir][Vector] Update the lowering of `vector.transfer_write` to SCF This change updates the lowering of `vector.transfer_write` to SCF when scalable vectors are used. Specifically, when lowering `vector.transfer_write` to a loop of `vector.extractelement` ops, make sure that the upper bound of the generated loop is scaled by `vector.vscale`: ``` %10 = vector.vscale %11 = arith.muli %10, %c16 : index scf.for %arg2 = %c0 to %11 step %c1 ``` For reference, this is the current version (i.e. before this change): ``` scf.for %arg2 = %c0 to %c16 step %c1 ``` Note that this only valid for fixed-width vectors. Differential Revision: https://reviews.llvm.org/D154226	2023-06-30 20:14:47 +01:00
Kai Sasaki	a795955f52	[mlir] Register tensor dialect for transfer_read conversion Make sure to register tensor dialect as tensor.transfer_read can be dependent on its parameter. It resolves the issue causing the unregisterd dialect error to convert vector to SCF reported https://github.com/llvm/llvm-project/issues/60197. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D142866	2023-02-01 09:17:08 +09:00
Thomas Raoux	435905ecf2	[mlir][vector] Add extra lowering for more transfer_write maps Add pattern to lower transfer_write with permutation map that are not permutation of minor identity map. Differential Revision: https://reviews.llvm.org/D141815	2023-01-17 17:06:00 +00:00
Diego Caballero	eb7e2998d1	Reland "[mlir][Vector] Re-define masking semantics in vector.transfer ops"" This relands commit `847b5f82a4`. Differential Revision: https://reviews.llvm.org/D138079	2022-11-29 03:36:54 +00:00
Diego Caballero	f6d90055fd	[mlir][Vector] Remove 'lower-permutation-maps' option from VectorToSCF This patch is part of a larger simplification effort of vector transfer operations. It removes the flag `lower-permutation-maps` from VectorToSCF conversion and enables the lowering of permutation maps by default. This means that VectorToSCF will always lower permutation maps to independent broadcast/transpose operations before lowering vector operations to SCF. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D138742	2022-11-28 23:56:43 +00:00
Diego Caballero	847b5f82a4	Revert "[mlir][Vector] Re-define masking semantics in vector.transfer ops" This reverts commit `6c59c5cd08`.	2022-11-18 01:18:11 +00:00
Diego Caballero	6c59c5cd08	[mlir][Vector] Re-define masking semantics in vector.transfer ops Masking hasn't been widely used in vector transfer ops and the semantics of the mask operand were a bit loose. This patch states that the mask operand in a vector transfer op is applied to the read/write part of the operation and, therefore, its shape should match the shape of the elements read/written from/into the memref/tensor regardless of any permutation/broadcasting also applied by the transfer operation. Reviewers: nicolasvasilache Differential Revision: https://reviews.llvm.org/D138079	2022-11-18 01:05:42 +00:00
rkayaith	13bd410962	[mlir][Pass] Include anchor op in -pass-pipeline In D134622 the printed form of a pass manager is changed to include the name of the op that the pass manager is anchored on. This updates the `-pass-pipeline` argument format to include the anchor op as well, so that the printed form of a pipeline can be directly passed to `-pass-pipeline`. In most cases this requires updating `-pass-pipeline='pipeline'` to `-pass-pipeline='builtin.module(pipeline)'`. This also fixes an outdated assert that prevented running a `PassManager` anchored on `'any'`. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D134900	2022-11-03 11:36:12 -04:00
Nicolas Vasilache	435debea69	[mlir][test] NFC - Fix some worst offenders "capture by SSA name" tests Many tests still depend on specific names of SSA values (!!). This commit is a best effort cleanup that will set the stage for adding some pretty SSA result names.	2022-09-30 08:24:13 -07:00
River Riddle	0fd3a1ce60	[mlir][NFC] Update remaining textual references of un-namespaced `func` operations The special case parsing of operations in the `func` dialect is being removed, and operations will require the dialect namespace prefix.	2022-04-20 22:17:31 -07:00
River Riddle	3028bf740e	[mlir][NFC] Update textual references of `func` to `func.func` in Conversion/ tests The special case parsing of `func` operations is being removed.	2022-04-20 22:17:27 -07:00
River Riddle	af371f9f98	Reland [GreedPatternRewriter] Preprocess constants while building worklist when not processing top down Reland Note: Adds a fix to properly mark a commutative operation as folded if we change the order of its operands. This was uncovered by the fact that we no longer re-process constants. This avoids accidentally reversing the order of constants during successive application, e.g. when running the canonicalizer. This helps reduce the number of iterations, and also avoids unnecessary changes to input IR. Fixes #51892 Differential Revision: https://reviews.llvm.org/D122692	2022-04-07 11:31:42 -07:00
Mehdi Amini	ba43d6f85c	Revert "[GreedPatternRewriter] Preprocess constants while building worklist when not processing top down" This reverts commit `59bbc7a085`. This exposes an issue breaking the contract of `applyPatternsAndFoldGreedily` where we "converge" without applying remaining patterns.	2022-04-01 06:16:55 +00:00
River Riddle	59bbc7a085	[GreedPatternRewriter] Preprocess constants while building worklist when not processing top down This avoids accidentally reversing the order of constants during successive application, e.g. when running the canonicalizer. This helps reduce the number of iterations, and also avoids unnecessary changes to input IR. Fixes #51892 Differential Revision: https://reviews.llvm.org/D122692	2022-03-31 12:08:55 -07:00
River Riddle	3655069234	[mlir] Move the Builtin FuncOp to the Func dialect This commit moves FuncOp out of the builtin dialect, and into the Func dialect. This move has been planned in some capacity from the moment we made FuncOp an operation (years ago). This commit handles the functional aspects of the move, but various aspects are left untouched to ease migration: func::FuncOp is re-exported into mlir to reduce the actual API churn, the assembly format still accepts the unqualified `func`. These temporary measures will remain for a little while to simplify migration before being removed. Differential Revision: https://reviews.llvm.org/D121266	2022-03-16 17:07:03 -07:00
River Riddle	47f175b09b	[mlir] Update FuncOp conversion passes to Pass/InterfacePass<FunctionOpInterface> These passes generally don't rely on any special aspects of FuncOp, and moving allows for these passes to be used in many more situations. The passes that obviously weren't relying on invariants guaranteed by a "function" were updated to be generic pass, the rest were updated to be FunctionOpinterface InterfacePasses. The test updates are NFC switching from implicit nesting (-pass -pass2) form to the -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do). Differential Revision: https://reviews.llvm.org/D121190	2022-03-08 12:25:32 -08:00
William S. Moses	670aeece51	[MLIR][OpenMP][SCF] Mark parallel regions as allocation scopes MLIR has the notion of allocation scopes which specify that stack allocations (e.g. memref.alloca, llvm.alloca) should be freed or equivalently aren't available at the end of the corresponding region. Currently neither OpenMP parallel nor SCF parallel regions have the notion of such a scope. This clearly makes sense for an OpenMP parallel as this is implemented in with a new function which outlines the region, and clearly any allocations in that newly outlined function have a lifetime that ends at the return of the function, by definition. While SCF.parallel doesn't have a guaranteed runtime which it is implemented with, this similarly makes sense for SCF.parallel since otherwise an allocation within an SCF.parallel will needlessly continue to allocate stack memory that isn't cleaned up until the function (or other allocation scope op) which contains the SCF.parallel returns. This means that it is impossible to represent thread or iteration-local memory without causing a stack blow-up. In the case that this stack-blow-up behavior is intended, this can be equivalently represented with an allocation outside of the SCF.parallel with a size equal to the number of iterations. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D119743	2022-02-18 11:06:32 -05:00
Nicolas Vasilache	3c3810e72e	[mlir][vector] Avoid hoisting alloca'ed temporary buffers across AutomaticAllocationScope This revision avoids incorrect hoisting of alloca'd buffers across an AutomaticAllocationScope boundary. In the more general case, we will probably need a ParallelScope-like interface. Differential Revision: https://reviews.llvm.org/D118768	2022-02-02 06:00:42 -05:00
Nicolas Vasilache	c537a94334	[mlir][Vector] Thread 0-d vectors through vector.transfer ops This revision adds 0-d vector support to vector.transfer ops. In the process, numerous cleanups are applied, in particular around normalizing and reducing the number of builders. Reviewed By: ThomasRaoux, springerm Differential Revision: https://reviews.llvm.org/D114803	2021-12-01 16:49:43 +00:00
Mogball	7c5ecc8b7e	[mlir][vector] Insert/extract element can accept index `vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position` operand changed to accept `AnySignlessIntegerOrIndex` for better operability with operations that use `index`, such as affine loops. LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering directly to these operations without explicitly inserting casts is allowed. SPIRV's equivalent ops can also accept `i64`. Reviewed By: nicolasvasilache, jpienaar Differential Revision: https://reviews.llvm.org/D114139	2021-11-18 22:40:29 +00:00
Mogball	a54f4eae0e	[MLIR] Replace std ops with arith dialect ops Precursor: https://reviews.llvm.org/D110200 Removed redundant ops from the standard dialect that were moved to the `arith` or `math` dialects. Renamed all instances of operations in the codebase and in tests. Reviewed By: rriddle, jpienaar Differential Revision: https://reviews.llvm.org/D110797	2021-10-13 03:07:03 +00:00
Nicolas Vasilache	0a7f81a451	mlir][Vector] Fix spuriously disabled test.	2021-10-12 12:56:40 +00:00
Nicolas Vasilache	0c74b12a2e	[mlir][Vector] NFC - Add test to exercise lowering of vector.transfer to scf This revision also renames and moves some tests around. Differential Revision: https://reviews.llvm.org/D111606	2021-10-12 12:38:33 +00:00
Matthias Springer	bd20756d2c	[mlir] Support tensor types in unrolled VectorToSCF Differential Revision: https://reviews.llvm.org/D102668	2021-06-02 10:44:04 +09:00
Matthias Springer	558e740170	[mlir] Support tensor types in non-unrolled VectorToSCF Support for tensor types in the unrolled version will follow in a separate commit. Add a new pass option to activate lowering of transfer ops with tensor types (default: deactivated). Differential Revision: https://reviews.llvm.org/D102666	2021-06-02 10:37:58 +09:00
Matthias Springer	0f24163870	[mlir] Replace vector-to-scf with progressive-vector-to-scf Depends On D102388 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D102101	2021-05-13 23:27:31 +09:00
Matthias Springer	d020dd2b21	[mlir] Migrate vector-to-loops.mlir to ProgressiveVectorToSCF Create a copy of vector-to-loops.mlir and adapt the test for ProgressiveVectorToSCF. Fix a small bug in getExtractOp() triggered by this test. Differential Revision: https://reviews.llvm.org/D102388	2021-05-13 22:48:20 +09:00
Matthias Springer	9b77be5583	[mlir] Unrolled progressive-vector-to-scf. Instead of an SCF for loop, these pattern generate fully unrolled loops with no temporary buffer allocations. Differential Revision: https://reviews.llvm.org/D101981	2021-05-13 13:08:48 +09:00
Tobias Gysi	33ce6f02ca	[mlir][linalg] fixing hard-coded variable names in a test (NFC) The patch fixes hard-coded variable names in the vector-to-loops test.	2021-04-12 10:34:49 +00:00
Matthias Springer	95f8135043	[mlir] Change vector.transfer_read/write "masked" attribute to "in_bounds". This is in preparation for adding a new "mask" operand. The existing "masked" attribute was used to specify dimensions that may be out-of-bounds. Such transfers can be lowered to masked load/stores. The new "in_bounds" attribute is used to specify dimensions that are guaranteed to be within bounds. (Semantics is inverted.) Differential Revision: https://reviews.llvm.org/D99639	2021-03-31 18:04:22 +09:00
Uday Bondhugula	0b20413ef6	Revert "[Canonicalizer] Process regions top-down instead of bottom up & reuse existing constants." This reverts commit `361b7d125b` by Chris Lattner <clattner@nondot.org> dated Fri Mar 19 21:22:15 2021 -0700. The change to the greedy rewriter driver picking a different order was made without adequate analysis of the trade-offs and experimentation. A change like this has far reaching consequences on transformation pipelines, and a major impact upstream and downstream. For eg., one can’t be sure that it doesn’t slow down a large number of cases by small amounts or create other issues. More discussion here: https://llvm.discourse.group/t/speeding-up-canonicalize/3015/25 Reverting this so that improvements to the traversal order can be made on a clean slate, in bigger steps, and higher bar. Differential Revision: https://reviews.llvm.org/D99329	2021-03-25 22:17:26 +05:30

1 2

76 Commits