clang-p2996

Author	SHA1	Message	Date
Matthias Springer	c99670ba51	[mlir][vector] `LoadOp`/`StoreOp`: Allow 0-D vectors (#76134 ) Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D vectors. This commit fixes `mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir` when verifying the IR after each pattern (#74270). That test produces a temporary 0-D load/store op.	2023-12-22 11:12:58 +09:00
Andrzej Warzyński	e6f5762879	[mlir][vector][nfc] Add a test case for scalable vectors (#76138 ) Extends fold-arith-extf-into-vector-contract.mlir by adding a test case for scalable vectors.	2023-12-21 18:45:00 +00:00
Billy Zhu	34a65980d7	[MLIR] Erase location of folded constants (#75415 ) Follow up to the discussion from #75258, and serves as an alternate solution for #74670. Set the location to Unknown for deduplicated / moved / materialized constants by OperationFolder. This makes sure that the folded constants don't end up with an arbitrary location of one of the original ops that became it, and that hoisted ops don't confuse the stepping order.	2023-12-21 09:54:48 -08:00
Finn Plummer	88151dd428	[mlir][spirv] Add folding for SNegate, [Logical]Not (#74992 ) Add missing constant propogation folder for SNegate, [Logical]Not. Implement additional folding when !(!x) for all ops. This helps for readability of lowered code into SPIR-V. Part of work for #70704	2023-12-21 18:24:01 +01:00
Jakub Kuderski	72003adf6b	[mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015 ) Each vector element is reduced independently, which is a form of multi-reduction. The plan is to allow for gradual lowering of multi-reduction that results in fewer `gpu.shuffle` ops at the end: 1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d `gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32 For example we can perform 2 independent f16 reductions with a series of `gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.	2023-12-21 11:55:43 -05:00
Andrzej Warzyński	17afa5befb	[mlir][nfc] Update tests for Contract -> Op transforms (#76054 ) Updates two tests for vector.contract -> vector.outerproduct transformations: 1. Rename "vector-contract-to-outerproduct-transforms.mlir" as "vector-contract-to-outerproduct-matmul-transforms.mlir". The new name more accurate captures what's being tested. it is also consistent with "vector-contract-to-outerproduct-matvec-transforms.mlir", which covers vector matvec operations and makes finding relevant tests easier. 2. For matmul tests, move the traits definining the iteration spaces to the top of the file. This is consistent with how matvec tests are defined and also makes it easy to quickly identify what cases are covered. 3. For matmul tests, use more meaningful names for function arguments. This helps keep things consistent across the file (i.e. function definitions wih check lines and comments). 4. For matvec test, move a few tests around so that the most basic case (without masking) is first. 5. Update comments.	2023-12-21 13:20:16 +00:00
Tobias Gysi	9971b9ab19	[mlir][llvm] Improve alloca handling during inlining (#75961 ) This revision changes the alloca handling in the LLVM inliner. It ensures that alloca operations, even those nested within a region operation, can be relocated to the entry block of the function, or the closest ancestor region that is marked with either the isolated from above or automatic allocation scope trait. While the LLVM dialect does not have any region operations, the inlining interface may be used on IR that mixes different dialects.	2023-12-21 08:11:17 +01:00
Valentin Clement	a25da1a921	[mlir][openacc] Add device_type support for compute operations (#75864 ) Re-land PR after being reverted because of buildbot failures. This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 20:36:09 -08:00
Valentin Clement	553748356c	Revert "[mlir][openacc] Add device_type support for compute operations (#75864 )" This reverts commit `8b885eb90f`.	2023-12-20 16:08:10 -08:00
Valentin Clement (バレンタインクレメン)	8b885eb90f	[mlir][openacc] Add device_type support for compute operations (#75864 ) This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 13:45:47 -08:00
Han-Chung Wang	b33a131c82	[mlir][arith] Add support for expanding arith.maxnumf/minnumf ops. (#75989 ) The maxnum/minnum semantics can be found at https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic. The revision also updates function names in lit tests to match op name. Take arith.maxnumf as example: ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %result = arith.maxnumf %lhs, %rhs : f32 return %result : f32 } ``` will be expanded to ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %0 = arith.cmpf ugt, %lhs, %rhs : f32 %1 = arith.select %0, %lhs, %rhs : f32 %2 = arith.cmpf uno, %lhs, %lhs : f32 %3 = arith.select %2, %rhs, %1 : f32 return %3 : f32 } ``` Case 1: Both LHS and RHS are not NaN; LHS > RHS In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is LHS. Case 2: LHS is NaN and RHS is not NaN In this case, `%2` is true, so `%3` is always RHS. Case 3: LHS is not NaN and RHS is NaN In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and `%1` have the same value, which is LHS. Case 4: Both LHS and RHS are NaN: `%1` and RHS are all NaN, so the result is still NaN.	2023-12-20 10:35:12 -08:00
Gil Rapaport	d9803841f2	[mlir][emitc] Add op modelling C expressions (#71631 ) Add an emitc.expression operation that models C expressions, and provide transforms to form and fold expressions. The translator emits the body of emitc.expression ops as a single C expression. This expression is emitted by default as the RHS of an EmitC SSA value, but if possible, expressions with a single use that is not another expression are instead inlined. Specific expression's inlining can be fine tuned by lowering passes and transforms.	2023-12-20 15:04:46 +02:00
Andrzej Warzyński	354adb44c9	[mlir][vector] Extend `CreateMaskFolder` (#75842 ) Extends `CreateMaskFolder` pattern so that the following: ```mlir %c8 = arith.constant 8 : index %c16 = arith.constant 16 : index %0 = vector.vscale %1 = arith.muli %0, %c16 : index %10 = vector.create_mask %c8, %1 : vector<8x[16]xi1> ``` is folded as: ```mlir %0 = vector.constant_mask [8, 16] : vector<8x[16]xi1> ```	2023-12-20 11:08:54 +00:00
Andrzej Warzyński	d5abd8a1a9	[mlir][vector][nfc] Move tests for scalable outer-product (#76035 ) Tests for vector.outerproduct for scalable vectors from "vector-scalable-outerproduct.mlir" are moved to: * ops.mlir and invalid.mlir. These files are effectively used to document what Ops are supported and That's basically what the original file was testing (but specifically for scalable vectors).	2023-12-20 10:53:00 +00:00
Finn Plummer	4c83c27c91	[mlir][spirv] Add folding for [I\|Logical][Not]Equal (#74194 )	2023-12-20 11:00:28 +01:00
Matthias Springer	f7096428b4	[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` (#75315 ) Infer the side effects of `gpu.launch` from its body.	2023-12-20 15:25:25 +09:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
Kunwar Grover	282d501476	[mlir][Transform] Fix crash with invalid ir for transform libraries (#75649 ) This patch fixes a crash caused when the transform library interpreter is given an IR that fails to parse.	2023-12-19 23:16:19 +05:30
Han-Chung Wang	899c2bed9e	[mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410 ) It is a trivial early-return case. If the cloned ops are not returned, it will generate `extract_slice` op that extracts the whole slice. However, it is not folded away. Early-return to avoid the case. E.g., ```mlir func.func @matmul_tensors( %arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2: tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op %1 = transform.structured.tile_using_for %0 [0, 0, 0] : (!transform.any_op) -> (!transform.any_op) transform.yield } } ``` Apply the transforms and canonicalize the IR: ``` mlir-opt --transform-interpreter -canonicalize input.mlir ``` we will get ```mlir module { func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32> %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32> %dim_1 = tensor.dim %arg1, %c1 : tensor<?x?xf32> %extracted_slice = tensor.extract_slice %arg0[0, 0] [%dim, %dim_0] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_2 = tensor.extract_slice %arg1[0, 0] [%dim_0, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_3 = tensor.extract_slice %arg2[0, 0] [%dim, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %0 = linalg.matmul ins(%extracted_slice, %extracted_slice_2 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } } ``` The revision early-return the case so we can get: ```mlir func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } ```	2023-12-19 09:14:43 -08:00
Ivan Butygin	c0d2ea9d42	[mlir][scf] Improve `scf.parallel` fusion pass (#75852 ) Abort fusion if memref load may alias write, but not the exact alias. Add alias check hook to `naivelyFuseParallelOps`, so user can customize alias checking. Use builtin alias analysis in `ParallelLoopFusion` pass.	2023-12-19 18:07:46 +03:00
Jakub Kuderski	07677113ff	[mlir][vector] Add pattern to break down reductions into arith ops (#75727 ) The number of vector elements considered 'small' enough to extract is parameterized. This is to avoid going into specialized reduction lowering when a single/couple of arith ops can do. Targets without dedicated reduction intrinsics can use that as an emulation path too. Depends on https://github.com/llvm/llvm-project/pull/75846.	2023-12-18 17:54:54 -05:00
Jakub Kuderski	a528cee224	[mlir][vector] Improve `makeArithReduction` expansion (#75846 ) Propagate fast math flags. Distinguish `minf`/`maxf` and `minimumf`/`maximumf`. Required for future patterns in https://github.com/llvm/llvm-project/pull/75727.	2023-12-18 17:47:46 -05:00
srcarroll	b26ee97537	[MLIR][Linalg] Support dynamic sizes in `lower_unpack` (#75494 )	2023-12-18 19:02:04 +01:00
Rik Huijzer	672f1a036a	[mlir][memref] Make `LoadOp::verify` error more clear (#75831 ) While debugging https://github.com/llvm/llvm-project/issues/71326, the `LoadOp::verify` code and error were very confusing. This PR improves that. This code was a part from the reverted PR https://github.com/llvm/llvm-project/pull/75519. Fixing the `-convert-vector-to-scf` issue is going to take a bit longer and this code was out of scope anyway. Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-18 18:41:05 +01:00
Rik Huijzer	9f5afc3de9	Revert "[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 )" This reverts commit `3a1ae2f46d`.	2023-12-17 12:34:17 +01:00
Rik Huijzer	3a1ae2f46d	[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 ) Fixes https://github.com/llvm/llvm-project/issues/71326. The cause of the issue was that a new `LoadOp` was created which looked something like: ```mlir %arg4 = func.func main(%arg1 : index, %arg2 : index) { %alloca_0 = memref.alloca() : memref<vector<1x32xi1>> %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>> %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>> return } ``` which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is 0 dimensional, `%1` has one dimension, but `memref.load` tries to index `%1` with two indices. This is now fixed by using the fact that `unpackOneDim` always unpacks one dim `1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)` and so the `loadOp` should just index only one dimension. --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-17 11:42:35 +01:00
Quinn Dawkins	82ab0f7f36	[mlir][linalg] Fix rank-reduced cases for extract/insert slice in DropUnitDims (#74723 ) Inferring the reshape reassociation indices for extract/insert slice ops based on the read sizes of the original slicing op will generate an invalid expand/collapse shape op for already rank-reduced cases. Instead just infer from the shape of the slice. Ported from Differential Revision: https://reviews.llvm.org/D147488	2023-12-16 10:08:51 -05:00
Peiming Liu	6c06bde7c4	[mlir][sparse] support loop range query using SparseTensorLevel. (#75670 )	2023-12-15 16:33:31 -08:00
Andrzej Warzyński	f11bda78c8	[mlir][linalg] Use vector.shuffle to flatten conv filter (#75038 ) Updates the vectorisation of 1D depthwise convolution when flattening the channel dimension (introduced in #71918). In particular - how the convolution filter is "flattened". ATM, the vectoriser will use `vector.shape_cast`: ```mlir %b_filter = vector.broadcast %filter : vector<4xf32> to vector<3x2x4xf32> %sc_filter = vector.shape_cast %b_filter : vector<3x2x4xf32> to vector<3x8xf32> ``` This lowering is not ideal - `vector.shape_cast` can be convenient when it's folded away, but that's not happening in this case. Instead, this patch updates the vectoriser to use `vector.shuffle` (the overall result is identical): ```mlir %sh_filter = vector.shuffle %filter, %filter [0, 1, 2, 3, 0, 1, 2, 3] : vector<4xf32>, vector<4xf32> %b_filter = vector.broadcast %sh_filter : vector<8xf32> to vector<3x8xf32> ```	2023-12-15 17:56:59 +00:00
Boian Petkantchin	5e29112719	[mlir][mesh] Add verification and canonicalization for some collectives (#74905 ) Add verification and canonicalization for broadcast, gather, recv, reduce, scatter, send and shift. The canonicalizations only remove trivial collectives with empty mesh_axes attrubutes.	2023-12-15 06:41:10 -08:00
Rafael Ubal	214d32ccd2	Support for dynamic dimensions in 'tensor.splat' (#74626 ) This feature had been marked as `TODO` in the `tensor.splat` documentation for a while. This MR includes: - Support for dynamically shaped tensors in the return type of `tensor.splat` with the syntax suggested in the `TODO` comment. - Updated op documentation. - Bufferization support. - Updates in op folders affected by the new feature. - Unit tests for valid/invalid syntax, valid/invalid folding, and lowering through bufferization. - Additional op builders resembling those available in `tensor.empty`.	2023-12-15 13:54:45 +00:00
Quinn Dawkins	fcd54b368e	[mlir][tensor] Fix tensor.concat reifyResultShapes for static result dims (#75558 ) When the concatenated dim is statically sized but the inputs are dynamically sized, reifyResultShapes must return the static shape. Fixes the implementation of the interface for tensor.concat in such cases.	2023-12-15 08:43:58 -05:00
Hsiangkai Wang	f643eec892	[mlir][vector] Add emulation patterns for vector masked load/store (#74834 ) In this patch, it will convert ``` vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru ``` to ``` %ivalue = %pass_thru %m = vector.extract %mask[0] %result0 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1] %combined = vector.insert %v, %ivalue[0] scf.yield %combined } else { scf.yield %ivalue } %m = vector.extract %mask[1] %result1 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1 + 1] %combined = vector.insert %v, %result0[1] scf.yield %combined } else { scf.yield %result0 } ... ``` It will convert ``` vector.maskedstore %base[%idx_0, %idx_1], %mask, %value ``` to ``` %m = vector.extract %mask[0] scf.if %m { %extracted = vector.extract %value[0] memref.store %extracted, %base[%idx_0, %idx_1] } %m = vector.extract %mask[1] scf.if %m { %extracted = vector.extract %value[1] memref.store %extracted, %base[%idx_0, %idx_1 + 1] } ... ```	2023-12-15 11:35:48 +00:00
Felix Schneider	8190369e83	[mlir][tosa] Add verifier for `tosa.transpose` (#75376 ) This patch adds a verifier to `tosa.transpose` which fixes a crash. Related: https://github.com/llvm/llvm-project/pull/74367 Fix https://github.com/llvm/llvm-project/issues/74479	2023-12-15 07:22:32 +01:00
Vivian	bd6a2452ae	[mlir][SCF] Add support for peeling the first iteration out of the loop (#74015 ) There is a use case that we need to peel the first iteration out of the for loop so that the peeled forOp can be canonicalized away and the fillOp can be fused into the inner forall loop. For example, we have nested loops as below ``` linalg.fill ins(...) outs(...) scf.for %arg = %lb to %ub step %step scf.forall ... ``` After the peeling transform, it is expected to be ``` scf.forall ... linalg.fill ins(...) outs(...) scf.for %arg = %(lb + step) to %ub step %step scf.forall ... ``` This patch makes the most use of the existing peeling functions and adds support for peeling the first iteration out of the loop.	2023-12-14 17:03:52 -08:00
Fabian Mora	419c45a325	[mlir][gpu] Fix crash in `gpu-module-to-binary` (#75477 ) This patch fixes the error in issue #75434. The crash was being caused by not checking for a lack of target attributes in a GPU module. It's now considered an error to invoke the pass with a GPU module with no target attributes.	2023-12-14 14:03:10 -05:00
Jerry Wu	2c9ba9c34a	[mlir] Fix type transformation in DropUnitDimFromElementwiseOps (#75430 ) Use operand and result types to build the corresponding new types in `DropUnitDimFromElementwiseOps`.	2023-12-14 12:20:54 -05:00
Tobias Gysi	25d942403c	[mlir][llvm] Add invariant intrinsics (#75354 ) This commit implements the LLVM IR invariant intrinsics in LLVM dialect. These intrinsics can be used to mark a program regions in which the contents of a specific memory object will not change. The LLVM dialect implementation also implements the PromotableOpInterface to ensure Mem2Reg & SROA are able to promote pointers that are marked using the invariant intrinsics.	2023-12-14 14:58:45 +01:00
Kareem Ergawy	2ab926d959	[flang][MLIR][OpenMP] Add support for `target update` directive. (#75047 ) Add an op in the OMP dialect to model the `target update` direcive. This change reuses the `MapInfoOp` used by other device directive to model `map` clauses but verifies that the restrictions imposed by the `target update` directive are respected.	2023-12-14 12:48:45 +01:00
Cullen Rhodes	f0ce23509a	[mlir][ArmSME][NFC] Move conversion tests (#75446 ) * Move -vector-to-arm-sme tests to mlir/test/Conversion/VectorToArmSME * Move -arm-sme-to-llvm tests to mlir/test/Conversion/ArmSMEToLLVM * Separate unsupported tests.	2023-12-14 10:52:02 +00:00
Keren Zhou	e66f97e8a8	[mlir] Fix loop pipelining when the operand of `yield` is not defined in the loop body (#75423 )	2023-12-13 19:19:13 -08:00
Prathamesh Tagore	f397bdf5ae	[mlir][tensor] Fold consumer linalg transpose with producer tensor pack (#74206 ) Partial fix to https://github.com/openxla/iree/issues/15367	2023-12-13 14:26:19 -08:00
Andrzej Warzyński	c02d07fdf0	[mlir][vector] Add pattern to drop unit dim from elementwise(a, b)) (#74817 ) For vectors with either leading or trailing unit dim, replaces: elementwise(a, b) with: sc_a = shape_cast(a) sc_b = shape_cast(b) res = elementwise(sc_a, sc_b) return shape_cast(res) The newly inserted shape_cast Ops fold (before elementwise Op) and then restore (after elementwise Op) the unit dim. Vectors `a` and `b` are required to be rank > 1. Example: ```mlir %mul = arith.mulf %B_row, %A_row : vector<1x[4]xf32> %cast = vector.shape_cast %mul : vector<1x[4]xf32> to vector<[4]xf32> ``` gets converted to: ```mlir %B_row_sc = vector.shape_cast %B_row : vector<1x[4]xf32> to vector<[4]xf32> %A_row_sc = vector.shape_cast %A_row : vector<1x[4]xf32> to vector<[4]xf32> %mul = arith.mulf %B_row_sc, %A_row_sc : vector<[4]xf32> %mul_sc = vector.shape_cast %mul : vector<[4]xf32> to vector<1x[4]xf32> %cast = vector.shape_cast %mul_sc : vector<1x[4]xf32> to vector<[4]xf32> ``` In practice, the bottom 2 shape_cast(s) will be folded away.	2023-12-13 20:29:12 +00:00
Tom Eccles	79524ba527	[mlir][ArmSME] Add sve streaming compatible attribute (#75222 ) Following the same path already used for ArmStreaming and ArmLocallyStreaming. This should correspond to clang's __arm_streaming_compatible attribute.	2023-12-13 13:53:01 +00:00
Yinying Li	31b72b0742	[mlir][sparse]Make isBlockSparsity more robust (#75113 ) 1. A single dimension can either be blocked (with floordiv and mod pair) or non-blocked. Mixing them would be invalid. 2. Block size should be non-zero value.	2023-12-12 13:43:03 -05:00
Boian Petkantchin	4b3446771f	[mlir][mesh] Add endomorphism simplification for all-reduce (#73150 ) Does transformations like all_reduce(x) + all_reduce(y) -> all_reduce(x + y) max(all_reduce(x), all_reduce(y)) -> all_reduce(max(x, y)) when the all_reduce element-wise op is max. Added general rewrite pattern HomomorphismSimplification and EndomorphismSimplification that encapsulate the general algorithm. Made specialization for all-reduce with respect to addf, addi, minsi, maxsi, minimumf and maximumf in the Arithmetic dialect.	2023-12-12 10:21:52 -08:00
Jakub Kuderski	8063622721	[mlir][vector] Allow vector distribution with multiple written elements (#75122 ) Add a configuration option to allow vector distribution with multiple elements written by a single lane. This is so that we can perform vector multi-reduction with multiple results per workgroup.	2023-12-12 13:15:17 -05:00
Rafael Ubal	a8f3860bcb	[mlir][tensor] Fix bug in `tensor.extract(tensor.from_elements)` folder (#75109 ) The folder for `tensor.extract` is not operating correctly when it is consuming the result of a `tensor.from_elements` operation. The existing unit test named `@extract_from_tensor.from_elements_3d` in `mlir/test/Dialect/Tensor/canonicalize.mlir` seems an attempt to stress this code. However, this unit tests creates a `tensor.from_elements` op exclusively from constants, which gets folded away into a single constant tensor. Therefore, the buggy code was never executed in unit tests. I have added a new unit test named `@extract_from_tensor.from_elements_variable_3d` that makes sure the `tensor.from_elements` op is not folded away by having its input operands come directly from function arguments. The original folder code would have made this test fail. This bug was notably affecting the lowering of the `tosa.pad` op in the `tosa-to-tensor` pass, where the generated code is likely to contain a `tensor.from_elements` + `tensor.extract` op sequence.	2023-12-12 15:36:52 +00:00
lorenzo chelini	06c4f78b07	[MLIR][Linalg] improve silenceable failure msg for `lower_pack` (NFC) (#75053 ) Adjust the silenceable failure message as we lower `tensor.unpack` as a combination of `linalg.transpose` + `tensor.collapse_shape` and `tensor.extract_slice`.	2023-12-12 13:06:17 +01:00

1 2 3 4 5 ...

4968 Commits