clang-p2996

Author	SHA1	Message	Date
Adam Paszke	85b2327192	[mlir][nvvm] Fix the PTX lowering of wgmma.mma_async (#76150 )	2023-12-22 14:46:34 +01:00
Ryan Holt	847a6f8f0a	[mlir][MemRef] Add runtime bounds checking (#75817 ) This change adds (runtime) bounds checks for `memref` ops using the existing `RuntimeVerifiableOpInterface`. For `memref.load` and `memref.store`, we check that the indices are in-bounds of the memref's index space. For `memref.reinterpret_cast` and `memref.subview` we check that the resulting address space is in-bounds of the input memref's address space.	2023-12-22 11:49:15 +09:00
Finn Plummer	88151dd428	[mlir][spirv] Add folding for SNegate, [Logical]Not (#74992 ) Add missing constant propogation folder for SNegate, [Logical]Not. Implement additional folding when !(!x) for all ops. This helps for readability of lowered code into SPIR-V. Part of work for #70704	2023-12-21 18:24:01 +01:00
Jakub Kuderski	72003adf6b	[mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015 ) Each vector element is reduced independently, which is a form of multi-reduction. The plan is to allow for gradual lowering of multi-reduction that results in fewer `gpu.shuffle` ops at the end: 1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d `gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32 For example we can perform 2 independent f16 reductions with a series of `gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.	2023-12-21 11:55:43 -05:00
Matthias Springer	db8a119e8f	[mlir][ArmSME] Fix invalid rewriter API usage (#76123 ) When operations are modified in-place, the rewriter must be notified. This commit fixes `mlir/test/Conversion/ArmSMEToLLVM/unsupported.mlir`, `mlir/test/Dialect/ArmSME/tile-zero-masks.mlir` and `mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` enabled.	2023-12-21 17:39:36 +09:00
Tobias Gysi	9971b9ab19	[mlir][llvm] Improve alloca handling during inlining (#75961 ) This revision changes the alloca handling in the LLVM inliner. It ensures that alloca operations, even those nested within a region operation, can be relocated to the entry block of the function, or the closest ancestor region that is marked with either the isolated from above or automatic allocation scope trait. While the LLVM dialect does not have any region operations, the inlining interface may be used on IR that mixes different dialects.	2023-12-21 08:11:17 +01:00
Matthias Springer	d8d09296ed	[mlir][EmitC] Fix invalid rewriter API usage (#76124 ) When operations are modified in-place, the rewriter must be notified. This commit fixes `mlir/test/Dialect/EmitC/transforms.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` enabled.	2023-12-21 16:00:18 +09:00
Valentin Clement	a25da1a921	[mlir][openacc] Add device_type support for compute operations (#75864 ) Re-land PR after being reverted because of buildbot failures. This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 20:36:09 -08:00
Han-Chung Wang	bffdde8b8e	[mlir][tensor][NFC] Fix a typo in pack simplification pattern. (#76109 )	2023-12-20 17:03:55 -08:00
Valentin Clement	553748356c	Revert "[mlir][openacc] Add device_type support for compute operations (#75864 )" This reverts commit `8b885eb90f`.	2023-12-20 16:08:10 -08:00
Peiming Liu	cf4dd91165	[mlir][sparse] initialize slice-driven loop-related fields in one place (#76099 )	2023-12-20 14:20:57 -08:00
Valentin Clement (バレンタインクレメン)	8b885eb90f	[mlir][openacc] Add device_type support for compute operations (#75864 ) This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 13:45:47 -08:00
Krzysztof Parzyszek	8b231d73bd	[mlir] Fix build break with shared libraries When project components are built as separate shared libraries, a lot of errors appear about undefined symbols, e.g. ``` /usr/bin/ld: CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o : in function `(anonymous namespace)::buildCommonPassPipeline(mlir::OpPa ssManager&, (anonymous namespace)::GPUToNVVMPipelineOptions const&)': GPUToNVVMPipeline.cpp:(.text._ZN12_GLOBAL__N_123buildCommonPassPipelineE RN4mlir13OpPassManagerERKNS_24GPUToNVVMPipelineOptionsE+0xa5): undefined reference to `mlir::createConvertLinalgToLoopsPass()' ``` Add the necessary dependencies to Dialect/GPU/Pipelines/CMakeLists.txt	2023-12-20 12:49:25 -06:00
Han-Chung Wang	b33a131c82	[mlir][arith] Add support for expanding arith.maxnumf/minnumf ops. (#75989 ) The maxnum/minnum semantics can be found at https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic. The revision also updates function names in lit tests to match op name. Take arith.maxnumf as example: ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %result = arith.maxnumf %lhs, %rhs : f32 return %result : f32 } ``` will be expanded to ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %0 = arith.cmpf ugt, %lhs, %rhs : f32 %1 = arith.select %0, %lhs, %rhs : f32 %2 = arith.cmpf uno, %lhs, %lhs : f32 %3 = arith.select %2, %rhs, %1 : f32 return %3 : f32 } ``` Case 1: Both LHS and RHS are not NaN; LHS > RHS In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is LHS. Case 2: LHS is NaN and RHS is not NaN In this case, `%2` is true, so `%3` is always RHS. Case 3: LHS is not NaN and RHS is NaN In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and `%1` have the same value, which is LHS. Case 4: Both LHS and RHS are NaN: `%1` and RHS are all NaN, so the result is still NaN.	2023-12-20 10:35:12 -08:00
Razvan Lupusoru	a711b042fd	[acc] Initial implementation of MemoryEffects on `acc` operations (#75970 ) The `acc` dialect operations now implement MemoryEffects interfaces in the following ways: - Data entry operations which may read host memory via `varPtr` are now marked as so. The majority of them do NOT actually read the host memory. For example, `acc.present` works on the basis of presence of pointer and not necessarily what the data points to - so they are not marked as reading the host memory. They still use `varPtr` though but this dependency is reflected through ssa. - Data clause operations which may mutate the data pointed to by `accPtr` are marked as doing so. - Data clause operations which update required structured or dynamic runtime counters are marked as reading and writing the newly defined `RuntimeCounters` resource. Some operations, like `acc.getdeviceptr` do not actually use the runtime counters - but are marked as reading them since the address obtained depends on the mapping operations which do update the runtime counters. Namely, `acc.getdeviceptr` cannot be moved across other mapping operations. - Constructs are marked as writing to the `ConstructResource`. This may be too strict but is needed for the following reasons: 1) Structured constructs may not use `accPtr` and instead use `varPtr` - when this is the case, data actions may be removed even when used. 2) Unstructured constructs are currently used to aggregate multiple data actions. We do not want such constructs removed or moved for now. - Terminators are marked as `Pure` as in other dialects. The current approach has the following limitations which may require further improvements: - Subsequent `acc.copyin` operations on same data do not actually read host memory pointed to by `varPtr` but are still marked as so. - Two `acc.delete` operations on same data may not mutate `accPtr` until the runtime counters are zero (but are still marked as mutating). - The `varPtrPtr` argument, when present, points to the address of location of `varPtr`. When mapping to target device, an `accPtrPtr` needs computed and this memory is mutated. This effect is not captured since the current operations do not produce `accPtrPtr`. - Runtime counter effects are imprecise since two operations with differing `varPtr` increment/decrement different counters. Additionally, operations with `varPtrPtr` mutate attachment counters. - The `ConstructResource` is too strict and likely can be relaxed with better modeling.	2023-12-20 07:11:19 -08:00
Gil Rapaport	d9803841f2	[mlir][emitc] Add op modelling C expressions (#71631 ) Add an emitc.expression operation that models C expressions, and provide transforms to form and fold expressions. The translator emits the body of emitc.expression ops as a single C expression. This expression is emitted by default as the RHS of an EmitC SSA value, but if possible, expressions with a single use that is not another expression are instead inlined. Specific expression's inlining can be fine tuned by lowering passes and transforms.	2023-12-20 15:04:46 +02:00
Andrzej Warzyński	354adb44c9	[mlir][vector] Extend `CreateMaskFolder` (#75842 ) Extends `CreateMaskFolder` pattern so that the following: ```mlir %c8 = arith.constant 8 : index %c16 = arith.constant 16 : index %0 = vector.vscale %1 = arith.muli %0, %c16 : index %10 = vector.create_mask %c8, %1 : vector<8x[16]xi1> ``` is folded as: ```mlir %0 = vector.constant_mask [8, 16] : vector<8x[16]xi1> ```	2023-12-20 11:08:54 +00:00
Finn Plummer	4c83c27c91	[mlir][spirv] Add folding for [I\|Logical][Not]Equal (#74194 )	2023-12-20 11:00:28 +01:00
Matthias Springer	c4457e10fe	[mlir][IR] Change block/region walkers to enumerate `this` block/region (#75020 ) This change makes block/region walkers consistent with operation walkers. An operation walk enumerates the current operation. Similarly, block/region walks should enumerate the current block/region. Example: ``` // Current behavior: op1->walk([](Operation op2) { / op1 is enumerated / }); block1->walk([](Block block2) { /* block1 is NOT enumerated / }); region1->walk([](Block block) { /* blocks of region1 are NOT enumerated / }); region1->walk([](Region region2) { /* region1 is NOT enumerated }); // New behavior: op1->walk([](Operation op2) { / op1 is enumerated / }); block1->walk([](Block block2) { /* block1 IS enumerated / }); region1->walk([](Block block) { /* blocks of region1 ARE enumerated / }); region1->walk([](Region region2) { /* region1 IS enumerated }); ```	2023-12-20 14:51:45 +09:00
Matthias Springer	f10302e3fa	[mlir] Require folders to produce Values of same type (#75887 ) This commit adds extra assertions to `OperationFolder` and `OpBuilder` to ensure that the types of the folded SSA values match with the result types of the op. There used to be checks that discard the folded results if the types do not match. This commit makes these checks stricter and turns them into assertions. Discarding folded results with the wrong type (without failing explicitly) can hide bugs in op folders. Two such bugs became apparent in MLIR (and some more in downstream projects) and are fixed with this change. Note: The existing type checks were introduced in https://reviews.llvm.org/D95991. Migration guide: If you see failing assertions (`folder produced value of incorrect type`; make sure to run with assertions enabled!), run with `-debug` or dump the operation right before the failing assertion. This will point you to the op that has the broken folder. A common mistake is a mismatch between static/dynamic dimensions (e.g., input has a static dimension but folded result has a dynamic dimension).	2023-12-20 14:39:22 +09:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
Jakub Kuderski	9f74e6e615	[mlir][vector][gpu] Use `makeArithReduction` in lowering patterns. NFC. (#75952 ) Use the `vector::makeArithReduction` helper as the source-of-truth of reduction to arith ops lowering.	2023-12-19 19:04:27 -05:00
Kunwar Grover	282d501476	[mlir][Transform] Fix crash with invalid ir for transform libraries (#75649 ) This patch fixes a crash caused when the transform library interpreter is given an IR that fails to parse.	2023-12-19 23:16:19 +05:30
Han-Chung Wang	899c2bed9e	[mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410 ) It is a trivial early-return case. If the cloned ops are not returned, it will generate `extract_slice` op that extracts the whole slice. However, it is not folded away. Early-return to avoid the case. E.g., ```mlir func.func @matmul_tensors( %arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2: tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op %1 = transform.structured.tile_using_for %0 [0, 0, 0] : (!transform.any_op) -> (!transform.any_op) transform.yield } } ``` Apply the transforms and canonicalize the IR: ``` mlir-opt --transform-interpreter -canonicalize input.mlir ``` we will get ```mlir module { func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32> %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32> %dim_1 = tensor.dim %arg1, %c1 : tensor<?x?xf32> %extracted_slice = tensor.extract_slice %arg0[0, 0] [%dim, %dim_0] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_2 = tensor.extract_slice %arg1[0, 0] [%dim_0, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_3 = tensor.extract_slice %arg2[0, 0] [%dim, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %0 = linalg.matmul ins(%extracted_slice, %extracted_slice_2 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } } ``` The revision early-return the case so we can get: ```mlir func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } ```	2023-12-19 09:14:43 -08:00
Ivan Butygin	c0d2ea9d42	[mlir][scf] Improve `scf.parallel` fusion pass (#75852 ) Abort fusion if memref load may alias write, but not the exact alias. Add alias check hook to `naivelyFuseParallelOps`, so user can customize alias checking. Use builtin alias analysis in `ParallelLoopFusion` pass.	2023-12-19 18:07:46 +03:00
Guray Ozen	5caae72d1a	[mlir][gpu] Productize `test-lower-to-nvvm` as `gpu-lower-to-nvvm` (#75775 ) The `test-lower-to-nvvm` pipeline serves as the common and proper pipeline for nvvm+host compilation, and it's used across our CUDA integration tests. This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm` and moves it within `InitAllPasses.h`. The aim is to call it from Python, also having a standardize compilation process for nvvm.	2023-12-19 08:40:46 +01:00
Matthias Springer	9b21866fea	[mlir][linalg] Fix invalid IR in `FoldInsertPadIntoFill` (#74418 ) `FoldInsertPadIntoFill` used to generate an invalid `tensor.insert_slice` op: ``` error: expected type to be 'tensor<?x?x?xf32>' or a rank-reduced version. (size mismatch) ``` This commit fixes tests such as `mlir/test/Dialect/Linalg/canonicalize.mlir` when verifying the IR after each pattern application (#74270).	2023-12-19 14:17:54 +09:00
Matthias Springer	3a087c1592	[mlir][linalg] Fix invalid IR in Linalg op fusion (#74425 ) Linalg op fusion (`Linalg/Transforms/Fusion.cpp`) used to generate invalid fused producer ops: ``` error: 'linalg.conv_2d_nhwc_hwcf' op expected type of operand #2 ('tensor<1x8x16x4xf32>') to match type of corresponding result ('tensor<?x?x?x?xf32>') note: see current operation: %24 = "linalg.conv_2d_nhwc_hwcf"(%21, %22, %23) <{dilations = dense<1> : tensor<2xi64>, operandSegmentSizes = array<i32: 2, 1>, strides = dense<2> : tensor<2xi64>}> ({ ^bb0(%arg9: f32, %arg10: f32, %arg11: f32): %28 = "arith.mulf"(%arg9, %arg10) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 %29 = "arith.addf"(%arg11, %28) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%29) : (f32) -> () }) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 * 2 + d4, d2 * 2 + d5, d6)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>]} : (tensor<1x?x?x3xf32>, tensor<3x3x3x4xf32>, tensor<1x8x16x4xf32>) -> tensor<?x?x?x?xf32> ``` This is a problem because the input IR to greedy pattern rewriter during `-test-linalg-greedy-fusion` is invalid. This commit fixes tests such as `mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir` when verifying the IR after each pattern application (#74270).	2023-12-19 14:17:10 +09:00
Jakub Kuderski	07677113ff	[mlir][vector] Add pattern to break down reductions into arith ops (#75727 ) The number of vector elements considered 'small' enough to extract is parameterized. This is to avoid going into specialized reduction lowering when a single/couple of arith ops can do. Targets without dedicated reduction intrinsics can use that as an emulation path too. Depends on https://github.com/llvm/llvm-project/pull/75846.	2023-12-18 17:54:54 -05:00
Jakub Kuderski	a528cee224	[mlir][vector] Improve `makeArithReduction` expansion (#75846 ) Propagate fast math flags. Distinguish `minf`/`maxf` and `minimumf`/`maximumf`. Required for future patterns in https://github.com/llvm/llvm-project/pull/75727.	2023-12-18 17:47:46 -05:00
srcarroll	b26ee97537	[MLIR][Linalg] Support dynamic sizes in `lower_unpack` (#75494 )	2023-12-18 19:02:04 +01:00
Rik Huijzer	672f1a036a	[mlir][memref] Make `LoadOp::verify` error more clear (#75831 ) While debugging https://github.com/llvm/llvm-project/issues/71326, the `LoadOp::verify` code and error were very confusing. This PR improves that. This code was a part from the reverted PR https://github.com/llvm/llvm-project/pull/75519. Fixing the `-convert-vector-to-scf` issue is going to take a bit longer and this code was out of scope anyway. Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-18 18:41:05 +01:00
Kazu Hirata	6655581038	[Dialect] Use llvm::is_contained (NFC)	2023-12-17 09:41:22 -08:00
Rik Huijzer	9f5afc3de9	Revert "[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 )" This reverts commit `3a1ae2f46d`.	2023-12-17 12:34:17 +01:00
Rik Huijzer	3a1ae2f46d	[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 ) Fixes https://github.com/llvm/llvm-project/issues/71326. The cause of the issue was that a new `LoadOp` was created which looked something like: ```mlir %arg4 = func.func main(%arg1 : index, %arg2 : index) { %alloca_0 = memref.alloca() : memref<vector<1x32xi1>> %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>> %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>> return } ``` which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is 0 dimensional, `%1` has one dimension, but `memref.load` tries to index `%1` with two indices. This is now fixed by using the fact that `unpackOneDim` always unpacks one dim `1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)` and so the `loadOp` should just index only one dimension. --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-17 11:42:35 +01:00
Matthias Springer	ea979b24b0	[mlir][SparseTensor][NFC] Remove `isNestedIn` helper function (#75729 ) Use `Region::findAncestorBlockInRegion` instead of a custom IR traversal.	2023-12-17 13:19:27 +09:00
Quinn Dawkins	82ab0f7f36	[mlir][linalg] Fix rank-reduced cases for extract/insert slice in DropUnitDims (#74723 ) Inferring the reshape reassociation indices for extract/insert slice ops based on the read sizes of the original slicing op will generate an invalid expand/collapse shape op for already rank-reduced cases. Instead just infer from the shape of the slice. Ported from Differential Revision: https://reviews.llvm.org/D147488	2023-12-16 10:08:51 -05:00
Peiming Liu	6c06bde7c4	[mlir][sparse] support loop range query using SparseTensorLevel. (#75670 )	2023-12-15 16:33:31 -08:00
Peiming Liu	21edad7d07	[mlir][sparse] set up the skeleton for SparseTensorLevel abstraction. (#75645 ) Note that at the current moment, the newly-introduced `SparseTensorLevel` classes are far from complete, we plan to migrate code generation related to accessing sparse tensor levels to these classes in the near future to simplify `LoopEmitter`.	2023-12-15 13:34:34 -08:00
Rob Suderman	aa165edca8	[mlir][math] Added `math.sinh` with expansions to `math.exp` (#75517 ) Includes end-to-end tests for the cpu running, folders using `libm` and lowerings to the corresponding `libm` operations.	2023-12-15 11:35:40 -08:00
Andrzej Warzyński	f11bda78c8	[mlir][linalg] Use vector.shuffle to flatten conv filter (#75038 ) Updates the vectorisation of 1D depthwise convolution when flattening the channel dimension (introduced in #71918). In particular - how the convolution filter is "flattened". ATM, the vectoriser will use `vector.shape_cast`: ```mlir %b_filter = vector.broadcast %filter : vector<4xf32> to vector<3x2x4xf32> %sc_filter = vector.shape_cast %b_filter : vector<3x2x4xf32> to vector<3x8xf32> ``` This lowering is not ideal - `vector.shape_cast` can be convenient when it's folded away, but that's not happening in this case. Instead, this patch updates the vectoriser to use `vector.shuffle` (the overall result is identical): ```mlir %sh_filter = vector.shuffle %filter, %filter [0, 1, 2, 3, 0, 1, 2, 3] : vector<4xf32>, vector<4xf32> %b_filter = vector.broadcast %sh_filter : vector<8xf32> to vector<3x8xf32> ```	2023-12-15 17:56:59 +00:00
Peiming Liu	4a72a4ef12	[NFC][mlir][sparse] remove redundant parameter. (#75551 )	2023-12-15 09:29:22 -08:00
Boian Petkantchin	5e29112719	[mlir][mesh] Add verification and canonicalization for some collectives (#74905 ) Add verification and canonicalization for broadcast, gather, recv, reduce, scatter, send and shift. The canonicalizations only remove trivial collectives with empty mesh_axes attrubutes.	2023-12-15 06:41:10 -08:00
Rafael Ubal	214d32ccd2	Support for dynamic dimensions in 'tensor.splat' (#74626 ) This feature had been marked as `TODO` in the `tensor.splat` documentation for a while. This MR includes: - Support for dynamically shaped tensors in the return type of `tensor.splat` with the syntax suggested in the `TODO` comment. - Updated op documentation. - Bufferization support. - Updates in op folders affected by the new feature. - Unit tests for valid/invalid syntax, valid/invalid folding, and lowering through bufferization. - Additional op builders resembling those available in `tensor.empty`.	2023-12-15 13:54:45 +00:00
Quinn Dawkins	fcd54b368e	[mlir][tensor] Fix tensor.concat reifyResultShapes for static result dims (#75558 ) When the concatenated dim is statically sized but the inputs are dynamically sized, reifyResultShapes must return the static shape. Fixes the implementation of the interface for tensor.concat in such cases.	2023-12-15 08:43:58 -05:00
Hsiangkai Wang	f643eec892	[mlir][vector] Add emulation patterns for vector masked load/store (#74834 ) In this patch, it will convert ``` vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru ``` to ``` %ivalue = %pass_thru %m = vector.extract %mask[0] %result0 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1] %combined = vector.insert %v, %ivalue[0] scf.yield %combined } else { scf.yield %ivalue } %m = vector.extract %mask[1] %result1 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1 + 1] %combined = vector.insert %v, %result0[1] scf.yield %combined } else { scf.yield %result0 } ... ``` It will convert ``` vector.maskedstore %base[%idx_0, %idx_1], %mask, %value ``` to ``` %m = vector.extract %mask[0] scf.if %m { %extracted = vector.extract %value[0] memref.store %extracted, %base[%idx_0, %idx_1] } %m = vector.extract %mask[1] scf.if %m { %extracted = vector.extract %value[1] memref.store %extracted, %base[%idx_0, %idx_1 + 1] } ... ```	2023-12-15 11:35:48 +00:00
Felix Schneider	8190369e83	[mlir][tosa] Add verifier for `tosa.transpose` (#75376 ) This patch adds a verifier to `tosa.transpose` which fixes a crash. Related: https://github.com/llvm/llvm-project/pull/74367 Fix https://github.com/llvm/llvm-project/issues/74479	2023-12-15 07:22:32 +01:00
Vivian	bd6a2452ae	[mlir][SCF] Add support for peeling the first iteration out of the loop (#74015 ) There is a use case that we need to peel the first iteration out of the for loop so that the peeled forOp can be canonicalized away and the fillOp can be fused into the inner forall loop. For example, we have nested loops as below ``` linalg.fill ins(...) outs(...) scf.for %arg = %lb to %ub step %step scf.forall ... ``` After the peeling transform, it is expected to be ``` scf.forall ... linalg.fill ins(...) outs(...) scf.for %arg = %(lb + step) to %ub step %step scf.forall ... ``` This patch makes the most use of the existing peeling functions and adds support for peeling the first iteration out of the loop.	2023-12-14 17:03:52 -08:00
Fabian Mora	419c45a325	[mlir][gpu] Fix crash in `gpu-module-to-binary` (#75477 ) This patch fixes the error in issue #75434. The crash was being caused by not checking for a lack of target attributes in a GPU module. It's now considered an error to invoke the pass with a GPU module with no target attributes.	2023-12-14 14:03:10 -05:00

1 2 3 4 5 ...

7547 Commits