clang-p2996

Author	SHA1	Message	Date
Kazu Hirata	6e6ba60336	[mlir] Use llvm::max_element (NFC) (#143396 )	2025-06-09 12:46:23 -07:00
Charitha Saumya	10dc8bc519	[mlir][vector] Fix for WarpOpScfForOp failure when scf.for has results that are unused. (#141853 ) Currently, only the values defined outside ForOp but inside the original WarpOp are considered "escaping values". However this is not true if the ForOp has some unused results. In this case, corresponding IterArgs must also be yielded by the original WarpOp. This PR adds the required code changes to achieve this.	2025-06-09 11:56:34 -07:00
Jeremy Morse	0e4b8b8f81	[DebugInfo][RemoveDIs] Rip out the UseNewDbgInfoFormat flag (#143207 ) Start removing debug intrinsics support -- starting with the flag that controls production of their replacement, debug records. This patch removes the command-line-flag and with it the ability to switch back to intrinsics. The module / function / block level "IsNewDbgInfoFormat" flags get hardcoded to true, I'll to incrementally remove things that depend on those flags.	2025-06-09 19:36:34 +01:00
Umang Yadav	7f08503a3b	Introduce `arith.scaling_extf` and `arith.scaling_truncf` (#141965 ) This PR adds `arith.scaling_truncf` and `arith.scaling_extf` operations which supports the block quantization following OCP MXFP specs listed here https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf OCP MXFP Spec comes with reference implementation here https://github.com/microsoft/microxcaling/tree/main Interesting piece of reference code is this method `_quantize_mx` `7bc41952de/mx/mx_ops.py (L173)`. Both `arith.scaling_truncf` and `arith.scaling_extf` are designed to be an elementwise operation. Please see description about them in `ArithOps.td` file for more details. Internally, `arith.scaling_truncf` does the `arith.truncf(arith.divf(input/(2^scale)))`. `scale` should have necessary broadcast, clamping, normalization and NaN propagation done before callling into `arith.scaling_truncf`. `arith.scaling_extf` does the `arith.mulf(2^scale, input)` after taking care of necessary data type conversions. CC: @krzysz00 @dhernandez0 @bjacob @pashu123 @MaheshRavishankar @tgymnich --------- Co-authored-by: Prashant Kumar <pk5561@gmail.com> Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-06-09 13:13:31 -05:00
Darren Wihandi	a3c7d46145	[mlir][spirv] Implement UMod canonicalization for vector constants (#141902 ) Closes #63174. Implements this transformation pattern, which is currently only applied to scalars, for vectors: ``` %1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32 %2 = "spirv.UMod"(%1, %CONST_4) : (i32, i32) -> i32 ``` to ``` %1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32 %2 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32 ``` Additionally fixes and issue where patterns like this: ``` %1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32 %2 = "spirv.UMod"(%1, %CONST_32) : (i32, i32) -> i32 ``` were incorrectly canonicalized to: ``` %1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32 %2 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32 ``` which is incorrect since `(X % A) % B` == `(X % B)` IFF A is a multiple of B, i.e., B divides A.	2025-06-09 11:09:36 -04:00
Igor Wodiany	cc2d5facec	[mlir][spirv] Make `CooperativeMatrixType` a `ShapedType` (#142784 ) This is to enable `CooperativeMatrixType` to be used with `DenseElementsAttr`, so that a `spirv.Constant` can be easily built from `OpConstantComposite`. For example: ```mlir %cst = spirv.Constant dense<0.000000e+00> : !spirv.coopmatrix<1x1xf32, Subgroup, MatrixAcc> ``` Constraints of arithmetic operations are changed, as `SameOperandsAndResultType` can no longer fully verify CoopMatrices. This is because for shaped types the verifier only checks element type and shapes, whereas for any other arbitrary type it looks for an exact match. This patch does not enable the actual deserialization. This will be done in a subsequent PR.	2025-06-09 16:01:48 +01:00
Kazu Hirata	b3b8a097fe	[mlir] Use *Map::try_emplace (NFC) (#143341 ) - try_emplace(Key) is shorter than insert({Key, nullptr}). - try_emplace performs value initialization without value parameters. - We overwrite values on successful insertion anyway.	2025-06-09 07:18:26 -07:00
Jeremy Kun	b1b84a629d	Pretty print on -dump-pass-pipeline (#143223 ) This PR makes `dump-pass-pipeline` pretty-print the dumped pipeline. For large pipelines the current behavior produces a wall of text that is hard to visually navigate. For the command ```bash mlir-opt --pass-pipeline="builtin.module(flatten-memref, expand-strided-metadata,func.func(arith-expand,func.func(affine-scalrep)))" --dump-pass-pipeline ``` Before: ```bash Pass Manager with 3 passes: builtin.module(flatten-memref,expand-strided-metadata,func.func(arith-expand{include-bf16=false include-f8e8m0=false},func.func(affine-scalrep))) ``` After: ```bash Pass Manager with 3 passes: builtin.module( flatten-memref, expand-strided-metadata, func.func( arith-expand{include-bf16=false include-f8e8m0=false}, func.func( affine-scalrep ) ) ) ``` Another nice feature of this is that the pretty-printed string can still be copy/pasted into `-pass-pipeline` using a quote: ```bash $ bin/mlir-opt --dump-pass-pipeline test.mlir --pass-pipeline=' builtin.module( flatten-memref, expand-strided-metadata, func.func( arith-expand{include-bf16=false include-f8e8m0=false}, func.func( affine-scalrep ) ) )' ``` --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>	2025-06-08 12:23:38 -07:00
Andrzej Warzyński	5dfb7bbaa4	[mlir][linalg] Simplify `createWriteOrMaskedWrite` (NFC) (#141567 ) This patch removes `inputVecSizesForLeadingDims` from the parameter list of `createWriteOrMaskedWrite`. That argument is unnecessary - vector sizes can be obtained from the `vecToStore` parameter. Since this doesn't change behavior or test results, it's marked as NFC. Additional cleanups: * Renamed `vectorToStore` to `vecToStore` for consistency and brevity. * Rewrote a conditional at the end of the function to use early exit, improving readability: ```cpp // BEFORE: if (maskingRequried) { Value maskForWrite = ...; write = maskOperation(write, maskForWrite); } return write; // AFTER if (!maskingRequried) return write; Value maskFroWrite = ...; return vector::maskOperation(builder, write, maskForWrite); ```	2025-06-08 12:36:51 +01:00
Kazu Hirata	1cf1c21b84	[mlir] Strip away lambdas (NFC) (#143280 ) We don't need lambdas here.	2025-06-08 01:34:17 -07:00
Andrzej Warzyński	b4b86a7a3c	[mlir][linalg] Refactor vectorization hooks to improve code reuse (#141244 ) This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * https://github.com/llvm/llvm-project/pull/122927 * https://github.com/llvm/llvm-project/pull/123031 At the time, reuse of the shared `create` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * https://github.com/llvm/llvm-project/issues/142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see #138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address #142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.	2025-06-07 19:25:30 +01:00
Darren Wihandi	c9c60172a1	[mlir][spirv] Implement lowering `gpu.subgroup_reduce` with cluster size for SPIRV (#141402 ) Implement lowering of `gpu.subgroup_reduce` with a cluster size attribute to SPIRV by using the `ClusteredReduce` group operation.	2025-06-06 12:50:18 -04:00
Kazu Hirata	1eb843b1a0	[mlir] Ensure newline at the end of files (NFC) (#143155 )	2025-06-06 09:16:52 -07:00
Rolf Morel	4eeee41f52	[MLIR][Transform] Allow ApplyRegisteredPassOp to take options as a param (#142683 ) Makes it possible to pass around the options to a pass inside a schedule. The refactoring also makes it so that the pass manager and pass are only constructed once per `apply()` of the transform op versus for each target payload given to the op's `apply()`.	2025-06-06 11:19:39 +01:00
Momchil Velikov	b9d3a644c2	[MLIR] Add apply_patterns.arm_sve.vector_contract_to_i8mm TD Op (#140572 )	2025-06-06 10:54:14 +01:00
Tom Eccles	b03081e9fb	[mlir][OpenMP] set correct insert point after creating a barrier (#142997 ) Fixes #138436	2025-06-06 10:43:13 +01:00
Momchil Velikov	44a047c929	[MLIR][ArmSVE] Add initial lowering of vector.contract to SVE `*MMLA` instructions (#135636 )	2025-06-06 09:54:23 +01:00
Michele Scuttari	f849866fc5	[MLIR] Reduce complexity of searching circular function calls in bufferization (#142099 ) The current algorithm searching for circular function calls scales quadratically due to the linear scan of the functions vector that is performed for each element of the vector itself. The PR replaces such algorithm with an O(V + E) version based on the Khan's algorithm for topological sorting, where V is the number of functions and E is the number of function calls.	2025-06-06 10:35:58 +02:00
Michele Scuttari	aaec9e5f5b	[MLIR] Keep cached symbol tables across buffer deallocation insertions (#141956 ) The `DeallocationState` class has been modified to keep a reference to an externally owned `SymbolTableCollection` class, to preserve the cached symbol tables across multiple insertions of deallocation instructions.	2025-06-06 07:22:35 +02:00
Karlo Basioli	cd585864c0	Pass memory buffer to RuntimeDyld::MemoryManager factory (#142930 ) `RTDyldObjectLinkingLayer` is currently creating a memory manager without any parameters. In this PR I am passing the MemoryBuffer that will be emitted to the MemoryManager so that the user can use it to configure the behaviour of the MemoryManager.	2025-06-06 00:44:39 +01:00
Kazu Hirata	85480a4d37	[mlir] Directly call ShapedType::isDynamic without lambdas (NFC) (#142994 ) We do not need lambdas in these places.	2025-06-05 16:14:27 -07:00
asraa	c66b72f8ce	[mlir][tensor] remove tensor.insert constant folding out of canonicalization (#142671 ) Follow ups from https://github.com/llvm/llvm-project/pull/142458/ In particular concerns that indiscriminately folding tensor constants can lead to bloating the IR as these can be arbitrarily large. Signed-off-by: Asra Ali <asraa@google.com>	2025-06-05 14:53:33 -07:00
Chao Chen	def37f7e3a	[mlir][vector] add unroll pattern for broadcast (#142011 ) This PR adds `UnrollBroadcastPattern` to `VectorUnroll` transform. To support this, it also extends `BroadcastOp` definition with `VectorUnrollOpInterface`	2025-06-05 12:42:16 -05:00
Krzysztof Parzyszek	4dcc159485	[utils][TableGen] Implement clause aliases as alternative spellings (#141765 ) Use the spellings in the generated clause parser. The functions `get<lang>ClauseKind` and `get<lang>ClauseName` are not yet updated. The definitions of both clauses and directives now take a list of "Spelling"s instead of a single string. For example ``` def ACCC_Copyin : Clause<[Spelling<"copyin">, Spelling<"present_or_copyin">, Spelling<"pcopyin">]> { ... } ``` A "Spelling" is a versioned string, defaulting to "all versions". For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507	2025-06-05 12:35:30 -05:00
James Newling	7ce315d14a	[mlir][vector] Improve shape_cast lowering (#140800 ) Before this PR, a rank-m -> rank-n vector.shape_cast with m,n>1 was lowered to extracts/inserts of single elements, so that a shape_cast on a vector with N elements would always require N extracts/inserts. While this is necessary in the worst case scenario it is sometimes possible to use fewer, larger extracts/inserts. Specifically, the largest common suffix on the shapes of the source and result can be extracted/inserted. For example: ```mlir %0 = vector.shape_cast %arg0 : vector<10x2x3xf32> to vector<2x5x2x3xf32> ``` has common suffix of shape `2x3`. Before this PR, this would be lowered to 60 extract/insert pairs with extracts of the form `vector.extract %arg0 [a, b, c] : f32 from vector<10x2x3xf32>`. With this PR it is 10 extract/insert pairs with extracts of the form `vector.extract %arg0 [a] : vector<2x3xf32> from vector<10x2x3xf32>`.	2025-06-05 10:18:38 -07:00
Srinivasa Ravi	1bc3845c44	[MLIR][NVVM] Add prefetch Ops (#141737 ) This change adds `prefetch` and `prefetch.uniform` Ops to the NVVM dialect for the `prefetch` and `prefetchu` group of instructions. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu	2025-06-05 20:08:00 +05:30
Tai Ly	c14078318c	[tosa] Add verifier checks for Scatter (#142661 ) This adds verifier checks for the scatter op to make sure the shapes of inputs and output are consistent with respect to spec. Signed-off-by: Tai Ly <tai.ly@arm.com>	2025-06-05 15:23:39 +01:00
Luke Hutton	9d5e1449f7	[mlir][tosa] Fix MulOp verifier handling for unranked operands (#141980 ) The previous verifier checks did not correctly handle unranked operands. For example, it could incorrectly assume the number of `rankedOperandTypes` would be >= 2, which isn't the case when both a and b are unranked. This change simplifies these checks such that they only operate over the intended a and b operands as opposed to the shift operand as well.	2025-06-05 08:54:01 +01:00
Adam Straw	3172c61895	[mlir][gpu] Fix bug with gpu.printf global location (#142872 ) Bug description: Global variables and functions created during gpu.printf conversion to NVVM may contain debug info metadata from function containing the gpu.printf which cannot be used out of that function.	2025-06-05 00:21:44 -06:00
Matthias Springer	e4c8ff94e7	[mlir][tensor] Add runtime verification for `cast`/`dim`/`extract`/`insert`/`extract_slice` (#141332 ) Add `RuntimeVerifiableOpInterface` implementations for the following ops. These were mostly copied from the respective memref implementations. Only the part that deals with offsets and strides was removed. * `tensor.cast`: `memref.cast` * `tensor.dim`: `memref.dim` * `tensor.extract`: `memref.load` * `tensor.insert`: `memref.store` * `tensor.extract_slice`: `memref.subview`	2025-06-05 12:06:47 +09:00
hanhanW	d96447b4d3	Reapply "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785 )" This reverts commit `178b64e75b`. The author misread the report of the failure, and thought that it broke the CI again. Reland the fix.	2025-06-04 09:05:15 -07:00
hanhanW	178b64e75b	Revert "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785 )" This reverts commit `07a534160a`.	2025-06-04 08:59:54 -07:00
Han-Chung Wang	07a534160a	Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785 ) Below is the original commit description. Furthermore, it applies a [fix](`33a26b9ca2`) for CMakeList.txt The issue occurs during a downstream pass which does dialect conversion, where both [`FuncOpConversion`](`cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480)`) and [`SubviewFolder`](`cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187)`) are run together. The original starting IR is: ```mlir module { func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> { %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>> return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>> } } ``` After `FuncOpConversion` runs, the IR looks like: ```mlir "builtin.module"() ({ "llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64): %0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> "func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> () }) : () -> () "func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({ }) : () -> () }) {llvm.data_layout = "", llvm.target_triple = ""} : () -> () ``` The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate unlinked block, which is disconnected from the rest of the IR (so not only is the IR verifier-invalid, it can't even be parsed). This IR is created by signature conversion in the dialect conversion infra. Now `SubviewFolder` is applied, and the utility function here is called on one of these disconnected block arguments, causing a crash. The TestMemRefToLLVMWithTransforms pass is introduced to exercise the bug, and it can be reused by other contributors in the future. Co-authored-by: Rahul Kayaith <rkayaith@gmail.com> --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-06-04 08:32:09 -07:00
Krzysztof Parzyszek	15dff71cac	Revert "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner (#142025 )" This reverts commit `c3746ff322`. This breaks build with BUILD_SHARED_LIBS=ON. ``` /usr/bin/ld: CMakeFiles/MLIRTestMemRefToLLVMWithTransforms.dir/TestMemRefToLLVMWithTransforms.cpp.o: in function `(anonymous namespace)::TestMemRefToLLVMWithTransforms::runOnOperation()': TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x68): undefined reference to `mlir::LowerToLLVMOptions::LowerToLLVMOptions(mlir::MLIRContext)' /usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:[ 88%] Built target CodeGenTests (.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEvmake[2]: Leaving directory '/work2/kparzysz/git/llvm.org/b/x86' +0x80): undefined reference to `mlir::LLVMTypeConverter::LLVMTypeConverter(mlir::MLIRContext, mlir::LowerToLLVMOptions const&, mlir::DataLayoutAnalysis const)' /usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x143): undefined reference to `mlir::populateFuncToLLVMConversionPatterns(mlir::LLVMTypeConverter const&, mlir::RewritePatternSet&, mlir::SymbolTable const)' /usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x174): undefined reference to `mlir::LLVMConversionTarget::LLVMConversionTarget(mlir::MLIRContext&)' ```	2025-06-04 09:19:59 -05:00
Han-Chung Wang	c3746ff322	[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner (#142025 ) The issue occurs during a downstream pass which does dialect conversion, where both [`FuncOpConversion`](`cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480)`) and [`SubviewFolder`](`cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187)`) are run together. The original starting IR is: ```mlir module { func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> { %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>> return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>> } } ``` After `FuncOpConversion` runs, the IR looks like: ```mlir "builtin.module"() ({ "llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64): %0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> "func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> () }) : () -> () "func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({ }) : () -> () }) {llvm.data_layout = "", llvm.target_triple = ""} : () -> () ``` The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate unlinked block, which is disconnected from the rest of the IR (so not only is the IR verifier-invalid, it can't even be parsed). This IR is created by signature conversion in the dialect conversion infra. Now `SubviewFolder` is applied, and the utility function here is called on one of these disconnected block arguments, causing a crash. The TestMemRefToLLVMWithTransforms pass is introduced to exercise the bug, and it can be reused by other contributors in the future. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com> Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>	2025-06-04 06:51:39 -07:00
Krzysztof Parzyszek	57500cd6a0	[utils][TableGen] Clarify usage of ClauseVal, rename to EnumVal (#141761 ) The class "ClauseVal" actually represents a definition of an enumeration value, and in itself it is not bound to any clause. Rename it to EnumVal and add a comment clarifying how it's translated into an actual enum definition in the generated source code. There is no change in functionality.	2025-06-04 08:16:21 -05:00
Igor Wodiany	3ce3281989	[mlir][spirv] Check output of getConstantInt (#140568 ) This patch adds an assert to check if the result of `getConstantInt` is non-null. Previously the code failed with Segmentation Fault if `getConstantInt` failed to look up the value. This primarily occurrs when the value is defined as OpSpecConstant rather than OpConstant.	2025-06-04 13:15:28 +01:00
Vadim Curcă	5a531b1158	[mlir] NFC: Add data flow analysis extension points (#142549 ) This commit introduces `visitCallOperation` and `visitCallableOperation` extension points in the sparse data flow analysis framework. This allows, for example, to make the analysis less conservative, without a lot of code duplication, propagating information even if not all the call or return sites are known.	2025-06-04 14:15:05 +02:00
Srinivasa Ravi	4e4273c940	[MLIR][NVVM] Add dot.accumulate.2way Op (#140518 ) This change adds the `dot.accumulate.2way` Op to the NVVM dialect for 16-bit to 8-bit dot-product accumulate operation. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-dp2a	2025-06-04 13:29:46 +05:30
Aviad Cohen	4c6449044a	[mlir]: Added properties/attributes ignore flags to OperationEquivalence (#142623 ) Those flags are useful for cases and operation which we may consider equivalent even when their attributes/properties are not the same.	2025-06-04 10:01:20 +03:00
Ian Wood	f5a2f00da9	Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshapes" (#142639 ) Reverts llvm/llvm-project#137963 --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>	2025-06-03 14:10:41 -07:00
Kazu Hirata	95ce58bc4a	[mlir] Fix a warning This patch fixes: mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:1680:37: error: comparison of integers of different signs: 'int' and 'uint64_t' (aka 'unsigned long') [-Werror,-Wsign-compare]	2025-06-03 10:11:51 -07:00
Tai Ly	04b63ac1ab	[tosa] Change VariableOp to align with spec (#142240 ) This fixes Tosa VariableOp to align with spec 1.0 - add var_shape attribute to store shape of variable type - change type attribute to store element type of variable type - add a builder so previous construction calls still work - fix up level check of rank to be on variable type instead of initial value which is optional - add level check of size for variable type - add lit tests for variable op's without initial values - add lit test for variable op with fixed rank but unknown dimension - add invalid lit test for variable op with unranked type Signed-off-by: Tai Ly <tai.ly@arm.com>	2025-06-03 17:41:33 +01:00
asraa	34d8275e4f	[mlir][tensor] add tensor insert/extract op folders (#142458 ) Adds a few canonicalizers, folders, and rewrite patterns to tensor ops: * tensor.insert folder: insert into a constant is replaced with a new constant * tensor.extract folder: extract from a parent tensor that was inserted at the same indices is folded into the inserted value * rewrite pattern added that replaces an extract of a collapse shape with an extract of the source tensor (requires static source dimensions) Signed-off-by: Asra Ali <asraa@google.com>	2025-06-03 09:16:03 -07:00
Artem Gindinson	cb4a407e5c	[mlir][tensor] Loosen restrictions on folding dynamic reshapes (#137963 ) The main idea behind the change is to allow expand-of-collapse folds for reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the expand op must have a coherent index/affine expression specified in its `output_shape` argument (see example below), and if it doesn't, the IR has already been invalidated at an earlier stage: ``` %c32 = arith.constant 32 : index %div = arith.divsi %<some_index>, %c32 : index %collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]] : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32> %affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div] %expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine] : tensor<9x?x?xf32> into tensor<9x?x32x?xf32> ``` On the above assumption, adjust the routine in `getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of issues: 1. n>2 consecutive dynamic dimensions in the source shape cannot be collapsed together into 1<k<n neighboring dynamic dimensions in the target shape, since there'd be more than one suitable reassociation (example: `?x?x10x? into ?x?`) 2. When figuring out static subshape reassociations based on products, there are cases where a static dimension is collapsed with a dynamic one, and should therefore be skipped when comparing products of source & target dimensions (e.g. `?x2x3x4 into ?x12`) To address 1, we should detect such sequences in the target shape before assigning multiple dynamic dimensions into the same index set. For 2, we take note that a static target dimension was preceded by a dynamic one and allow an "offset" subshape of source static dimensions, as long as there's an exact sequence for the target size later in the source shape. This PR aims to address all reshapes that can be determined based purely on shapes (and original reassociation maps, as done in `ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't seem possible to fold all qualifying dynamic shape patterns in a deterministic way without looking into affine expressions simultaneously. That would be difficult to maintain in a single general utility, so a path forward would be to provide dialect-specific implementations for Linalg/Tensor. Signed-off-by: Artem Gindinson <gindinson@roofline.ai> --------- Signed-off-by: Artem Gindinson <gindinson@roofline.ai> Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>	2025-06-03 09:09:01 -07:00
Igor Wodiany	7797824297	[mlir][spirv] Allow disabling control flow structurization (#140561 ) Currently some control flow patterns cannot be structurized into existing SPIR-V MLIR constructs, e.g., conditional early exits (break). Since the support for early exit cannot be currently added (https://github.com/llvm/llvm-project/pull/138688#pullrequestreview-2830791677) this patch enables structurizer to be disabled to keep the control flow unstructurized. By default, the control flow is structurized.	2025-06-03 15:41:39 +01:00
Md Abdullah Shahneous Bari	dc297cbc9a	[mlir][memref][spirv] Add conversion for memref.extract_aligned_pointer_as_index to SPIR-V (#86750 ) Converts memref.extract_aligned_pointer_as_index to spirv.ConvertPtrToU. Index conversion is done based on 'use-64bit-index' option.	2025-06-03 09:39:14 -05:00
Momchil Velikov	878badc44d	[MLIR][AArch64] Add an extra test for Neon I8MM (NFC) (#135777 )	2025-06-03 12:12:57 +01:00
Michele Scuttari	9289604cf6	[MLIR] Use cached symbol tables in `getFuncOpsOrderedByCalls` (#141967 ) Address TODO regarding the recomputation of symbol tables. The signature of the `getFuncOpsOrderedByCalls` function is modified to receive the collection of cached symbol tables.	2025-06-03 11:29:02 +02:00
Momchil Velikov	be9334a68e	[MLIR] Add `apply_patterns.arm_neon.vector_contract_to_i8mm` TD Op (#140251 ) This patch wraps `populateLowerContractionToSMMLAPatternPatterns` into a new TD Op `apply_patterns.arm_neon.vector_contract_to_i8mm` . It also removes the "test-lower-to-arm-neon" pass.	2025-06-03 10:21:13 +01:00

1 2 3 4 5 ...

23076 Commits