clang-p2996

Author	SHA1	Message	Date
Han-Chung Wang	07a534160a	Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785 ) Below is the original commit description. Furthermore, it applies a [fix](`33a26b9ca2`) for CMakeList.txt The issue occurs during a downstream pass which does dialect conversion, where both [`FuncOpConversion`](`cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480)`) and [`SubviewFolder`](`cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187)`) are run together. The original starting IR is: ```mlir module { func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> { %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>> return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>> } } ``` After `FuncOpConversion` runs, the IR looks like: ```mlir "builtin.module"() ({ "llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64): %0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> "func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> () }) : () -> () "func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({ }) : () -> () }) {llvm.data_layout = "", llvm.target_triple = ""} : () -> () ``` The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate unlinked block, which is disconnected from the rest of the IR (so not only is the IR verifier-invalid, it can't even be parsed). This IR is created by signature conversion in the dialect conversion infra. Now `SubviewFolder` is applied, and the utility function here is called on one of these disconnected block arguments, causing a crash. The TestMemRefToLLVMWithTransforms pass is introduced to exercise the bug, and it can be reused by other contributors in the future. Co-authored-by: Rahul Kayaith <rkayaith@gmail.com> --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-06-04 08:32:09 -07:00
Krzysztof Parzyszek	15dff71cac	Revert "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner (#142025 )" This reverts commit `c3746ff322`. This breaks build with BUILD_SHARED_LIBS=ON. ``` /usr/bin/ld: CMakeFiles/MLIRTestMemRefToLLVMWithTransforms.dir/TestMemRefToLLVMWithTransforms.cpp.o: in function `(anonymous namespace)::TestMemRefToLLVMWithTransforms::runOnOperation()': TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x68): undefined reference to `mlir::LowerToLLVMOptions::LowerToLLVMOptions(mlir::MLIRContext)' /usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:[ 88%] Built target CodeGenTests (.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEvmake[2]: Leaving directory '/work2/kparzysz/git/llvm.org/b/x86' +0x80): undefined reference to `mlir::LLVMTypeConverter::LLVMTypeConverter(mlir::MLIRContext, mlir::LowerToLLVMOptions const&, mlir::DataLayoutAnalysis const)' /usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x143): undefined reference to `mlir::populateFuncToLLVMConversionPatterns(mlir::LLVMTypeConverter const&, mlir::RewritePatternSet&, mlir::SymbolTable const)' /usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x174): undefined reference to `mlir::LLVMConversionTarget::LLVMConversionTarget(mlir::MLIRContext&)' ```	2025-06-04 09:19:59 -05:00
Han-Chung Wang	c3746ff322	[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner (#142025 ) The issue occurs during a downstream pass which does dialect conversion, where both [`FuncOpConversion`](`cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480)`) and [`SubviewFolder`](`cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187)`) are run together. The original starting IR is: ```mlir module { func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> { %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>> return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>> } } ``` After `FuncOpConversion` runs, the IR looks like: ```mlir "builtin.module"() ({ "llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64): %0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> "func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> () }) : () -> () "func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({ }) : () -> () }) {llvm.data_layout = "", llvm.target_triple = ""} : () -> () ``` The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate unlinked block, which is disconnected from the rest of the IR (so not only is the IR verifier-invalid, it can't even be parsed). This IR is created by signature conversion in the dialect conversion infra. Now `SubviewFolder` is applied, and the utility function here is called on one of these disconnected block arguments, causing a crash. The TestMemRefToLLVMWithTransforms pass is introduced to exercise the bug, and it can be reused by other contributors in the future. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com> Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>	2025-06-04 06:51:39 -07:00
Krzysztof Parzyszek	57500cd6a0	[utils][TableGen] Clarify usage of ClauseVal, rename to EnumVal (#141761 ) The class "ClauseVal" actually represents a definition of an enumeration value, and in itself it is not bound to any clause. Rename it to EnumVal and add a comment clarifying how it's translated into an actual enum definition in the generated source code. There is no change in functionality.	2025-06-04 08:16:21 -05:00
Srinivasa Ravi	4e4273c940	[MLIR][NVVM] Add dot.accumulate.2way Op (#140518 ) This change adds the `dot.accumulate.2way` Op to the NVVM dialect for 16-bit to 8-bit dot-product accumulate operation. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-dp2a	2025-06-04 13:29:46 +05:30
Ian Wood	f5a2f00da9	Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshapes" (#142639 ) Reverts llvm/llvm-project#137963 --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>	2025-06-03 14:10:41 -07:00
Tai Ly	04b63ac1ab	[tosa] Change VariableOp to align with spec (#142240 ) This fixes Tosa VariableOp to align with spec 1.0 - add var_shape attribute to store shape of variable type - change type attribute to store element type of variable type - add a builder so previous construction calls still work - fix up level check of rank to be on variable type instead of initial value which is optional - add level check of size for variable type - add lit tests for variable op's without initial values - add lit test for variable op with fixed rank but unknown dimension - add invalid lit test for variable op with unranked type Signed-off-by: Tai Ly <tai.ly@arm.com>	2025-06-03 17:41:33 +01:00
asraa	34d8275e4f	[mlir][tensor] add tensor insert/extract op folders (#142458 ) Adds a few canonicalizers, folders, and rewrite patterns to tensor ops: * tensor.insert folder: insert into a constant is replaced with a new constant * tensor.extract folder: extract from a parent tensor that was inserted at the same indices is folded into the inserted value * rewrite pattern added that replaces an extract of a collapse shape with an extract of the source tensor (requires static source dimensions) Signed-off-by: Asra Ali <asraa@google.com>	2025-06-03 09:16:03 -07:00
Artem Gindinson	cb4a407e5c	[mlir][tensor] Loosen restrictions on folding dynamic reshapes (#137963 ) The main idea behind the change is to allow expand-of-collapse folds for reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the expand op must have a coherent index/affine expression specified in its `output_shape` argument (see example below), and if it doesn't, the IR has already been invalidated at an earlier stage: ``` %c32 = arith.constant 32 : index %div = arith.divsi %<some_index>, %c32 : index %collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]] : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32> %affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div] %expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine] : tensor<9x?x?xf32> into tensor<9x?x32x?xf32> ``` On the above assumption, adjust the routine in `getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of issues: 1. n>2 consecutive dynamic dimensions in the source shape cannot be collapsed together into 1<k<n neighboring dynamic dimensions in the target shape, since there'd be more than one suitable reassociation (example: `?x?x10x? into ?x?`) 2. When figuring out static subshape reassociations based on products, there are cases where a static dimension is collapsed with a dynamic one, and should therefore be skipped when comparing products of source & target dimensions (e.g. `?x2x3x4 into ?x12`) To address 1, we should detect such sequences in the target shape before assigning multiple dynamic dimensions into the same index set. For 2, we take note that a static target dimension was preceded by a dynamic one and allow an "offset" subshape of source static dimensions, as long as there's an exact sequence for the target size later in the source shape. This PR aims to address all reshapes that can be determined based purely on shapes (and original reassociation maps, as done in `ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't seem possible to fold all qualifying dynamic shape patterns in a deterministic way without looking into affine expressions simultaneously. That would be difficult to maintain in a single general utility, so a path forward would be to provide dialect-specific implementations for Linalg/Tensor. Signed-off-by: Artem Gindinson <gindinson@roofline.ai> --------- Signed-off-by: Artem Gindinson <gindinson@roofline.ai> Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>	2025-06-03 09:09:01 -07:00
Md Abdullah Shahneous Bari	dc297cbc9a	[mlir][memref][spirv] Add conversion for memref.extract_aligned_pointer_as_index to SPIR-V (#86750 ) Converts memref.extract_aligned_pointer_as_index to spirv.ConvertPtrToU. Index conversion is done based on 'use-64bit-index' option.	2025-06-03 09:39:14 -05:00
Momchil Velikov	878badc44d	[MLIR][AArch64] Add an extra test for Neon I8MM (NFC) (#135777 )	2025-06-03 12:12:57 +01:00
Momchil Velikov	be9334a68e	[MLIR] Add `apply_patterns.arm_neon.vector_contract_to_i8mm` TD Op (#140251 ) This patch wraps `populateLowerContractionToSMMLAPatternPatterns` into a new TD Op `apply_patterns.arm_neon.vector_contract_to_i8mm` . It also removes the "test-lower-to-arm-neon" pass.	2025-06-03 10:21:13 +01:00
Han-Chung Wang	58ea53863b	[mlir][memref] Add a folder for chained AssumeAlignmentOp ops. (#142425 ) The chained ops can be folded away when they have the same alignment. Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-06-02 21:09:42 -07:00
Chao Chen	9e2684e4cf	[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#142477 ) Bring back https://github.com/llvm/llvm-project/pull/140163 with fixes	2025-06-02 21:39:30 -05:00
Chao Chen	b88dfb0b23	Revert "[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N]" (#142459 ) Reverts llvm/llvm-project#140163	2025-06-02 15:47:21 -04:00
Ian Wood	c005df3c7e	[mlir][linalg] Fix EraseIdentityLinalgOp on fill-like ops (#130000 ) Adds a check to make sure that the linalg op is safe to erase by ensuring that the `linalg.yield` is yielding one of the linalg op's block args. This check already exists for linalg ops with pure tensor semantics. Closes https://github.com/llvm/llvm-project/issues/129414 --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>	2025-06-02 12:18:57 -07:00
Chao Chen	0210750d5a	[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#140163 ) This PR introduces the initial implementation of a blocking pass for XeGPU programs. The pass leverages unroll patterns from both the XeGPU and Vector dialects. --------- Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>	2025-06-02 14:02:45 -05:00
James Newling	543446a353	[mli][vector] canonicalize vector.from_elements from ascending extracts (#139819 ) Example: ```mlir %0 = vector.extract %source[0, 0] : i8 from vector<1x2xi8> %1 = vector.extract %source[0, 1] : i8 from vector<1x2xi8> %2 = vector.from_elements %0, %1 : vector<2xi8> ``` becomes ```mlir %2 = vector.shape_cast %source : vector<1x2xi8> to vector<2xi8> ``` It was decided that we should spill canonicalization tests into new files (see [discussion](https://github.com/llvm/llvm-project/pull/135096#pullrequestreview-2760245596)) In view of this I added the new tests to a new file specifically for canonicalization of from_elements. To be consistent in the location of the tests, I moved existing tests `extract_scalar_from_from_element`, `extract_1d_from_from_elements`, `extract_2d_from_from_elements` and `from_elements_to_splat` from `canonicalize.mlir` to `canonicalze/vector-from-elements.mlir`. In addition to moving I changed the LIT variables to all be upper-case for consistency.	2025-06-02 11:15:25 -07:00
Han-Chung Wang	77e2e3f641	[mlir][memref] Update tests to use memref.assume_alignment properly. (#142358 ) With `ffb9bbfd07`, memref.assume_alignment op returns a result value. The revision updates the tests to reflect the change: - Update all the lit tests to use the result of memref.assume_alignment, if it is present. - Capture the result of the op in lit tests. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com>	2025-06-02 07:57:36 -07:00
Artem Gindinson	af6e3c045b	[mlir][math] Fix intrinsic conversions to LLVM for 0D-vector types (#141020 ) `vector<t>` types are not compatible with the LLVM type system – with the current approach employed within `LLVMTypeConverter`, they must be explicitly converted into `vector<1xt>` when lowering. Employ this rule within the conversion patterns for intrinsics that are handled directly within `MathToLLVM`: `math.ctlz` `.cttz`, `.absi`, `.expm1`, `.log1p`, `.rsqrt`, `.isnan`, `.isfinite`. This change does not cover/test patterns that are based off `VectorConvertToLLVMPattern` template from `LLVMCommon/VectorPattern.h`. --------- Signed-off-by: Artem Gindinson <gindinson@roofline.ai>	2025-06-02 12:27:44 +01:00
TatWai Chong	adf9fedd47	[mlir][tosa] Add assembly format validation for COND_IF op (#142254 ) COND_IF's simplified form - where redundant operand notations are omitted - is not conformant to the specification. According to the specification, all operands passed into an operation must be explicitly declared at each operation's structure. Add optional check to verify if the given form complies with the specification.	2025-06-02 10:47:49 +01:00
Jacques Pienaar	e49738b3ac	[mlir][lsp] Enable registering dialects based on URI. (#141331 ) Previously the dialects registered were fixed per LSP binary. This works as long as all the dialects of interest from the different projects across which one uses the LSP, are disjoint. This expands this to support cases where there are dialects that overlap in dialect name but usage of these are separate wrt projects. The alternative is multiple binaries and switching LSP used in editor per project (there is some extra complexity in hosted instances). This handles a simple (I believe common case) where one can determine based on path and have single binary - the cost of dynamically doing so based on path would be either keeping different registries to return or repopulating dialect & extension maps.	2025-06-01 23:55:32 -07:00
Longsheng Mou	26b81c4300	[mlir][memref] Add terminator check to prevent a crash (#141972 ) This PR adds terminator check to prevent a crash when invoke `lastNonTerminatorInRegion`. Fixes #137333.	2025-05-31 13:25:42 +08:00
Krzysztof Drewniak	66a357f2a4	[mlir] Unique property constraints where possible (#140849 ) Now that `Property` is a `PropConstraint`, hook it up to the same constraint-uniquing machinery that other types of constraints use. This will primarily save on code size for types, like enums, that have inherent constraints which are shared across many operations.	2025-05-30 16:21:50 -05:00
Cameron McInally	ce9cef79ea	[flang] Add support for -mprefer-vector-width=<value> (#142073 ) This patch adds support for the -mprefer-vector-width= command line option. The parsing of this options is equivalent to Clang's and it is implemented by setting the "prefer-vector-width" function attribute. Co-authored-by: Cameron McInally <cmcinally@nvidia.com>	2025-05-30 07:50:18 -06:00
Andrzej Warzyński	85f791d9cd	[mlir][linalg][nfc] Move vectorization tests (#141656 ) Moves all the remaining Linalg vectorization tests from: * `mlir/tests/Dialect/Linalg/` to: `mlir/tests/Dialect/Linalg/vectorization/` To maintain consistency within tests, `vectorize-convolution.mlir` was updated to use: `transform.structured.vectorize_children_and_apply_patterns` instead of: * `-test-linalg-transform-patterns=test-linalg-to-vector-patterns` This change required minor updates to some `CHECK` lines, reflecting only reordering of ops due to an additional pattern being applied. Closes #141025	2025-05-30 09:21:19 +01:00
Michael Maitland	7454098a9e	[mlir][Value] Add getNumUses, hasNUses, and hasNUsesOrMore to Value (#142084 ) We already have hasOneUse. Like llvm::Value we provide helper methods to query the number of uses of a Value. Add unittests for Value, because that was missing. --------- Co-authored-by: Michael Maitland <michaelmaitland@meta.com>	2025-05-30 00:39:45 -04:00
Luke Hutton	0105f657e2	[mlir][tosa] Fix mul op verifier when input types don't match result (#141617 ) This commit fixes a crash when operand types are not integer, but the result is. While this isn't valid, the verifier should not crash.	2025-05-29 09:27:40 +01:00
Luke Hutton	76051980ea	[mlir][tosa] Allow unranked input/output tensors in resize ops (#141608 ) This commit allows the input/output of the resize op to be unranked to account for shapes being computed during shape inference.	2025-05-29 09:27:24 +01:00
Srinivasa Ravi	aca088d802	[MLIR][NVVM] Update dot.accumulate.4way NVVM Op (#141223 ) This change refactors and updates the `dot.accumulate.4way` NVVM Op to be more descriptive and readable.	2025-05-29 10:51:11 +05:30
Muzammil	893ef7ffbd	[mlir][GPU] Fixes subgroup reduce lowering (#141825 ) Fixes the final reduction steps which were taken from an implementation of scan, not reduction, causing lanes earlier in the wave to have incorrect results due to masking. Now aligning more closely with triton implementation : https://github.com/triton-lang/triton/pull/5019 # Hypothetical example To provide an explanation of the issue with the current implementation, let's take the simple example of attempting to perform a sum over 64 lanes where the initial values are as follows (first lane has value 1, and all other lanes have value 0): ``` [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` When performing a sum reduction over these 64 lanes, in the current implementation we perform 6 dpp instructions which in sequential order do the following: 1) sum over clusters of 2 contiguous lanes 2) sum over clusters of 4 contiguous lanes 3) sum over clusters of 8 contiguous lanes 4) sum over an entire row 5) broadcast the result of last lane in each row to the next row and each lane sums current value with incoming value. 5) broadcast the result of the 32nd lane to last two rows and each lane sums current value with incoming value. After step 4) the result for the example above looks like this: ``` [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` After step 5) the result looks like this: ``` [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` After step 6) the result looks like this: ``` [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] ``` Note that the correct value here is always 1, yet after the `dpp.broadcast` ops some lanes have incorrect values. The reason is that for these incorrect lanes, like lanes 0-15 in step 5, the `dpp.broadcast` op doesn't provide them incoming values from other lanes. Instead these lanes are provided either their own values, or 0 (depending on whether `bound_ctrl` is true or false) as values to sum over, either way these values are stale and these lanes shouldn't be used in general. So what this means: - For a subgroup reduce over 32 lanes (like Step 5), the correct result is stored in lanes 16 to 31 - For a subgroup reduce over 64 lanes (like Step 6), the correct result is stored in lanes 32 to 63. However in the current implementation we do not specifically read the value from one of the correct lanes when returning a final value. In some workloads it seems without this specification, the stale value from the first lane is returned instead. # Actual failing test For a specific example of how the current implementation causes issues, take a look at the IR below which represents an additive reduction over a dynamic dimension. ``` !matA = tensor<1x?xf16> !matB = tensor<1xf16> #map = affine_map<(d0, d1) -> (d0, d1)> #map1 = affine_map<(d0, d1) -> (d0)> func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB { %cst_1 = arith.constant 0.000000e+00 : f16 %c2_i64 = arith.constant 2 : i64 %0 = tensor.empty() : !matB %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB) { ^bb0(%in: f16, %out: f16): %7 = arith.addf %in, %out : f16 linalg.yield %7 : f16 } -> !matB return %4 : !matB } ``` When provided an input of type `tensor<1x2xf16>` and values `{0, 1}` to perform the reduction over, the value returned is consistently 4. By the same analysis done above, this shows that the returned value is coming from one of these stale lanes and needs to be read instead from one of the lanes storing the correct result. Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>	2025-05-28 17:47:22 -05:00
Hsiangkai Wang	8fb09c8d09	[mlir][gpu] Add GPU subgroup MMA extract and insert operations (#139048 ) - Introduced `gpu.subgroup_mma_extract` operation to extract values from `!gpu.mma_matrix` by invocation and indices. - Introduced `gpu.subgroup_mma_insert` operation to insert values into `!gpu.mma_matrix` by invocation and indices. - Updated the conversion patterns to SPIR-V for both extract and insert operations. - Added test cases to validate the new operations in the GPU to SPIR-V conversion. RFC: https://discourse.llvm.org/t/rfc-add-gpu-operations-to-permute-data-in-2-loaded-mma-matrix/86148?u=hsiangkai	2025-05-28 20:40:17 +01:00
Sang Ik Lee	3fa65dee14	[mlir] SYCL runtime wrapper: add memcpy support. (#141647 )	2025-05-28 11:33:15 -07:00
Bruno Cardoso Lopes	86685b95bf	[MLIR][LLVM][DLTI] Handle data layout token 'n32:64' (#141299 )	2025-05-28 11:07:03 -07:00
Kareem Ergawy	a8d8af3bfa	[OpenMP][OMPIRBuilder] Collect users of a value before replacing them in target outlined function (#139064 ) This PR fixes a crash that curently happens given the following input: ```fortran subroutine caller() real :: x integer :: i !$omp target x = i call callee(x,x) !$omp end target endsubroutine caller subroutine callee(x1,x2) real :: x1, x2 endsubroutine callee ``` The crash happens because the following sequence of events is taken by the `OMPIRBuilder`: 1. .... 2. An outlined function for the target region is created. At first the outlined function still refers to the SSA values from the original function of the target region. 3. The builder then iterates over the users of SSA values used in the target region to replace them with the corresponding function arguments of outlined function. 4. If the same instruction references the SSA value more than once (say m), all uses of that SSA value are replaced in the instruction. Deleting all m uses of the value. 5. The next m-1 iterations will still iterate over the same instruction dropping the last m-1 actual users of the value. Hence, we collect all users first before modifying them.	2025-05-28 17:40:34 +02:00
Michele Scuttari	63cb6af782	[MLIR] Add bufferization state to `getBufferType` and `resolveConflicts` interface methods (#141466 ) The PR continues the work started in #141019 by adding the `BufferizationState` class also to the `getBufferType` and `resolveConflicts` interface methods, together with the additional support functions that are used throughout the bufferization infrastructure.	2025-05-28 10:35:23 +02:00
Durgadoss R	a615975bd9	[MLIR][NVVM] Add Op to create tcgen05-mma smem descriptor (#141651 ) This patch adds an Op to create the shared-memory descriptor for Tcgen05 MMA. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-05-28 13:42:58 +05:30
Andrzej Warzyński	58f80536d3	[mlir][linalg] Consolidate tests for scalable vectorization (#141469 ) This patch moves scalable vectorization tests into an existing generic vectorization test file: * vectorization-scalable.mlir --> merged into vectorization.mlir Rationale: * Most tests in vectorization-scalable.mlir are variants of existing tests in vectorization.mlir. Keeping them together improves maintainability. * Consolidating tests makes it easier to spot gaps in coverage for regular vectorization. * In the Vector dialect, we don't separate tests for scalable vectors; this change aligns Linalg with that convention. Notable changes beyond moving tests: * Updated one of the two matrix-vector multiplication tests to use `linalg.matvec` instead of `linalg.generic`. CHECK lines remain unchanged. * Simplified the lone `linalg.index` test by removing an unnecessary `tensor.extract`. Also removed canonicalization patterns from the TD sequence for consistency with other tests. This patch contributes to the implementation of #141025 — please refer to that ticket for full context.	2025-05-27 16:39:56 +01:00
Asher Mancinelli	42b1df43e7	[mlir][math] Add missing trig math-to-llvm conversion patterns (#141069 ) asin, acos, atan, and atan2 were being lowered to libm calls instead of llvm intrinsics. Add the conversion patterns to handle these intrinsics and update tests to expect this.	2025-05-27 08:09:48 -07:00
Andrzej Warzyński	e22508ea81	[mlir][vector] Update `CombineContractBroadcastMask` (#140050 ) This patch updates `CombineContractBroadcastMask` to inherit from `MaskableOpRewritePattern`, enabling it to handle masked `vector.contract` operations. The pattern rewrites: ```mlir %a = vector.broadcast %a_bc %res vector.contract %a_bc, %b, ... ``` into: ```mlir // Move the broadcast into vector.contract (by updating the indexing // maps) %res vector.contract %a, %b, ... ``` The main challenge is supporting cases where the pattern drops a leading unit dimension. For example: ```mlir func.func @contract_broadcast_unit_dim_reduction_masked( %arg0 : vector<8x4xi32>, %arg1 : vector<8x4xi32>, %arg2 : vector<8x8xi32>, %mask: vector<1x8x8x4xi1>) -> vector<8x8xi32> { %0 = vector.broadcast %arg0 : vector<8x4xi32> to vector<1x8x4xi32> %1 = vector.broadcast %arg1 : vector<8x4xi32> to vector<1x8x4xi32> %result = vector.mask %mask { vector.contract { indexing_maps = [#map0, #map1, #map2], iterator_types = ["reduction", "parallel", "parallel", "reduction"], kind = #vector.kind<add> } %0, %1, %arg2 : vector<1x8x4xi32>, vector<1x8x4xi32> into vector<8x8xi32> } : vector<1x8x8x4xi1> -> vector<8x8xi32> return %result : vector<8x8xi32> } ``` Here, the leading unit dimension is dropped. To handle this, the mask is cast to the correct shape using a `vector.shape_cast`: ```mlir func.func @contract_broadcast_unit_dim_reduction_masked( %arg0: vector<8x4xi32>, %arg1: vector<8x4xi32>, %arg2: vector<8x8xi32>, %arg3: vector<1x8x8x4xi1>) -> vector<8x8xi32> { %mask_sc = vector.shape_cast %arg3 : vector<1x8x8x4xi1> to vector<8x8x4xi1> %res = vector.mask %mask_sc { vector.contract { indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add> } %arg0, %arg1, %mask_sc : vector<8x4xi32>, vector<8x4xi32> into vector<8x8xi32> } : vector<8x8x4xi1> -> vector<8x8xi32> return %res : vector<8x8xi32> } ``` While this isn't ideal - since it introduces a `vector.shape_cast` that must be cleaned up later - it reflects the best we can do once the input reaches `CombineContractBroadcastMask`. A more robust solution may involve simplifying the input earlier. I am leaving that as a TODO for myself to explore this further. Posting this now to unblock downstream work. LIMITATIONS Currently, this pattern assumes: * Only leading dimensions are dropped in the mask. * All dropped dimensions must be unit-sized.	2025-05-27 13:34:16 +01:00
Srinivasa Ravi	936bf29dda	[MLIR][NVVM] Rename cvt Ops to convert (#140868 ) This patch renames the cvt Ops and related structures to `convert` in the NVVM dialect to be more descriptive.	2025-05-27 11:57:48 +05:30
Vimal	b9d7ef7d5a	Fix handling of integer template argument in emitc.call_opaque (#141451 ) Integer attributes supplied to `emitc.call_opaque` as arguments were treated as index into the operands list. This should be the case only for the normal arguments but not for the template arguments which can't refer to SSA values. This commit updates the handling of template arguments in mlir-to-cpp by removing special handling of integer attributes.	2025-05-27 11:47:29 +08:00
Artem Gindinson	d03f30fb52	[mlir][TOSA] restore unrealized casts when lowering rescale ops (#141096 ) Along with the changes to rescale op attributes, commit `7208649` dropped the builtin casts between signed and signless types. However, explicitly unsigned types are still legal input and output values from the TOSA IR perspective. The change adds back the casts when the unsigned<->signless semantics are explicit in the underlying tensor types. This prevents the conversion routine from trying to generate illegal `arith` casts that are constrained to signless types. Whether the `arith` casts themselves are signed or unsigned should still depend on the rescale's `*_unsigned` attribute values. --------- Signed-off-by: Artem Gindinson <gindinson@roofline.ai>	2025-05-26 12:52:19 +01:00
Durgadoss R	b038dc24f3	[MLIR][NVVM] Add TMA linear prefetch Op (#141211 ) This patch adds an Op for the TMA prefetch (non-tensor) variant. llvm-lit tests are added to verify the lowering to the intrinsics. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-05-26 15:31:12 +05:30
Matthias Springer	a4031db7e0	[mlir] Fix build after #141454 (#141456 )	2025-05-26 14:56:27 +09:00
Matthias Springer	64f040b985	[mlir][linalg] Simplify runtime op verification test case (#141454 ) Simplify one of the test cases to make it easier to understand what is being verified.	2025-05-26 14:43:38 +09:00
Andrzej Warzyński	dca74f794a	[mlir][memref] Revert #140730 (#141406 ) Reverts #140730 - that turned out not to be an NFC as we originally thought. See the attached test for an example. Many thanks to @Garra1980 for reporting! Note, without this change, the newly added test would be incorrectly converted to: ```mlir func.func @view_memref_as_rank0(%arg0: index, %arg1: memref<2xi8>) { %0 = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64)> return } ```	2025-05-25 17:46:46 +01:00
Sang Ik Lee	014f4e95e0	[mlir][SPIR-V] Add lowering for gpu.lane_id op (#90873 ) Add gpu.lane_id op lower for convert-gpu-to-spirv pass	2025-05-25 08:41:34 -07:00
Darren Wihandi	a6828609b1	[mlir][spirv] Add GroupNonUniformVote instructions (#141294 ) Adds three SPIRV instructions under the `GroupNonUniformVote` capability: - OpGroupNonUniformAll - OpGroupNonUniformAny - OpGroupNonUniformAllEqual	2025-05-25 08:41:11 -07:00
Mingzhu Yan	c4cfc95d76	[mlir][SPIRV] Add decorateType method for MatrixType (#112018 ) Fixes #108161 This PR adds a decorateType method for MatrixType, ensuring that `spirv.matrix` with offset in `spirv.struct` can be handled correctly. Signed-off-by: MingZhu Yan <yanmingzhu@iscas.ac.cn>	2025-05-25 08:21:01 -07:00

... 3 4 5 6 7 ...

13421 Commits