Commit Graph

23076 Commits

Author SHA1 Message Date
Kazu Hirata
6e6ba60336 [mlir] Use llvm::max_element (NFC) (#143396) 2025-06-09 12:46:23 -07:00
Charitha Saumya
10dc8bc519 [mlir][vector] Fix for WarpOpScfForOp failure when scf.for has results that are unused. (#141853)
Currently, only the values defined outside ForOp but inside the original
WarpOp are considered "escaping values". However this is not true if the
ForOp has some unused results. In this case, corresponding IterArgs must
also be yielded by the original WarpOp. This PR adds the required code
changes to achieve this.
2025-06-09 11:56:34 -07:00
Jeremy Morse
0e4b8b8f81 [DebugInfo][RemoveDIs] Rip out the UseNewDbgInfoFormat flag (#143207)
Start removing debug intrinsics support -- starting with the flag that
controls production of their replacement, debug records. This patch
removes the command-line-flag and with it the ability to switch back to
intrinsics. The module / function / block level "IsNewDbgInfoFormat"
flags get hardcoded to true, I'll to incrementally remove things that
depend on those flags.
2025-06-09 19:36:34 +01:00
Umang Yadav
7f08503a3b Introduce arith.scaling_extf and arith.scaling_truncf (#141965)
This PR adds `arith.scaling_truncf` and `arith.scaling_extf` operations
which supports the block quantization following OCP MXFP specs listed
here
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

OCP MXFP Spec comes with reference implementation here
https://github.com/microsoft/microxcaling/tree/main

Interesting piece of reference code is this method `_quantize_mx`
7bc41952de/mx/mx_ops.py (L173).

Both `arith.scaling_truncf` and `arith.scaling_extf` are designed to be
an elementwise operation. Please see description about them in
`ArithOps.td` file for more details.
 
Internally, 

`arith.scaling_truncf` does the
`arith.truncf(arith.divf(input/(2^scale)))`. `scale` should have
necessary broadcast, clamping, normalization and NaN propagation done
before callling into `arith.scaling_truncf`.

`arith.scaling_extf` does the `arith.mulf(2^scale, input)` after taking
care of necessary data type conversions.


CC: @krzysz00 @dhernandez0 @bjacob @pashu123 @MaheshRavishankar
@tgymnich

---------

Co-authored-by: Prashant Kumar <pk5561@gmail.com>
Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
2025-06-09 13:13:31 -05:00
Darren Wihandi
a3c7d46145 [mlir][spirv] Implement UMod canonicalization for vector constants (#141902)
Closes #63174. 

Implements this transformation pattern, which is currently only applied
to scalars, for vectors:
```
%1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32
%2 = "spirv.UMod"(%1, %CONST_4) : (i32, i32) -> i32
```
to
```
%1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32
%2 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32
```

Additionally fixes and issue where patterns like this:
```
%1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32
%2 = "spirv.UMod"(%1, %CONST_32) : (i32, i32) -> i32
```
were incorrectly canonicalized to:
```
%1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32
%2 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32
```
which is incorrect since `(X % A) % B` == `(X % B)` IFF A is a multiple
of B, i.e., B divides A.
2025-06-09 11:09:36 -04:00
Igor Wodiany
cc2d5facec [mlir][spirv] Make CooperativeMatrixType a ShapedType (#142784)
This is to enable `CooperativeMatrixType` to be used with
`DenseElementsAttr`, so that a `spirv.Constant` can be easily built from
`OpConstantComposite`. For example:

```mlir
%cst = spirv.Constant dense<0.000000e+00> : !spirv.coopmatrix<1x1xf32, Subgroup, MatrixAcc>
```

Constraints of arithmetic operations are changed, as
`SameOperandsAndResultType` can no longer fully verify CoopMatrices.
This is because for shaped types the verifier only checks element type
and shapes, whereas for any other arbitrary type it looks for an exact
match.

This patch does not enable the actual deserialization. This will be
done in a subsequent PR.
2025-06-09 16:01:48 +01:00
Kazu Hirata
b3b8a097fe [mlir] Use *Map::try_emplace (NFC) (#143341)
- try_emplace(Key) is shorter than insert({Key, nullptr}).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
2025-06-09 07:18:26 -07:00
Jeremy Kun
b1b84a629d Pretty print on -dump-pass-pipeline (#143223)
This PR makes `dump-pass-pipeline` pretty-print the dumped pipeline. For
large pipelines the current behavior produces a wall of text that is
hard to visually navigate.

For the command

```bash
mlir-opt --pass-pipeline="builtin.module(flatten-memref, expand-strided-metadata,func.func(arith-expand,func.func(affine-scalrep)))" --dump-pass-pipeline
```

Before:

```bash
Pass Manager with 3 passes:
builtin.module(flatten-memref,expand-strided-metadata,func.func(arith-expand{include-bf16=false include-f8e8m0=false},func.func(affine-scalrep)))
```

After:

```bash
Pass Manager with 3 passes:
builtin.module(
  flatten-memref,
  expand-strided-metadata,
  func.func(
    arith-expand{include-bf16=false include-f8e8m0=false},
    func.func(
      affine-scalrep
    )
  )
)
```

Another nice feature of this is that the pretty-printed string can still
be copy/pasted into `-pass-pipeline` using a quote:

```bash
$ bin/mlir-opt --dump-pass-pipeline test.mlir --pass-pipeline='
builtin.module(
  flatten-memref,
  expand-strided-metadata,
  func.func(
    arith-expand{include-bf16=false include-f8e8m0=false},
    func.func(
      affine-scalrep
    )
  )
)'
```

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
2025-06-08 12:23:38 -07:00
Andrzej Warzyński
5dfb7bbaa4 [mlir][linalg] Simplify createWriteOrMaskedWrite (NFC) (#141567)
This patch removes `inputVecSizesForLeadingDims` from the parameter list
of `createWriteOrMaskedWrite`. That argument is unnecessary - vector
sizes can be obtained from the `vecToStore` parameter. Since this doesn't
change behavior or test results, it's marked as NFC.

Additional cleanups:
  * Renamed `vectorToStore` to `vecToStore` for consistency and brevity.
  * Rewrote a conditional at the end of the function to use early exit,
    improving readability:

```cpp
  // BEFORE:
  if (maskingRequried) {
    Value maskForWrite = ...;
    write = maskOperation(write, maskForWrite);
  }
  return write;

  // AFTER
  if (!maskingRequried)
    return write;

  Value maskFroWrite = ...;
  return vector::maskOperation(builder, write, maskForWrite);
```
2025-06-08 12:36:51 +01:00
Kazu Hirata
1cf1c21b84 [mlir] Strip away lambdas (NFC) (#143280)
We don't need lambdas here.
2025-06-08 01:34:17 -07:00
Andrzej Warzyński
b4b86a7a3c [mlir][linalg] Refactor vectorization hooks to improve code reuse (#141244)
This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * https://github.com/llvm/llvm-project/pull/122927
  * https://github.com/llvm/llvm-project/pull/123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  `in_bounds`. For example, the size of the `in_bounds` attr now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

VECTORIZATION BEHAVIOUR
-----------------------

This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
  available.
* The vectorizer continues to infer vector sizes where needed.

Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* https://github.com/llvm/llvm-project/issues/142107

This will be addressed separately.

TEST CHANGES
-----------
Only affects vectorization of:

* `tensor.insert_slice` (now refactored to use shared hooks)

Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see #138265).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
     updated test in transform-vector.mlir for an example that would
     benefit from this.
  * Address #142107

(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
2025-06-07 19:25:30 +01:00
Darren Wihandi
c9c60172a1 [mlir][spirv] Implement lowering gpu.subgroup_reduce with cluster size for SPIRV (#141402)
Implement lowering of `gpu.subgroup_reduce` with a cluster size
attribute to SPIRV by using the `ClusteredReduce` group operation.
2025-06-06 12:50:18 -04:00
Kazu Hirata
1eb843b1a0 [mlir] Ensure newline at the end of files (NFC) (#143155) 2025-06-06 09:16:52 -07:00
Rolf Morel
4eeee41f52 [MLIR][Transform] Allow ApplyRegisteredPassOp to take options as a param (#142683)
Makes it possible to pass around the options to a pass inside a schedule.

The refactoring also makes it so that the pass manager and pass are only
constructed once per `apply()` of the transform op versus for each target
payload given to the op's `apply()`.
2025-06-06 11:19:39 +01:00
Momchil Velikov
b9d3a644c2 [MLIR] Add apply_patterns.arm_sve.vector_contract_to_i8mm TD Op (#140572) 2025-06-06 10:54:14 +01:00
Tom Eccles
b03081e9fb [mlir][OpenMP] set correct insert point after creating a barrier (#142997)
Fixes #138436
2025-06-06 10:43:13 +01:00
Momchil Velikov
44a047c929 [MLIR][ArmSVE] Add initial lowering of vector.contract to SVE *MMLA instructions (#135636) 2025-06-06 09:54:23 +01:00
Michele Scuttari
f849866fc5 [MLIR] Reduce complexity of searching circular function calls in bufferization (#142099)
The current algorithm searching for circular function calls scales quadratically due to the linear scan of the functions vector that is performed for each element of the vector itself. The PR replaces such algorithm with an O(V + E) version based on the Khan's algorithm for topological sorting, where V is the number of functions and E is the number of function calls.
2025-06-06 10:35:58 +02:00
Michele Scuttari
aaec9e5f5b [MLIR] Keep cached symbol tables across buffer deallocation insertions (#141956)
The `DeallocationState` class has been modified to keep a reference to an externally owned `SymbolTableCollection` class, to preserve the cached symbol tables across multiple insertions of deallocation instructions.
2025-06-06 07:22:35 +02:00
Karlo Basioli
cd585864c0 Pass memory buffer to RuntimeDyld::MemoryManager factory (#142930)
`RTDyldObjectLinkingLayer` is currently creating a memory manager
without any parameters.

In this PR I am passing the MemoryBuffer that will be emitted to the
MemoryManager so that the user can use it to configure the behaviour of
the MemoryManager.
2025-06-06 00:44:39 +01:00
Kazu Hirata
85480a4d37 [mlir] Directly call ShapedType::isDynamic without lambdas (NFC) (#142994)
We do not need lambdas in these places.
2025-06-05 16:14:27 -07:00
asraa
c66b72f8ce [mlir][tensor] remove tensor.insert constant folding out of canonicalization (#142671)
Follow ups from https://github.com/llvm/llvm-project/pull/142458/
In particular concerns that indiscriminately folding tensor constants
can lead to bloating the IR as these can be arbitrarily large.

Signed-off-by: Asra Ali <asraa@google.com>
2025-06-05 14:53:33 -07:00
Chao Chen
def37f7e3a [mlir][vector] add unroll pattern for broadcast (#142011)
This PR adds `UnrollBroadcastPattern` to `VectorUnroll` transform. 
To support this, it also extends `BroadcastOp` definition with
`VectorUnrollOpInterface`
2025-06-05 12:42:16 -05:00
Krzysztof Parzyszek
4dcc159485 [utils][TableGen] Implement clause aliases as alternative spellings (#141765)
Use the spellings in the generated clause parser. The functions
`get<lang>ClauseKind` and `get<lang>ClauseName` are not yet updated.

The definitions of both clauses and directives now take a list of
"Spelling"s instead of a single string. For example
```
def ACCC_Copyin : Clause<[Spelling<"copyin">,
                          Spelling<"present_or_copyin">,
                          Spelling<"pcopyin">]> { ... }
```

A "Spelling" is a versioned string, defaulting to "all versions".

For background information see

https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
2025-06-05 12:35:30 -05:00
James Newling
7ce315d14a [mlir][vector] Improve shape_cast lowering (#140800)
Before this PR, a rank-m -> rank-n vector.shape_cast with m,n>1 was
lowered to extracts/inserts of single elements, so that a shape_cast on
a vector with N elements would always require N extracts/inserts. While
this is necessary in the worst case scenario it is sometimes possible to
use fewer, larger extracts/inserts. Specifically, the largest common
suffix on the shapes of the source and result can be extracted/inserted.
For example:

```mlir
%0 = vector.shape_cast %arg0 : vector<10x2x3xf32> to vector<2x5x2x3xf32>
```

has common suffix of shape `2x3`. Before this PR, this would be lowered
to 60 extract/insert pairs with extracts of the form
`vector.extract %arg0 [a, b, c] : f32 from vector<10x2x3xf32>`. With
this PR it is 10 extract/insert pairs with extracts of the form
`vector.extract %arg0 [a] : vector<2x3xf32> from vector<10x2x3xf32>`.
2025-06-05 10:18:38 -07:00
Srinivasa Ravi
1bc3845c44 [MLIR][NVVM] Add prefetch Ops (#141737)
This change adds `prefetch` and `prefetch.uniform` Ops to the NVVM
dialect for the `prefetch` and `prefetchu` group of instructions.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu
2025-06-05 20:08:00 +05:30
Tai Ly
c14078318c [tosa] Add verifier checks for Scatter (#142661)
This adds verifier checks for the scatter op
to make sure the shapes of inputs and output
are consistent with respect to spec.

Signed-off-by: Tai Ly <tai.ly@arm.com>
2025-06-05 15:23:39 +01:00
Luke Hutton
9d5e1449f7 [mlir][tosa] Fix MulOp verifier handling for unranked operands (#141980)
The previous verifier checks did not correctly handle unranked operands.
For example, it could incorrectly assume the number of
`rankedOperandTypes` would be >= 2, which isn't the case when both a and
b are unranked.

This change simplifies these checks such that they only operate over the
intended a and b operands as opposed to the shift operand as well.
2025-06-05 08:54:01 +01:00
Adam Straw
3172c61895 [mlir][gpu] Fix bug with gpu.printf global location (#142872)
Bug description: Global variables and functions created during
gpu.printf conversion to NVVM may contain debug info metadata from
function containing the gpu.printf which cannot be used out of that
function.
2025-06-05 00:21:44 -06:00
Matthias Springer
e4c8ff94e7 [mlir][tensor] Add runtime verification for cast/dim/extract/insert/extract_slice (#141332)
Add `RuntimeVerifiableOpInterface` implementations for the following
ops. These were mostly copied from the respective memref
implementations. Only the part that deals with offsets and strides was
removed.
* `tensor.cast`: `memref.cast`
* `tensor.dim`: `memref.dim`
* `tensor.extract`: `memref.load`
* `tensor.insert`: `memref.store`
* `tensor.extract_slice`: `memref.subview`
2025-06-05 12:06:47 +09:00
hanhanW
d96447b4d3 Reapply "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785)"
This reverts commit 178b64e75b.

The author misread the report of the failure, and thought that it broke
the CI again. Reland the fix.
2025-06-04 09:05:15 -07:00
hanhanW
178b64e75b Revert "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785)"
This reverts commit 07a534160a.
2025-06-04 08:59:54 -07:00
Han-Chung Wang
07a534160a Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785)
Below is the original commit description. Furthermore, it applies a
[fix](33a26b9ca2)
for CMakeList.txt

The issue occurs during a downstream pass which does dialect conversion,
where both
[`FuncOpConversion`](cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480))
and
[`SubviewFolder`](cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187))
are run together. The original starting IR is:
```mlir
module {
  func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> {
    %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>>
    return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>>
  }
}
```


After `FuncOpConversion` runs, the IR looks like:
```mlir
"builtin.module"() ({
  "llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({
  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64):
    %0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>
    "func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> ()
  }) : () -> ()
  "func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({
  }) : () -> ()
}) {llvm.data_layout = "", llvm.target_triple = ""} : () -> ()
```
The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate
unlinked block, which is disconnected from the rest of the IR (so not
only is the IR verifier-invalid, it can't even be parsed). This IR is
created by signature conversion in the dialect conversion infra.

Now `SubviewFolder` is applied, and the utility function here is called
on one of these disconnected block arguments, causing a crash.

The TestMemRefToLLVMWithTransforms pass is introduced to exercise the
bug, and it can be reused by other contributors in the future.

Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-06-04 08:32:09 -07:00
Krzysztof Parzyszek
15dff71cac Revert "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner (#142025)"
This reverts commit c3746ff322.

This breaks build with BUILD_SHARED_LIBS=ON.

```
/usr/bin/ld: CMakeFiles/MLIRTestMemRefToLLVMWithTransforms.dir/TestMemRefToLLVMWithTransforms.cpp.o: in function `(anonymous namespace)::TestMemRefToLLVMWithTransforms::runOnOperation()':
TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x68): undefined reference to `mlir::LowerToLLVMOptions::LowerToLLVMOptions(mlir::MLIRContext*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:[ 88%] Built target CodeGenTests
(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEvmake[2]: Leaving directory '/work2/kparzysz/git/llvm.org/b/x86'
+0x80): undefined reference to `mlir::LLVMTypeConverter::LLVMTypeConverter(mlir::MLIRContext*, mlir::LowerToLLVMOptions const&, mlir::DataLayoutAnalysis const*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x143): undefined reference to `mlir::populateFuncToLLVMConversionPatterns(mlir::LLVMTypeConverter const&, mlir::RewritePatternSet&, mlir::SymbolTable const*)'
/usr/bin/ld: TestMemRefToLLVMWithTransforms.cpp:(.text._ZN12_GLOBAL__N_130TestMemRefToLLVMWithTransforms14runOnOperationEv+0x174): undefined reference to `mlir::LLVMConversionTarget::LLVMConversionTarget(mlir::MLIRContext&)'
```
2025-06-04 09:19:59 -05:00
Han-Chung Wang
c3746ff322 [mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner (#142025)
The issue occurs during a downstream pass which does dialect conversion,
where both
[`FuncOpConversion`](cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480))
and
[`SubviewFolder`](cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187))
are run together. The original starting IR is:
```mlir
module {
  func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> {
    %subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>>
    return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>>
  }
}
```


After `FuncOpConversion` runs, the IR looks like:
```mlir
"builtin.module"() ({
  "llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({
  ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64):
    %0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>
    "func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> ()
  }) : () -> ()
  "func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({
  }) : () -> ()
}) {llvm.data_layout = "", llvm.target_triple = ""} : () -> ()
```
The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate
unlinked block, which is disconnected from the rest of the IR (so not
only is the IR verifier-invalid, it can't even be parsed). This IR is
created by signature conversion in the dialect conversion infra.

Now `SubviewFolder` is applied, and the utility function here is called
on one of these disconnected block arguments, causing a crash.

The TestMemRefToLLVMWithTransforms pass is introduced to exercise the
bug, and it can be reused by other contributors in the future.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>

Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>
2025-06-04 06:51:39 -07:00
Krzysztof Parzyszek
57500cd6a0 [utils][TableGen] Clarify usage of ClauseVal, rename to EnumVal (#141761)
The class "ClauseVal" actually represents a definition of an enumeration
value, and in itself it is not bound to any clause. Rename it to EnumVal
and add a comment clarifying how it's translated into an actual enum
definition in the generated source code.

There is no change in functionality.
2025-06-04 08:16:21 -05:00
Igor Wodiany
3ce3281989 [mlir][spirv] Check output of getConstantInt (#140568)
This patch adds an assert to check if the result of `getConstantInt` is
non-null. Previously the code failed with Segmentation Fault if
`getConstantInt` failed to look up the value. This primarily occurrs when
the value is defined as OpSpecConstant rather than OpConstant.
2025-06-04 13:15:28 +01:00
Vadim Curcă
5a531b1158 [mlir] NFC: Add data flow analysis extension points (#142549)
This commit introduces `visitCallOperation` and `visitCallableOperation`
extension points in the sparse data flow analysis framework. This
allows, for example, to make the analysis less conservative, without a
lot of code duplication, propagating information even if not all the
call or return sites are known.
2025-06-04 14:15:05 +02:00
Srinivasa Ravi
4e4273c940 [MLIR][NVVM] Add dot.accumulate.2way Op (#140518)
This change adds the `dot.accumulate.2way` Op to the NVVM dialect for
16-bit to 8-bit dot-product accumulate operation.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-dp2a
2025-06-04 13:29:46 +05:30
Aviad Cohen
4c6449044a [mlir]: Added properties/attributes ignore flags to OperationEquivalence (#142623)
Those flags are useful for cases and operation which we may consider equivalent even when their attributes/properties are not the same.
2025-06-04 10:01:20 +03:00
Ian Wood
f5a2f00da9 Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshapes" (#142639)
Reverts llvm/llvm-project#137963

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-06-03 14:10:41 -07:00
Kazu Hirata
95ce58bc4a [mlir] Fix a warning
This patch fixes:

  mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:1680:37: error: comparison
  of integers of different signs: 'int' and 'uint64_t' (aka 'unsigned
  long') [-Werror,-Wsign-compare]
2025-06-03 10:11:51 -07:00
Tai Ly
04b63ac1ab [tosa] Change VariableOp to align with spec (#142240)
This fixes Tosa VariableOp to align with spec 1.0
  - add var_shape attribute to store shape of variable type
  - change type attribute to store element type of variable type
  - add a builder so previous construction calls still work
- fix up level check of rank to be on variable type instead of initial
value which is optional
  - add level check of size for variable type
  - add lit tests for variable op's without initial values
  - add lit test for variable op with fixed rank but unknown dimension
  - add invalid lit test for variable op with unranked type

Signed-off-by: Tai Ly <tai.ly@arm.com>
2025-06-03 17:41:33 +01:00
asraa
34d8275e4f [mlir][tensor] add tensor insert/extract op folders (#142458)
Adds a few canonicalizers, folders, and rewrite patterns to tensor ops:

* tensor.insert folder: insert into a constant is replaced with a new
constant
* tensor.extract folder: extract from a parent tensor that was inserted
at the same indices is folded into the inserted value
* rewrite pattern added that replaces an extract of a collapse shape
with an extract of the source tensor (requires static source dimensions)

Signed-off-by: Asra Ali <asraa@google.com>
2025-06-03 09:16:03 -07:00
Artem Gindinson
cb4a407e5c [mlir][tensor] Loosen restrictions on folding dynamic reshapes (#137963)
The main idea behind the change is to allow expand-of-collapse folds for
reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the
expand op must have a coherent index/affine expression specified in its
`output_shape` argument (see example below), and if it doesn't, the IR
has already been invalidated at an earlier stage:
```
%c32 = arith.constant 32 : index
%div = arith.divsi %<some_index>, %c32 : index
%collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]]
	         : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32>
%affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div]
%expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine]
		: tensor<9x?x?xf32> into tensor<9x?x32x?xf32>
```

On the above assumption, adjust the routine in
`getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond
just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of
issues:
1. n>2 consecutive dynamic dimensions in the source shape cannot be
collapsed together into 1<k<n neighboring dynamic dimensions in the
target shape, since there'd be more than one suitable reassociation
(example: `?x?x10x? into ?x?`)
2. When figuring out static subshape reassociations based on products,
there are cases where a static dimension is collapsed with a dynamic
one, and should therefore be skipped when comparing products of source &
target dimensions (e.g. `?x2x3x4 into ?x12`)

To address 1, we should detect such sequences in the target shape before
assigning multiple dynamic dimensions into the same index set. For 2, we
take note that a static target dimension was preceded by a dynamic one
and allow an "offset" subshape of source static dimensions, as long as
there's an exact sequence for the target size later in the source shape.

This PR aims to address all reshapes that can be determined based purely
on shapes (and original reassociation
maps, as done in
`ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't
seem possible to fold all qualifying dynamic shape patterns in a
deterministic way without looking into affine expressions
simultaneously. That would be difficult to maintain in a single general
utility, so a path forward would be to provide dialect-specific
implementations for Linalg/Tensor.

Signed-off-by: Artem Gindinson <gindinson@roofline.ai>

---------

Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-06-03 09:09:01 -07:00
Igor Wodiany
7797824297 [mlir][spirv] Allow disabling control flow structurization (#140561)
Currently some control flow patterns cannot be structurized into
existing SPIR-V MLIR constructs, e.g., conditional early exits (break).
Since the support for early exit cannot be currently added
(https://github.com/llvm/llvm-project/pull/138688#pullrequestreview-2830791677)
this patch enables structurizer to be disabled to keep
the control flow unstructurized. By default, the control flow is
structurized.
2025-06-03 15:41:39 +01:00
Md Abdullah Shahneous Bari
dc297cbc9a [mlir][memref][spirv] Add conversion for memref.extract_aligned_pointer_as_index to SPIR-V (#86750)
Converts memref.extract_aligned_pointer_as_index to spirv.ConvertPtrToU.
Index conversion is done based on 'use-64bit-index' option.
2025-06-03 09:39:14 -05:00
Momchil Velikov
878badc44d [MLIR][AArch64] Add an extra test for Neon I8MM (NFC) (#135777) 2025-06-03 12:12:57 +01:00
Michele Scuttari
9289604cf6 [MLIR] Use cached symbol tables in getFuncOpsOrderedByCalls (#141967)
Address TODO regarding the recomputation of symbol tables. The signature of the `getFuncOpsOrderedByCalls` function is modified to receive the collection of cached symbol tables.
2025-06-03 11:29:02 +02:00
Momchil Velikov
be9334a68e [MLIR] Add apply_patterns.arm_neon.vector_contract_to_i8mm TD Op (#140251)
This patch wraps `populateLowerContractionToSMMLAPatternPatterns` into a
new TD Op `apply_patterns.arm_neon.vector_contract_to_i8mm` .

It also removes the "test-lower-to-arm-neon" pass.
2025-06-03 10:21:13 +01:00