Commit Graph

23341 Commits

Author SHA1 Message Date
Kazu Hirata
887222e352 [mlir] Migrate away from ArrayRef(std::nullopt) (NFC) (#144989)
ArrayRef has a constructor that accepts std::nullopt.  This
constructor dates back to the days when we still had llvm::Optional.

Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.

This patch takes care of the mlir side of the migration, starting with
straightforward places where I see ArrayRef or ValueRange nearby.
Note that ValueRange has a constructor that forwards arguments to an
ArrayRef constructor.
2025-06-20 08:33:59 -07:00
Jeremy Kun
b533b0ec34 Define a DataFlowSolver helper that loads sensible default analyses (#143415)
Cf. https://discourse.llvm.org/t/mlir-dead-code-analysis/67568/10

Custom analysis passes will not work properly unless both
DeadCodeAnalysis and SparseConstantPropagation are loaded to the
DataFlowSolver. This is intended behavior, but surprising to many users
as shown in the thread. In lieu of a longer-term fix (which I am not
knowledgeable enough to implement myself, yet), this commit adds a
helper function that loads these two analyses, as well as providing
breadcrumbs for an explanation of the problem. The existing places in
the codebase where these two analyses are loaded for the purpose of
running other unrelated analyses are replaced by the use of the helper.

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
Co-authored-by: Oleksandr "Alex" Zinenko <azinenko@amd.com>
2025-06-20 08:16:52 -07:00
Chao Chen
9dc59cc95b [MLIR] Incorrect track of usedKey in setPropertiesFromParsedAttr (#144789)
co-authored by @chencha3 and @joker-eph
2025-06-20 10:02:09 -05:00
Nicolas Vasilache
269cb22ae8 [mlir][transform] extract a minimal DomainAndOperandsAffineMapT… (#145034)
…ransferInterface out of LinalgStructuredInterface and use that for
PadTilingInterface

Along the way, a bug was found on the handling of scalar values, fix it
and add a test.
2025-06-20 15:45:21 +02:00
Andrzej Warzyński
8db272ffcf [mlir][SparseTensor] Re-enable tests on AArch64 (#143387)
These tests were disabled in https://reviews.llvm.org/D136273, due to:
* https://github.com/llvm/llvm-project/issues/58465

That issue has now been resolved, so we should be able to re-enable
these tests.
2025-06-20 14:25:36 +01:00
Krzysztof Parzyszek
349f8d67d4 [flang][OpenMP] Skip runtime mapping with no offload targets (#144534)
When no offload targets are specified flang will ignore "target"
constructs, but not "target data" constructs. This patch makes the
behavior consistent across all offload-related operations.

While ignoring "target" may produce semantically incorrect code, it may
still be a useful debugging tool.
2025-06-20 08:09:36 -05:00
Artemiy Bulavin
6edf2eb364 [MLIR] Print more user-friendly error message when generating local reproducer and threading is enabled (#144905) 2025-06-20 15:45:17 +03:00
Matthias Springer
6c0ac888c5 [mlir][arith][NFC] Remove redundant trait declaration (#145007)
`Arith_Op` already declares the `ElementwiseMappable` traits, so they
don't have to be declared for `arith.select`.
2025-06-20 14:32:56 +02:00
Mehdi Amini
a5b1093f78 [MLIR] Add ReturnLike trait to memref.atomic_yield (#144932)
Without this, the yield isn't considered as the region terminator and
the dataflow framework does not consider it live.
2025-06-20 13:55:03 +02:00
Nicolas Vasilache
227cd56e13 [NFC] Update transform-op-pad-tiling-interface.mlir
Missing NL
2025-06-20 12:33:56 +02:00
Nicolas Vasilache
7af545237f [NFC] Update transform-op-pad-tiling-interface-multiple-of.mlir
Missing NL
2025-06-20 12:33:14 +02:00
Nicolas Vasilache
00c18d04ab [mlir][Transforms] Add a PadTilingInterface transformation and hook i… (#144991)
…t up to the transform dialect

This revision revisits the padding transformation from first principles
and prepares it to work more generally with TilingInterface.

Compared to structured.transform.pad it has the following differences:
- no support for nofold, copy-back, transpose and hoisting: these have
been carried by the padding op in the very early days of StructuredOps
and have since then been separated out as independent transformations
that compose.
- no conflated static bounding box derivation attempts:
pad_tiling_interface composes more naturally with or without tiling.
- properly derives padding size on outputs where multiple dimensions
contribute: this is not supported in structured.transform.pad
- geared towards supporting TilingInterface once the proper control
mechanisms are supported through a PadSizeComputationFunction (supports
LinalgOp by default)

This will gradually replace structured.transform.pad as it is fleshed
out and tested more comprehensively.

In the future this should be moved out of a specific Linalg
implementation file and into a more general "Structured" file.
2025-06-20 12:31:46 +02:00
Matthias Springer
95bd05d7ca [mlir][Func][NFC] Simplify implementation after #144706 (#145006) 2025-06-20 12:04:04 +02:00
Matthias Springer
f577516d91 [mlir][arith] Add back ElementwiseMappable to arith.trunci (#145000)
This trait was accidentally dropped in #144863.
2025-06-20 11:23:33 +02:00
Umang Yadav
836201f117 Allow bf16 operands on new MFMAs (#144925)
New gfx950 MFMA allows bf16 operands. 


c0cc81cdc0/llvm/include/llvm/IR/IntrinsicsAMDGPU.td (L3434)

When running `amdgpu-to-rocdl`, Current logic converts bf16 to i16
always which fails to compile for newer bf16 MFMA e.g.
`v_mfma_f32_16x16x32bf16`.
Backend expects bf16 type for the operands for those newer MFMAs. This
patch fixes it.

CC: @krzysz00  @dhernandez0  @giuseros  @antiagainst  @kuhar
2025-06-19 12:52:31 -05:00
Slava Zakharin
6ce86538c1 [mlir][cf] Preserve branch weights during cf.cond_br canonicalization. (#144822) 2025-06-19 10:09:10 -07:00
Andrzej Warzyński
3fe62682ef [mlir][vector] Use result consistently as the result argument name (#144739)
This patch updates the following ops to use `result` (instead of `res`)
as the name for their result argument:
  * `vector.scalable.insert`
  * `vector.scalable.extract`
  * `vector.insert_strided_slice`

This change ensures naming consistency with other ops in the `vector`
dialect. It addresses part of:
* https://github.com/llvm/llvm-project/issues/131602
2025-06-19 17:34:08 +01:00
Tobias Gysi
eb694b2846 [mlir][arith] Delete mul ext canonicalizations (#144844)
The Arith dialect includes patterns that canonicalize a sequence of:

- trunci(shrui(mul(sext(x), sext(y)), c)) -> mulsi_extended(x, y)
- trunci(shrui(mul(zext(x), zext(y)), c)) -> mului_extended(x, y)

These patterns return the high word of an extended multiplication, which
assumes that the shift amount is equal to the bit width of the original
operands. This check was missing, leading to incorrect canonicalizations
when the shift amount was less than the bit width.

For example, the following code:
```
  %x = arith.extui %a: i32 to i33
  %y = arith.extui %b: i32 to i33
  %m = arith.muli %x, %y: i33
  %c1 = arith.constant 1: i33
  %sh = arith.shrui %m, %c1 : i33
  %hi = arith.trunci %sh: i33 to i32
```
would incorrectly be canonicalized to:
```
_, %hi = arith.mului_extended %a, %b : i32
```
This commit removes the faulty canonicalizations since they are not
believed to be generally beneficial (c.f., the discussion of the
alternative https://github.com/llvm/llvm-project/pull/144787 which fixes
the canonicalizations).
2025-06-19 16:32:48 +02:00
Matthias Springer
a4e4527c4b [mlir][Transforms] Fix replaceUsesOfBlockArgument API (#144706)
Before this PR, users had to pass the "old" block argument when
replacing the uses of a block argument in a newly converted block. Users
can now pass the actual block argument that should be replaced.

Note for LLVM integration: Make sure to pass the current block argument
instead of the old one.
2025-06-19 15:39:06 +02:00
Abdul Raheem
f87b6625d6 [MLIR][NFC] Fixed some Typos (#144263)
-- Fixed some typos in Operation.h

Signed-off: Abdul Raheem Beigh abdulraheembeigh@gmail.com
2025-06-19 06:24:34 -07:00
Matthias Springer
e33f13ba48 [mlir][arith] Add overflow flags to arith.trunci (#144863)
LLVM already supports overflow flags on `llvm.trunc` for a while. This
commit adds support for these flags to `arith.trunci`.
2025-06-19 13:59:22 +02:00
Fabian Mora
97c1a24445 [mlir][linalg] Add option to pad dynamic dims to linalg::rewriteAsPaddedOp (#144354)
This patch makes the following changes:

- Add a `ValueRange typeDynDims` argument to
`linalg::makeComposedPadHighOp`, allowing to pad a tensor with dynamic
dimensions using `tensor::createPadHighOp`.

- Add a `DenseMap<std::pair<unsigned, unsigned>, OpFoldResult>
sizeToPadTo;` option to `LinalgPaddingOptions`. This option allows
setting the size to use when padding a dimension of an operand, allowing
to pad operands even in the case they don't have a constant upper
bounding box. If the value is not provided, then the constant upper
bound is used by default.

- Add a `use_prescribed_tensor_shapes` option to
`transform.structured.pad`. If set to true then `tensor.dim` will be
used as dimensions to compute the size of the padded dim instead of
computing the constant upper bound.

- This patch also changes the behavior for computing the padded shape
`linalg::rewriteAsPaddedOp`, by using the newly added options in
`LinalgPaddingOptions`.

- Finally it adds tests verifying the behavior.
2025-06-19 11:47:44 +02:00
Durgadoss R
c0a9c908a6 [MLIR][NVVM-Docs] Fix rendering of a few tables in NVVM Docs (#144764)
This patch corrects the formatting of tables
in the tcgen05 ld/st and smem_descriptor Ops.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-06-19 14:45:43 +05:30
Hsiangkai Wang
03461c9c6e [mlir][gpu][spirv] Remove rotation semantics of gpu.shuffle up/down (#139105)
From the description of gpu.shuffle operation, shuffle up/down rotates
values in the subgroup because it applies modulo on the shifted value to
calculate the result lane ID. It is inconsistent with the definition of
SPIR-V shuffle up/down and NVVM data movement definitions within
subgroup.

In NVVM, it says

"If the computed source lane index j is in range, the returned i32 value
will be the value of %a from lane j; otherwise, it will be the the value
of %a from the current thread."

It will keep the original value if the result land ID is out of range.

In SPIR-V OpGroupNonUniformShuffleUp and OpGroupNonUniformShuffleDown,
it says

"The resulting value is undefined if Delta is greater than the current
invocation’s id within the scope or if the identified invocation is not
in scope restricted tangle."

It's an undefined value if the result land ID is out of range.

Anyway, there is no circular movement in shuffle up/down from these 2
specifications. This patch removes the circular movement in gpu.shuffle
up/down and lower gpu.shuffle up/down to SPIR-V
OpGroupNonUniformShuffleUp and OpGroupNonUniformShuffleDown directly.

Reference:

https://docs.nvidia.com/cuda/archive/12.2.1/nvvm-ir-spec/index.html#data-movement

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformShuffleUp

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformShuffleDown
2025-06-19 07:56:30 +01:00
Han-Chung Wang
351303c28e [mlir][docs] Fix broken links to Traits documentation. (#144820) 2025-06-18 20:07:43 -07:00
Jianhui Li
118bfcda46 [MLIR][XEGPU] Add blocking support for scatter ops (#144766)
Add blocking support for scatter ops: Create_tdesc, update, prefetch,
load and store. It also enables the load/store with chunk size.
2025-06-18 14:52:03 -07:00
Diego Caballero
ac37a0df94 [mlir] Fix integer comparison warning (#144794)
Introduced by https://github.com/llvm/llvm-project/pull/141457
2025-06-18 14:11:21 -07:00
Diego Caballero
7aecd7ecac [mlir][Vector] Add vector.to_elements op (#141457)
This PR introduces the `vector.to_elements` op, which decomposes a
vector into its scalar elements. This operation is symmetrical to the
existing `vector.from_elements`.

Examples:

```
    // Decompose a 0-D vector.
    %0 = vector.to_elements %v0 : vector<f32>
    // %0 = %v0[0]

    // Decompose a 1-D vector.
    %0:2 = vector.to_elements %v1 : vector<2xf32>
    // %0#0 = %v1[0]
    // %0#1 = %v1[1]

    // Decompose a 2-D.
    %0:6 = vector.to_elements %v2 : vector<2x3xf32>
    // %0#0 = %v2[0, 0]
    // %0#1 = %v2[0, 1]
    // %0#2 = %v2[0, 2]
    // %0#3 = %v2[1, 0]
    // %0#4 = %v2[1, 1]
    // %0#5 = %v2[1, 2]
```

This op is aimed at reducing code size when modeling "structured" vector
extractions and simplifying canonicalizations of large sequences of
`vector.extract` and `vector.insert` ops into `vector.shuffle` and other
sophisticated ops that can re-arrange vector elements.
2025-06-18 13:45:43 -07:00
Jakub Kuderski
96bbe472ef Revert "[mlir][spirv] Fix int type declaration duplication when serializing" and follow up commits (#144773)
This reverts the following PRs:
* https://github.com/llvm/llvm-project/pull/143108
* https://github.com/llvm/llvm-project/pull/144538
* https://github.com/llvm/llvm-project/pull/144685

Reverting because this disabled tests when building without the llvm
spirv backend enabled.
2025-06-18 16:15:06 -04:00
Yang Bai
fe3933da15 [mlir][vector] Support complete folding in single pass for vector.insert/vector.extract (#142124)
### Description

This patch improves the folding efficiency of `vector.insert` and
`vector.extract` operations by not returning early after successfully
converting dynamic indices to static indices.

This PR also renames the test pass `TestConstantFold` to
`TestSingleFold` and adds comprehensive documentation explaining the
single-pass folding behavior.

### Motivation

Since the `OpBuilder::createOrFold` function only calls `fold` **once**,
the current `fold` methods of `vector.insert` and `vector.extract` may
leave the op in a state that can be folded further. For example,
consider the following un-folded IR:
```
%v1 = vector.insert %e1, %v0 [0] : f32 into vector<128xf32>
%c0 = arith.constant 0 : index
%e2 = vector.extract %v1[%c0] : f32 from vector<128xf32>
```
If we use `createOrFold` to create the `vector.extract` op, then the
result will be:
```
%v1 = vector.insert %e1, %v0 [127] : f32 into vector<128xf32>
%e2 = vector.extract %v1[0] : f32 from vector<128xf32>
```
But this is not the optimal result. `createOrFold` should have returned
`%e1`.
The reason is that the execution of fold returns immediately after
`extractInsertFoldConstantOp`, causing subsequent folding logics to be
skipped.

---------

Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-06-18 09:26:04 -07:00
Andrei Golubev
ee070d0816 [mlir][bufferization] Support custom types (1/N) (#142986)
Following the addition of TensorLike and BufferLike type interfaces (see
00eaff3e9c), introduce minimal changes
required to bufferize a custom tensor operation into a custom buffer
operation.

To achieve this, new interface methods are added to TensorLike type
interface that abstract away the differences between existing (tensor ->
memref) and custom conversions.

The scope of the changes is intentionally limited (for example,
BufferizableOpInterface is untouched) in order to first understand the
basics and reach consensus design-wise.

---
Notable changes:
* mlir::bufferization::getBufferType() returns BufferLikeType (instead
of BaseMemRefType)
* ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType.
Operation argument "memref" renamed to "buffer"
* ToTensorOp's tensor type inferring builder is dropped (users now need
to provide the tensor type explicitly)
2025-06-18 16:18:12 +02:00
Sergei Lebedev
1d6f1029f7 [mlir] [python] Fixed the return type of MemRefType.get_strides_and_offset (#144523)
Previously, the return type for `offset` was `list[int]`, which clearly
is not right.
2025-06-18 09:53:20 -04:00
lorenzo chelini
c5613dc863 [MLIR] Mark LLVM::FMAOp as legal (#144671)
Mark LLVM::FMAOp as legal in configureGpuToNVVMConversionLegality, since
we can handle intrinsic lowering in the NVPTX backend and emit
fma.rn.f32.
2025-06-18 15:49:00 +02:00
Kunwar Grover
6729da647a [mlir][amdgpu][nfc] Add PatternBenefit to populate methods (#144663) 2025-06-18 15:19:17 +02:00
Frank Schlimbach
8584abb05a [mlir] mlir/test/lit.local.cfg -> mlir/test/Target/SPIRV/lit.local.cfg (#144685)
renamed: mlir/test/lit.local.cfg -> mlir/test/Target/SPIRV/lit.local.cfg
2025-06-18 15:04:55 +02:00
Matthias Springer
66580f77b8 [mlir][Transforms][NFC] Dialect Conversion: Keep unresolvedMaterializations up to date (#144254)
`unresolvedMaterializations` is a mapping from
`UnrealizedConversionCastOp` to `UnresolvedMaterializationRewrite`. This
mapping is needed to find the correct type converter for an unresolved
materialization.

With this commit, `unresolvedMaterializations` is updated immediately
when an op is being erased. This also cleans up the code base a bit:
`SingleEraseRewriter` is now used only during the "cleanup" phase and no
longer needed as a field of `ConversionRewriterImpl`.

This commit is in preparation of the One-Shot Dialect Conversion
refactoring: `allowPatternRollback = false` will in the future trigger
immediate materialization of all IR changes.
2025-06-18 14:42:09 +02:00
Andrei Golubev
a1c2a71293 [mlir][bufferization] Use Type instead of Value in unknown conversion (#144658)
Generally, bufferization should be able to create a memref from a tensor
without needing to know more than just a mlir::Type. Thus, change
BufferizationOptions::UnknownTypeConverterFn to accept just a type
(mlir::TensorType for now) instead of mlir::Value. Additionally, apply
the same rationale to getMemRefType() helper function.

Both changes are prerequisites to enable custom types support in
one-shot bufferization.
2025-06-18 14:38:58 +02:00
Oleksandr "Alex" Zinenko
8a469da8b2 [mlir] remove unnecessary atomic_rmw expansions (#144515)
The expansion of `memref.atomic_rmw` into a `memref.generic_atomic_rmw`
for floating-point min/max operations is no longer necessary as those
are now supported by the LLVM dialect and LLVM IR.

Furthermore, combining this expansion with direct lowering of
`generic_atomic_rmw` could leads to invalid LLVM dialect IR with
`cmpxchg` operating on floating-point values that it does not support.
2025-06-18 13:32:46 +02:00
Frank Schlimbach
43e1a5a411 [mlir][mesh] adding option for traversal order in sharding propagation (#144079)
The traversal order in sharding propagation was hard-coded. This PR
provides options to the pass to select a suitable order
- forward-only
- backward-only
- forward-backward
- backward-forward

Default is the previous behavior (backward-forward).
2025-06-18 11:06:48 +02:00
Kirill Chibisov
74687180dd [mlir][emitc] Make CExpression trait into interface (#142771)
By defining `CExpressionInterface`, we move the side effect detection
logic from `emitc.expression` into the individual operations
implementing the interface allowing operations to gradually tune the
side effect.

It also allows checking for side effects each operation individually.
2025-06-18 07:38:47 +02:00
Jianhui Li
86a09f3615 [MLIR][XeGPU] Clean up xegpu op tests (#144592)
Test cleanup: 
1) separate layout.mlir from ops.mlir for layout related test 
2) remove lane layout for ops working at work item scope. 
3) remove redundant test in create_tdesc/update_tdesc/prefetch. 
4) remove "test_" from all test function name.
2025-06-17 19:48:09 -05:00
Slava Zakharin
70343c8d44 [mlir][flang] Added Weighted[Region]BranchOpInterface's. (#142079)
The new interfaces provide getters and setters for the weight
information about the branches of BranchOpInterface and
RegionBranchOpInterface operations.

These interfaces are done the same way as LLVM dialect's
BranchWeightOpInterface.

The plan is to produce this information in Flang, e.g. mark
most probably "cold" code as such and allow LLVM to order
basic blocks accordingly. An example of such a code is
copy loops generated for arrays repacking - we can mark it
as "cold" assuming that the copy will not happen dynamically.
If the copy actually happens the overhead of the copy is probably high
enough so that we may not care about the little overhead
of jumping to the "cold" code and fetching it.
2025-06-17 16:14:13 -07:00
Jianhui Li
f25f2f7de4 [MLIR][XeGPU] Extend unrolling support for scatter ops with chunk_size (#144447)
Add support for load/store with chunk_size, which requires special
consideration for the operand blocking since offests and masks are
 n-D and tensor are n+1-D. Support operations including create_tdesc,
update_tdesc, load, store, and prefetch.

---------

Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
2025-06-17 17:46:35 -05:00
Nishant Patel
8063bd153c [MLIR][XeGPU] Add support for elementwise ops in Wg to Sg distribute pass [1/N] (#142797)
This PR adds support for Elementwise operations' (unary & binary)
lowering from Workgroup to Subgroup.
2025-06-17 09:55:02 -07:00
Davide Grohmann
549bc55cc3 [mlir][spirv] Fix int type declaration duplication when serializing (#143108)
At the MLIR level unsigned integer and signless integers are different
types. Indeed when looking up the two types in type definition cache
they do not match.

Hence when translating a SPIR-V module which contains both usign and
signless integers will contain the same type declaration twice
(something like OpTypeInt 32 0) which is not permitted in SPIR-V and
such generated modules fail validation.

This patch solves the problem by mapping unisgned integer types to
singless integer types before looking up in the type definition cache.

---------

Signed-off-by: Davide Grohmann <davide.grohmann@arm.com>
2025-06-17 10:35:14 -04:00
William Moses
917bc90967 [MLIR][LLVMIR] Mark Funcop as affinescope (#144456)
All functions are conceptually an affine scope.
2025-06-17 06:41:15 -07:00
Denzel-Brian Budii
12611a7fc7 [mlir] Improve mlir-query by adding matcher combinators (#141423)
Whereas backward-slice matching provides support to limit traversal by
specifying the desired depth level, this pull request introduces support
for limiting traversal with a nested matcher (adding forward-slice
also). It also adds support for variadic operators, including `anyOf`
and `allOf`. Rather than simply stopping traversal when an operation
named foo is encountered, one can now define a matcher that specifies
different exit conditions. Variadic support implementation within
mlir-query is very similar to clang-query.
2025-06-17 14:07:20 +02:00
Oleksandr "Alex" Zinenko
875b36a874 [mlir] fix MemRefToLLVM lowering of atomic operations (#139045)
We have been confusingly, and arguably incorrectly, lowering `m**imumf`
atomic RMW operations in the MemRef dialect to `fm**` atomic RMW
operations in the LLVM dialect, which have different NaN-propagation
semantics: `m**imumf` propagates NaNs from either operand whereas
`fm**`, which lowers to the `fm**num` intrinsic returns the non-NaN
operand. This also contradicts the lowering of `arith.m**imumf` and
`arith.m**numf` operations.

Change the lowering to match the terminology in arith.

Add tests for these lowerings.

Keep a debug message in case of surprising behavior downstream (the code
may be producing more NaNs now).
2025-06-17 13:40:57 +02:00
Momchil Velikov
7eda8274fe [MLIR] Integration tests for lowering vector.contract to SVE FEAT_I8MM (#140573) 2025-06-17 11:03:14 +01:00
Tom Eccles
aa01e8e9cf [mlir][OpenMP] Fix broken insertion point for charbox with omp task (#143112)
Fixes #142365
2025-06-17 10:42:42 +01:00