Commit Graph

13421 Commits

Author SHA1 Message Date
Tim Gymnich
67c590004d [mlir][AMDGPU] Add scaled floating point conversion ops (#141554)
implement `ScaledExtPackedOp` and `PackedScaledTruncOp`
2025-06-13 11:09:11 +02:00
Simone Pellegrini
4b59b7b946 [mlir][Linalg] Fix fusing of indexed linalg consumer with different axes (#140892)
When fusing two `linalg.genericOp`, where the producer has index
semantics, invalid `affine.apply` ops can be generated where the number
of indices do not match the number of loops in the fused genericOp.

This patch fixes the issue by directly using the number of loops from
the generated fused op.
2025-06-13 10:03:09 +01:00
Longsheng Mou
02f1f6967a [mlir][linalg] Add pure tensor check for winogradConv2DHelper (#142299)
This PR adds pure tensor semantics check for `winogradConv2DHelper` to
prevent a crash. Fixes #141566.
2025-06-13 15:49:54 +08:00
Diego Caballero
1ac61c8334 [mlir][Vector] Remove vector.extractelement/insertelement from sparse vectorizer (#143270)
This PR is part of the last step to remove `vector.extractelement` and `vector.insertelement` ops.
RFC: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops

It updates the Sparse Vectorizer to use `vector.extract` and `vector.insert` instead of `vector.extractelement` and `vector.insertelement`.
2025-06-12 14:49:00 -07:00
Andrzej Warzyński
4a58a63280 [mlir][linalg] Remove the test-linalg-to-vector-patterns option (#142116)
This patch removes the `test-linalg-to-vector-patterns` option from the
`-test-linalg-transform-patterns=` test flag. It was only used in one
test, where a more specialized transform dialect op can be used instead:

* `transform.apply_patterns.linalg.pad_vectorization`

While we could preserve `test-linalg-to-vector-patterns`, it's better to
rely on finer-grained transformations — this way, we know exactly what
is being run and tested. Now that its only use has been removed, it
feels natural to delete `test-linalg-to-vector-patterns`.
2025-06-12 19:26:51 +01:00
fairywreath
2c20bc5112 [mlir][spirv] Add definitions for GL FindILsb and FindSMsb (#143916)
Adds SPIRV GL FindILsb and FindSMsb instructions which correspond to GL
instruction numbers 73 and 74.
2025-06-12 12:54:42 -04:00
Nicolas Vasilache
e4de74ba11 [mlir][Vector] Tighten up application conditions in TransferReadAfter… (#143869)
…WriteToBroadcast

The pattern would previously apply in spurious cases and generate
incorrect IR.

In the process, we disable the application of this pattern in the case
where there is no broadcast; this should be handled separately and may
more easily support masking.

The case {no-broadcast, yes-transpose} was previously caught by this
pattern and arguably could also generate incorrect IR (and was also
untested): this case does not apply anymore.

The last cast {yes-broadcast, yes-transpose} continues to apply but
should arguably be removed from the future because creating transposes
as part of canonicalization feels dangerous.
There are other patterns that move permutation logic:

- either into the transfer, or
- outside of the transfer

Ideally, this would be target-dependent and not a canonicalization (i.e.
does your DMA HW allow transpose on the fly or not) but this is beyond
the scope of this PR.

Co-authored-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>
2025-06-12 17:11:06 +02:00
Igor Wodiany
62b6940900 [mlir][spirv] Add definition for GL Pack/UnpackHalf2x16 (#143889) 2025-06-12 16:10:33 +01:00
Adam Siemieniuk
d698ede748 [mlir][amx] Restore conversion interface for AMX (#143871)
Restores mistakenly removed AMX interface which ensures that the custom
tile type is converted to its LLVM equivalent within other operations
such as control flow.

Fix after #140559
2025-06-12 13:45:19 +02:00
Ian Wood
6e5a1423b7 [mlir] Reapply "Loosen restrictions on folding dynamic reshapes" (#142827)
The original PR https://github.com/llvm/llvm-project/pull/137963 had a
nvidia bot failure. This appears to be a flaky test because rerunning
the build was successful.

This change needs commit 6f2ba47 to fix incorrect usage of
`getReassociationIndicesForCollapse`.

Reverts llvm/llvm-project#142639

Co-authored-by: Artem Gindinson <gindinson@roofline.ai>
2025-06-12 10:28:27 +02:00
Ian Wood
6f2ba4712f [mlir] Fix ComposeExpandOfCollapseOp for dynamic case (#142663)
Changes `findCollapsingReassociation` to return nullopt in all cases
where source shape has `>=2` dynamic dims. `expand(collapse)` can
reshape to in any valid output shape but a collapse can only collapse
contiguous dimensions. When there are `>=2` dynamic dimensions it is
impossible to determine if it can be simplified to a collapse or if it
is preforming a more advanced reassociation.


This problem was uncovered by
https://github.com/llvm/llvm-project/pull/137963

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-06-11 14:34:02 -07:00
Rolf Morel
fb761aa38b [MLIR][Transform] apply_registered_op fixes: arg order & python options auto-conversion (#143779) 2025-06-11 21:19:52 +01:00
Razvan Lupusoru
34a1b8ce25 [acc] acc.loop verifier now requires parallelism determination flag (#143720)
The OpenACC specification for `acc loop` describe that a loop's
parallelism determination mode is either auto, independent, or seq. The
rules are as follows.
- As per OpenACC 3.3 standard section 2.9.6 independent clause: A loop
construct with no auto or seq clause is treated as if it has the
independent clause when it is an orphaned loop construct or its parent
compute construct is a parallel construct.
- As per OpenACC 3.3 standard section 2.9.7 auto clause: When the parent
compute construct is a kernels construct, a loop construct with no
independent or seq clause is treated as if it has the auto clause.
- Additionally, loops marked with gang, worker, or vector are not
guaranteed to be parallel. Specifically noted in 2.9.7 auto clause: If
not, or if it is unable to make a determination, it must treat the auto
clause as if it is a seq clause, and it must ignore any gang, worker, or
vector clauses on the loop construct.

The verifier for `acc.loop` was updated to enforce this marking because
the context in which a loop appears is not trivially determined once IR
transformations begin. For example, orphaned loops are implicitly
`independent`, but after inlining into an `acc.kernels` region they
would be implicitly considered `auto`. Thus now the verifier requires
that a frontend specifically generates acc dialect with this marking
since it knows the context.
2025-06-11 12:37:08 -07:00
Rolf Morel
fe7bf4b90b [MLIR][Transform] apply_registered_pass op's options as a dict (#143159)
Improve ApplyRegisteredPassOp's support for taking options by taking
them as a dict (vs a list of string-valued key-value pairs).

Values of options are provided as either static attributes or as params
(which pass in attributes at interpreter runtime). In either case, the
keys and value attributes are converted to strings and a single
options-string, in the format used on the commandline, is constructed to
pass to the `addToPipeline`-pass API.
2025-06-11 17:33:55 +01:00
Igor Wodiany
9150a8249f [mlir][spirv] Add definition for GL Exp2 (#143678) 2025-06-11 15:59:47 +01:00
Igor Wodiany
b09206db15 [mlir][spirv] Include SPIRV_AnyImage in SPIRV_Type (#143676)
This change is trigger by encountering the following error:

```
<unknown>:0: error: 'spirv.Load' op result #0 must be void
or bool or 8/16/32/64-bit integer or 16/32/64-bit float or
vector of bool or 8/16/32/64-bit integer or 16/32/64-bit
float values of length 2/3/4/8/16 or any SPIR-V pointer type
or any SPIR-V array type or any SPIR-V run time array type
or any SPIR-V struct type or any SPIR-V cooperative matrix
type or any SPIR-V matrix type or any SPIR-V sampled image
type, but got '!spirv.image<f32, Dim2D, NoDepth, NonArrayed,
SingleSampled, NoSampler, Rgba8>'<unknown>:0: note: see current
operation:
%126 = "spirv.Load"(%125) {relaxed_precision} : (!spirv.ptr<!spirv.image<f32, Dim2D, NoDepth, NonArrayed, SingleSampled, NoSampler, Rgba8>, UniformConstant>) -> !spirv.image<f32, Dim2D, NoDepth, NonArrayed, SingleSampled, NoSampler, Rgba8>
```
2025-06-11 14:37:28 +01:00
Darren Wihandi
e15d50d5ff [mlir][spirv] Add lowering of multiple math trig/hypb functions (#143604)
Add Math to SPIRV lowering for tan, asin, acos, sinh, cosh, asinh, acosh
and atanh. This completes the lowering of all trigonometric and
hyperbolic functions from math to SPIRV.
2025-06-11 09:20:40 -04:00
Simone Pellegrini
abbbe4a6cd [mlir][vector] Fix attaching write effects on transfer_write's base (#142940)
This fixes an issue with `TransferWriteOp`'s implementation of the
`MemoryEffectOpInterface` where the write effect was attached to the
stored value rather than the base.

This had the effect that when asking for the memory effects for the
input memref buffer using `getEffectsOnValue(...)`, the function would
return no-effects (as the effect would have been attached to the stored
value rather than the input buffer).
2025-06-11 12:37:34 +01:00
MaheshRavishankar
45ae41e0d8 [mlir][scf] Return replacements explicitly in SCFTilingResult. (#143217)
In #120115 the replacements for the tiled operations were wrapped within
the `MergeResult` object. That is a bit of an obfuscation and not
immediately obvious where to get the replacements post tiling. This
changes the `SCFTilingResult` to have `replacements` explicit (as it was
before that change).
`mergeOps` is added as a separate field of `SCFTilingResult`, which is
empty when the reduction type is `FullReduction`.

This is a API breaking change. All uses of `mergeResult.replacements`
should be replaced with `replacements`.

There was also an implicit assumption that
`PartialReductionTilingInterface` is derived from `TilingInterface`, so
all ops that implemented the `PartialReductionTilingInterface` were
expected to implement the `TilingInterface` as well. This pre-dated the
existence of derived inheritances. Make
`PartialReductionTilingInterface` derive from `TilingInterface`.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-10 12:13:42 -07:00
Jianhui Li
9630d7cb92 [MLIR][XeGPU] add blocking support for reduce, broadcast, and transpose (#143389)
This PR adds blocking support for vector dialect operations (`reduce`,
`broadcast`, and `transpose`) in the XeGPU based IR. It simply assigned
the shape specified by "inst_data" as its target shape of the unrolling
to implement the blocking. It is based on
https://github.com/llvm/llvm-project/pull/140163.
2025-06-10 10:50:26 -05:00
Igor Wodiany
f7967effa3 [mlir][spirv][nfc] Add missing tests for GL Tanh Op (#143538)
The problem was noticed when adding Log2 operation.
2025-06-10 15:40:22 +01:00
Cameron McInally
cde1035a2f [flang] Add support for -mrecip[=<list>] (#143418)
This patch adds support for the -mrecip command line option. The parsing
of this options is equivalent to Clang's and it is implemented by
setting the "reciprocal-estimates" function attribute.

Also move the ParseMRecip(...) function to CommonArgs, so that Flang is
able to make use of it as well.

---------

Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
2025-06-10 08:25:33 -06:00
Igor Wodiany
326429022f [mlir][spirv] Deserialize OpConstantComposite of type Cooperative Matrix (#142786)
Depends on #142784.
2025-06-10 15:23:06 +01:00
NimishMishra
bf1fe6eb33 [mlir][OpenMP] Reintroduce TODO for translation of linear clause (#143531)
Reintroduce a TODO for linear clause translation unless corner issues
(like linear variables being entities other than `alloca`, and support
for linear variables of types other than integer) are solved.
2025-06-10 07:06:28 -07:00
Tulio Magno Quites Machado Filho
5e0e6a0dd6 [MLIR] Use mlir_target_link_libraries with MLIRTestIRDLToCppDialect (#143435)
Replace LINK_LIBS with mlir_target_link_libraries.
Fixes #143246.

Suggested-by: Nikita Popov <npopov@redhat.com>
2025-06-10 08:08:21 -03:00
Igor Wodiany
d61a06e255 [mlir][spirv] Add definition for GL Log2 (#143409) 2025-06-10 10:40:50 +01:00
Darren Wihandi
4e6896244f [mlir][spirv] Add definitions for GL inverse hyperbolic functions (#141720)
Adds definitions for `Asinh`, `Acosh` and `Atanh` based on [SPIR-V
extended instructions for
GLSL](https://registry.khronos.org/SPIR-V/specs/unified1/GLSL.std.450.html).
Their instruction numbers are 22, 23 and 24.
2025-06-09 18:40:04 -04:00
Charitha Saumya
10dc8bc519 [mlir][vector] Fix for WarpOpScfForOp failure when scf.for has results that are unused. (#141853)
Currently, only the values defined outside ForOp but inside the original
WarpOp are considered "escaping values". However this is not true if the
ForOp has some unused results. In this case, corresponding IterArgs must
also be yielded by the original WarpOp. This PR adds the required code
changes to achieve this.
2025-06-09 11:56:34 -07:00
Umang Yadav
7f08503a3b Introduce arith.scaling_extf and arith.scaling_truncf (#141965)
This PR adds `arith.scaling_truncf` and `arith.scaling_extf` operations
which supports the block quantization following OCP MXFP specs listed
here
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

OCP MXFP Spec comes with reference implementation here
https://github.com/microsoft/microxcaling/tree/main

Interesting piece of reference code is this method `_quantize_mx`
7bc41952de/mx/mx_ops.py (L173).

Both `arith.scaling_truncf` and `arith.scaling_extf` are designed to be
an elementwise operation. Please see description about them in
`ArithOps.td` file for more details.
 
Internally, 

`arith.scaling_truncf` does the
`arith.truncf(arith.divf(input/(2^scale)))`. `scale` should have
necessary broadcast, clamping, normalization and NaN propagation done
before callling into `arith.scaling_truncf`.

`arith.scaling_extf` does the `arith.mulf(2^scale, input)` after taking
care of necessary data type conversions.


CC: @krzysz00 @dhernandez0 @bjacob @pashu123 @MaheshRavishankar
@tgymnich

---------

Co-authored-by: Prashant Kumar <pk5561@gmail.com>
Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>
2025-06-09 13:13:31 -05:00
Darren Wihandi
a3c7d46145 [mlir][spirv] Implement UMod canonicalization for vector constants (#141902)
Closes #63174. 

Implements this transformation pattern, which is currently only applied
to scalars, for vectors:
```
%1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32
%2 = "spirv.UMod"(%1, %CONST_4) : (i32, i32) -> i32
```
to
```
%1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32
%2 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32
```

Additionally fixes and issue where patterns like this:
```
%1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32
%2 = "spirv.UMod"(%1, %CONST_32) : (i32, i32) -> i32
```
were incorrectly canonicalized to:
```
%1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32
%2 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32
```
which is incorrect since `(X % A) % B` == `(X % B)` IFF A is a multiple
of B, i.e., B divides A.
2025-06-09 11:09:36 -04:00
Igor Wodiany
cc2d5facec [mlir][spirv] Make CooperativeMatrixType a ShapedType (#142784)
This is to enable `CooperativeMatrixType` to be used with
`DenseElementsAttr`, so that a `spirv.Constant` can be easily built from
`OpConstantComposite`. For example:

```mlir
%cst = spirv.Constant dense<0.000000e+00> : !spirv.coopmatrix<1x1xf32, Subgroup, MatrixAcc>
```

Constraints of arithmetic operations are changed, as
`SameOperandsAndResultType` can no longer fully verify CoopMatrices.
This is because for shaped types the verifier only checks element type
and shapes, whereas for any other arbitrary type it looks for an exact
match.

This patch does not enable the actual deserialization. This will be
done in a subsequent PR.
2025-06-09 16:01:48 +01:00
Jeremy Kun
b1b84a629d Pretty print on -dump-pass-pipeline (#143223)
This PR makes `dump-pass-pipeline` pretty-print the dumped pipeline. For
large pipelines the current behavior produces a wall of text that is
hard to visually navigate.

For the command

```bash
mlir-opt --pass-pipeline="builtin.module(flatten-memref, expand-strided-metadata,func.func(arith-expand,func.func(affine-scalrep)))" --dump-pass-pipeline
```

Before:

```bash
Pass Manager with 3 passes:
builtin.module(flatten-memref,expand-strided-metadata,func.func(arith-expand{include-bf16=false include-f8e8m0=false},func.func(affine-scalrep)))
```

After:

```bash
Pass Manager with 3 passes:
builtin.module(
  flatten-memref,
  expand-strided-metadata,
  func.func(
    arith-expand{include-bf16=false include-f8e8m0=false},
    func.func(
      affine-scalrep
    )
  )
)
```

Another nice feature of this is that the pretty-printed string can still
be copy/pasted into `-pass-pipeline` using a quote:

```bash
$ bin/mlir-opt --dump-pass-pipeline test.mlir --pass-pipeline='
builtin.module(
  flatten-memref,
  expand-strided-metadata,
  func.func(
    arith-expand{include-bf16=false include-f8e8m0=false},
    func.func(
      affine-scalrep
    )
  )
)'
```

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
2025-06-08 12:23:38 -07:00
Andrzej Warzyński
b4b86a7a3c [mlir][linalg] Refactor vectorization hooks to improve code reuse (#141244)
This patch refactors two vectorization hooks in Vectorization.cpp:
 * `createWriteOrMaskedWrite` gains a new parameter for write indices,
   aligning it with its counterpart `createReadOrMaskedRead`.
 * `vectorizeAsInsertSliceOp` is updated to reuse both of the above
   hooks, rather than re-implementing similar logic.

CONTEXT
-------
This is effectively a refactoring of the logic for vectorizing
`tensor.insert_slice`. Recent updates added masking support:
  * https://github.com/llvm/llvm-project/pull/122927
  * https://github.com/llvm/llvm-project/pull/123031

At the time, reuse of the shared `create*` hooks wasn't feasible due to
missing parameters and overly rigid assumptions. This patch resolves
that and moves us closer to a more maintainable structure.

CHANGES IN `createWriteOrMaskedWrite`
-------------------------------------
* Introduces a clear distinction between the destination tensor and the
  vector to store, via named variables like `destType`/`vecToStoreType`,
  `destShape`/`vecToStoreShape`, etc.
* Ensures the correct rank and shape are used for attributes like
  `in_bounds`. For example, the size of the `in_bounds` attr now matches
  the source vector rank, not the tensor rank.
* Drops the assumption that `vecToStoreRank == destRank` - this doesn't
  hold in many real examples.
*  Deduces mask dimensions from `vecToStoreShape` (vector) instead of
   `destShape` (tensor). (Eventually we should not require
`inputVecSizesForLeadingDims` at all - mask shape should be inferred.)

NEW HELPER: `isMaskTriviallyFoldable`
-------------------------------------
Adds a utility to detect when masking is unnecessary. This avoids
inserting redundant masks and reduces the burden on canonicalization to
clean them up later.

Example where masking is provably unnecessary:
```mlir
%2 = vector.mask %1 {
  vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0]
    {in_bounds = [true, true, true]}
    : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32>
} : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32>
```

Also, without this hook, tests are more complicated and require more
matching.

VECTORIZATION BEHAVIOUR
-----------------------

This patch preserves the current behaviour around masking and the use
of`in_bounds` attribute. Specifically:
* `useInBoundsInsteadOfMasking` is set when no input vector sizes are
  available.
* The vectorizer continues to infer vector sizes where needed.

Note: the computation of the `in_bounds` attribute is not always
correct. That
issue is tracked here:
* https://github.com/llvm/llvm-project/issues/142107

This will be addressed separately.

TEST CHANGES
-----------
Only affects vectorization of:

* `tensor.insert_slice` (now refactored to use shared hooks)

Test diffs involve additional `arith.constant` Ops due to increased
reuse of
shared helpers (which generate their own constants). This will be
cleaned up
via constant caching (see #138265).

NOTE FOR REVIEWERS
------------------
This is a fairly substantial rewrite. You may find it easier to review
`createWriteOrMaskedWrite` as a new method rather than diffing
line-by-line.

TODOs (future PRs)
------------------
Further alignment of `createWriteOrMaskedWrite` and
`createReadOrMaskedRead`:
  * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in
    VectorUtils.cpp)
  * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`.
  * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the
     updated test in transform-vector.mlir for an example that would
     benefit from this.
  * Address #142107

(*) This method will eventually be moved out of Vectorization.cpp, which
isn't the right long-term home for it.
2025-06-07 19:25:30 +01:00
Darren Wihandi
c9c60172a1 [mlir][spirv] Implement lowering gpu.subgroup_reduce with cluster size for SPIRV (#141402)
Implement lowering of `gpu.subgroup_reduce` with a cluster size
attribute to SPIRV by using the `ClusteredReduce` group operation.
2025-06-06 12:50:18 -04:00
Kazu Hirata
1eb843b1a0 [mlir] Ensure newline at the end of files (NFC) (#143155) 2025-06-06 09:16:52 -07:00
Rolf Morel
4eeee41f52 [MLIR][Transform] Allow ApplyRegisteredPassOp to take options as a param (#142683)
Makes it possible to pass around the options to a pass inside a schedule.

The refactoring also makes it so that the pass manager and pass are only
constructed once per `apply()` of the transform op versus for each target
payload given to the op's `apply()`.
2025-06-06 11:19:39 +01:00
Momchil Velikov
b9d3a644c2 [MLIR] Add apply_patterns.arm_sve.vector_contract_to_i8mm TD Op (#140572) 2025-06-06 10:54:14 +01:00
Tom Eccles
b03081e9fb [mlir][OpenMP] set correct insert point after creating a barrier (#142997)
Fixes #138436
2025-06-06 10:43:13 +01:00
Momchil Velikov
44a047c929 [MLIR][ArmSVE] Add initial lowering of vector.contract to SVE *MMLA instructions (#135636) 2025-06-06 09:54:23 +01:00
asraa
c66b72f8ce [mlir][tensor] remove tensor.insert constant folding out of canonicalization (#142671)
Follow ups from https://github.com/llvm/llvm-project/pull/142458/
In particular concerns that indiscriminately folding tensor constants
can lead to bloating the IR as these can be arbitrarily large.

Signed-off-by: Asra Ali <asraa@google.com>
2025-06-05 14:53:33 -07:00
Chao Chen
def37f7e3a [mlir][vector] add unroll pattern for broadcast (#142011)
This PR adds `UnrollBroadcastPattern` to `VectorUnroll` transform. 
To support this, it also extends `BroadcastOp` definition with
`VectorUnrollOpInterface`
2025-06-05 12:42:16 -05:00
Krzysztof Parzyszek
4dcc159485 [utils][TableGen] Implement clause aliases as alternative spellings (#141765)
Use the spellings in the generated clause parser. The functions
`get<lang>ClauseKind` and `get<lang>ClauseName` are not yet updated.

The definitions of both clauses and directives now take a list of
"Spelling"s instead of a single string. For example
```
def ACCC_Copyin : Clause<[Spelling<"copyin">,
                          Spelling<"present_or_copyin">,
                          Spelling<"pcopyin">]> { ... }
```

A "Spelling" is a versioned string, defaulting to "all versions".

For background information see

https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
2025-06-05 12:35:30 -05:00
James Newling
7ce315d14a [mlir][vector] Improve shape_cast lowering (#140800)
Before this PR, a rank-m -> rank-n vector.shape_cast with m,n>1 was
lowered to extracts/inserts of single elements, so that a shape_cast on
a vector with N elements would always require N extracts/inserts. While
this is necessary in the worst case scenario it is sometimes possible to
use fewer, larger extracts/inserts. Specifically, the largest common
suffix on the shapes of the source and result can be extracted/inserted.
For example:

```mlir
%0 = vector.shape_cast %arg0 : vector<10x2x3xf32> to vector<2x5x2x3xf32>
```

has common suffix of shape `2x3`. Before this PR, this would be lowered
to 60 extract/insert pairs with extracts of the form
`vector.extract %arg0 [a, b, c] : f32 from vector<10x2x3xf32>`. With
this PR it is 10 extract/insert pairs with extracts of the form
`vector.extract %arg0 [a] : vector<2x3xf32> from vector<10x2x3xf32>`.
2025-06-05 10:18:38 -07:00
Srinivasa Ravi
1bc3845c44 [MLIR][NVVM] Add prefetch Ops (#141737)
This change adds `prefetch` and `prefetch.uniform` Ops to the NVVM
dialect for the `prefetch` and `prefetchu` group of instructions.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu
2025-06-05 20:08:00 +05:30
Tai Ly
c14078318c [tosa] Add verifier checks for Scatter (#142661)
This adds verifier checks for the scatter op
to make sure the shapes of inputs and output
are consistent with respect to spec.

Signed-off-by: Tai Ly <tai.ly@arm.com>
2025-06-05 15:23:39 +01:00
Luke Hutton
9d5e1449f7 [mlir][tosa] Fix MulOp verifier handling for unranked operands (#141980)
The previous verifier checks did not correctly handle unranked operands.
For example, it could incorrectly assume the number of
`rankedOperandTypes` would be >= 2, which isn't the case when both a and
b are unranked.

This change simplifies these checks such that they only operate over the
intended a and b operands as opposed to the shift operand as well.
2025-06-05 08:54:01 +01:00
Adam Straw
3172c61895 [mlir][gpu] Fix bug with gpu.printf global location (#142872)
Bug description: Global variables and functions created during
gpu.printf conversion to NVVM may contain debug info metadata from
function containing the gpu.printf which cannot be used out of that
function.
2025-06-05 00:21:44 -06:00
Matthias Springer
e4c8ff94e7 [mlir][tensor] Add runtime verification for cast/dim/extract/insert/extract_slice (#141332)
Add `RuntimeVerifiableOpInterface` implementations for the following
ops. These were mostly copied from the respective memref
implementations. Only the part that deals with offsets and strides was
removed.
* `tensor.cast`: `memref.cast`
* `tensor.dim`: `memref.dim`
* `tensor.extract`: `memref.load`
* `tensor.insert`: `memref.store`
* `tensor.extract_slice`: `memref.subview`
2025-06-05 12:06:47 +09:00
hanhanW
d96447b4d3 Reapply "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785)"
This reverts commit 178b64e75b.

The author misread the report of the failure, and thought that it broke
the CI again. Reland the fix.
2025-06-04 09:05:15 -07:00
hanhanW
178b64e75b Revert "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785)"
This reverts commit 07a534160a.
2025-06-04 08:59:54 -07:00