Commit Graph

6722 Commits

Author SHA1 Message Date
Nicolas Vasilache
08cf6ae537 [mlir][memref] Add a new ReifyResultShapes pass (#145927)
This pass reifies the shapes of a subset of
`ReifyRankedShapedTypeOpInterface` ops with `tensor` results.

The pass currently only supports result shape type reification for:
   - tensor::PadOp
   - tensor::ConcatOp

It addresses a representation gap where implicit op semantics are needed
to infer static result types from dynamic
operands. But it does so by using `ReifyRankedShapedTypeOpInterface` as
the source of truth rather than the op itself.
As a consequence, this cannot generalize today.

TODO: in the future, we should consider coupling this information with
op "transfer functions" (e.g.
`IndexingMapOpInterface`) to provide a source of truth that can work
across result shape inference, canonicalization and
op verifiers.

The pass replaces the operations with their reified versions, when more
static information can be derived, and inserts
casts when results shapes are updated.

Example:
```mlir
  #map = affine_map<(d0) -> (-d0 + 256)>
  func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
    %0 = affine.apply #map(%arg1)
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
    %padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
    ^bb0(%arg3: index, %arg4: index, %arg5: index):
      tensor.yield %arg0 : f32
    } : tensor<1x?x64xf32> to tensor<1x?x64xf32>
    return %padded : tensor<1x?x64xf32>
  }

  // mlir-opt --reify-result-shapes
  #map = affine_map<()[s0] -> (-s0 + 256)>
  func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
    %0 = affine.apply #map()[%arg1]
    %extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
    %padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
    ^bb0(%arg3: index, %arg4: index, %arg5: index):
      tensor.yield %arg0 : f32
    } : tensor<1x?x64xf32> to tensor<1x256x64xf32>
    %cast = tensor.cast %padded : tensor<1x256x64xf32> to tensor<1x?x64xf32>
    return %cast : tensor<1x?x64xf32>
  }
  ```

---------

Co-authored-by: Fabian Mora <fabian.mora-cordero@amd.com>
2025-07-01 15:39:21 +02:00
Hsiangkai Wang
f581ef5b66 [mlir][gpu] Add gpu.rotate operation (#142796)
Add gpu.rotate operation and a pattern to convert gpu.rotate to SPIR-V
OpGroupNonUniformRotateKHR.
2025-07-01 11:32:25 +01:00
Luke Hutton
698ec8c7ba [mlir][tosa] Require signless types in validation and add corresponding conversion pass (#144367)
Firstly, this commit requires that all types are signless in the strict
mode of the validation pass. This is because signless types on
operations are required by the TOSA specification. The "strict" mode in
the validation pass is the final check for TOSA conformance to the
specification, which can often be used for conversion to other formats.

In addition, a conversion pass `--tosa-convert-integer-type-to-signless`
is provided to allow a user to convert all integer types to signless.
The intention is that this pass can be run before the validation pass.
Following use of this pass, input/output information should be carried
independently by the user.
2025-07-01 10:29:53 +01:00
Yang Bai
393a75ebb7 [mlir][Vector] Add constant folding for vector.from_elements operation (#145849)
### Summary

This PR adds a new folding pattern for **vector.from_elements** that
canonicalizes it to **arith.constant** when all input operands are
constants.

### Implementation Details

**Leverages FoldAdaptor capabilities**: Uses adaptor.getElements() to
access **pre-computed** constant attributes, avoiding redundant pattern
matching on operands.

### Example Transformation
```
Before:
%c0_i32 = arith.constant 0 : i32
%c1_i32 = arith.constant 1 : i32
%c2_i32 = arith.constant 2 : i32
%c3_i32 = arith.constant 3 : i32
%v = vector.from_elements %c0_i32, %c1_i32, %c2_i32, %c3_i32 : vector<2x2xi32>

After:
%v = arith.constant dense<[[0, 1], [2, 3]]> : vector<2x2xi32>
```

---------

Co-authored-by: Yang Bai <yangb@nvidia.com>
2025-06-30 20:39:53 -07:00
Fabian Mora
878d3594ed [mlir][vector] Avoid setting padding by default to 0 in vector.transfer_read prefer ub.poison (#146088)
Context:
`vector.transfer_read` always requires a padding value. Most of its
builders take no `padding` value and assume the safe value of `0`.
However, this should be a conscious choice by the API user, as it makes
it easy to introduce bugs.
For example, I found several occasions while making this patch that the
padding value was not getting propagated (`vector.transfer_read` was
transformed into another `vector.transfer_read`). These bugs, were
always caused because of constructors that don't require specifying
padding.

Additionally, using `ub.poison` as a possible default value is better,
as it indicates the user "doesn't care" about the actual padding value,
forcing users to specify the actual padding semantics they want.

With that in mind, this patch changes the builders in
`vector.transfer_read` to always having a `std::optional<Value> padding`
argument. This argument is never optional, but for convenience users can
pass `std::nullopt`, padding the transfer read with `ub.poison`.

---------

Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
2025-06-30 15:20:42 -04:00
Markus Böck
8602204d9f [mlir][tensor] Relax input type requirement on tensor.splat (#145893)
`tensor.splat` is currently restricted to only accepting input values
that are of integer, index or float type.

This is much more restrictive than the tensor type itself as well as any
lowerings of it.

This PR therefore removes this restriction by using `AnyType` for the
input value. Whether the type is actually valid or not for a tensor
remains verified through the type equality of the result tensor element
type and the input type.
2025-06-30 09:49:19 +02:00
Uday Bondhugula
80625c16f0 [MLIR][Affine] Fix memref replacement in affine-data-copy-generate (#139016)
Fixes: https://github.com/llvm/llvm-project/issues/130257

Fix affine-data-copy-generate in certain cases that involved users in
multiple blocks. Perform the memref replacement correctly during copy
generation.

Improve/clean up memref affine use replacement API. Instead of
supporting dominance and post dominance filters (which aren't adequate
in most cases) and computing dominance info expensively each time in
RAMUW, provide a user filter callback, i.e., force users to compute
dominance if needed.
2025-06-28 10:27:11 +05:30
Christopher McGirr
29b1054835 [mlir][linalg] Update pack and unpack documentation (#143903)
* Clarified the `inner_dim_pos` attribute in the case of high
dimensionality tensors.
* Added a 5D examples to show-case the use-cases that triggered this
updated.
* Added a reminder for linalg.unpack that number of elements are not
required to be the same between input/output due to padding being
dropped.

I encountered some odd variations of `linalg.pack` and `linalg.unpack`
while working on some TFLite models and the definition in the
documentation did not match what I saw pass in IR verification.

The following changes reconcile those differences.

---------

Signed-off-by: Christopher McGirr <mcgirr@roofline.ai>
2025-06-27 13:55:26 -07:00
Momchil Velikov
3876e887d0 [MLIR][ArmSVE] Add an ArmSVE dialect operation mapping to bfmmla (#145064) 2025-06-27 15:37:13 +01:00
Erich Keane
3463aba45f [OpenACC][CIR] Implement copyin/copyout/create lowering for compute/c… (#145976)
…ombined

This patch does the lowering of copyin (represented as a
    acc.copyin/acc.delete), copyout (acc.create/acc.copyin), and create
(acc.create/acc.delete).

Additionally, it found a few problems with #144806, so it fixes those as
well.
2025-06-27 07:25:58 -07:00
Andrzej Warzyński
541f33e075 [mlir][linalg] Prevent hoisting of transfer pairs in the presence of aliases (#145235)
This patch adds additional checks to the hoisting logic to prevent hoisting of
`vector.transfer_read` / `vector.transfer_write` pairs when the underlying
memref has users that introduce aliases via operations implementing
`ViewLikeOpInterface`.

Note: This may conservatively block some valid hoisting opportunities and could
affect performance. However, as demonstrated by the included tests, the current
logic is too permissive and can lead to incorrect transformations.

If this change prevents hoisting in cases that are provably safe, please share
a minimal repro - I'm happy to explore ways to relax the check.

Special treatment is given to `memref.assume_alignment`, mainly to accommodate
recent updates in:

* https://github.com/llvm/llvm-project/pull/139521

Note that such special casing does not scale and should generally be avoided.
The current hoisting logic lacks robust alias analysis. While better support
would require more work, the broader semantics of `memref.assume_alignment`
remain somewhat unclear. It's possible this op may eventually be replaced with
the "alignment" attribute added in:

* https://github.com/llvm/llvm-project/pull/144344
2025-06-27 13:18:15 +01:00
long.chen
aed8f1992a [NFC][mlir][memref] refine debug message about memref::SubViewOp. (#145470) 2025-06-27 18:34:45 +08:00
Christopher McGirr
96c1611163 [mlir][linalg] fix OuterUnitDims linalg.pack decomposition pattern (#141613)
Given the following example:
```
module {
  func.func @main(%arg0: tensor<1x1x1x4x1xf32>, %arg1: tensor<1x1x4xf32>) -> tensor<1x1x1x4x1xf32> {
    %pack = linalg.pack %arg1 outer_dims_perm = [1, 2, 0] inner_dims_pos = [2, 0] inner_tiles = [4, 1] into %arg0 : tensor<1x1x4xf32> -> tensor<1x1x1x4x1xf32>
    return %pack : tensor<1x1x1x4x1xf32>
  }
}
```

We would generate an invalid transpose operation because the calculated
permutation would be `[0, 2, 0]` which is semantically incorrect. As the
permutation must contain unique integers corresponding to the source
tensor dimensions.

The following change modifies how we calculate the permutation array and
ensures that the dimension indices given in the permutation array is
unique.

The above example would then translate to a transpose having a
permutation of `[1, 2, 0]`. Following the rule, that the `inner_dim_pos`
is appended to the permutation array and the preceding indices are
filled with the remaining dimensions.
2025-06-27 09:24:33 +02:00
Han-Chung Wang
0515449f6d [mlir][tensor][memref] Enhance collapse(expand(src)) canonicalization pattern. (#145995) 2025-06-26 19:39:50 -07:00
Momchil Velikov
e0b83ca8a4 [MLIR][ArmNeon] Add a couple of negative tests for BFMMLA with scalable dimensions (#145882) 2025-06-26 17:09:59 +01:00
Frank Schlimbach
2bece2194a [mlir][mesh] resubmitting #144079 (#145897)
#144079 introduced a test with an uninitialized access
Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/164/builds/11140
and got reverted #145531

This PR is an exact copy of #144079 plus a trivial fix
(96c8525c82c5474d8522a973553e85a2033bee5b).
2025-06-26 16:36:17 +02:00
Nicolas Vasilache
a1d5b9d1cd [mlir][affine] Wrap SimplifyAffineMinMax in a pass (#145741)
This revision adds a pass working on FunctionOpInterface to connect recently introduced AffineMin/Max simplification patterns.

Additionally fixes some minor issues that have surfaced upon larger scale testing.
2025-06-26 15:32:51 +02:00
Nicolas Vasilache
e5a8c51c9d [mlir][tensor] Make tensor::PadOp a ReifyRankedShapedTypeOpInterface (#145867)
Co-authored-by: Fabian Mora <fmora.dev@gmail.com>
2025-06-26 14:40:57 +02:00
Momchil Velikov
a2265423d0 [MLIR][ArmNeon] Add an ArmNeon operation which maps to bfmmla (#145038) 2025-06-26 10:43:36 +01:00
Andrzej Warzyński
e9e25f02e6 [mlir][vector] Restrict vector.insert/vector.extract to disallow 0-d vectors (#121458)
This patch enforces a restriction in the Vector dialect: the non-indexed
operands of `vector.insert` and `vector.extract` must no longer be 0-D
vectors. In other words, rank-0 vector types like `vector<f32>` are
disallowed as the source or result.

EXAMPLES
--------
The following are now **illegal** (note the use of `vector<f32>`):

```mlir
%0 = vector.insert %v, %dst[0, 0] : vector<f32> into vector<2x2xf32>
%1 = vector.extract %src[0, 0] : vector<f32> from vector<2x2xf32>
```

Instead, use scalars as the source and result types:

```mlir
  %0 = vector.insert %v, %dst[0, 0] : f32 into vector<2x2xf32>
  %1 = vector.extract %src[0, 0] : f32 from vector<2x2xf32>
```

Note, this change serves three goals. These are summarised below.

## 1. REDUCED AMBIGUITY
By enforcing scalar-only semantics when the result (`vector.extract`)
or source (`vector.insert`) are rank-0, we eliminate ambiguity
in interpretation. Prior to this patch, both `f32` and `vector<f32>`
were accepted.

## 2. MATCH IMPLEMENTATION TO DOCUMENTATION
The current behaviour contradicts the documented intent. For example,
`vector.extract` states:

> Degenerates to an element type if n-k is zero.

This patch enforces that intent in code.

## 3. ENSURE SYMMETRY BETWEEN INSERT AND EXTRACT
With the stricter semantics in place, it’s natural and consistent to
make `vector.insert` behave symmetrically to `vector.extract`, i.e.,
degenerate the source type to a scalar when n = 0.

NOTES FOR REVIEWERS
-------------------
1. Main change is in "VectorOps.cpp", where stricter type checks are
   implemented.
2. Test updates in "invalid.mlir" and "ops.mlir" are minor cleanups to
   remove now-illegal examples.
2. Lowering changes in "VectorToSCF.cpp" are the main trade-off: we now
   require an additional `vector.extract` when a preceding
   `vector.transfer_read` generates a rank-0 vector.

RELATED RFC
-----------
*
https://discourse.llvm.org/t/rfc-should-we-restrict-the-usage-of-0-d-vectors-in-the-vector-dialect
2025-06-26 09:47:06 +01:00
Andrzej Warzyński
a8998fa062 [mlir][vector][nfc] Move vector.splat test (#145699)
Moves a test for vector.splat so that all invalid tests are grouped
together and easy to find.
2025-06-26 09:27:07 +01:00
Alan Li
3f3282cee8 [AMDGPU] Adding AMDGPU dialect wrapper for ROCDL transpose loads. (#145395)
* 1-to-1 mapping wrapper op.
* Direct lowering from AMDGPU wrapper to ROCDL intrinsics.
2025-06-25 22:58:14 -04:00
Chao Chen
36fbc6a8d2 [MLIR][XeGPU] Remove the transpose attribute from Gather/Scatter ops and Cleanup the documents (#145389) 2025-06-25 19:43:53 -05:00
Charitha Saumya
c539ec0db5 [mlir][vector] Add support for vector extract/insert_strided_slice in vector distribution. (#145421)
This PR adds initial support for `vector.extract_strided_slice` and
`vector.insert_strided_slice` ops in vector distribution.
2025-06-25 16:41:28 -07:00
Mehdi Amini
ff0dcc4614 [MLIR][Linalg] Harden parsing Linalg named ops (#145337)
This thread through proper error handling / reporting capabilities to
avoid hitting llvm_unreachable while parsing linalg ops.

Fixes #132755
Fixes #132740
Fixes #129185
2025-06-26 00:22:08 +02:00
MaheshRavishankar
c873e5f87d [mlir][TilingInterface] Handle multi operand consumer fusion. (#145193)
For consumer fusion cases of this form

```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {

  tensor.parallel_insert_slice ... into %arg0
  tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```

the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.

- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`

to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.

The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-25 11:54:38 -07:00
Andrzej Warzyński
5f8e7ed5a3 [mlir][linalg][nfc] Split hoisting tests into dedicated test functions (#145234)
Refactors the `@hoist_vector_transfer_pairs` test function in
`hoisting.mlir` into smaller, focused test functions - each covering a
specific `vector.transfer_read`/`vector.transfer_write` pair.

This makes it easier to identify which edge cases are tested, spot
duplication, and write more targeted and readable check lines, with less
surrounding noise.

This refactor also helped identify some issues with the original
`@hoist_vector_transfer_pairs` test:
  * Input variables `%val` and `%cmp` were unused.
  * There were no check lines for reads from `memref5`.

**Note for reviewers (current and future):**

This PR is split into small, incremental, and self-contained commits. It
should be easier to follow the changes by reviewing those commits
individually, rather than reading the full squashed diff. However, this
will be merged as a single commit to avoid adding unnecessary history
noise in-tree.
2025-06-25 19:32:38 +01:00
Andrzej Warzyński
c92b580d63 [mlir][ArmSME] Remove ConvertIllegalShapeCastOpsToTransposes (#139706)
As a follow-up to PR #135841 (see discussion for background), this patch
removes the `ConvertIllegalShapeCastOpsToTransposes` pattern from the SME
legalization pass. This change unblocks folding for ShapeCastOp involving
scalable vectors.

Originally, the `ConvertIllegalShapeCastOpsToTransposes` pattern was introduced
to rewrite certain `vector.shape_cast` ops that could not be lowered otherwise.
Based on local end-to-end testing, this workaround is no longer required, and
the pattern can now be safely removed.

This patch also removes a special case from `ShapeCastOp::fold`, simplifying
the fold logic.

As a side effect of removing `ConvertIllegalShapeCastOpsToTransposes`, we lose
the mechanism that enabled lowering of certain ops like:

```mlir
%res = vector.transfer_read %mem[%a, %b] (...) :
       memref<?x?xf32>, vector<[4]x1xf32>
```
Previously, such cases were handled by:
* Rewriting a nearby `vector.shape_cast` to a `vector.transpose` (via
  `ConvertIllegalShapeCastOpsToTransposes`)
* Then lowering the result with `LiftIllegalVectorTransposeToMemory`.

This patch introduces a new dedicated pattern,
`LowerColumnTransferReadToLoops`, that directly handles illegal
`vector.transfer_read` ops involving leading scalable dimensions.
2025-06-25 16:40:14 +01:00
Momchil Velikov
ea1e181571 [MLIR][AArch64] Lower vector.contract with mixed signed/unsigned arguments to Neon FEAT_I8MM (#144698) 2025-06-25 16:10:02 +01:00
Jack Frankland
022e1e99f3 [mlir][memref]: Fix Bug in GlobalOp Verifier (#144900)
When comparing the type of the initializer in a `memref::GlobalOp`
against its result only consider the element type and the shape. Other
attributes such as memory space should be ignored since comparing these
between tensors and memrefs doesn't make sense and constructing a memref
in a specific memory space with a tensor that has no such attribute
should be valid.

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
2025-06-25 15:14:55 +01:00
Hsiangkai Wang
d16f42d1e2 [mlir][linalg] Constrain the parameters m, r in Winograd ops (#144657)
We only support fixed set of minimum filtering algorithm for Winograd
Conv2D decomposition. Instead of letting users specify any integer,
define a fixed set of enumeration values for the parameters of minimum
filtering algorithm.
2025-06-25 14:02:07 +01:00
Momchil Velikov
3b251cd675 [MLIR] Legalize certain vector.transfer_read ops of scalable vectors (#143146)
This patch adds a transform of `transfer_read` operation to change the
vector type to one that can be mapped to an LLVM type. This is done by
collapsing trailing dimensions so we obtain a vector type with a single
trailing scalable dimension.
2025-06-25 09:54:35 +01:00
Matthias Springer
4f9adb6889 [mlir][vector] Relax operand type restrictions for vector.splat (#145517)
The vector type allows element types that implement the
`VectorElementTypeInterface`. `vector.splat` should allow any element
type that is supported by the vector type.
2025-06-25 10:45:31 +02:00
Jaden Angella
e615544e2b [mlir][EmitC] Add pass to wrap a func in class (#141158)
Goal: Enable using C++ classes to AOT compile models for MLGO.

This commit introduces a transformation pass that converts standalone
`emitc.func` operations into `emitc.class `structures to support
class-based C++ code generation for MLGO.

Transformation details:
- Wrap `emitc.func @func_name` into `emitc.class @Myfunc_nameClass`
- Converts function arguments to class fields with preserved attributes
- Transforms function body into an `execute()` method with no arguments
- Replaces argument references with `get_field` operations

Before: emitc.func @Model(%arg0, %arg1, %arg2) with direct argument
access
After: emitc.class with fields and execute() method using get_field
operations

This enables generating C++ classes that can be instantiated and
executed as self-contained model objects for AOT compilation workflows.
2025-06-24 13:07:22 -07:00
Charitha Saumya
c8a9579ff9 [mlir][xegpu] Add support for distributing gpu.barrier (#145434) 2025-06-24 09:28:30 -07:00
Qinkun Bao
b0ef912534 Revert "[mlir][mesh] adding option for traversal order in sharding propagation" (#145531)
Reverts llvm/llvm-project#144079

Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/164/builds/11140
2025-06-24 11:26:46 -04:00
Razvan Lupusoru
4847bd5ae4 [mlir][acc] Add support for data clause modifiers (#144806)
The OpenACC data clause operations have been updated to support the
OpenACC 3.4 data clause modifiers. This includes ensuring verifiers
check that only supported ones are used on relevant operations.

In order to support a seamless update from encoding the modifiers in the
data clause to this attribute, the following considerations were made:
- Ensure that modifier builders which do not take modifier are still
available.
- All data clause enum values are left in place until a complete
transition is made to the new modifiers.
2025-06-24 07:48:06 -07:00
Nicolas Vasilache
d31ba52563 [mlir][Interface] Factor out common IndexingMapOpInterface behavior in a new generic interface (#145313)
Refactor the verifiers to make use of the common bits and make
`vector.contract` also use this interface.
In the process, the confusingly named getStaticShape has disappeared.

Note: the verifier for IndexingMapOpInterface is currently called
manually from other verifiers as it was unclear how to avoid it taking
precedence over more meaningful error messages
2025-06-24 07:56:32 +02:00
Aviad Cohen
3ba7a872bf [mlir][func]: Introduce ReplaceFuncSignature tranform operation (#143381)
This transform takes a module and a function name, and replaces the
signature of the function by reordering the arguments and results
according to the interchange arrays. The function is expected to be
defined in the module, and the interchange arrays must match the number
of arguments and results of the function.
2025-06-24 06:35:06 +03:00
Shay Kleiman
5f74d9bb62 [mlir][linalg] Add support for inlined const to isaFillOpInterface (#144870) 2025-06-23 22:53:41 +03:00
MaheshRavishankar
7bc956d3d6 [mlir][PartialReductionTilingInterface] Add support for ReductionTilingStrategy::PartialReductionOuterParallel in tileUsingSCF. (#143988)
Following up from https://github.com/llvm/llvm-project/pull/143467,
this PR adds support for
`ReductionTilingStrategy::PartialReductionOuterParallel` to
`tileUsingSCF`. The implementation of
`PartialReductionTilingInterface` for `Linalg` ops has been updated to
support this strategy as well. This makes the `tileUsingSCF` come on
par with `linalg::tileReductionUsingForall` which will be deprecated
subsequently.

Changes summary
- `PartialReductionTilingInterface` changes :
  - `tileToPartialReduction` method needed to get the induction
    variables of the generated tile loops. This was needed to keep the
    generated code similar to `linalg::tileReductionUsingForall`,
    specifically to create a simplified access for slicing the
intermediate partial results tensor when tiled in `num_threads` mode.
  - `getPartialResultTilePosition` methods needs the induction
    varialbes for the generated tile loops for the same reason above,
    and also needs the `tilingStrategy` to be passed in to generate
    correct code.

The tests in `transform-tile-reduction.mlir` testing the
`linalg::tileReductionUsingForall` have been moved over to test
`scf::tileUsingSCF` with
`ReductionTilingStrategy::PartialReductionOuterParallel`
strategy. Some of the test that were doing further cyclic distribution
of the transformed code from tiling are removed. Those seem like two
separate transformation that were merged into one. Ideally that would
need to happen when resolving the `scf.forall` rather than during
tiling.

Please review only the top commit. Depends on
https://github.com/llvm/llvm-project/pull/143467

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-23 12:27:26 -07:00
MaheshRavishankar
71817856f7 [mlir][PartialReductionTilingInterface] Generalize implementation of tileUsingSCF for ReductionTilingStrategy::PartialOuterReduction. (#143467)
This is a precursor to generalizing the `tileUsingSCF` to handle
`ReductionTilingStrategy::PartialOuterParallel` strategy. This change
itself is generalizing/refactoring the current implementation that
supports only `ReductionTilingStrategy::PartialOuterReduction`.

Changes in this PR
- Move the `ReductionTilingStrategy` enum out of
  `scf::SCFTilingOptions` and make them visible to `TilingInterface`.
- `PartialTilingInterface` changes
  - Pass the `tilingStrategy` used for partial reduction to
    `tileToPartialReduction`.
  - Pass the reduction dimension along as `const
    llvm::SetVector<unsigned> &`.
- Allow `scf::SCFTilingOptions` to set the reduction dimensions that
  are to be tiled.
- Change `structured.tiled_reduction_using_for` to allow specification
  of the reduction dimensions to be partially tiled.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2025-06-23 11:23:46 -07:00
Momchil Velikov
c7d9b6ed5d [MLIR] Fix incorrect slice contiguity inference in vector::isContiguousSlice (#142422)
Previously, slices were sometimes marked as non-contiguous when they
were actually contiguous. This occurred when the vector type had leading
unit dimensions, e.g., `vector<1x1x...x1xd0xd1x...xdn-1xT>`. In such
cases, only the trailing `n` dimensions of the memref need to be
contiguous, not the entire vector rank.

This affects how `FlattenContiguousRowMajorTransfer{Read,Write}Pattern`
flattens `transfer_read` and `transfer_write` ops.

The patterns used to collapse a number of dimensions equal to the vector
rank which missed some opportunities when the leading unit dimensions of
the vector span non-contiguous dimensions of the memref.
Now that the contiguity of the slice is determined correctly, there is a
choice how many dimensions of the
memref to collapse, ranging from 
a) the number of vector dimensions after ignoring the leading unit
dimensions, up to
  b) the maximum number of contiguous memref dimensions

This patch makes a choice to do minimal memref collapsing. The rationale
behind this decision is that
this way the least amount of information is discarded.

(It follows that in some cases where the patterns used to trigger and
collapse some memref dimensions, after this patch the patterns may
collapse less dimensions).
2025-06-23 14:12:56 +01:00
Vitalii Shutov
9e704a0aa1 [MLIR][MemRef] Add alloca support for erase_dead_alloc_and_stores (#142131)
Previously, `erase_dead_alloc_and_stores` didn't support
`memref.alloca`. This patch introduces support for it.

---------

Signed-off-by: Vitalii Shutov <vitalii.shutov@arm.com>
2025-06-23 13:44:20 +01:00
Nicolas Vasilache
7360ed0159 [mlir][transform] Drop redundant padding_dimensions spec from pad_tiling_interface (#145257)
This revision aligns padding specification in pad_tiling_interface to
that of tiling specification.
Dimensions that should be skipped are specified by "padding by 0".
Trailing dimensions that are ignored are automatically completed to "pad
to 0".
2025-06-23 12:29:22 +02:00
Nicolas Vasilache
d9a99afbfc [mlir][transform] Plumb a simplified form of AffineMin folding into t… (#145170)
…ransform.pad-tiling-interface

This revision introduces a simple variant of AffineMin folding in
makeComposedFoldedAffineApply and makes use of it in
transform.pad-tiling-interface. Since this version explicitly call
ValueBoundsInterface, it may be too expensive and is only activate
behind a flag.
It results in better foldings when mixing tiling and padding, including
with dynamic shapes.
2025-06-23 12:23:24 +02:00
Luke Hutton
98a6fed096 [mlir][tosa] Allow zero-points to be unranked (#143770)
This commit allows zero-points used by a number of tosa operations to be
unranked. This allows the shape inference pass to propagate shape
information.
2025-06-23 09:38:39 +01:00
Momchil Velikov
4af96a9d83 [MLIR] Determine contiguousness of memrefs with dynamic dimensions (#142421)
This patch enhances `MemRefType::areTrailingDimsContiguous` to also
handle memrefs with dynamic dimensions.

The implementation itself is based on a new member function
`MemRefType::getMaxCollapsableTrailingDims` that return the maximum
number of trailing dimensions that can be collapsed - trivially all
dimensions for memrefs with identity layout, or by examining the memref
strides stopping at discontiguous or statically unknown strides.
2025-06-23 09:28:33 +01:00
Luke Hutton
ddfc7cb61f [mlir][tosa] Check negative output size in resize shape inference (#143382)
This commit adds a check to ensure that the calculated output height and
width, during shape inference, should be non-negative. An error is
output if this is the case.

Fixes: #142402
2025-06-23 15:17:17 +08:00
Fabian Mora
c7165587e4 [mlir][affine|ValueBounds] Add transform to simplify affine min max ops with ValueBoundsOpInterface (#145068)
This commit makes the following changes:

- Expose `map` and `mapOperands` in
`ValueBoundsConstraintSet::Variable`, so that the class can be used by
subclasses of `ValueBoundsConstraintSet`. Otherwise subclasses cannot
access those members.

- Add `ValueBoundsConstraintSet::strongCompare`. This method is similar
to `ValueBoundsConstraintSet::compare` except that it returns false when
the inverse comparison holds, and `llvm::failure()` if neither the
relation nor its inverse relation could be proven.

- Add `simplifyAffineMinOp`, `simplifyAffineMaxOp`, and
`simplifyAffineMinMaxOps` to simplify those operations using
`ValueBoundsConstraintSet`.

- Adds the `SimplifyMinMaxAffineOpsOp` transform op that uses
`simplifyAffineMinMaxOps`.

- Add the `test.value_with_bounds` op to test unknown values with a min
max range using `ValueBoundsOpInterface`.

- Adds tests verifying the transform.

Example:

```mlir
func.func @overlapping_constraints() -> (index, index) {
  %0 = test.value_with_bounds {min = 0 : index, max = 192 : index}
  %1 = test.value_with_bounds {min = 128 : index, max = 384 : index}
  %2 = test.value_with_bounds {min = 256 : index, max = 512 : index}
  %r0 = affine.min affine_map<()[s0, s1, s2] -> (s0, s1, s2)>()[%0, %1, %2]
  %r1 = affine.max affine_map<()[s0, s1, s2] -> (s0, s1, s2)>()[%0, %1, %2]
  return %r0, %r1 : index, index
}
// Result of applying `simplifyAffineMinMaxOps` to `func.func`
#map1 = affine_map<()[s0, s1] -> (s1, s0)>
func.func @overlapping_constraints() -> (index, index) {
  %0 = test.value_with_bounds {max = 192 : index, min = 0 : index}
  %1 = test.value_with_bounds {max = 384 : index, min = 128 : index}
  %2 = test.value_with_bounds {max = 512 : index, min = 256 : index}
  %3 = affine.min #map1()[%0, %1]
  %4 = affine.max #map1()[%1, %2]
  return %3, %4 : index, index
}
```

---------

Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
2025-06-23 06:05:20 +02:00