Commit Graph

23341 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
57500cd6a0 [utils][TableGen] Clarify usage of ClauseVal, rename to EnumVal (#141761)
The class "ClauseVal" actually represents a definition of an enumeration
value, and in itself it is not bound to any clause. Rename it to EnumVal
and add a comment clarifying how it's translated into an actual enum
definition in the generated source code.

There is no change in functionality.
2025-06-04 08:16:21 -05:00
Igor Wodiany
3ce3281989 [mlir][spirv] Check output of getConstantInt (#140568)
This patch adds an assert to check if the result of `getConstantInt` is
non-null. Previously the code failed with Segmentation Fault if
`getConstantInt` failed to look up the value. This primarily occurrs when
the value is defined as OpSpecConstant rather than OpConstant.
2025-06-04 13:15:28 +01:00
Vadim Curcă
5a531b1158 [mlir] NFC: Add data flow analysis extension points (#142549)
This commit introduces `visitCallOperation` and `visitCallableOperation`
extension points in the sparse data flow analysis framework. This
allows, for example, to make the analysis less conservative, without a
lot of code duplication, propagating information even if not all the
call or return sites are known.
2025-06-04 14:15:05 +02:00
Srinivasa Ravi
4e4273c940 [MLIR][NVVM] Add dot.accumulate.2way Op (#140518)
This change adds the `dot.accumulate.2way` Op to the NVVM dialect for
16-bit to 8-bit dot-product accumulate operation.

PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-dp2a
2025-06-04 13:29:46 +05:30
Aviad Cohen
4c6449044a [mlir]: Added properties/attributes ignore flags to OperationEquivalence (#142623)
Those flags are useful for cases and operation which we may consider equivalent even when their attributes/properties are not the same.
2025-06-04 10:01:20 +03:00
Ian Wood
f5a2f00da9 Revert "[mlir][tensor] Loosen restrictions on folding dynamic reshapes" (#142639)
Reverts llvm/llvm-project#137963

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-06-03 14:10:41 -07:00
Kazu Hirata
95ce58bc4a [mlir] Fix a warning
This patch fixes:

  mlir/lib/Dialect/Tensor/IR/TensorOps.cpp:1680:37: error: comparison
  of integers of different signs: 'int' and 'uint64_t' (aka 'unsigned
  long') [-Werror,-Wsign-compare]
2025-06-03 10:11:51 -07:00
Tai Ly
04b63ac1ab [tosa] Change VariableOp to align with spec (#142240)
This fixes Tosa VariableOp to align with spec 1.0
  - add var_shape attribute to store shape of variable type
  - change type attribute to store element type of variable type
  - add a builder so previous construction calls still work
- fix up level check of rank to be on variable type instead of initial
value which is optional
  - add level check of size for variable type
  - add lit tests for variable op's without initial values
  - add lit test for variable op with fixed rank but unknown dimension
  - add invalid lit test for variable op with unranked type

Signed-off-by: Tai Ly <tai.ly@arm.com>
2025-06-03 17:41:33 +01:00
asraa
34d8275e4f [mlir][tensor] add tensor insert/extract op folders (#142458)
Adds a few canonicalizers, folders, and rewrite patterns to tensor ops:

* tensor.insert folder: insert into a constant is replaced with a new
constant
* tensor.extract folder: extract from a parent tensor that was inserted
at the same indices is folded into the inserted value
* rewrite pattern added that replaces an extract of a collapse shape
with an extract of the source tensor (requires static source dimensions)

Signed-off-by: Asra Ali <asraa@google.com>
2025-06-03 09:16:03 -07:00
Artem Gindinson
cb4a407e5c [mlir][tensor] Loosen restrictions on folding dynamic reshapes (#137963)
The main idea behind the change is to allow expand-of-collapse folds for
reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the
expand op must have a coherent index/affine expression specified in its
`output_shape` argument (see example below), and if it doesn't, the IR
has already been invalidated at an earlier stage:
```
%c32 = arith.constant 32 : index
%div = arith.divsi %<some_index>, %c32 : index
%collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]]
	         : tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32>
%affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div]
%expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine]
		: tensor<9x?x?xf32> into tensor<9x?x32x?xf32>
```

On the above assumption, adjust the routine in
`getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond
just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of
issues:
1. n>2 consecutive dynamic dimensions in the source shape cannot be
collapsed together into 1<k<n neighboring dynamic dimensions in the
target shape, since there'd be more than one suitable reassociation
(example: `?x?x10x? into ?x?`)
2. When figuring out static subshape reassociations based on products,
there are cases where a static dimension is collapsed with a dynamic
one, and should therefore be skipped when comparing products of source &
target dimensions (e.g. `?x2x3x4 into ?x12`)

To address 1, we should detect such sequences in the target shape before
assigning multiple dynamic dimensions into the same index set. For 2, we
take note that a static target dimension was preceded by a dynamic one
and allow an "offset" subshape of source static dimensions, as long as
there's an exact sequence for the target size later in the source shape.

This PR aims to address all reshapes that can be determined based purely
on shapes (and original reassociation
maps, as done in
`ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't
seem possible to fold all qualifying dynamic shape patterns in a
deterministic way without looking into affine expressions
simultaneously. That would be difficult to maintain in a single general
utility, so a path forward would be to provide dialect-specific
implementations for Linalg/Tensor.

Signed-off-by: Artem Gindinson <gindinson@roofline.ai>

---------

Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-06-03 09:09:01 -07:00
Igor Wodiany
7797824297 [mlir][spirv] Allow disabling control flow structurization (#140561)
Currently some control flow patterns cannot be structurized into
existing SPIR-V MLIR constructs, e.g., conditional early exits (break).
Since the support for early exit cannot be currently added
(https://github.com/llvm/llvm-project/pull/138688#pullrequestreview-2830791677)
this patch enables structurizer to be disabled to keep
the control flow unstructurized. By default, the control flow is
structurized.
2025-06-03 15:41:39 +01:00
Md Abdullah Shahneous Bari
dc297cbc9a [mlir][memref][spirv] Add conversion for memref.extract_aligned_pointer_as_index to SPIR-V (#86750)
Converts memref.extract_aligned_pointer_as_index to spirv.ConvertPtrToU.
Index conversion is done based on 'use-64bit-index' option.
2025-06-03 09:39:14 -05:00
Momchil Velikov
878badc44d [MLIR][AArch64] Add an extra test for Neon I8MM (NFC) (#135777) 2025-06-03 12:12:57 +01:00
Michele Scuttari
9289604cf6 [MLIR] Use cached symbol tables in getFuncOpsOrderedByCalls (#141967)
Address TODO regarding the recomputation of symbol tables. The signature of the `getFuncOpsOrderedByCalls` function is modified to receive the collection of cached symbol tables.
2025-06-03 11:29:02 +02:00
Momchil Velikov
be9334a68e [MLIR] Add apply_patterns.arm_neon.vector_contract_to_i8mm TD Op (#140251)
This patch wraps `populateLowerContractionToSMMLAPatternPatterns` into a
new TD Op `apply_patterns.arm_neon.vector_contract_to_i8mm` .

It also removes the "test-lower-to-arm-neon" pass.
2025-06-03 10:21:13 +01:00
Michele Scuttari
b4ded99a4a [MLIR] Make SymbolTableCollection methods virtual (#141760)
The `LockedSymbolTable` class not only encapsulate a `SymbolTableCollection`, but also extends it. However, the methods of `SymbolTableCollection` are not marked as `virtual`, and therefore methods receiving a `SymbolTableCollection` would always call the base methods even if the object was a subclass. The proposed changes consist in marking the base methods as `virtual`.
2025-06-03 10:53:17 +02:00
Han-Chung Wang
58ea53863b [mlir][memref] Add a folder for chained AssumeAlignmentOp ops. (#142425)
The chained ops can be folded away when they have the same alignment.

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-06-02 21:09:42 -07:00
Chao Chen
9e2684e4cf [MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#142477)
Bring back https://github.com/llvm/llvm-project/pull/140163 with fixes
2025-06-02 21:39:30 -05:00
Chao Chen
b88dfb0b23 Revert "[MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N]" (#142459)
Reverts llvm/llvm-project#140163
2025-06-02 15:47:21 -04:00
Ian Wood
c005df3c7e [mlir][linalg] Fix EraseIdentityLinalgOp on fill-like ops (#130000)
Adds a check to make sure that the linalg op is safe to erase by
ensuring that the `linalg.yield` is yielding one of the linalg op's
block args. This check already exists for linalg ops with pure tensor
semantics.


Closes https://github.com/llvm/llvm-project/issues/129414

---------

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-06-02 12:18:57 -07:00
Chao Chen
0210750d5a [MLIR][XeGPU] Add unroll patterns and blocking pass for XeGPU [2/N] (#140163)
This PR introduces the initial implementation of a blocking pass for
XeGPU programs. The pass leverages unroll patterns from both the XeGPU
and Vector dialects. 

---------

Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
2025-06-02 14:02:45 -05:00
James Newling
543446a353 [mli][vector] canonicalize vector.from_elements from ascending extracts (#139819)
Example:
```mlir
%0 = vector.extract %source[0, 0] : i8 from vector<1x2xi8>
%1 = vector.extract %source[0, 1] : i8 from vector<1x2xi8>
%2 = vector.from_elements %0, %1 : vector<2xi8>
```

becomes
```mlir
%2 = vector.shape_cast %source : vector<1x2xi8> to vector<2xi8>
```

It was decided that we should spill canonicalization tests into new
files (see
[discussion](https://github.com/llvm/llvm-project/pull/135096#pullrequestreview-2760245596))
In view of this I added the new tests to a new file specifically for
canonicalization of from_elements. To be consistent in the location of
the tests, I moved existing tests `extract_scalar_from_from_element`,
`extract_1d_from_from_elements`, `extract_2d_from_from_elements` and
`from_elements_to_splat` from `canonicalize.mlir` to
`canonicalze/vector-from-elements.mlir`. In addition to moving I changed
the LIT variables to all be upper-case for consistency.
2025-06-02 11:15:25 -07:00
Han-Chung Wang
77e2e3f641 [mlir][memref] Update tests to use memref.assume_alignment properly. (#142358)
With
ffb9bbfd07,
memref.assume_alignment op returns a result value. The revision updates
the tests to reflect the change:

- Update all the lit tests to use the result of memref.assume_alignment,
if it is present.
- Capture the result of the op in lit tests.

---------

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-06-02 07:57:36 -07:00
Artem Gindinson
af6e3c045b [mlir][math] Fix intrinsic conversions to LLVM for 0D-vector types (#141020)
`vector<t>` types are not compatible with the LLVM type system – with
the current approach employed within `LLVMTypeConverter`, they must be
explicitly converted into `vector<1xt>` when lowering. Employ this rule
within the conversion patterns for intrinsics that are handled directly
within `MathToLLVM`: `math.ctlz` `.cttz`, `.absi`, `.expm1`, `.log1p`,
`.rsqrt`, `.isnan`, `.isfinite`.

This change does not cover/test patterns that are based off
`VectorConvertToLLVMPattern` template from `LLVMCommon/VectorPattern.h`.

---------

Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
2025-06-02 12:27:44 +01:00
TatWai Chong
adf9fedd47 [mlir][tosa] Add assembly format validation for COND_IF op (#142254)
COND_IF's simplified form - where redundant operand notations are
omitted - is not conformant to the specification. According to the
specification, all operands passed into an operation must be explicitly
declared at each operation's structure. Add optional check to verify if
the given form complies with the specification.
2025-06-02 10:47:49 +01:00
Jacques Pienaar
e49738b3ac [mlir][lsp] Enable registering dialects based on URI. (#141331)
Previously the dialects registered were fixed per LSP binary. This works
as long as all the dialects of interest from the different projects
across which one uses the LSP, are disjoint. This expands this to
support cases where there are dialects that overlap in dialect name but
usage of these are separate wrt projects. The alternative is multiple
binaries and switching LSP used in editor per project (there is some
extra complexity in hosted instances).

This handles a simple (I believe common case) where one can determine
based on path and have single binary - the cost of dynamically doing so
based on path would be either keeping different registries to return or
repopulating dialect & extension maps.
2025-06-01 23:55:32 -07:00
Vitaly Buka
60250c15e0 Revert "[mlir]: Added properties/attributes ignore flags to OperationEquivalence" (#142319)
Reverts llvm/llvm-project#141664

See
https://github.com/llvm/llvm-project/pull/141664#issuecomment-2927867604
2025-06-01 13:46:18 -07:00
Vitaly Buka
002c0abcd8 Revert "Fixed wrong check OperationEquivalenceTest.HashWorksWithFlags" (#142318)
Reverts llvm/llvm-project#142210

This is not enough, see #141664
2025-06-01 13:45:32 -07:00
Longsheng Mou
26b81c4300 [mlir][memref] Add terminator check to prevent a crash (#141972)
This PR adds terminator check to prevent a crash when invoke
`lastNonTerminatorInRegion`. Fixes #137333.
2025-05-31 13:25:42 +08:00
Aviad Cohen
ab77a70a74 Fixed wrong check OperationEquivalenceTest.HashWorksWithFlags (#142210)
The check was meant to check `IgnoreProperties` works as expected but
operated on the wrong operation.

Co-authored-by: Aviad Cohen <aviad.cohen2@mobileye.com>
2025-05-30 18:14:28 -07:00
Krzysztof Drewniak
66a357f2a4 [mlir] Unique property constraints where possible (#140849)
Now that `Property` is a `PropConstraint`, hook it up to the same
constraint-uniquing machinery that other types of constraints use. This
will primarily save on code size for types, like enums, that have
inherent constraints which are shared across many operations.
2025-05-30 16:21:50 -05:00
Aviad Cohen
c5f3018668 [mlir]: Added properties/attributes ignore flags to OperationEquivalence (#141664)
Those flags are useful for cases and operation which we may consider
equivalent even when their attributes/properties are not the same.
2025-05-30 22:18:07 +03:00
Krzysztof Drewniak
a236dc63bf [mlir][NFC] Make Property a subclass of PropConstraint (#140848)
In preparation for allowing non-attribute properties in the declaritive
rewrite pattern system, make `Property` a subclass of `PropConstraint`
in tablegen and add a CK_Prop to the Constraint class for tablegen.

Like `TypeConstraint` but unlike other constraints, a `PropConstraint`
has an additional field - the C++ interface type of the property being
constraint (if it's known).
2025-05-30 12:02:07 -05:00
Cameron McInally
ce9cef79ea [flang] Add support for -mprefer-vector-width=<value> (#142073)
This patch adds support for the -mprefer-vector-width= command line
option. The parsing of this options is equivalent to Clang's and it is
implemented by setting the "prefer-vector-width" function attribute.

Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
2025-05-30 07:50:18 -06:00
Andrzej Warzyński
85f791d9cd [mlir][linalg][nfc] Move vectorization tests (#141656)
Moves all the remaining Linalg vectorization tests from:
  * `mlir/tests/Dialect/Linalg/*`

to:
  * `mlir/tests/Dialect/Linalg/vectorization/*`

To maintain consistency within tests,  `vectorize-convolution.mlir` 
was updated to use:
  *  `transform.structured.vectorize_children_and_apply_patterns` 

instead of:
  * `-test-linalg-transform-patterns=test-linalg-to-vector-patterns`

This change required minor updates to some `CHECK` lines, reflecting
only reordering of ops due to an additional pattern being applied.

Closes #141025
2025-05-30 09:21:19 +01:00
Michael Tyler Maitland
2e82a17f4e [mlir][value] Fix the ASAN error introduced in #142084 2025-05-30 02:21:28 -04:00
Michael Maitland
7454098a9e [mlir][Value] Add getNumUses, hasNUses, and hasNUsesOrMore to Value (#142084)
We already have hasOneUse. Like llvm::Value we provide helper methods to
query the number of uses of a Value. Add unittests for Value, because
that was missing.

---------

Co-authored-by: Michael Maitland <michaelmaitland@meta.com>
2025-05-30 00:39:45 -04:00
Han-Chung Wang
587d6fcbb6 [mlir] Recover the behavior of SliceAnaylsis for llvm-project@6a8dde04a07 (#142076)
In
6a8dde04a0,
it changes the method to return LogicalFailure, so callers can handle
the failure instead of crashing, if I read the intention correctly.
However, it changes the behavior of the implementation; it breaks
several integratino tests in downstream projects (e.g., IREE).

Before the change, processValue does not treat it as a failure if the
check below TODO has a false condition. However, with the new change, it
starts treating it as a failure.

The revision updates the final `else` branch (i.e., `llvm_unreachable`
line) to return a failure, and return success at the end; the behavior
is recovered.

```cpp
auto processValue = [&](Value value) {
  if (auto *definingOp = value.getDefiningOp()) {
    if (backwardSlice->count(definingOp) == 0)
      getBackwardSliceImpl(definingOp, backwardSlice, options);
  } else if (auto blockArg = dyn_cast<BlockArgument>(value)) {
    if (options.omitBlockArguments)
      return;

    Block *block = blockArg.getOwner();
    Operation *parentOp = block->getParentOp();
    // TODO: determine whether we want to recurse backward into the other
    // blocks of parentOp, which are not technically backward unless they flow
    // into us. For now, just bail.
    if (parentOp && backwardSlice->count(parentOp) == 0) {
      assert(parentOp->getNumRegions() == 1 &&
             llvm::hasSingleElement(parentOp->getRegion(0).getBlocks()));
      getBackwardSliceImpl(parentOp, backwardSlice, options);

    }
  } else {
    llvm_unreachable("No definingOp and not a block argument.");
  }
```

No additional tests are added, like the previous commit. This revision
is mostly a post-fix for
6a8dde04a0

Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>

Signed-off-by: hanhanW <hanhan0912@gmail.com>
2025-05-29 19:18:14 -07:00
Longsheng Mou
ba75febd4f [mlir][gpu] Update descriptions format of GPU ops(NFC) (#141395) 2025-05-29 20:07:57 +08:00
Will Froom
ebe25d8428 [MLIR] Add missing move constructor / assignment operator to DialectRegistry (#141915)
Fix after #140963
2025-05-29 10:07:02 +01:00
Luke Hutton
0105f657e2 [mlir][tosa] Fix mul op verifier when input types don't match result (#141617)
This commit fixes a crash when operand types are not integer, but the
result is. While this isn't valid, the verifier should not crash.
2025-05-29 09:27:40 +01:00
Luke Hutton
76051980ea [mlir][tosa] Allow unranked input/output tensors in resize ops (#141608)
This commit allows the input/output of the resize op to be unranked to
account for shapes being computed during shape inference.
2025-05-29 09:27:24 +01:00
Srinivasa Ravi
aca088d802 [MLIR][NVVM] Update dot.accumulate.4way NVVM Op (#141223)
This change refactors and updates the `dot.accumulate.4way` NVVM Op to
be more descriptive and readable.
2025-05-29 10:51:11 +05:30
drazi
25c5235f30 assert with more information to help debug (#132194)
This PR output debug message to assertion to help debug user python
code. Will print out more friendly information

```
>           assert isinstance(arg, _cext.ir.Value), f"expects Value, got {type(arg)}"                                                          
E           AssertionError: expected Value, got <class 'UserDefinedClass'>       
```
2025-05-29 00:14:37 -04:00
Muzammil
893ef7ffbd [mlir][GPU] Fixes subgroup reduce lowering (#141825)
Fixes the final reduction steps which were taken from an implementation
of scan, not reduction, causing lanes earlier in the wave to have
incorrect results due to masking.

Now aligning more closely with triton implementation :
https://github.com/triton-lang/triton/pull/5019

# Hypothetical example
To provide an explanation of the issue with the current implementation,
let's take the simple example of attempting to perform a sum over 64
lanes where the initial values are as follows (first lane has value 1,
and all other lanes have value 0):
```
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
When performing a sum reduction over these 64 lanes, in the current
implementation we perform 6 dpp instructions which in sequential order
do the following:
1) sum over clusters of 2 contiguous lanes
2) sum over clusters of 4 contiguous lanes
3) sum over clusters of 8 contiguous lanes
4) sum over an entire row
5) broadcast the result of last lane in each row to the next row and
each lane sums current value with incoming value.
5) broadcast the result of the 32nd lane to last two rows and each lane
sums current value with incoming value.

After step 4) the result for the example above looks like this:

```
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```

After step 5) the result looks like this:
```
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```

After step 6) the result looks like this:
```
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```
Note that the correct value here is always 1, yet after the
`dpp.broadcast` ops some lanes have incorrect values. The reason is that
for these incorrect lanes, like lanes 0-15 in step 5, the
`dpp.broadcast` op doesn't provide them incoming values from other
lanes. Instead these lanes are provided either their own values, or 0
(depending on whether `bound_ctrl` is true or false) as values to sum
over, either way these values are stale and these lanes shouldn't be
used in general.

So what this means:
- For a subgroup reduce over 32 lanes (like Step 5), the correct result
is stored in lanes 16 to 31
- For a subgroup reduce over 64 lanes (like Step 6), the correct result
is stored in lanes 32 to 63.

However in the current implementation we do not specifically read the
value from one of the correct lanes when returning a final value. In
some workloads it seems without this specification, the stale value from
the first lane is returned instead.

# Actual failing test
For a specific example of how the current implementation causes issues,
take a look at the IR below which represents an additive reduction over
a dynamic dimension.
```
!matA = tensor<1x?xf16>
!matB = tensor<1xf16>
#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1) -> (d0)>
func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB {
  %cst_1 = arith.constant 0.000000e+00 : f16
  %c2_i64 = arith.constant 2 : i64
  %0 = tensor.empty() : !matB
  %2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB
  %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB)  {
  ^bb0(%in: f16, %out: f16):
    %7 = arith.addf %in, %out : f16
    linalg.yield %7 : f16
  } -> !matB
  return %4 : !matB
}
```
When provided an input of type `tensor<1x2xf16>` and values `{0, 1}` to
perform the reduction over, the value returned is consistently 4. By the
same analysis done above, this shows that the returned value is coming
from one of these stale lanes and needs to be read instead from one of
the lanes storing the correct result.

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
2025-05-28 17:47:22 -05:00
Hsiangkai Wang
8fb09c8d09 [mlir][gpu] Add GPU subgroup MMA extract and insert operations (#139048)
- Introduced `gpu.subgroup_mma_extract` operation to extract values from
`!gpu.mma_matrix` by invocation and indices.
- Introduced `gpu.subgroup_mma_insert` operation to insert values into
`!gpu.mma_matrix` by invocation and indices.
- Updated the conversion patterns to SPIR-V for both extract and insert
operations.
- Added test cases to validate the new operations in the GPU to SPIR-V
conversion.

RFC:
https://discourse.llvm.org/t/rfc-add-gpu-operations-to-permute-data-in-2-loaded-mma-matrix/86148?u=hsiangkai
2025-05-28 20:40:17 +01:00
Sang Ik Lee
3fa65dee14 [mlir] SYCL runtime wrapper: add memcpy support. (#141647) 2025-05-28 11:33:15 -07:00
Bruno Cardoso Lopes
86685b95bf [MLIR][LLVM][DLTI] Handle data layout token 'n32:64' (#141299) 2025-05-28 11:07:03 -07:00
Kareem Ergawy
a8d8af3bfa [OpenMP][OMPIRBuilder] Collect users of a value before replacing them in target outlined function (#139064)
This PR fixes a crash that curently happens given the following input:
```fortran
subroutine caller()
  real :: x
  integer :: i

  !$omp target
    x = i
    call callee(x,x)
  !$omp end target
endsubroutine caller

subroutine callee(x1,x2)
  real :: x1, x2
endsubroutine callee
```

The crash happens because the following sequence of events is taken by
the `OMPIRBuilder`:
1. ....
2. An outlined function for the target region is created. At first the
outlined function still refers to the SSA values from the original
function of the target region.
3. The builder then iterates over the users of SSA values used in the
target region to replace them with the corresponding function arguments
of outlined function.
4. If the same instruction references the SSA value more than once (say
m), all uses of that SSA value are replaced in the instruction. Deleting
all m uses of the value.
5. The next m-1 iterations will still iterate over the same instruction
dropping the last m-1 actual users of the value.

Hence, we collect all users first before modifying them.
2025-05-28 17:40:34 +02:00
Leonid Gorbunov
ff5f8e513c [MLIR][Presburger] removeTrivialRedundancy: skip unnecessary check for duplicate constraints (#138969)
`removeTrivialRedundancy` first marks duplicate rows redundant, then
when multiple rows differ only by a constant term, it removes all but
one of them. Since the latter removes all but one duplicate row as well,
it is unnecessary (redundant!) to mark duplicate rows redundant. So we
remove this step.
2025-05-28 13:21:00 +01:00