Commit Graph

184 Commits

Author SHA1 Message Date
Sayan Saha
26722f5b61 [MLIR] Fix incorrect memref::DimOp canonicalization, add tensor::DimOp canonicalization (#84225)
The current canonicalization of `memref.dim` operating on the result of
`memref.reshape` into `memref.load` is incorrect as it doesn't check
whether the `index` operand of `memref.dim` dominates the source
`memref.reshape` op. It always introduces `memref.load` right after
`memref.reshape` to ensure the `memref` is not mutated before the
`memref.load` call. As a result, the following error is observed:

```
$> mlir-opt --canonicalize input.mlir

func.func @reshape_dim(%arg0: memref<*xf32>, %arg1: memref<?xindex>, %arg2: index) -> index {
    %c4 = arith.constant 4 : index
    %reshape = memref.reshape %arg0(%arg1) : (memref<*xf32>, memref<?xindex>) -> memref<*xf32>
    %0 = arith.muli %arg2, %c4 : index
    %dim = memref.dim %reshape, %0 : memref<*xf32>
    return %dim : index
  }
```

results in:

```
dominator.mlir:22:12: error: operand #1 does not dominate this use
    %dim = memref.dim %reshape, %0 : memref<*xf32>
           ^
dominator.mlir:22:12: note: see current operation: %1 = "memref.load"(%arg1, %2) <{nontemporal = false}> : (memref<?xindex>, index) -> index
dominator.mlir:21:10: note: operand defined here (op in the same block)
    %0 = arith.muli %arg2, %c4 : index
```

Properly fixing this issue requires a dominator analysis which is
expensive to run within a canonicalization pattern. So, this patch fixes
the canonicalization pattern by being more strict/conservative about the
legality condition in which we perform this canonicalization.
The more general pattern is also added to `tensor.dim`. Since tensors are
immutable we don't need to worry about where to introduce the
`tensor.extract` call after canonicalization.
2024-03-11 19:37:33 -07:00
James Newling
67ef4ae2c3 [MLIR][Tensor,MemRef] Fold expand_shape and collapse_shape if identity (#80658)
Before: op verifiers failed if the input and output ranks were the same
(i.e. no expansion or collapse). This behavior requires users of these
shape ops to verify manually that they are not creating identity
versions of these ops every time they build them -- problematic. This PR
removes this strict verification, and introduces folders for the the
identity cases.

The PR also removes the special case handling of rank-0 tensors for
expand_shape and collapse_shape, there doesn't seem to be any reason to
treat them differently.
2024-03-12 10:11:58 +09:00
Matthias Springer
9efdccb26f [mlir][memref] memref.subview: Verify result strides with rank reductions (#80158)
This is a follow-up on #79865. Result strides are now also verified if
the `memref.subview` op has rank reductions.
2024-02-02 10:17:55 +01:00
Matthias Springer
ce7cc723b9 [mlir][memref] memref.subview: Verify result strides
The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases.

Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit.

Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
* Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering.

Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)
2024-01-31 09:28:53 +00:00
Matthias Springer
96c907dbce Revert "[mlir][memref] memref.subview: Verify result strides" (#80116)
Reverts llvm/llvm-project#79865

I think there is a bug in the stride computation in
`SubViewOp::inferResultType`. (Was already there before this change.)

Reverting this commit for now and updating the original pull request
with a fix and more test cases.
2024-01-31 09:35:13 +01:00
Matthias Springer
db49319264 [mlir][memref] memref.subview: Verify result strides (#79865)
The `memref.subview` verifier currently checks result shape, element
type, memory space and offset of the result type. However, the strides
of the result type are currently not verified. This commit adds
verification of result strides for non-rank reducing ops and fixes
invalid IR in test cases.

Verification of result strides for ops with rank reductions is more
complex (and there could be multiple possible result types). That is
left for a separate commit.

Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped
dimensions, there must be something wrong with the op. Return
`FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the
op has rank reductions or not. Inline the implementation into the
verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
2024-01-31 09:14:48 +01:00
Benjamin Maxwell
fdf73e9495 [mlir][memref] Remove incorrect memref.transpose fold (#79809)
This folded casts into `memref.transpose` without updating the result
type of the transpose op, which resulted in IR that failed to verify for
statically sized memrefs.

i.e.

```mlir
%cast = memref.cast %0 : memref<?x4xf32> to memref<?x?xf32>
%transpose = memref.transpose %cast : memref<?x?xf32> to memref<?x?xf32>
```

would fold to:

```mlir
// Fails verification:
%transpose = memref.transpose %cast : memref<?x4xf32> to memref<?x?xf32>
```
2024-01-30 15:58:22 +00:00
Oleksandr "Alex" Zinenko
2798b72ae7 [mlir] introduce debug transform dialect extension (#77595)
Introduce a new extension for simple print-debugging of the transform
dialect scripts. The initial version of this extension consists of two
ops that are printing the payload objects associated with transform
dialect values. Similar ops were already available in the test extenion
and several downstream projects, and were extensively used for testing.
2024-01-12 13:24:02 +01:00
Felix Schneider
4619e21c72 [mlir][memref] Transpose: allow affine map layouts in result, extend folder (#76294)
Currently, the `memref.transpose` verifier forces the result type of the
Op to have an explicit `StridedLayoutAttr` via the method
`inferTransposeResultType`. This means that the example Op
given in the documentation is actually invalid because it uses an `AffineMap`
to specify the layout.
It also means that we can't "un-transpose" a transposed memref back to
the implicit layout form, because the verifier will always enforce the
explicit strided layout.

This patch makes the following changes:

1. The verifier checks whether the canonicalized strided layout of the
result Type is identitcal to the canonicalized infered result type
layout. This way, it's only important that the two Types have the same
strided layout, not necessarily the same representation of it.
2. The folder is extended to support folding away the trivial case of
identity permutation and to fold one transposition into another by
composing the permutation maps.
2024-01-11 19:54:49 +01:00
Rik Huijzer
672f1a036a [mlir][memref] Make LoadOp::verify error more clear (#75831)
While debugging https://github.com/llvm/llvm-project/issues/71326, the
`LoadOp::verify` code and error were very confusing. This PR improves
that.

This code was a part from the reverted PR
https://github.com/llvm/llvm-project/pull/75519. Fixing the
`-convert-vector-to-scf` issue is going to take a bit longer and this
code was out of scope anyway.

Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
2023-12-18 18:41:05 +01:00
Rik Huijzer
9f5afc3de9 Revert "[mlir][vector] Fix invalid LoadOp indices being created (#75519)"
This reverts commit 3a1ae2f46d.
2023-12-17 12:34:17 +01:00
Rik Huijzer
3a1ae2f46d [mlir][vector] Fix invalid LoadOp indices being created (#75519)
Fixes https://github.com/llvm/llvm-project/issues/71326.

The cause of the issue was that a new `LoadOp` was created which looked
something like:
```mlir
%arg4 = 
func.func main(%arg1 : index, %arg2 : index) {
  %alloca_0 = memref.alloca() : memref<vector<1x32xi1>>
  %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>>
  %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>>
  return
}
```
which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is
0 dimensional, `%1` has one dimension, but `memref.load` tries to index
`%1` with two indices.

This is now fixed by using the fact that `unpackOneDim` always unpacks
one dim


1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)

and so the `loadOp` should just index only one dimension.

---------

Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
2023-12-17 11:42:35 +01:00
Guray Ozen
c65d8c7187 [mlir][memref] extract_strided_metadata for zero-sized memref (#74835) 2023-12-08 15:55:14 +01:00
Rik Huijzer
68f0bc6f2e [mlir] Fix a zero stride canonicalizer crash (#74200)
This PR fixes https://github.com/llvm/llvm-project/issues/73383 and is
another shot at the refactoring proposed in
https://github.com/llvm/llvm-project/pull/72885.

---------

Co-authored-by: Kai Sasaki <lewuathe@gmail.com>
2023-12-06 07:35:18 +01:00
Max191
3a6f02a658 [mlir] Add subbyte emulation support for memref.store. (#73174)
This adds a conversion for narrow type emulation of memref.store ops.
The conversion replaces the memref.store with two memref.atomic_rmw ops.
Atomics are used to prevent race conditions on same-byte accesses, in
the event that two threads are storing into the same byte.

Fixes https://github.com/openxla/iree/issues/15370
2023-11-28 11:51:30 -08:00
Max191
b823f8469b [mlir] Add support for memref.alloca sub-byte emulation (#73138)
Adds a similar case to `memref.alloc` for `memref.alloca` in
EmulateNarrowTypes.

Fixes https://github.com/openxla/iree/issues/15515
2023-11-27 16:28:22 -08:00
Max191
b29332a318 [mlir] Add narrow type emulation for memref.reinterpret_cast (#73144) 2023-11-27 10:41:14 -08:00
Rik Huijzer
1949fe90bf [mlir] Verify non-negative offset and size (#72059)
In #71153, the `memref.subview` canonicalizer crashes due to a negative
`size` being passed as an operand. During `SubViewOp::verify` this
negative `size` is not yet detectable since it is dynamic and only
available after constant folding, which happens during the
canonicalization passes. As discussed in
<https://discourse.llvm.org/t/rfc-more-opfoldresult-and-mixed-indices-in-ops-that-deal-with-shaped-values/72510>,
the verifier should not be extended as it should "only verify local
aspects of an operation".

This patch fixes #71153 by not folding in aforementioned situation.

Also, this patch adds a basic offset and size check in the
`OffsetSizeAndStrideOpInterface` verifier.

Note: only `offset` and `size` are checked because `stride` is allowed
to be negative
(54d81e49e3).
2023-11-16 07:42:37 +01:00
Max191
dae3c44ce6 [mlir] Add vector.store/maskedstore of memref.subview memref alias folding (#72184)
Fixes https://github.com/openxla/iree/issues/15575
2023-11-14 14:24:54 -08:00
Quinn Dawkins
48f980c535 [mlir][memref] Add memref alias folding for masked transfers (#71476)
The contents of a mask on a masked transfer are unaffected by the
particular region of memory being read/stored to, so just forward the
mask in subview folding patterns.
2023-11-07 08:56:54 -05:00
tyb0807
5aa2c65abd [mlir][MemRef] Add subview folding pattern for vector.maskedload (#71380)
This is required for fixing https://github.com/openxla/iree/issues/15031
2023-11-06 20:08:30 +01:00
Théo Degioanni
b142501e92 [mlir][memref] Fix segfault in SROA (#71063)
Fixes #70902.

The out of bounds check in the SROA implementation for MemRef was not
actually testing anything because it only operated on a store op which
does not trigger the logic by itself. It is now checked for real and the
underlying bug is fixed.

I checked the LLVM implementation just in case but this should not
happen as out-of-bound checks happen in GEP's verifier there.
2023-11-06 13:53:16 +01:00
Matthias Springer
437c62178c [mlir][memref] Remove redundant memref.tensor_store op (#71010)
`bufferization.materialize_in_destination` should be used instead. Both
ops bufferize to a memcpy. This change also conceptually cleans up the
memref dialect a bit: the memref dialect no longer contains ops that
operate on tensor values.
2023-11-05 12:47:18 +09:00
Matthias Springer
6086c272a3 [mlir][memref] Fix out-of-bounds crash when reifying result dims (#70774)
Do not crash when the input IR is invalid, i.e., when the index of the
dimension operand of a `tensor.dim`/`memref.dim` is out-of-bounds. This
fixes #70180.
2023-10-31 17:26:56 +09:00
Oleksandr "Alex" Zinenko
e4384149b5 [mlir] use transform-interpreter in test passes (#70040)
Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.

Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.

Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
2023-10-24 16:12:34 +02:00
Felix Schneider
f32b3e1caa [mlir][memref] Fix index delinearization for CollapseShapeOp folding (#68833)
The `resolveSourceIndicesCollapseShape` method is used to compute
indices into the source `MemRef` of a `CollapseShapeOp` from the
collapsed indices. This method didn't check for dynamic sizes of the
source shape which led to a crash.

Fix https://github.com/llvm/llvm-project/issues/68483
2023-10-12 07:12:43 +02:00
Kunwar Grover
8f397e04e5 [mlir][memref] Fix emulate narrow types for strided memref offset (#68181)
This patch fixes strided memref offset calculation for emulating narrow
types.

As a side effect, this patch also adds support for a 1-D subviews with
static sizes, static offsets and strides of 1 for testing. Emulate
narrow types pass was not tested for strided memrefs before this patch.
2023-10-06 04:52:33 +05:30
qcolombet
932dc9d8c4 [mlir][MemRef] Add a pattern to simplify `extract_strided_metadata(ca… (#68291)
…st)`

`expand-strided-metadata` was missing a pattern to get rid of
`memref.cast`.
The pattern is straight foward:
Produce a new `extract_strided_metadata` with the source of the cast and
fold the static information (sizes, strides, offset) along the way.
2023-10-05 14:32:42 +02:00
Ingo Müller
991cb14715 [mlir][memref][transform] Add new alloca_to_global op. (#66511)
This PR adds a new transform op that replaces `memref.alloca`s with
`memref.get_global`s to newly inserted `memref.global`s. This is useful,
for example, for allocations that should reside in the shared memory of
a GPU, which have to be declared as globals.
2023-09-21 18:17:00 +02:00
Daniil Dudkin
01e80a0f41 [mlir] Add maxnumf and minnumf to AtomicRMWKind (#66442)
This commit adds the mentioned kinds of `AtomicRMWKind`
as well as code generation for them.
2023-09-15 22:41:51 +03:00
Daniil Dudkin
6f4a528698 [mlir][memref] Use dedicated ops in AtomicRMWOpConverter (#66437)
This patch refactors the `AtomicRMWOpConverter` class to use
the dedicated operations from Arith dialect instead of using
`cmpf` + `select` pattern.
Also, a test for `minimumf` kind of `atomic_rmw` has been added.
2023-09-15 00:52:35 +03:00
Daniil Dudkin
c46a04339a [mlir][arith] Rename AtomicRMWKind's maxfmaximumf, minfminimumf (#66135)
This patch is part of a larger initiative aimed at fixing floating-point
`max` and `min` operations in MLIR:
https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.

This commit renames `maxf` and `minf` enumerators of `AtomicRMWKind`
to better reflect the current naming scheme and the goals of the RFC.
2023-09-14 01:09:37 +03:00
Oleksandr "Alex" Zinenko
e55e36de7a [mlir] alloc-to-alloca conversion for memref (#65335)
Introduce a simple conversion of a memref.alloc/dealloc pair into an
alloca in the same scope. Expose it as a transform op and a pattern.

Allocas typically lower to stack allocations as opposed to alloc/dealloc
that lower to significantly more expensive malloc/free calls. In
addition, this can be combined with allocation hoisting from loops to
further improve performance.
2023-09-05 17:58:22 +02:00
Martin Erhart
8037deb7af [mlir][memref] Add pass to expand realloc operations, simplify lowering to LLVM
There are two motivations for this change:
1. It considerably simplifies adding support for the realloc operation to the
   new buffer deallocation pass by lowering the realloc such that no
   deallocation operation is inserted and the deallocation pass itself can
   insert that dealloc
2. The lowering is expressed on a higher level and thus easier to understand,
   and the lowerings of the memref operations it is composed of don't have to
   be duplicated in the MemRefToLLVM lowering (also see discussion in
   https://reviews.llvm.org/D133424)

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D159430
2023-09-05 08:58:40 +00:00
Hanhan Wang
c5dee18b63 [mlir][memref] Add support for erasing dead allocations.
Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D159135
2023-09-01 13:30:26 -07:00
Andrey Turetskiy
01f4390a51 [MLIR] Fold memref.reinterpret_cast(x) -> x when the type is fully static and
does not change.

Differential Revision: https://reviews.llvm.org/D149296
2023-08-30 20:50:18 -07:00
Matthias Springer
e3373c6c83 [mlir][memref] Fix crash in SubViewReturnTypeCanonicalizer
`SubViewReturnTypeCanonicalizer` is used by `OpWithOffsetSizesAndStridesConstantArgumentFolder`, which folds constant SSA value (dynamic) sizes into static sizes. The previous implementation crashed when a dynamic size was folded into a static `1` dimension, which was then mistaken as a rank reduction.

Differential Revision: https://reviews.llvm.org/D158721
2023-08-25 16:01:49 +02:00
Mahesh Ravishankar
0f8bab8d59 [mlir] Revamp implementation of sub-byte load/store emulation.
When handling sub-byte emulation, the sizes of the converted `memref`s
also need to be updated (this was not done in the current
implementation). This adds the additional complexity of having to
linearize the `memref`s as well. Consider a `memref<3x3xi4>` where the
`i4` elements are packed. This has a overall size of 5 bytes (rounded
up to number of bytes). This can only be represented by a
`memref<5xi8>`. A `memref<3x2xi8>` would imply an implicit padding of
4 bits at the end of each row. So incorporate linearization into the
sub-byte load-store emulation.

This patch also updates some of the utility functions to make better
use of statically available information using `OpFoldResult` and
`makeComposedFoldedAffineApplyOps`.

Reviewed By: hanchung, yzhang93

Differential Revision: https://reviews.llvm.org/D158125
2023-08-17 20:27:53 +00:00
Hanhan Wang
f6897c37a2 [mlir][MemRef] Bail out for unsupported cases in FoldMemRefAliasOps pass
The pass uses `computeSuffixProduct` method which only allows static
shapes. This revision adds an early-exit for dynamic cases to avoid
crash.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D157668
2023-08-11 14:52:53 -07:00
Matthias Springer
0bb4d4d32f [mlir][transform] Add ApplyToLLVMConversionPatternsOp
This op populates conversion patterns by querying the
ConvertToLLVMPatternInterface. Only dialects that support this interface
are supported.

Differential Revision: https://reviews.llvm.org/D157487
2023-08-09 13:44:47 +02:00
Uday Bondhugula
b36de52c98 NFC. Move remaining affine/memref test cases into respective dialect dirs
Move a bunch of lingering test cases from test/Transforms/ into
test/Dialect/Affine and MemRef.

Differential Revision: https://reviews.llvm.org/D155855
2023-07-21 22:36:01 +05:30
Hanhan Wang
8fc433f055 [mlir][MemRef] Move narrow type emulation common methods to MemRefUtils.
It also unifies the computation of StridedLayoutAttr. If the stride is
static known value, we can just use it.

Differential Revision: https://reviews.llvm.org/D155017
2023-07-13 14:43:21 -07:00
yzhang93
5a1cdcbd86 [mlir] Narrow bitwidth emulation for MemRef load
This patch adds support for narrow bitwidth storage emulation. The goal is to support sub-byte type
codegen for LLVM CPU. Specifically, a type converter is added to convert memref of narrow bitwidth
(e.g., i4) into supported wider bitwidth (e.g., i8). Another focus of this patch is to populate the
pattern for int4 memref.load. memref.store pattern should be added in a seperate patch.

Reviewed By: hanchung, mravishankar

Differential Revision: https://reviews.llvm.org/D151519
2023-06-26 14:18:30 -07:00
Roger Ferrer Ibanez
666a56f1f1 [MLIR][Tests] Update tests so they require assertions
These tests check statistics results which require assertions enabled.

Differential Revision: https://reviews.llvm.org/D152780
2023-06-13 16:00:44 +00:00
Matthias Springer
cc7f52432b [mlir][transform] Use separate ops instead of PatternRegistry
* Remove `transform::PatternRegistry`.
* Add a new op for each currently registered pattern set.
* Change names of vector dialect pattern selector ops, so that they are consistent with the remaining code base.
* Remove redundant `transform.vector.extract_address_computations` op.

Differential Revision: https://reviews.llvm.org/D152249
2023-06-06 11:53:03 +02:00
Guray Ozen
5ec360c589 [mlir] Enable folding memref alias forvector.load
This work enables  folding memref alias pass for`vector.load`

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D151447
2023-05-25 17:07:20 +02:00
Guray Ozen
46c32afbc5 [mlir] Enable folding memref alias for ldmatrix
Folding mechanism does not recognize `ldmatrix` op. This work helps pass to recognize the op and fold the memref aliases.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D151412
2023-05-25 13:10:17 +02:00
Théo Degioanni
0bf120a820 [mlir] [sroa] Add support for MemRef.
This patch implements SROA interfaces for MemRef, up to a given fixed
size.

Reviewed By: gysit, Dinistro

Differential Revision: https://reviews.llvm.org/D151102
2023-05-24 07:33:28 +00:00
Alex Zinenko
2f3ac28cb2 [mlir] don't hardcode PDL_Operation in Transform dialect extensions
Update operations in Transform dialect extensions defined in the Affine,
GPU, MemRef and Tensor dialects to use the more generic
`TransformHandleTypeInterface` type constraint instead of hardcoding
`PDL_Operation`. See
https://discourse.llvm.org/t/rfc-type-system-for-the-transform-dialect/65702
for motivation.

Remove the dependency on PDLDialect from these extensions.

Update tests to use `!transform.any_op` instead of `!pdl.operation`.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D150781
2023-05-17 15:10:12 +00:00
Théo Degioanni
ead8e9d795 [mlir] [mem2reg] Adapt to be pattern-friendly.
This revision modifies the mem2reg interfaces and algorithm to be more
omfortable to use as a pattern. The motivation behind this is that
currently the pattern needs to be applied to the scope op of the region
in which allocators should be promoted. However, a more natural way to
apply the pattern would be to apply it on the allocator directly. This
is not only clearer but easier to parallelize.

This revision changes the mem2reg pattern to operate this way. This
required restraining the interfaces to only mutate IR using
RewriterBase, as the previously used escape hatch is not granular enough
to match on the region that is modified only. This has the unfortunate
cost of preventing batching allocator promotion and making the block
argument adding logic more complex. Because batching no longer made any
sense, I made the internal analyzer/promoter decoupling private again.

This also adds statistics to the mem2reg infrastructure.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D150432
2023-05-16 08:35:13 +00:00