Commit Graph

153 Commits

Author SHA1 Message Date
Matthias Springer
6176d6a93e [mlir][tensor] Support parallel_insert_slice in MergeConsecutiveInsertExtractSlicePatterns.cpp
Differential Revision: https://reviews.llvm.org/D141116
2023-01-06 12:33:45 +01:00
liqinweng
5c18ae3135 [MLIR][Tensor] Canonicalize expand/collapse_shape of splat to splat
Collapsing / expanding a splatted value can be replaced with a single `tensor.splat` operation. Replace
these cases with a simple `tensor.splat` operation.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D140552
2023-01-04 13:07:55 -08:00
Matthias Springer
e7790fbed3 [mlir] Add test-convergence option to Canonicalizer tests
This new option is set to `false` by default. It should  be set only in Canonicalizer tests to detect faulty canonicalization patterns. I.e., patterns that prevent the canonicalizer from converging. The canonicalizer should always convergence on such small unit tests that we have in `canonicalize.mlir`.

Two faulty canonicalization patterns were detected and fixed with this change.

Differential Revision: https://reviews.llvm.org/D140873
2023-01-04 12:02:21 +01:00
Hanhan Wang
83396d8549 [mlir][tensor] Implement TilingInterface for unpack op
The main issue of tiling unpack op is about incomplete tile. Since all
the dimensions are orthogonal, discussing 1-d unpack case is enough. The
core idea is to make the input slice have complete tiles. In this case,
a larger unpacked tile will be created. We'll need an extract_slice op
to shift and truncate the output.

Take Nn_to_N as an example. Say that N=32, n=8, and tiling_size=15. The
coordinates of second tile (i.e., result[15..31]) are [(1, 7), (2, 0,),
(2, 1) ... (3, 6), (3, 7)]. The first row and the last row are
incomplete in terms of inputs. It's impossible to represent an unpack op
using the coordinates. Because the input has higher rank and the math
computation of coordinate is using mod and ceilDiv. That's very tricky.

To represent the unpack op, we have to complete the rows. I.e., the
input coordinates would start with (1, 0); end with (3, 7). In this
context, the tiled unpack produces a (3 * n) elements because there are
3 rows in total. Follow by a tensor.extract_slice op, we can get the
actual result.

If the tiling sizes are multiple of inner tile sizes, it is a perfect
tiling case. In this context, the larger input and output is not needed.

Reviewed By: chelini

Differential Revision: https://reviews.llvm.org/D139362
2022-12-16 13:06:52 -08:00
Hanhan Wang
71b3a0d146 [mlir][tensor] Move tiling tensor.pad op tests from Linalg/ to Tensor/
Reviewed By: nicolasvasilache, springerm

Differential Revision: https://reviews.llvm.org/D139978
2022-12-14 14:47:58 -08:00
Matthias Springer
e5dc99e642 [mlir][tensor][bufferize] Improve bufferization of DimOp/RankOp
The tensor operands do not bufferize to a memory read.

Differential Revision: https://reviews.llvm.org/D140007
2022-12-14 12:47:46 +01:00
Matthias Springer
be630f07de [mlir][bufferize] Implement BufferizableOpInterface for tensor.empty
The op is not bufferizable but should be analyzable (for `EliminateEmptyTensors`, which uses the bufferization infrastructure).

Also improve debugging functionality and error messages.

Also adds a missing pass to the sparse pipeline. (tensor.empty should be replaced with bufferization.alloc_tensor, but it sometimes used to work without depending on how the tensor.empty is used. Now we always fail explicitly.)
2022-12-12 14:19:38 +01:00
Alexander Belyaev
f6fb0a4f35 [mlir] Make patterns for folding tensor.empty optional.
At the moment, they are a part of EmptyOp::getCanonicalizationPatterns. When
extract_slice(tensor.empty) is rewritten as a new tensor.empty, it could
happen that we end up with two tensor.empty ops, since the original
tensor.empty can have two users. After bufferization such cases result in two
allocations.

Differential Revision: https://reviews.llvm.org/D139308
2022-12-07 23:01:34 +01:00
Emilio Cota
72d76a2403 [mlir][bufferize] lower allocation alignment from 128 to 64 bytes
While it is unlikely to matter in practice, there is no reason
for this value to be larger than it should be. 64 bytes is the
size of a cache line in most machines, and we can fit a full
512-bit vector in it.

Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D139434
2022-12-07 11:12:46 -05:00
Matthias Springer
9cdf6b641d [mlir][tensor] Support parallel_insert_slice in reassociative reshape folder
Differential Revision: https://reviews.llvm.org/D139540
2022-12-07 16:25:10 +01:00
Hanhan Wang
193cefd1b1 [mlir][tensor] Adapt FoldTensorCastProducerOp pattern on DPS interface.
This revision adapts the pattern in LinAlg to work on DPS interface, and
adds it to canonicalization patterns of tensor dialect. The
InsertSliceOp is skipped in the pattern because it has its own logic
about folding tensor.cast ops.

Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D139375
2022-12-06 12:13:37 -08:00
Hanhan Wang
0d03ba62c5 [mlir][tensor] Implement TilingInterface for tensor.pack op.
We can compute the offsets and sizes for the slice of input because the
iteration domain is defined over outer loops. If the dimension is tiled,
the i-th index is the product of offset_i and inner_tile_i.

Different from tiling a pad op, we do not have to deal with reading zero
data from input. Because the tiling sizes are indicated to packed outer
dimensions. We will read either the entire tile or partial tile for each
packed tile. The scf.if and tensor.generate ops are not needed in this
context.

Co-authored-by: Lorenzo Chelini <l.chelini@icloud.com>

Reviewed By: rengolin, mravishankar

Differential Revision: https://reviews.llvm.org/D138631
2022-12-05 14:00:10 -08:00
Matthias Springer
1403073790 [mlir][tensor] Fold rank-reducing insert_slice with inverse collapse_shape
Differential Revision: https://reviews.llvm.org/D139221
2022-12-05 09:17:29 +01:00
Matthias Springer
50a2bb95ab [mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape
Differential Revision: https://reviews.llvm.org/D139220
2022-12-05 09:17:24 +01:00
Matthias Springer
f92c7506e3 Revert "[mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape"
This reverts commit a076f57a1a.
2022-12-02 21:22:20 +01:00
Matthias Springer
c837a94754 Revert "[mlir][tensor] Fold rank-reducing insert_slice with inverse collapse_shape"
This reverts commit 1522a3b7b3.
2022-12-02 21:22:04 +01:00
Matthias Springer
c1fef4e88a [mlir][bufferization] Make TensorCopyInsertionPass a test pass
TensorCopyInsertion should not have been exposed as a pass. This was a flaw in the original design. It is a preparation step for bufferization and certain transforms (that would otherwise be legal) are illegal between TensorCopyInsertion and actual rewrite to MemRef ops. Therefore, even if broken down as two separate steps internally, they should be exposed as a single pass.

This change affects the sparse compiler, which uses `TensorCopyInsertionPass`. A new `SparsificationAndBufferizationPass` is added to replace all passes in the sparse tensor pipeline from `TensorCopyInsertionPass` until the actual bufferization (rewrite to memref/non-tensor). It is generally unsafe to run arbitrary passes in-between, in particular passes that hoist tensor ops out of loops or change SSA use-def chains along tensor ops.

Differential Revision: https://reviews.llvm.org/D138915
2022-12-02 15:38:02 +01:00
Matthias Springer
1522a3b7b3 [mlir][tensor] Fold rank-reducing insert_slice with inverse collapse_shape
Differential Revision: https://reviews.llvm.org/D139104
2022-12-02 10:42:52 +01:00
Matthias Springer
a076f57a1a [mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape
Differential Revision: https://reviews.llvm.org/D139103
2022-12-02 10:42:46 +01:00
Lorenzo Chelini
44f7356005 [MLIR][Tensor] Add canonicalization for UnpackOp
pack(unpack(x)) -> x

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D138917
2022-12-01 15:17:50 +01:00
Hanhan Wang
a971d51932 [mlir][tensor] Enhance the verifier of pack and unpack op.
The outer_dims_perm must be a permutation or empty.

Reviewed By: chelini

Differential Revision: https://reviews.llvm.org/D138936
2022-11-29 15:47:52 -08:00
Hanhan Wang
0a1569a400 [mlir][NFC] Remove trailing whitespaces from *.td and *.mlir files.
This is generated by running

```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```

Reviewed By: rriddle, dcaballe

Differential Revision: https://reviews.llvm.org/D138866
2022-11-28 15:26:30 -08:00
Matthias Springer
13593dc9dc [mlir][tensor][bufferize] Fix tensor.insert_slice regression
This reverts D132662 (apart from overall cleanups), which introduced a too aggressive optimization for tensor.insert_slice bufferization. Instead, bufferizesToMemoryRead is improved to handle some of these cases. The remaining cases can still bufferize efficiently when running the canonicalizer before the bufferization.

Differential Revision: https://reviews.llvm.org/D138745
2022-11-26 19:14:33 +01:00
Matthias Springer
f2d91a7ae1 [mlir][utils] Fix invalid reshapes in ComposeCollapseOfExpandOp
Do not generate CollapseShapeOps/ExpandShapeOps that have the same source and result shape. Generate casts instead. Such reshapes became invalid with D138498.

Differential Revision: https://reviews.llvm.org/D138557
2022-11-23 13:52:00 +01:00
Matthias Springer
b9745ad812 [mlir][tensor/memref] Disallow Collapse/ExpandShapeOps that do not reduce/increase the rank
CollapseShapeOp/ExpandShapeOp that do not change the rank (or increase/reduce it) are invalid.

Differential Revision: https://reviews.llvm.org/D138498
2022-11-23 09:19:35 +01:00
Matthias Springer
6052b17aab [mlir][tensor] Add dim(expand_shape/collapse_shape) folding
Differential Revision: https://reviews.llvm.org/D138487
2022-11-22 17:34:49 +01:00
Lorenzo Chelini
9aa505a28d Introduce tensor.pack and tensor.unpack operations
Pack and Unpack return new tensors within which the individual elements
are reshuffled according to the packing specification. This has the
consequence of modifying the canonical order in which a given operator
(i.e., Matmul) accesses the individual elements. After bufferization,
this typically translates to increased access locality and cache
behavior improvement, e.g., eliminating cache line splitting.

Co-authored-by: Mahesh Ravishankar <ravishankarm@google.com>
Co-authored-by: Han-Chung Wang <hanchung@google.com>

RFC: https://discourse.llvm.org/t/rfc-tensor-pack-and-tensor-unpack/66408/1

Reviewed By: nicolasvasilache, rengolin, hanchung

Differential Revision: https://reviews.llvm.org/D138119
2022-11-22 09:11:59 +01:00
Lei Zhang
9bb633741a [mlir][bufferization] Support general Attribute as memory space
MemRef has been accepting a general Attribute as memory space for
a long time. This commits updates bufferization side to catch up,
which allows downstream users to plugin customized symbolic memory
space. This also eliminates quite a few `getMemorySpaceAsInt`
calls, which is deprecated.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D138330
2022-11-21 09:40:50 -05:00
Lorenzo Chelini
c9f0a3e39d [MLIR][Tensor] Clean-up ops.mlir test (NFC)
Split input file was not used.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D138009
2022-11-16 13:46:16 +01:00
Zequan Wu
a7fa5febaa [Test] Fix CHECK typo.
Differential Revision: https://reviews.llvm.org/D137287
2022-11-04 10:18:04 -07:00
Mahesh Ravishankar
24f9293de8 [mlir][Tensor] Allow builders of tensor.empty to accept encoding attribute.
The `RankedTensorType` can have an optional encoding
attribute. Allowing the builders of `tensor.empty` to accept the
encoding attribute (optionally), allows building empty tensors with
the type having the encoding attribute.

Reviewed By: nicolasvasilache, hanchung, springerm

Differential Revision: https://reviews.llvm.org/D137297
2022-11-03 20:30:12 +00:00
Matthias Springer
09dfb44193 [mlir][tensor][bufferize] Support memory_space for tensor.pad
This change adds memory space support to tensor.pad. (tensor.generate and tensor.from_elements do not support memory spaces yet.)

The memory space is inferred from the buffer of the source tensor.

Instead of lowering tensor.pad to tensor.generate + tensor.insert_slice, it is now lowered to bufferization.alloc_tensor (with the correct memory space) + linalg.map + tensor.insert_slice.

Memory space support for the remaining two tensor ops is left for a later point, as this requires some more design discussions.

Differential Revision: https://reviews.llvm.org/D136265
2022-10-27 12:29:57 +02:00
Matthias Springer
66baa349c6 [mlir][tensor] Fix build: Add missing line break to test case
This should have been part of D136767.
2022-10-27 12:20:05 +02:00
Matthias Springer
c1f0a15c65 [mlir][tensor][bufferize] Lower tensor.generate to linalg.map
There is no memref equivalent of tensor.generate. The purpose of this change is to avoid creating scf.parallel loops during bufferization.

Differential Revision: https://reviews.llvm.org/D136767
2022-10-27 12:03:13 +02:00
Matthias Springer
cfaf3292df [mlir][tensor] Disallow unranked tensors for tensor.extract/insert
When writing a tensor.extract/tensor.insert, the rank of the tensor is implied by the number of specified indices. When extracting from/inserting into an unranked tensor, it should first be casted to a ranked version.

Differential Revision: https://reviews.llvm.org/D136756
2022-10-27 10:09:31 +02:00
Christopher Bate
446981bdb6 [mlir][tensor] ExtractSliceFromReshape: handle collapsing of unit dim edge cases
Prior to this change, the "ExtractSliceFromReshape" pattern would transform

```
%collapsed = tensor.collapse_shape %input [[0, 1], [2]]
                : tensor<1x11x100xf32> into tensor<11x100xf32>
%slice = tensor.extract_slice %collapsed [%offt, 0] [%size, 100] [1, 1]
                : tensor<11x100xf32> to tensor<?x100xf32>
```

into a loop that iterated over the range `%size - %offt`, that pieces
together multiple sub-slices of `%input` along the first dimension. This
is correct but obviously inefficient. The technical condition is that
collapsing at-most-one non-unit dimension of `%src` will not result in a
subsequent slice along the corresponding dimension of `%collapsed`
mapping across discontinuities in the index space of `%src`. Thus, the
definition of a "linearized dimension" (from the perspective of
`tensor.collapse_shape`) is updated to reflect this condition.

The transform will now generate

```
%slice = tensor.extract_slice %input [0, %offt, 0][1, %size, 100] [1, 1]
            : tensor<1x11x100xf32> to tensor<1x?x100xf32>
%result = tensor.collapse_shape [[0, 1], [2]]
            : tensor<1x?x100xf32> to tensor<?x100xf32>
```

which can be further canonicalized.

Additional tests are added to check this family of edge cases.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D135726
2022-10-22 13:29:34 -06:00
Matthias Springer
6cdd34b973 [mlir][tensor][bufferize] Bufferize inserts into equivalent tensors in-place
Inserting a tensor into an equivalent tensor is a no-op after bufferization. No alloc is needed.

Differential Revision: https://reviews.llvm.org/D132662
2022-10-06 15:06:33 +09:00
Nicolas Vasilache
54a4e9685d [mlir][Tensor] NFC - Add result pretty printing to TensorOps
Differential Revision: https://reviews.llvm.org/D135135
2022-10-04 09:16:51 -07:00
Matthias Springer
81ca5aa452 [mlir][tensor][NFC] Rename linalg.init_tensor to tensor.empty
tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`).

This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect.

RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101

Differential Revision: https://reviews.llvm.org/D135129
2022-10-04 17:25:35 +09:00
Lei Zhang
bd81524e7f Reland "[mlir][tensor] Support more cases in MergeConsecutiveExtractSlice"
This relands commit 5d4603a02d.
It cludes fixes to GCC test failures and simplification to
the implementation.

Co-authored-by: Mahesh Ravishankar <ravishankarm@google.com>
Co-authored-by: Christopher Bate <cbate@nvidia.com>
2022-09-22 17:28:50 -04:00
Mehdi Amini
e0a6df53b4 Revert "[mlir][tensor] Support more cases in MergeConsecutiveExtractSlice"
This reverts commit 5d4603a02d.

The Dialect/Tensor/fold-consecutive-insert-extract-slice.mlir test is
failing when built with GCC
2022-09-21 04:01:57 +00:00
Lei Zhang
5d4603a02d [mlir][tensor] Support more cases in MergeConsecutiveExtractSlice
This commit adds utility functions to perform general merging of
OffsetSizeAndStrideOpInterface by supporting producer rank
reducing and non-unit strides.

With it we can extend MergeConsecutiveExtractSlice to support
more cases.

Co-authored-by: Mahesh Ravishankar <ravishankarm@google.com>

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D134294
2022-09-20 20:16:03 -04:00
Lei Zhang
bb4c53b7ba [mlir][tensor] Merge consecutive insert_slice/extract_slice ops
Consecutive tensor.insert_slice/tensor.extract_slice can be
created for the case like tiling convolution and then downsizing
2-D convolutions into 1-D ones. It hinders further transformations.
So adding these patterns to clean it up.

Given that bufferization is sensitive and have requirements over
the IR structure (see https://reviews.llvm.org/D132666),
these patterns are put in Transforms/ with separate entry points
for explicit collection.

Reviewed By: ThomasRaoux, mravishankar

Differential Revision: https://reviews.llvm.org/D133871
2022-09-20 19:52:56 -04:00
Christopher Bate
4d27f06f94 [mlir][Tensor] Fix ExtractSliceFromReshape transform edge case
The transformation would fail if none of the sliced dimensions were
linearized by the producing `tensor.collapse_shape`. This is a trivial
edge case but it wasn't correctly tested. Fixes the issue and adds a test.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D134088
2022-09-19 14:02:45 -06:00
Lei Zhang
9d59705169 [mlir][tensor] Fold round-tripping extract/insert slice ops
Reviewed By: ThomasRaoux, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133909
2022-09-19 12:58:52 -04:00
Johannes Reifferscheid
78f4a02aef Fixes for D133947. 2022-09-16 11:38:30 +02:00
Johannes Reifferscheid
d7c606f5b7 Fix bufferization of collapse_shape of subviews with size 1 dims.
Currently, there's an optimization that claims dimensions of size 1 are always
contiguous. This is not necessarily the case for subviews.

```
Input:

[
  [
    [0, 1],
    [2, 3]
  ],
  [
    [4, 5]
    [6, 7]
  ]
]

Subview:

  [
    [
      [0, 1],
    ],
    [
      [4, 5]
    ]
  ]
```

The old logic treats this subview as contiguous, when it is not.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D134026
2022-09-16 11:32:30 +02:00
Alex Zinenko
f096e72ce6 [mlir] switch bufferization to use strided layout attribute
Bufferization already makes the assumption that buffers pass function
boundaries in the strided form and uses the corresponding affine map layouts.
Switch it to use the recently introduced strided layout instead to avoid
unnecessary casts when bufferizing further operations to the memref dialect
counterparts that now largely rely on the strided layout attribute.

Depends On D133947

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133951
2022-09-16 10:56:50 +02:00
Alex Zinenko
46b90a7b5d [mlir] make remaining memref dialect ops produce strided layouts
The three following ops in the memref dialect: transpose, expand_shape,
collapse_shape, have been originally designed to operate on memrefs with
strided layouts but had to go through the affine map representation as the type
did not support anything else. Make these ops produce memref values with
StridedLayoutAttr instead now that it is available.

Depends On D133938

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133947
2022-09-16 10:56:48 +02:00
Alex Zinenko
2791162b01 [mlir] make memref.subview produce strided layout
Memref subview operation has been initially designed to work on memrefs with
strided layouts only and has never supported anything else. Port it to use the
recently added StridedLayoutAttr instead of extracting the strided from
implicitly from affine maps.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133938
2022-09-16 10:56:46 +02:00