Commit Graph

245 Commits

Author SHA1 Message Date
lorenzo chelini
06c4f78b07 [MLIR][Linalg] improve silenceable failure msg for lower_pack (NFC) (#75053)
Adjust the silenceable failure message as we lower `tensor.unpack` as a
combination of `linalg.transpose` + `tensor.collapse_shape` and
`tensor.extract_slice`.
2023-12-12 13:06:17 +01:00
Pablo Antonio Martinez
b396e5429c Reland "[MLIR][Transform] Add attribute in MatchOp to filter by operand type (#67994)"
Test was failing due to a different transform sequence declaration (transform sequence were used, while now it should be named transform sequence). Test is now fixed.
2023-12-07 11:57:02 +00:00
Mikhail Goncharov
10879403e5 Revert "[MLIR][Transform] Add attribute in MatchOp to filter by operand type (#67994)"
This reverts commit c4399130ae.

Test fails https://lab.llvm.org/buildbot/#/builders/272/builds/2757
2023-12-07 10:28:35 +01:00
Pablo Antonio Martinez
c4399130ae [MLIR][Transform] Add attribute in MatchOp to filter by operand type (#67994)
This patchs adds the `filter_operand_types` attribute to transform::MatchOp, allowing to filter ops depending on their operand types.
2023-12-07 08:28:52 +00:00
Andrzej Warzyński
03c2f5d8bb [mlir][linalg][conv] Flatten the channel dimension when vectorizing (#71918)
The current vectorization of 1D depthwise convolutions in Linalg is
_sub-optimal_ for tensor with a low number of channel dimensions, e.g.:

```mlir
linalg.depthwise_conv_1d_nwc_wc
    {dilations = dense<1> : vector<1xi64>,
    strides = dense<1> : vector<1xi64>}
    ins(%input, %filter : tensor<1x8x3xi8>, tensor<1x3xi8>)
    outs(%output : tensor<1x8x3xi8>) -> tensor<1x8x3xi8>
```

That's due to the fact that ultimately (i.e. at LLVM level),
vectorization happens along the trailing dimension (i.e. the channel
dimension). In this case it leads to vectors with 3 elements (or worse,
if there's e.g. only 1 channel dimension). For comparison, a 128 bit
wide vector registers can hold 16 x i8.

Instead, this patch adds an option to flatten/collapse the channel
dimension into the width dimension of the input/filter/output using
`vector.shape_cast` operation:

```mlir
    %sc_input = vector.shape_cast %input : vector<1x8x3xi8> to vector<1x24xi8>
    %sc_output = vector.shape_cast %output : vector<1x8x3xi8> to vector<1x24xi8>
    %b_filter = vector.broadcast %filter : vector<3xi8> to vector<1x8x3xi8>
    %sc_filter = vector.shape_cast %b_filter : vector<1x8x3xi8> to vector<1x24xi8>
```

This new vectorization mode is implemented in `depthwiseConv` by
inserting `vector.shape_cast` Ops before and after 
`depthwiseConv1dSliceAsMulAcc` is invoked. It can be selected through
e.g. a transform dialect attribute:

```mlir
  transform.structured.vectorize_children_and_apply_patterns %conv {flatten_1d_depthwise_conv}
```

A forthcoming patch will implement a strategy to automatically switch
between the two implementations, depending on the shape of the input
tensors.

Co-authored by: Bradley Smith <bradley.smith@arm.com>
2023-12-06 21:35:03 +00:00
Felix Schneider
e07c92a9c3 [mlir] Fix TileUsingForOp attr-dict printing/parsing (#73261)
`TileUsingForOp` has an optional Attribute `interchange` which was given
in curly braces like this: `{interchange = [...]}`. The way this was
parsed meant that no `attr-dict` could be attached to the Op.
This patch adds printing / parsing of an `attr-dict` to the Op and
prints/parses the `interchange` Attribute separate from the
discardable Attributes.
2023-12-06 20:08:01 +01:00
Jack Frankland
4a3d2088d6 [mlir][linalg] Add TransposeConv2D Transform Op (#68567)
* Add a LinAlg pass to convert 2D convolutions and quantized 2D
convolutions that have the `FHWC` filter channel ordering into a
transpose followed by 2D convolutions that have the `HWCF` channel
ordering.

* Add a lit test to check the semantics of the transformation are
correct for both quantized and unquantized variants.

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
2023-11-28 09:56:12 +00:00
Matthias Springer
6367677c9d [mlir][linalg] BufferizeToAllocationOp: fix side effects (#72986)
`bufferize_to_allocation` does not bufferize/replace targeted ops if
`bufferize_destination_only` is set.

Fixes #72931.
2023-11-23 09:22:40 +01:00
Felix Schneider
227654e871 Revert "[mlir] Fix TileUsingForOp attr-dict printing/parsing, cleanup assembly format" (#73178)
Reverts llvm/llvm-project#72745 as it is causing test failures on
mlir-nvidia in
`mlir/test/python/dialects/transform_structured_ext.py`.
2023-11-22 23:25:50 +01:00
Felix Schneider
0401668483 [mlir] Fix TileUsingForOp attr-dict printing/parsing (#72745)
`TileUsingForOp` has an optional Attribute `interchange` which was given
in curly braces like this: `{interchange = [...]}`. The way this was
parsed meant that no normal `attr-dict` could be attached to the Op.
This patch adds printing / parsing of an `attr-dict` to the Op and treats
the `interchange` Attribute as part of that dictionary for now.
2023-11-22 22:35:29 +01:00
Matthias Springer
437c62178c [mlir][memref] Remove redundant memref.tensor_store op (#71010)
`bufferization.materialize_in_destination` should be used instead. Both
ops bufferize to a memcpy. This change also conceptually cleans up the
memref dialect a bit: the memref dialect no longer contains ops that
operate on tensor values.
2023-11-05 12:47:18 +09:00
Matthias Springer
b9fe461e73 [mlir][transform] LISH: Add transform op (#70630)
Add a transform op for loop-invariant subset hoisting. Delete the old
transform op from the Linalg dialect.
2023-11-05 11:40:51 +09:00
lorenzo chelini
6cbcb79350 [MLIR][Linalg] Introduce SpecializeOp (#70326)
Introduce an operation to specialize linalg.generics, for example,
detecting a linalg.generic that is semantically equivalent to a
linalg.copy and replacing the former with the latter. After code
generation, it is helpful to lower named operations to vendor-optimized
libraries.
2023-10-31 10:07:35 +01:00
Jack Frankland
92e751d426 [mlir][linalg] Add NHWC + FHWC Img2Col (#68708)
Adds the Img2Col transformation for the fhwc channel ordering in a
Conv2D. Because of how the channel ordering affects the matrix
dimensions in the flattened filter this results in a slightly different
implementation of the actual "matrix multiplication". Instead of doing a
regular row-column dot-product this arrangement requires a row-row dot
product, otherwise the filter matrix would first need to be transposed.

Adds a lit test to the transform dialect to check the semantics of the
optimization are correct.

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
2023-10-13 10:20:18 +01:00
MaheshRavishankar
93c42299bd [mlir][TilingInterface] NFC code changes separated out from introduction of scf::tileUsingSCFForallop. (#67081)
This patch contains NFC changes that are precursor to the introduction
of `scf::tileUsingSCFForallOp` method introduced in
https://github.com/llvm/llvm-project/pull/67083.
2023-09-26 13:42:27 -07:00
Oleksandr "Alex" Zinenko
96ff0255f2 [mlir] cleanup of structured.tile* transform ops (#67320)
Rename and restructure tiling-related transform ops from the structured
extension to be more homogeneous. In particular, all ops now follow a
consistent naming scheme:

 - `transform.structured.tile_using_for`;
 - `transform.structured.tile_using_forall`;
 - `transform.structured.tile_reduction_using_for`;
 - `transform.structured.tile_reduction_using_forall`.

This drops the "_op" naming artifact from `tile_to_forall_op` that
shouldn't have been included in the first place, consistently specifies
the name of the control flow op to be produced for loops (instead of
`tile_reduction_using_scf` since `scf.forall` also belongs to `scf`),
and opts for the `using` connector to avoid ambiguity.

The loops produced by tiling are now systematically placed as *trailing*
results of the transform op. While this required changing 3 out of 4 ops
(except for `tile_using_for`), this is the only choice that makes sense
when producing multiple `scf.for` ops that can be associated with a
variadic number of handles. This choice is also most consistent with
*other* transform ops from the structured extension, in particular with
fusion ops, that produce the structured op as the leading result and the
loop as the trailing result.
2023-09-26 09:14:29 +02:00
Oleksandr "Alex" Zinenko
702608f4d8 [mlir] emit better errors in transform.structured.interchange (#67315)
The implementation doesn't emit any diagnostics as it is shared with the
pattern-based implementation. Check preconditions early and emit
diagnostics from the transform op instead. Without this change, the op
would produce a definite failure and no error message.
2023-09-25 15:36:07 +02:00
Ingo Müller
69bc1cbbff [mlir][linalg][transform] Rename {masked_vectorize => vectorize => vectorize_children_and...}. (#66575)
This PR renames the vectorization transform ops as follows:

* `structured.masked_vectorize` => `structured.vectorize`. This reflects
the fact that since [recently](https://reviews.llvm.org/D157774) the op
can also handle the unmasked case.
* `structured.vectorize` =>
`structured.vectorize_children_and_applies_patterns`. This reflects the
fact that the op does not just vectorize the given payload op but all
vectorizable children contained in it, and applies patterns before and
after for preparation and clean-up.

This rename was discussed first
[here](https://reviews.llvm.org/D157774).

The PR also adapts and cleans ups the tablegen description of the
`VectorizeChildrenAndApplyPatternsOp` (formerly `VectorizeOp`).
2023-09-21 15:38:29 +02:00
MaheshRavishankar
170a25a793 [mlir][TilingInterface] Make the tiling set tile sizes function use OpFoldResult. (#66566) 2023-09-18 17:18:51 -07:00
Martin Erhart
6bf043e743 [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619)
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
2023-09-18 16:44:48 +02:00
lorenzo chelini
d65885ae63 [MLIR][Linalg] Bail out if the tiles provided are more than the number (#66007)
Currently, the compiler crashes if the number of tiles provided exceeds
the number of loops.
2023-09-13 10:41:03 -04:00
Martin Erhart
c199f7dc62 Revert "[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute"
This reverts commit 6a91dfedeb.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:48 +00:00
Martin Erhart
6a91dfedeb [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.

This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.

The documentation should w.r.t. these pass option changes should also be
updated in this commit.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D156662
2023-09-13 09:30:22 +00:00
Matthias Springer
91464e1d6a [mlir][bufferization][NFC] Rename copy_tensor op to materialize_in_destination (#65467)
The previous name was badly chosen. The op is used to ensure that a
computation materializes in the future buffer of a certain tensor.
2023-09-12 15:20:41 +02:00
lorenzo chelini
e5137e7c33 [MLIR][Linalg] Retire tile_to_scf_for (#65633)
Both `TileOp` and `TileToScfForOp` use the tiling interface and the
`tileUsingSCFForOp` method. This duplication was introduced in
44cfea0279
as a way to retire `linalg::tileLinalgOp,` now there is not more need
for this duplication, and it seems that `tileOp` has more recent
changes, thus retire `TileToScfForOp.`
2023-09-07 16:13:23 -04:00
Martin Erhart
412c2fd270 [mlir][linalg] Optional dealloc insertion for bufferize_to_allocation (#65610)
This commit allows to omit insertion of the memref.dealloc operation
when linalg.structured.bufferize_to_allocation is run and makes this the
default behavior. This is desirable when the
buffer-deallocation-pipeline is run after bufferization to handle buffer
deallocation.
2023-09-07 17:49:48 +02:00
Aviad Cohen
d6a2014eb8 [mlir][Linalg]: Add memory space to linalg transform::PromoteOp
This patch allows to supply an optional memory space of the promoted
buffer.

Differential Revision: https://reviews.llvm.org/D159074
2023-09-07 17:35:32 +03:00
Oleksandr "Alex" Zinenko
3964d943ec [mlir] transform.structured.match fix tilingIface condition (#65337)
The matching condition for payload ops implementing TilingInterface was
inverted. Fix it and add a test.
2023-09-05 18:02:33 +02:00
Oleksandr "Alex" Zinenko
c17735053b [mlir] transform.structured.match loop-like flag (#65336)
Add an enum option to `transform.structured.match` operation to match
payload operations implementing LoopLikeOpInterface.
2023-09-05 17:56:00 +02:00
Andrzej Warzynski
6ca4fe64f1 [mlir][nfc] Make vectorize_nd_extract optional
Depends on: D157774

Differential Revision: https://reviews.llvm.org/D159360
2023-09-04 18:37:36 +01:00
Matthias Springer
a17313794b [mlir][linalg][transform] Return copy_back op from PadOp.
This patch makes the `transform.structured.pad` op return also a handle
to the copy op that it inserts. This allows to continue transformation
on that op, such as mapping it to a GPU thread.

The patch was mainly authored by @springerm as part of the WIP patch
https://reviews.llvm.org/D156371, which also has an example usage of
this change.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D159088
2023-08-29 14:55:33 +00:00
Matthias Springer
977cb4fdf8 [mlir][linalg][transform] PadOp: Add option to generate linalg.copy copy_back op
Three different options can be specified:
* `bufferization.copy_tensor` (default)
* `linalg.copy`
* `none` (no copy_back)

Differential Revision: https://reviews.llvm.org/D156173
2023-08-09 17:10:16 +02:00
Matthias Springer
440808faf6 [mlir][linalg] MapCopyToThreadsOp: Support tensor.pad
Also return the generated loop op.

Differential Revision: https://reviews.llvm.org/D155950
2023-07-21 15:51:46 +02:00
Matthias Springer
a5bba98a58 [mlir][linalg] BufferizeToAllocationOp: Add option to materialize buffers for operands
Add an option that does not bufferize the targeted op itself, but just materializes a buffer for the destination operands. This is useful for partial bufferization of complex ops such as `scf.forall`, which need special handling (and an analysis if the region).

Differential Revision: https://reviews.llvm.org/D155946
2023-07-21 15:29:59 +02:00
Ingo Müller
522831384f [mlir][linalg][transform] Extend diagnostics of FuseIntoContainingOp.
This patch extends the diagnostic output of `FuseIntoContainingOp` when
it fails to find the next producer by also provided the location of the
affected transform op.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D155803
2023-07-21 09:34:04 +00:00
Mahesh Ravishankar
67399932c7 [mlir][Linalg] Cleanup the drop unit dims pass in Linalg.
TL;DR the following API functions have been merged

```
void populateFoldUnitExtentDimsViaReshapesPatterns(RewritePatternSet &patterns);
void populateFoldUnitExtentDimsViaSlicesPatterns(RewritePatternSet &patterns);
```

into

```
void populateFoldUnitExtentDimsPatterns(RewritePatternSet &patterns,
                                        ControlDropUnitDims &options);
```

To use the previous functionality use

```
ControlDropUnitDims options;
// By default options.rankReductionStrategy is
// ControlDropUnitDims::RankReductionStrategy::ReassociativeReshape.
populateFoldUnitExtentDimsPatterns(patterns, options);
```

and

```
ControlDropUnitDims options;
options.rankReductionStrategy = ControlDropUnitDims::RankReductionStrategy::ExtractInsertSlice
populateFoldUnitExtentDimsPatterns(patterns, options);

```

This pass is quite old and needed to be updated based on the current
approach to transformations in Linalg

- Instead of two patterns, one to just remove loop dimensions that are
  unit extent (and using 0 in the indexing maps), and another to drop
  the unit-extents in the operand shapes, combine into a single
  transformation. This avoid creating an intermediate step with
  indexing maps having 0's in the domains exp ressions.

- Expose the core transformation as a utility function and add a
  pattern that calls this transformation.

This is a mostly NFC change, apart from the API change and dropping
the patterns/test that only dropped the loops that are unit extents.

Differential Revision: https://reviews.llvm.org/D155518
2023-07-19 17:47:18 +00:00
Quentin Colombet
9be8219f60 [mlir][Linalg] Add an interface to decompose complex ops
This patch adds an interface, named AggregatedOpInterface, that decomposes
complex operations into simpler ones.

For now, make the interface specific to Linalg because although the concept
is general, the way to materialize it needs some maturing.

Use that interface with the softmax operator.

Differential Revision: https://reviews.llvm.org/D154363
2023-07-18 19:06:36 +02:00
Matthias Springer
1a5aa77f30 [mlir][linalg] BufferizeToAllocationOp: Add option to specify custom alloc op
Supported ops are "memref.alloc" and "memref.alloca".

Differential Revision: https://reviews.llvm.org/D155282
2023-07-14 13:39:05 +02:00
Matthias Springer
d3ddcfd448 [mlir][DialectUtils] Generalize extractFromI64ArrayAttr helper
Generalize `extractFromI64ArrayAttr` to `extractFromIntegerArrayAttr`, so that arbitrary integer/bool types can be extracted.

Differential Revision: https://reviews.llvm.org/D154974
2023-07-12 17:59:40 +02:00
Matthias Springer
579bca1265 [mlir][linalg] BufferizeToAllocation: Add custom memcpy op
Add a new option that allows users to specify a memcpy op: "memref.tensor_store", "memref.copy" or "linalg.copy".

Differential Revision: https://reviews.llvm.org/D154968
2023-07-11 16:47:42 +02:00
Matthias Springer
8ddd98f831 [mlir][linalg] Return newly created ops from bufferize_to_allocation
Return all ops that were generated as part of the bufferization, so that users do not have to match them in the enclosing op.

Differential Revision: https://reviews.llvm.org/D154966
2023-07-11 16:34:02 +02:00
Nicolas Vasilache
1e84e91efa [mlir][Linalg] NFC - Improve some transform op builders 2023-07-11 15:35:43 +02:00
Matthias Springer
867afe5e53 [mlir][vector] Remove duplicate tensor subset <-> vector transfer patterns
Remove patterns that fold tensor subset ops into vector transfer ops from the vector dialect. These patterns already exist in the tensor dialect.

Differential Revision: https://reviews.llvm.org/D154932
2023-07-11 11:12:29 +02:00
Nicolas Vasilache
171a5a761d [mlir][Linalg] Add a greedy transform to map copies to threads efficiently.
This revision adds a new transformation to map a copy operation to a gpu grid of threads.
It implements a first heuristic that allows trading off coalesced accesses vs predication and occupancy.

Differential Revision: https://reviews.llvm.org/D154836
2023-07-10 16:11:04 +00:00
Matthias Springer
d6e9efab81 [mlir][linalg][transform] Add verifier to MaskedVectorizeOp
Verify that the correct number of `scalable_sizes` was provided.

Differential Revision: https://reviews.llvm.org/D154600
2023-07-06 16:24:52 +02:00
Lorenzo Chelini
4d74c845a1 [MLIR][Linalg] Expose packMatmulGreedily in Transforms.h (NFC)
Make the transformation accessible to other drivers (i.e., passes).
2023-07-06 11:59:17 +02:00
Matthias Springer
9b11323904 [mlir][linalg][transform] Fix TileOp builder
The TileOp builders did not set `scalable_sizes`, which produces invalid ops. `scalable_sizes` must contain as any booleans as there are sizes.

Differential Revision: https://reviews.llvm.org/D154585
2023-07-06 11:40:33 +02:00
Andrzej Warzynski
ad7ef1923f [mlir][transform] Allow arbitrary indices to be scalable
This change lifts the limitation that only the trailing dimensions/sizes
in dynamic index lists can be scalable. It allows us to extend
`MaskedVectorizeOp` and `TileOp` from the Transform dialect so that the
following is allowed:

  %1, %loops:3 = transform.structured.tile %0 [4, [4], [4]]

This is also a follow up for https://reviews.llvm.org/D153372
that will enable the following (middle vector dimension is scalable):

  transform.structured.masked_vectorize %0 vector_sizes [2, [4], 8]

To facilate this change, the hooks for parsing and printing dynamic
index lists are updated accordingly (`printDynamicIndexList` and
`parseDynamicIndexList`, respectively). `MaskedVectorizeOp` and `TileOp`
are updated to include an array of attribute of bools that captures
whether the corresponding vector dimension/tile size, respectively, are
scalable or not.

NOTE 1: I am re-landing this after the initial version was reverted. To
fix the regression and in addition to the original patch, this revision
updates the Python bindings for the transform dialect

NOTE 2: This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
  * https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/

This relands 048764f23a with fixes.

Differential Revision: https://reviews.llvm.org/D154336
2023-07-05 09:53:26 +01:00
Matthias Springer
335ada6099 [mlir][linalg] BufferizeToAllocationOp: Bufferize ops, not values
The `bufferize_to_allocation` transform op now operates on payload ops, not payload values. Only ops can be bufferized, not values.

Also remove the `replacement` result from the transform op.

Differential Revision: https://reviews.llvm.org/D153970
2023-07-04 14:35:13 +02:00
Matthias Springer
0e06ec5961 [mlir][linalg] Return tensor::PadOp handle from transform op
"transform.structured.pad" now returns all `tensor::PadOp` in addition to the padded ops.

Also add a test case that shows how to force an allocation for "tensor.pad" ops with a custom memory space.

Differential Revision: https://reviews.llvm.org/D153555
2023-07-04 14:24:47 +02:00