Commit Graph

307 Commits

Author SHA1 Message Date
Manupa Karunaratne
a6e72f9392 [MLIR][Vector] Add Lowering for vector.step (#113655)
Currently, the lowering for vector.step lives
under a folder. This is not ideal if we want
to do transformation on it and defer the
 materizaliztion of the constants much later.

This commits adds a rewrite pattern that
could be used by using
`transform.structured.vectorize_children_and_apply_patterns`
transform dialect operation.

Moreover, the rewriter of vector.step is also
now used in -convert-vector-to-llvm pass where
it handles scalable and non-scalable types as
LLVM expects it.

As a consequence of removing the vector.step
lowering as its folder, linalg vectorization
will keep vector.step intact.
2024-11-01 16:38:36 +00:00
Andrzej Warzyński
39ad84e4d1 [mlir][linalg] Split GenericPadOpVectorizationPattern into two patterns (#111349)
At the moment, `GenericPadOpVectorizationPattern` implements two
orthogonal transformations:
  1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`,
    `linalg::FillOp` and `tensor::InsertSliceOp`.
  2. Vectorizes (where possible) `tensor::InsertSliceOp` (see
    `tryVectorizeCopy`).

This patch splits `GenericPadOpVectorizationPattern` into two separate
patterns:
  1. `GeneralizePadOpPattern` for the first transformation (note that
    currently `GenericPadOpVectorizationPattern` inherits from
    `GeneralizePadOpPattern`).
  2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`.

With this change, we gain the following:
  * a clear separation between pre-processing and vectorization
    transformations/stages,
  * a path to support masked vectorisation for `tensor.insert_slice`
    (with a dedicated pattern for vectorization, it is much easier to
    specify the input vector sizes used in masking),
  * more opportunities to vectorize `tensor.insert_slice`.

Note for downstream users:
--------------------------

If you were using `populatePadOpVectorizationPatterns`, following this
change you will also have to add
`populateInsertSliceVectorizationPatterns`.

Finer implementation details:
-----------------------------

1.  The majority of changes in this patch are copy & paste + some edits.
  1.1. The only functional change is that the vectorization of
    `tensor.insert_slice` is now broadly available (as opposed to being
    constrained to the pad vectorization pattern:
    `GenericPadOpVectorizationPattern`).
  1.2. Following-on from the above, `@pad_and_insert_slice_dest` is
    updated. As expected, the input `tensor.insert_slice` Op is no
    longer "preserved" and instead gets vectorized successfully.

2. The `linalg.fill` case in `getConstantPadVal` works under the
   assumption that only _scalar_ source values can be used. That's
   consistent with the definition of the Op, but it's not tested at the
   moment. Hence a test case in Linalg/invalid.mlir is added.

3. The behaviour of the two TD vectorization Ops,
   `transform.structured.vectorize_children_and_apply_patterns` and
   `transform.structured.vectorize` is preserved.
2024-10-29 16:57:23 +00:00
Andrzej Warzyński
ac4bd74190 [mlir] Add apply_patterns.linalg.pad_vectorization TD Op (#112504)
This PR simply wraps `populatePadOpVectorizationPatterns` into a new
Transform Dialect Op: `apply_patterns.linalg.pad_vectorization`.

This change makes it possible to run (and test) the corresponding
patterns _without_:

  `transform.structured.vectorize_children_and_apply_patterns`.

Note that the Op above only supports non-masked vectorisation (i.e. when
the inputs are static), so, effectively, only fixed-width vectorisation
(as opposed to scalable vectorisation). As such, this change is required
to construct vectorization pipelines for tensor.pad targeting scalable
vectors.

To test the new Op and the corresponding patterns, I added
"vectorization-pad-patterns.mlir" - most tests have been extracted from
"vectorization-with-patterns.mlir".
2024-10-25 10:39:26 -07:00
Max191
2bff9d9ffe [mlir] Don't hoist transfers from potentially zero trip loops (#112752)
The hoistRedundantVectorTransfers function does not verification of loop
bounds when hoisting vector transfers. This is not safe in general,
since it is possible that the loop will have zero trip count. This PR
uses ValueBounds to verify that the lower bound is less than the upper
bound of the loop before hoisting. Trip count verification is currently
behind an option `verifyNonZeroTrip`, which is false by default.

Zero trip count loops can arise in GPU code generation, where a loop
bound can be dependent on a thread id. If not all threads execute the
loop body, then hoisting out of the loop can cause these threads to
execute the transfers when they are not supposed to.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
2024-10-18 16:11:21 -04:00
Andrzej Warzyński
a758bcdbd9 [mlir][td] Rename pack_paddings in structured.pad (#111036)
The pack_paddings attribute in the structure.pad TD Op is used to set
the `nofold` attribute in the generated tensor.pad Op. The current name
is confusing and suggests that there's a relation with the tensor.pack
Op. This patch renames it as `nofold_flags` to better match the actual
usage.
2024-10-15 19:24:43 +01:00
Quinn Dawkins
9144fed31b [mlir] Add option for a cleanup pattern set to SCF tiling helper (#109554)
The SCF helper for tiling an operation implementing the TilingInterface
and greedily fusing consumers requires an uninterrupted chain of
operations implementing the tiling interface to succeed. There can be
cases with intermediate ops that don't implement the interface but have
producers that could be fused if various canonicalization/simplification
patterns could run in between fusion steps.

This adds an option to SCFTileAndFuseOptions for a pattern set to run
between fusion steps to the ops that result from fusion/tiling. Removed
and newly inserted slices are tracked for continued fusion applications.

See this RFC for more discussion:

https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155
2024-10-04 14:42:55 -04:00
Andrzej Warzyński
d9d623310d [mlir][linalg] Add a new helper hook: hasVectorizationImpl (#110708)
The newly added hook simply returns `false` for Ops for which there's no
"vectorization logic" in the Linalg Vectorizer (i.e. the `vectorize()`
method). It's added so that the following two TD ops expose identical
level of functionality (that's not the case ATM):

  * `transform.structured.vectorize_children_and_apply_patterns`
  * `transform.structured.vectorize`

Specifically, ATM, the former works only for Linalg Ops, while the
latter works for all Ops that the vectorizer supports (*). With this
change,
I am making sure that both TD will behave consistently.

Note, this shouldn't affect any of the current uses of the vectorizer.

(*) This is implemented via the `vectorize()` method in
Vectorization.cpp.
2024-10-04 10:06:33 +01:00
Rolf Morel
94cf80d6fd [MLIR][Linalg] Pattern to fold AddOp to accumulation via contraction op's dest (#110514)
Replaces a linalg.add with one operand the single user of a contraction,
which has a zero-filled, "identity-mapped" destination and is dominated
by the `other` operand, by the contraction with `other` as its dest.

Benefits include elision of an elementwise op, namely the linalg.add,
and removing a tensor.empty as a destination which is likely to require
an allocation upon bufferization.
2024-10-03 12:22:57 +02:00
Kazu Hirata
b52885bc23 [mlir] Use std::optional::value_or (NFC) (#109893) 2024-09-26 09:53:43 -07:00
Hugo Trachino
28039055e5 [MLIR][Transform] Hoist Pad generates linalg.transpose (#109669)
For readability purpose, generate linalg named ops when possible.
For maintainability purpose, get rid of duplicated code.
2024-09-26 09:33:47 +01:00
JOE1994
884221eddb [mlir] Tidy uses of llvm::raw_stream_ostream (NFC)
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly

( 65b13610a5 for further reference )

* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
2024-09-16 23:23:25 -04:00
Andrzej Warzyński
42944da5ba [mlir][vector] Group re-order patterns together (#102856)
Group all patterns that re-order vector.transpose and vector.broadcast
Ops (*) under `populateSinkVectorOpsPatterns`. These patterns are
normally used to "sink" redundant Vector Ops, hence grouping together.
Example:

```mlir
%at = vector.transpose %a, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%bt = vector.transpose %b, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%r = arith.addf %at, %bt : vector<2x4xf32>
```
would get converted to:
```mlir
%0 = arith.addf %a, %b : vector<4x2xf32>
%r = vector.transpose %0, [1, 0] : vector<2x4xf32>
```

This patch also moves all tests for these patterns so that all of them
are:
  * run under one test-flag: `test-vector-sink-patterns`,
  * located in one file: "vector-sink.mlir".

To facilitate this change:
  * `-test-sink-vector-broadcast` is renamed as
    `test-vector-sink-patterns`,
  * "sink-vector-broadcast.mlir" is renamed as "vector-sink.mlir",
  * tests for `ReorderCastOpsOnBroadcast` and
    `ReorderElementwiseOpsOnTranspose` patterns are moved from
    "vector-reduce-to-contract.mlir" to "vector-sink.mlir",
  * `ReorderElementwiseOpsOnTranspose` patterns are removed from
    `populateVectorReductionToContractPatterns` and added to (newly
    created) `populateSinkVectorOpsPatterns`,
  * `ReorderCastOpsOnBroadcast` patterns are removed from
    `populateVectorReductionToContractPatterns` - these are already
    present in `populateSinkVectorOpsPatterns`.

This should allow us better layering and more straightforward testing.
For the latter, the goal is to be able to easily identify which pattern
a particular test is exercising (especially when it's a specific
pattern).

NOTES FOR DOWNSTREAM USERS

In order to preserve the current functionality, please make sure to add
  * `populateSinkVectorOpsPatterns`,

wherever you are using `populateVectorReductionToContractPatterns`.
Also, rename `populateSinkVectorBroadcastPatterns` as
`populateSinkVectorOpsPatterns`.

(*) I didn't notice any other re-order patterns.
2024-08-16 16:53:53 +01:00
Hsiangkai Wang
c4bf949171 [mlir][linalg] Implement TilingInterface for winograd operators (#96184)
In order to support arbitrary size input data of conv2d, implement
TilingInterface for winograd operations. Before converting winograd
operations into nested loops with matrix multiply, tile the input of
conv2d into the supported size first.

Add a transform operation structured.decompose_winograd_op to decompose
winograd operations. Before applying the transform op, use
tile_using_for to tile the input data into supported size. The test case
shows how to tile and decompose winograd operations.
2024-08-16 16:22:02 +01:00
Nikhil Kalra
84cc1865ef [mlir] Support DialectRegistry extension comparison (#101119)
`PassManager::run` loads the dependent dialects for each pass into the
current context prior to invoking the individual passes. If the
dependent dialect is already loaded into the context, this should be a
no-op. However, if there are extensions registered in the
`DialectRegistry`, the dependent dialects are unconditionally registered
into the context.

This poses a problem for dynamic pass pipelines, however, because they
will likely be executing while the context is in an immutable state
(because of the parent pass pipeline being run).

To solve this, we'll update the extension registration API on
`DialectRegistry` to require a type ID for each extension that is
registered. Then, instead of unconditionally registered dialects into a
context if extensions are present, we'll check against the extension
type IDs already present in the context's internal `DialectRegistry`.
The context will only be marked as dirty if there are net-new extension
types present in the `DialectRegistry` populated by
`PassManager::getDependentDialects`.

Note: this PR removes the `addExtension` overload that utilizes
`std::function` as the parameter. This is because `std::function` is
copyable and potentially allocates memory for the contained function so
we can't use the function pointer as the unique type ID for the
extension.

Downstream changes required:
- Existing `DialectExtension` subclasses will need a type ID to be
registered for each subclass. More details on how to register a type ID
can be found here:
8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)
- Existing uses of the `std::function` overload of `addExtension` will
need to be refactored into dedicated `DialectExtension` classes with
associated type IDs. The attached `std::function` can either be inlined
into or called directly from `DialectExtension::apply`.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-06 01:32:36 +02:00
Kazu Hirata
5262865aac [mlir] Construct SmallVector with ArrayRef (NFC) (#101896) 2024-08-04 11:43:05 -07:00
MaheshRavishankar
6740d701bd [mlir][Linalg] Deprecate linalg::tileToForallOp and linalg::tileToForallOpUsingTileSizes (#91878)
The implementation of these methods are legacy and they are removed in
favor of using the `scf::tileUsingSCF` methods as replacements. To get
the latter on par with requirements of the deprecated methods, the
tiling allows one to specify the maximum number of tiles to use instead
of specifying the tile sizes. When tiling to `scf.forall` this
specification is used to generate the `num_threads` version of the
operation.

A slight deviation from previous implementation is that the deprecated
method always generated the `num_threads` variant of the `scf.forall`
operation. Instead now this is driven by the tiling options specified.
This reduces the indexing math generated when the tile sizes are
specified.

**Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`**

```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> numThreads;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping);
```

can be replaced by
```
scf::SCFTilingOptions options;
options.setNumThreads(numThreads);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```

This generates the `numThreads` version of the `scf.forall` for the
inter-tile loops, i.e.

```
... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...)
```

**Moving from `linalg::tileToForallOpUsingTileSizes` to
`scf::tileUsingSCF`**

```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> tileSizes;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping);
```

can be replaced by
```
scf::SCFTilingOptions options;
options.setTileSizes(tileSizes);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```

Also note that `linalg::tileToForallOpUsingTileSizes` would effectively
call the `linalg::tileToForallOp` by computing the `numThreads` from the
`op` and `tileSizes` and generate the `numThreads` version of the
`scf.forall`. That is not the case anymore. Instead this will directly
generate the `tileSizes` version of the `scf.forall` op

```
... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...)
```

If you actually want to use the `numThreads` version, it is upto the
caller to compute the `numThreads` and set `options.setNumThreads`
instead of `options.setTileSizes`. Note that there is a slight
difference in the num threads version and tile size version. The former
requires an additional `affine.max` on the tile size to ensure
non-negative tile sizes. When lowering to `numThreads` version this
`affine.max` is not needed since by construction the tile sizes are
non-negative. In previous implementations, the `numThreads` version
generated when using the `linalg::tileToForallOpUsingTileSizes` method
would avoid generating the `affine.max` operation. To get the same
state, downstream users will have to additionally normalize the
`scf.forall` operation.

**Changes to `transform.structured.tile_using_forall`**

The transform dialect op that called into `linalg::tileToForallOp` and
`linalg::tileToForallOpUsingTileSizes` have been modified to call
`scf::tileUsingSCF`. The transform dialect op always generates the
`numThreads` version of the `scf.forall` op. So when `tile_sizes` are
specified for the transform dialect op, first the `tile_sizes` version
of the `scf.forall` is generated by the `scf::tileUsingSCF` method which
is then further normalized to get back to the same state. So there is no
functional change to `transform.structured.tile_using_forall`. It always
generates the `numThreads` version of the `scf.forall` op (as it did
before this change).

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2024-07-31 12:32:07 -07:00
Felix Schneider
ef8819e256 [mlir] Extend tile_using_for verifier to fix a crash (#98366)
This patch adds a check for the correct number of `loops` results of the
`transform.structured.tile_using_for` Op to the verifier, fixing a
crash.

Fix https://github.com/llvm/llvm-project/issues/98008
2024-07-12 09:05:06 +02:00
Hsiangkai Wang
d9c26b9d56 [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (#96182)
Add a transform operation structured.winograd_conv2d to convert
linalg.conv_2d_nhwc_fhwc to Linalg winograd operations.

Reviewers: ftynse, Max191, GeorgeARM, nicolasvasilache, MaheshRavishankar, dcaballe, rengolin

Reviewed By: ftynse, Max191

Pull Request: https://github.com/llvm/llvm-project/pull/96182
2024-07-11 14:45:36 +01:00
Matthias Springer
f2d3d829b9 [mlir][linalg][Transform] Fix use-after-free in SplitOp::apply (#96390)
Detected with ASAN. `Operation::getLoc()` was called after erasing the
operation.

Reverts 48cf6b6bbe, which attempted to fix
the use-after-free. (But the use-after-free is still there when the
`hasFailed` branch is taken.)
2024-06-24 21:35:58 +02:00
Christian Sigg
48cf6b6bbe [mlir] Fix use-after-free introduced in a9efcbf490. 2024-06-23 18:56:23 +02:00
muneebkhan85
a9efcbf490 [MLIR] Add continuous tiling to transform dialect (#82792)
This patch enables continuous tiling of a target structured op using
diminishing tile sizes. In cases where the tensor dimensions are not
exactly divisible by the tile size, we are left with leftover tensor
chunks that are irregularly tiled. This approach enables tiling of the
leftover chunk with a smaller tile size and repeats this process
recursively using exponentially diminishing tile sizes. This eventually
generates a chain of loops that apply tiling using diminishing tile
sizes.

Adds `continuous_tile_sizes` op to the transform dialect. This op, when
given a tile size and a dimension, computes a series of diminishing tile
sizes that can be used to tile the target along the given dimension.
Additionally, this op also generates a series of chunk sizes that the
corresponding tile sizes should be applied to along the given dimension.

Adds `multiway` attribute to `transform.structured.split` that enables
multiway splitting of a single target op along the given dimension, as
specified in a list enumerating the chunk sizes.
2024-06-21 16:39:43 +02:00
donald chen
2c1ae801e1 [mlir][side effect] refactor(*): Include more precise side effects (#94213)
This patch adds more precise side effects to the current ops with memory
effects, allowing us to determine which OpOperand/OpResult/BlockArgument
the
operation reads or writes, rather than just recording the reading and
writing
of values. This allows for convenient use of precise side effects to
achieve
analysis and optimization.

Related discussions:
https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
2024-06-19 22:10:34 +08:00
MaheshRavishankar
b99d0b3440 [mlir][TilingInterface] Update PartialReductionOpInterface to get it more in line with TilingInterface. (#95460)
The `TilingInterface` methods have return values that allow the
interface implementation to return multiple operations, and also return
tiled values explicitly. This is to avoid the assumption that the
interface needs to return a single operation and this operations result
are the expected tiled values. Make the
`PartialReductionOpInterface::tileToPartialReduction` return
`TilingResult` as well for the same reason.

Similarly make the `PartialReductionOpInterface::mergeReductions` also
return a list of generated operations and values to use as replacements.

This is just a refactoring to allow for deprecation of
`linalg::tileReductionUsingForall` with `scf::tileReductionUsingSCF`
method.
2024-06-18 09:07:29 -07:00
Ramkumar Ramachandra
0fb216fb2f mlir/MathExtras: consolidate with llvm/MathExtras (#95087)
This patch is part of a project to move the Presburger library into
LLVM.
2024-06-11 23:00:02 +01:00
Kunwar Grover
9329b20d5d [mlir][TilingInterface] Allow multiple results in PartialReductionOpInterface (#92624)
This patch adds support for reducing operations with multiple results
using PartialReductionOpInterface. Also adds an implementation of
PartialReductionOpInterface for multiple results for linalg.generic.
2024-05-22 19:21:20 +01:00
srcarroll
2c1c67674c [mlir][transform] Consistent linalg transform op syntax for dynamic index lists (#90897)
This patch is a first pass at making consistent syntax across the
`LinalgTransformOp`s that use dynamic index lists for size parameters.
Previously, there were two different forms: inline types in the list, or
place them in the functional style tuple. This patch goes for the
latter.

In order to do this, the `printPackedOrDynamicIndexList`,
`printDynamicIndexList` and their `parse` counterparts were modified so
that the types can be optionally provided to the corresponding custom
directives.

All affected ops now use tablegen `assemblyFormat`, so custom
`parse`/`print` functions have been removed. There are a couple ops that
will likely add dynamic size support, and once that happens it should be
made sure that the assembly remains consistent with the changes in this
patch.

The affected ops are as follows: `pack`, `pack_greedily`,
`tile_using_forall`. The `tile_using_for` and `vectorize` ops already
used this syntax, but their custom assembly was removed.

---------

Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2024-05-08 09:11:53 -05:00
Kazu Hirata
64ee821fad [mlir] Fix a warning
This patch fixes:

  llvm-project/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp:1715:28:
  error: unused variable 'kPadToMultipleOfKeyword'
  [-Werror,-Wunused-const-variable]
2024-05-04 16:53:48 -07:00
srcarroll
f2f65eddc5 [mlir][transform] Add support for transform.param pad multiples in PadOp (#90755)
This patch modifies the definition of `PadOp` to take transform params
and handles for the `pad_to_multiple_of` operand.

---------

Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2024-05-04 17:34:40 -05:00
Cullen Rhodes
be1c72d2ba [mlir][linalg] Move transpose_matmul to targeted transform op (#89717)
More targeted than a blanket "apply everywhere" pattern. Follow up to
#89075 to address @ftynse's feedback.
2024-04-23 10:52:50 +01:00
Cullen Rhodes
7922534974 [mlir][linalg] Add patterns to convert matmul to transposed variants (#89075)
This adds patterns to convert from the Linalg matmul and batch_matmul
ops to the transposed variants. By default the LHS matrix is transposed.

Our work enabling a lowering path from linalg.matmul to ArmSME has
revealed the current lowering results in non-contiguous memory accesses
for the A matrix and very poor performance.

These patterns provide a simple option to fix this.
2024-04-23 07:21:06 +01:00
Steven Varoumas
35b292efc6 [mlir][Hoisting] Hoisting vector.extract/vector.broadcast pairs (#86108)
This transformation, inspired by what is done in
hoist_redundant_transfers, hoists pairs of extract/broadcast operations
out of scf.for loops.

It changes a loop of the form:

```
%res = scf.for _ = _ to _ step _ iter_args(%iarg = %v) -> (t1) {
  %e = vector.extract %iarg : t1 to t2
  %u = "some_use"(%e) : (t2) -> t2
  %b = vector.broadcast %u : t2 to t1
  scf.yield %b : t1
}
```

into the following:

```
%e = vector.extract %v: t1 to t2
%res' = scf.for _ = _ to _ step _ iter_args(%iarg = %e) -> (t2) {
  %u' = "some_use"(%iarg) : (t2) -> t2
  scf.yield %u' : t2
}
%res = vector.broadcast %res' : t2 to t1
```
2024-04-22 14:54:01 +02:00
Christian Sigg
a5757c5b65 Switch member calls to isa/dyn_cast/cast/... to free function calls. (#89356)
This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-19 15:58:27 +02:00
srcarroll
b79db39659 [mlir][linalg] Support ParamType in vector_sizes option of VectorizeOp transform (#87557) 2024-04-09 15:52:40 -05:00
Oleksandr "Alex" Zinenko
91856b34e3 [mlir] move MatchOpInterface under Transform/Interfaces (#86899)
This is similar to the TransformOpInterface move.
2024-03-28 14:00:22 +01:00
srcarroll
bbcfe6f431 [mlir][transform] Emit error message with emitSilenceableFailure (#86146)
The previous implementation used a `notifyMatchFailure` to emit failure
message inappropriately and then used the
`emitDefaultSilenceableFailure`. This patch changes this to use the more
appropriate `emitSilenceableFailure` with error message. Additionally a
failure test has been added.
2024-03-22 12:37:39 -05:00
srcarroll
df9ed9cf52 [mlir][transform] Fix failure in flattening already flattened linalg ops (#86037)
The previous implementation was doing an early successful return on
`rank <= 1` without adding the original op to transform results. This
resulted in errors about number of returns. This patch fixes this by
adding the original op to results. Additionally, we first check if op is
elementwise and return a slienceable failure early if not.
2024-03-21 00:25:07 -05:00
Oleksandr "Alex" Zinenko
5a9bdd85ee [mlir] split transform interfaces into a separate library (#85221)
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.

Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.

As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
2024-03-20 22:15:17 +01:00
lhunloh
47bc565ca7 [MLIR] [Transforms] Let transform.structured.convert_to_loops return handles to loops (#83984)
This lets `transform.structured.convert_to_loops` return handles to the
generated loops, making this transformation more useful to use for
(transformation-)nesting purposes. This is modelled after SCFs
`transform.loop.forall_to_for` which returns handles to loops.

Introduced in commit aa2a96a24a, with a
note that they might move out of the `Linalg`-Dialect, but no reason
given for the non-return of handles. As far as I can see, this transform
always returns loops.
2024-03-06 17:07:30 -05:00
Congcong Cai
0597644a64 [mlir][transform] replace original op to loop ops (#83537) 2024-03-05 03:58:12 +08:00
srcarroll
b6f4dd9ee8 [mlir][transform] Implement FlattenElementwiseLinalgOp transform op (#81431)
A `transform.structured.flatten_elementwise` op is implemented for
flattening the iteration space and (applicable) operands/results to a
single dimension.
2024-02-28 11:19:06 -06:00
Balaji V. Iyer
adf838daee [mlir][Vectorizer] Added support to Vectorize tensor.unpack (#76087)
Added support to vectorized tensor.unpack. The unpack Op is split into a
`vector.transfer_read`, `vector.transpose`, `vector.shape_cast` and a
`vector.transfer_write`.
2024-02-20 16:10:14 -06:00
Matthias Springer
914e607487 [mlir][IR][NFC] Rename notify*Removed to notify*Erased (#82253)
Rename listener callback names:
* `notifyOperationRemoved` -> `notifyOperationErased`
* `notifyBlockRemoved` -> `notifyBlockErased`

The current callback names are misnomers. The callbacks are triggered
when an operation/block is erased, not when it is removed (unlinked).

E.g.:
```c++
/// Notify the listener that the specified operation is about to be erased.
/// At this point, the operation has zero uses.
///
/// Note: This notification is not triggered when unlinking an operation.
virtual void notifyOperationErased(Operation *op) {}
```

This change is in preparation of adding listener support to the dialect
conversion. The dialect conversion internally unlinks IR before erasing
it at a later point of time. There is an important difference between
"remove" and "erase". Lister callback names should be accurate to avoid
confusion.
2024-02-20 09:08:19 +01:00
Max191
7880b2c858 [mlir] Add direct vectorization lowering for tensor.pack ops (#78660)
This PR adds a direct vectorization lowering of `tensor.pack` into
`mask(vector.transfer_read)`->`vector.shape_cast`->`vector.transpose`->`vector.transfer_write`.
2024-02-07 14:11:11 -05:00
srcarroll
488f88b844 [mlir][transform] Add elementwise criteria to match.structured.body (#79626)
As far as I am aware, there is no simple way to match on elementwise
ops. I propose to add an `elementwise` criteria to the
`match.structured.body` op. Although my only hesitation is that
elementwise is not only determined by the body, but also the indexing
maps. So if others find this too awkward, I can implement a separate
match op instead.
2024-01-31 10:12:33 +01:00
jinchen62
d439f3640b Add support of param type for transform.structured.tile_using_forall (#72097)
Make transform.structured.tile_using_forall be able to take param type
tile sizes.

Examples:
```
%tile_sizes = transform.param.constant 16 : i64 -> !transform.param<i64>
transform.structured.tile_using_forall %matmul tile_sizes [%tile_sizes : !transform.param<i64>, 32] ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
```
```
%c10 = transform.param.constant 10 : i64 -> !transform.any_param
%c20 = transform.param.constant 20 : i64 -> !transform.any_param
%tile_sizes = transform.merge_handles %c10, %c20 : !transform.any_param
transform.structured.tile_using_forall %matmul tile_sizes *(%tile_sizes : !transform.any_param) ( mapping = [#gpu.block<x>, #gpu.block<y>] ) : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
```
2024-01-31 10:02:39 +01:00
MaheshRavishankar
76ead96c1d [mlir][TilingInterface] Use LoopLikeOpInterface in tiling using SCF to unify tiling with scf.for and scf.forall. (#77874)
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.

This is a breaking change with the following changes

1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
   functionality is obtained by using `scf::tileUsingSCF` and setting
   the loop type in `scf::SCFTilingOptions` passed into this method to
   `scf::SCFTilingOptions::LoopType::ForallOp` (using the
   `setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
   renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
   the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
   any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
   `SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
   `LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
   method is replaced with `getRegionIterArgs()`

These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
2024-01-25 21:26:23 -08:00
Matthias Springer
5cc0f76d34 [mlir][IR] Add rewriter API for moving operations (#78988)
The pattern rewriter documentation states that "*all* IR mutations [...]
are required to be performed via the `PatternRewriter`." This commit
adds two functions that were missing from the rewriter API:
`moveOpBefore` and `moveOpAfter`.

After an operation was moved, the `notifyOperationInserted` callback is
triggered. This allows listeners such as the greedy pattern rewrite
driver to react to IR changes.

This commit narrows the discrepancy between the kind of IR modification
that can be performed and the kind of IR modifications that can be
listened to.
2024-01-25 11:01:28 +01:00
Mehdi Amini
de5cedefab Apply clang-tidy fixes for llvm-qualified-auto in LinalgTransformOps.cpp (NFC) 2024-01-18 16:39:20 -08:00
Mehdi Amini
42427d805a Apply clang-tidy fixes for bugprone-macro-parentheses in LinalgTransformOps.cpp (NFC) 2024-01-18 16:39:20 -08:00
Quinn Dawkins
5caab8bbc0 [mlir][transform] Add transform.get_operand op (#78397)
Similar to `transform.get_result`, except it returns a handle to the
operand indicated by a positional specification, same as is defined for
the linalg match ops.

Additionally updates `get_result` to take the same positional specification.
This makes the use case of wanting to get all of the results of an
operation easier by no longer requiring the user to reconstruct the list
of results one-by-one.
2024-01-18 09:33:14 -05:00