Commit Graph

1504 Commits

Author SHA1 Message Date
Aviad Cohen
ccc02563f4 [mlir][linalg]: Fixed possible memory leak in cloneToCollapsedOp (#87595)
* Direct call to `clone` function leads to memory leak. Instead, we should use `RewriterBase` clone function instead.
2024-04-07 08:23:16 +03:00
Matthias Springer
5e4a44380e [mlir][Interfaces][NFC] ValueBoundsConstraintSet: Pass stop condition in the constructor (#86099)
This commit changes the API of `ValueBoundsConstraintSet`: the stop
condition is now passed to the constructor instead of `processWorklist`.
That makes it easier to add items to the worklist multiple times and
process them in a consistent manner. The current
`ValueBoundsConstraintSet` is passed as a reference to the stop
function, so that the stop function can be defined before the the
`ValueBoundsConstraintSet` is constructed.

This change is in preparation of adding support for branches.
2024-04-04 17:05:47 +09:00
Matthias Springer
a27d886ce4 [mlir][linalg][bufferize] Fix element-wise access optimization for sparse tensors (#87305)
`linalg.generic` ops with sparse tensors do not necessarily bufferize to
element-wise access, because insertions into a sparse tensor may change
the layout of (or reallocate) the underlying sparse data structures.
2024-04-03 09:57:25 +09:00
Jakub Kuderski
971b852546 [mlir][NFC] Simplify type checks with isa predicates (#87183)
For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.
2024-04-01 11:40:09 -04:00
Jerry Wu
0c1c0d5393 [MLIR] Add patterns to bubble-up pack and push-down unpack through collapse/expand shape ops (#85297)
Add DataLayoutPropagation patterns to bubble-up pack and push-down
unpack through collapse/expand shape ops.

---------

Co-authored-by: Quinn Dawkins <quinn.dawkins@gmail.com>
2024-03-27 21:32:27 -04:00
Pablo Antonio Martinez
c41286af3f [mlir][linalg] Emit a warning when tile_using_forall generates non thread-safe code (#80813)
**Description**

The documentation of `transform.structured.tile_using_forall` says:

_"It is the user’s responsibility to ensure that num_threads/tile_sizes
is a valid tiling specification (i.e. that only tiles parallel
dimensions, e.g. in the Linalg case)."_

In other words, tiling a non-parallel dimension would generate code with
data races which is not safe to parallelize. For example, consider this
example (included in the tests in this PR):

```
func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> {
  %0 = scf.forall (%arg2) in (8) shared_outs(%arg3 = %arg1) -> (tensor<300x8xf32>) {
    %1 = affine.min #map(%arg2)
    %2 = affine.max #map1(%1)
    %3 = affine.apply #map2(%arg2)
    %extracted_slice = tensor.extract_slice %arg0[%3, 0, 0] [%2, 300, 8] [1, 1, 1] : tensor<100x300x8xf32> to tensor<?x300x8xf32>
    %4 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["reduction", "parallel", "parallel"]} ins(%extracted_slice : tensor<?x300x8xf32>) outs(%arg3 : tensor<300x8xf32>) {
    ^bb0(%in: f32, %out: f32):
      %5 = arith.addf %in, %out : f32
      linalg.yield %5 : f32
    } -> tensor<300x8xf32>
    scf.forall.in_parallel {
      tensor.parallel_insert_slice %4 into %arg3[0, 0] [300, 8] [1, 1] : tensor<300x8xf32> into tensor<300x8xf32>
    }
  }
  return %0 : tensor<300x8xf32>
}
```

We can easily see that this is not safe to parallelize because all
threads would be writing to the same position in `%arg3` (in the
`scf.forall.in_parallel`.

This PR detects wether it's safe to `tile_using_forall` and emits a
warning in the case it is not.

**Brief explanation**
It first generates a vector of affine expressions representing the tile
values and stores it in `dimExprs`. These affine expressions are
compared with the affine expressions coming from the results of the
affine map of each output in the linalg op. So going back to the
previous example, the original transform is:

```
#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d1, d2)>

func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> {
  // expected-warning@+1 {{tiling is not thread safe at axis #0}}
  %0 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction", "parallel", "parallel"]} ins(%arg0 : tensor<100x300x8xf32>) outs(%arg1 : tensor<300x8xf32>) {
  ^bb0(%in: f32, %out: f32):
    %1 = arith.addf %in, %out : f32
    linalg.yield %1 : f32
  } -> tensor<300x8xf32>
  return %0 : tensor<300x8xf32>
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!transform.any_op) -> !transform.any_op
    %forall, %tiled_generic = transform.structured.tile_using_forall %0 num_threads [8]
          : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
    transform.yield
  }
}
```

The `num_threads` attribute would be represented as `(d0)`. Because the
linalg op has only one output (`arg1`) it would only check against the
results of `#map1`, which are `(d1, d2)`. The idea is to check that all
affine expressions in `dimExprs` are present in the output affine map.
In this example, `d0` is not in `(d1, d2)`, so tiling that axis is
considered not thread safe.
2024-03-22 12:53:29 +01:00
Andrzej Warzyński
c56bd7ab79 [mlir][linalg] Enable masked vectorisation for depthwise convolutions (#81625)
This patch adds support for masked vectorisation of depthwise 1D WC
convolutions,`linalg.depthwise_conv_1d_nwc_wc`. This is implemented by
adding support for masking.

Two major assumptions are made:
  * only the channel dimension can be dynamic/scalable (i.e. the
    trailing dim),
  * when specifying vector sizes to use in the vectoriser, only the size
    corresponding to the channel dim is effectively used (other dims are
    inferred from the context).

In terms of scalable vectorisation, this should be sufficient to cover
all practical cases (i.e. making arbitrary dim scalable wouldn't make
much sense). As for more generic cases with dynamic shapes (e.g. W or N
dims being dynamic), more work would be needed. In particular, one would
have to consider the filter and input/output tensors separately.
2024-03-14 20:19:46 +00:00
Quinn Dawkins
60e562d11a [mlir][linalg] Add unit dim folding pattern for tensor.pad (#84684)
Unit extent dims that are not padded by a tensor.pad can be folded away.
When folding unit extent dims of surrounding linalg ops, this increases
the chance that the iteration space of the linalg op will align with
nearby pad ops, improving fusion opportunities.
2024-03-11 18:24:23 -04:00
Matthias Springer
f1aa783788 [mlir][IR] Fix overload resolution on MSVC build (#84589)
#82629 added additional overloads to `replaceAllUsesWith` and
`replaceUsesWithIf`. This caused a build breakage with MSVC when called
with ops that can implicitly convert to `Value`.

```
external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(881): error C2666: 'mlir::RewriterBase::replaceAllUsesWith': 2 overloads have similar conversions
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(631): note: could be 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Operation *,mlir::ValueRange)'
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(626): note: or       'void mlir::RewriterBase::replaceAllUsesWith(mlir::ValueRange,mlir::ValueRange)'
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(616): note: or       'void mlir::RewriterBase::replaceAllUsesWith(mlir::Value,mlir::Value)'
external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(882): note: while trying to match the argument list '(mlir::tensor::ExtractSliceOp, T)'
        with
        [
            T=mlir::Value
        ]
```

Note: The LLVM build bots (Linux and Windows) did not break, this seems
to be an issue with `Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe`.

This change renames the newly added overloads to `replaceAllOpUsesWith`
and `replaceOpUsesWithIf`.
2024-03-11 17:36:18 +09:00
Justin Lebar
fab2bb8bfd Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678)
For some reason this was missing from STLExtras.
2024-03-10 20:00:13 -07:00
Jie Fu
474a73d979 [mlir] Fix build failure in MeshShardingInterfaceImpl.cpp (NFC)
llvm-project/mlir/lib/Dialect/Linalg/Transforms/MeshShardingInterfaceImpl.cpp:96:8:
error: unused variable 'resultElementType' [-Werror,-Wunused-variable]
  Type resultElementType =
       ^
llvm-project/mlir/lib/Dialect/Linalg/Transforms/MeshShardingInterfaceImpl.cpp:122:1:
error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
}
^
2 errors generated.
2024-03-08 09:38:29 +08:00
Boian Petkantchin
fb582b6ace [mlir] Implement Mesh's ShardingInterface for Linalg ops (#82284)
Allows linalg structured operations to be handled during spmdization and
sharding propagation.

There is only support for projected permutation indexing maps.
2024-03-07 17:05:44 -08:00
Matthias Springer
59a92019fb [mlir][IR] Make replaceOp / replaceAllUsesWith API consistent (#82629)
* `replaceOp` replaces all uses of the original op and erases the old
op.
* `replaceAllUsesWith` replaces all uses of the original op/value/block.
It does not erase any IR.

This commit renames `replaceOpWithIf` to `replaceUsesWithIf`.
`replaceOpWithIf` was a misnomer because the function never erases the
original op. Similarly, `replaceOpWithinBlock` is renamed to
`replaceUsesWithinBlock`. (No "operation replaced" is sent because the
op is not erased.)

Also improve comments.
2024-03-07 10:26:22 +09:00
Thomas Preud'homme
a9304edf20 Fix remaining build failures with GCC 8.3 (#83266)
When compiling for GCC 8.x (< 8.4), SFINAE is disabled for
iterator_range constructor causing ambiguous resolution to construct an
OperandRange from a MutableOperatorRange, even in the presence of a
static_cast<OperatorRange>. This adds an explicit conversion method to
lift the ambiguity.

Tested with a full MLIR build with GCC 8.3.
2024-03-05 19:32:27 +00:00
Quinn Dawkins
3f18f6a2cf [mlir][linalg] Enable fusion by expansion of reduction and named ops (#83473)
This adds support for expansion of named linalg ops and linalg ops with
reduction iterators. This improves the ability to make fusion decisions
WRT reduction operations. To recover the previous behavior, users of the
patterns can add a control function to restrict propagation of reshape
by expansion through linalg ops with reduction iterators.

For named linalg ops, this always converts the named op into a generic.
2024-03-03 01:54:03 -05:00
Han-Chung Wang
46bd65a050 [mlir][LinAlg] Vectorize reverse-like ops using vector.gather ops. (#83205)
The reverse op is treated as a VectorMemoryAccessKind::Contiguous load.
It is contiguous slice, but we'll need to compute indices differently
and apply a reverse at vector level. It takes non-trivial efforts for
the approach. The revision flips the case to use vector.gather.
Otherwise there are functionality issues. E.g., the below example loaded
`2, 3, 4` (which is a bug), but what we want is `2, 1, 0`.

Before vectorization:

```mlir
func.func @vectorize_reverse_like_tensor_extract(%arg0: tensor<1x2x3xf32>, %arg1: tensor<1x1x3xf32>, %arg2: index) -> tensor<1x1x3xf32> {
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %c2 = arith.constant 2 : index
  %0 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel", "parallel", "parallel"]} outs(%arg1 : tensor<1x1x3xf32>) {
  ^bb0(%out: f32):
    %1 = linalg.index 1 : index
    %2 = linalg.index 0 : index
    %3 = affine.apply #map1(%1, %2, %arg2)
    %4 = linalg.index 2 : index
    %5 = arith.subi %c2, %4 : index
    %extracted = tensor.extract %arg0[%c0, %3, %5] : tensor<1x2x3xf32>
    linalg.yield %extracted : f32
  } -> tensor<1x1x3xf32>
  return %0 : tensor<1x1x3xf32>
}
```

Partial IR after vectorization:

```
  %5 = vector.constant_mask [1, 1, 3] : vector<1x1x4xi1>
  %6 = vector.broadcast %arg0 : index to vector<1x1x4xindex>
  %7 = vector.shape_cast %6 : vector<1x1x4xindex> to vector<4xindex>
  %8 = vector.extractelement %7[%c0_i32 : i32] : vector<4xindex>
  %9 = vector.transfer_read %3[%c0, %8, %c2], %cst, %5 {in_bounds = [true, true, true]} : tensor<1x2x3xf32>, vector<1x1x4xf32>
```
2024-02-28 09:45:09 -08:00
srcarroll
b6f4dd9ee8 [mlir][transform] Implement FlattenElementwiseLinalgOp transform op (#81431)
A `transform.structured.flatten_elementwise` op is implemented for
flattening the iteration space and (applicable) operands/results to a
single dimension.
2024-02-28 11:19:06 -06:00
Quinn Dawkins
1e98d4883d [mlir][linalg] NFC: Use tablegen macro for pass constructors (#82892)
This uses the tablegen macros for generating pass constructors, exposing
pass options for fold-unit-extent-dims and linalg-detensorize.

Additionally aligns some of the pass namings to their text counterpart.
This includes an API change:

createLinalgGeneralizationPass -> createLinalgGeneralizeNamedOpsPass
2024-02-24 14:35:39 -05:00
Matthias Springer
91d5653e3a [mlir] Use OpBuilder::createBlock in op builders and patterns (#82770)
When creating a new block in (conversion) rewrite patterns,
`OpBuilder::createBlock` must be used. Otherwise, no
`notifyBlockInserted` notification is sent to the listener.

Note: The dialect conversion relies on listener notifications to keep
track of IR modifications. Creating blocks without the builder API can
lead to memory leaks during rollback.
2024-02-24 09:10:07 +01:00
Balaji V. Iyer
adf838daee [mlir][Vectorizer] Added support to Vectorize tensor.unpack (#76087)
Added support to vectorized tensor.unpack. The unpack Op is split into a
`vector.transfer_read`, `vector.transpose`, `vector.shape_cast` and a
`vector.transfer_write`.
2024-02-20 16:10:14 -06:00
srcarroll
9466c4e629 [MLIR][tensor] Improve tensor.pack verifier to catch more cases with unconditional runtime errors (#77217)
Previously, the `tensor.pack` verifier detects unconditional runtime
errors only when tile sizes are static. Now, dynamic tiles are
considered and we only require that the input and either corresponding
tile or output size are static to determine if it will unconditionally
produce errors at runtime.
2024-02-19 12:27:24 -06:00
Quinn Dawkins
886294a2fe [mlir][linalg] Add pattern to propagate pack up through tensor.pad (#82035)
This mirrors the existing pattern for pushing unpack down through
padding, restricting to cases where the padded dimensions aren't tiled
by the pack.

Additionally reformats the propagation test to make it easier to read.
2024-02-18 11:20:15 -05:00
Javed Absar
7c4c274643 [MLIR][NFC] Fix some comments in padding transform. (#81741) 2024-02-14 17:00:42 +00:00
Quinn Dawkins
2c3ba9f622 [mlir][Linalg] Unrestrict redundant transfer hoisting from func.func (#79516)
All the hoistRedundantVectorTransfers op does is walk the target
operation, which does not have to be restricted to func.func.
2024-02-10 23:01:14 -05:00
Uday Bondhugula
fe8a62c463 [MLIR] Fix crash in AffineMap::replace for zero result maps (#80930)
Fix obvious bug in AffineMap::replace for the case of zero result maps.
Extend/complete inferExprsFromList to work with empty expression lists.
2024-02-08 19:16:29 +05:30
Max191
7880b2c858 [mlir] Add direct vectorization lowering for tensor.pack ops (#78660)
This PR adds a direct vectorization lowering of `tensor.pack` into
`mask(vector.transfer_read)`->`vector.shape_cast`->`vector.transpose`->`vector.transfer_write`.
2024-02-07 14:11:11 -05:00
Han-Chung Wang
1b7b40bf5d [mlir][Linalg] Support lowerUnPack for identity out_dims_perm cases. (#79594) 2024-01-26 05:46:53 -08:00
MaheshRavishankar
76ead96c1d [mlir][TilingInterface] Use LoopLikeOpInterface in tiling using SCF to unify tiling with scf.for and scf.forall. (#77874)
Using `LoopLikeOpInterface` as the basis for the implementation unifies
all the tiling logic for both `scf.for` and `scf.forall`. The only
difference is the actual loop generation. This is a follow up to
https://github.com/llvm/llvm-project/pull/72178
Instead of many entry points for each loop type, the loop type is now
passed as part of the options passed to the tiling method.

This is a breaking change with the following changes

1) The `scf::tileUsingSCFForOp` is renamed to `scf::tileUsingSCF`
2) The `scf::tileUsingSCFForallOp` is deprecated. The same
   functionality is obtained by using `scf::tileUsingSCF` and setting
   the loop type in `scf::SCFTilingOptions` passed into this method to
   `scf::SCFTilingOptions::LoopType::ForallOp` (using the
   `setLoopType` method).
3) The `scf::tileConsumerAndFusedProducerGreedilyUsingSCFForOp` is
   renamed to `scf::tileConsumerAndFuseProducerUsingSCF`. The use of
   the `controlFn` in `scf::SCFTileAndFuseOptions` allows implementing
   any strategy with the default callback implemeting the greedy fusion.
4) The `scf::SCFTilingResult` and `scf::SCFTileAndFuseResult` now use
   `SmallVector<LoopLikeOpInterface>`.
5) To make `scf::ForallOp` implement the parts of
   `LoopLikeOpInterface` needed, the `getOutputBlockArguments()`
   method is replaced with `getRegionIterArgs()`

These changes now bring the tiling and fusion capabilities using
`scf.forall` on par with what was already supported by `scf.for`
2024-01-25 21:26:23 -08:00
Mehdi Amini
2e0909025e Apply clang-tidy fixes for readability-simplify-boolean-expr in Vectorization.cpp (NFC) 2024-01-22 17:34:55 -08:00
Mehdi Amini
3af5ab21b8 Apply clang-tidy fixes for readability-identifier-naming in Transforms.cpp (NFC) 2024-01-22 17:34:55 -08:00
Mehdi Amini
c0fe2b8963 Apply clang-tidy fixes for modernize-loop-convert in Transforms.cpp (NFC) 2024-01-22 17:34:55 -08:00
Mehdi Amini
b1d4265a5f Apply clang-tidy fixes for llvm-qualified-auto in Promotion.cpp (NFC) 2024-01-19 17:58:15 -08:00
Mehdi Amini
197a73f019 Apply clang-tidy fixes for llvm-include-order in Fusion.cpp (NFC) 2024-01-19 17:58:15 -08:00
Mehdi Amini
46ce993dd4 Apply clang-tidy fixes for llvm-else-after-return in ElementwiseOpFusion.cpp (NFC) 2024-01-19 17:58:14 -08:00
Mehdi Amini
f19f213974 Apply clang-tidy fixes for llvm-else-after-return in DropUnitDims.cpp (NFC) 2024-01-19 17:58:14 -08:00
Mehdi Amini
3b61f5a1bc Apply clang-tidy fixes for performance-unnecessary-value-param in DataLayoutPropagation.cpp (NFC) 2024-01-19 17:58:14 -08:00
Mehdi Amini
60caa8ef74 Apply clang-tidy fixes for performance-unnecessary-value-param in ConvertConv2DToImg2Col.cpp (NFC) 2024-01-18 16:39:20 -08:00
Matthias Springer
5fcf907b34 [mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Kazu Hirata
8e8bbbd48e [mlir] Use llvm::is_contained (NFC) 2024-01-12 22:08:29 -08:00
Matthias Springer
0a8e3dd432 [mlir][Interfaces] DestinationStyleOpInterface: Rename hasTensor/BufferSemantics (#77574)
Rename interface functions as follows:
* `hasTensorSemantics` -> `hasPureTensorSemantics`
* `hasBufferSemantics` -> `hasPureBufferSemantics`

These two functions return "true" if the op has tensor/buffer operands
but not buffer/tensor operands.

Also drop the "ranked" part from the interface, i.e., do not distinguish
between ranked/unranked types.

The new function names describe the functions more accurately. They also
align their semantics with the notion of "tensor semantics" with the
bufferization framework. (An op is supposed to be bufferized if it has
tensor operands, and we don't care if it also has memref operands.)

This change is in preparation of #75273, which adds
`BufferizableOpInterface::hasTensorSemantics`. By renaming the functions
in the `DestinationStyleOpInterface`, we can avoid name clashes between
the two interfaces.
2024-01-12 10:02:54 +01:00
Matthias Springer
bb6d5c2200 [mlir][Transforms] GreedyPatternRewriteDriver: Do not CSE constants during iterations (#75897)
The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply
rewrite patterns to ops. It has special handling for constants: they are
CSE'd and sometimes moved to parent regions to allow for additional
CSE'ing. This happens in `OperationFolder`.

To allow for efficient CSE'ing, `OperationFolder` maintains an internal
lookup data structure to find the existing constant ops with the same
value for each `IsolatedFromAbove` region:
```c++
/// A mapping between an insertion region and the constants that have been
/// created within it.
DenseMap<Region *, ConstantMap> foldScopes;
```

Rewrite patterns are allowed to modify operations. In particular, they
may move operations (including constants) from one region to another
one. Such an IR rewrite can make the above lookup data structure
inconsistent.

We encountered such a bug in a downstream project. This bug materialized
in the form of an op that uses the result of a constant op from a
different `IsolatedFromAbove` region (that is not accessible).

This commit changes the behavior of the `GreedyPatternRewriteDriver`
such that `OperationFolder` is used to CSE constants at the beginning of
each iteration (as the worklist is populated), but no longer during an
iteration. `OperationFolder` is no longer used after populating the
worklist, so we do not have to care about inconsistent state in the
`OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver`
now performs the op folding by itself instead of calling
`OperationFolder::tryToFold`.

This change changes the order of constant ops in test cases, but not the
region in which they appear. All broken test cases were fixed by turning
`CHECK` into `CHECK-DAG`.

Alternatives considered: The state of `OperationFolder` could be
partially invalidated with every `notifyOperationModified` notification.
That is more fragile than the solution in this commit because incorrect
rewriter API usage can lead to missing notifications and hard-to-debug
`IsolatedFromAbove` violations. (It did not fix the above mention bug in
a downstream project, which could be due to incorrect rewriter API usage
or due to another conceptual problem that I missed.) Moreover, ops are
frequently getting modified during a greedy pattern rewrite, so we would
likely keep invalidating large parts of the state of `OperationFolder`
over and over.

Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant
ops are no longer folded during a greedy pattern rewrite. If you rely on
folding (and rematerialization) of constant ops during a greedy pattern
rewrite, turn the folder into a pattern.
2024-01-05 09:22:18 +01:00
Andrzej Warzyński
db9a16eaed [mlir][nfc] Update comments in the Linalg vectoriser (#76797) 2024-01-04 17:24:22 +00:00
Spenser Bauman
6b65d79fbb [mlir][linalg] Fix for invalid IR in eliminate_empty_tensors (#73513)
The transform.structured.eliminate_empty_tensors can produce mis-typed
IR when traversing use-def chains past tensor reshaping operations for
sharing candidates. This results in Linalg operations whose output types
do not match their 'outs' arguments.

This patch filters out candidate tensor.empty operations when their
types do not match the candidate input operand.
2024-01-01 17:12:40 +00:00
Jakub Kuderski
560564f51c [mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901)
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.

Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.

Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.

Issue: https://github.com/llvm/llvm-project/issues/72354
2023-12-20 00:14:43 -05:00
Matthias Springer
3a087c1592 [mlir][linalg] Fix invalid IR in Linalg op fusion (#74425)
Linalg op fusion (`Linalg/Transforms/Fusion.cpp`) used to generate
invalid fused producer ops:
```
error: 'linalg.conv_2d_nhwc_hwcf' op expected type of operand #2 ('tensor<1x8x16x4xf32>') to match type of corresponding result ('tensor<?x?x?x?xf32>')
note: see current operation:
%24 = "linalg.conv_2d_nhwc_hwcf"(%21, %22, %23) <{dilations = dense<1> : tensor<2xi64>, operandSegmentSizes = array<i32: 2, 1>, strides = dense<2> : tensor<2xi64>}> ({
^bb0(%arg9: f32, %arg10: f32, %arg11: f32):
  %28 = "arith.mulf"(%arg9, %arg10) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  %29 = "arith.addf"(%arg11, %28) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
  "linalg.yield"(%29) : (f32) -> ()
}) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 * 2 + d4, d2 * 2 + d5, d6)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>]} : (tensor<1x?x?x3xf32>, tensor<3x3x3x4xf32>, tensor<1x8x16x4xf32>) -> tensor<?x?x?x?xf32>
```

This is a problem because the input IR to greedy pattern rewriter during
`-test-linalg-greedy-fusion` is invalid. This commit fixes tests such as
`mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir` when verifying the
IR after each pattern application (#74270).
2023-12-19 14:17:10 +09:00
srcarroll
b26ee97537 [MLIR][Linalg] Support dynamic sizes in lower_unpack (#75494) 2023-12-18 19:02:04 +01:00
Quinn Dawkins
82ab0f7f36 [mlir][linalg] Fix rank-reduced cases for extract/insert slice in DropUnitDims (#74723)
Inferring the reshape reassociation indices for extract/insert slice ops
based on the read sizes of the original slicing op will generate an
invalid expand/collapse shape op for already rank-reduced cases. Instead
just infer from the shape of the slice.

Ported from Differential Revision: https://reviews.llvm.org/D147488
2023-12-16 10:08:51 -05:00
Andrzej Warzyński
f11bda78c8 [mlir][linalg] Use vector.shuffle to flatten conv filter (#75038)
Updates the vectorisation of 1D depthwise convolution when flattening
the channel dimension (introduced in #71918). In particular - how the
convolution filter is "flattened". ATM, the vectoriser will use
`vector.shape_cast`:

```mlir
  %b_filter = vector.broadcast %filter : vector<4xf32> to vector<3x2x4xf32>
  %sc_filter = vector.shape_cast %b_filter : vector<3x2x4xf32> to vector<3x8xf32>
```

This lowering is not ideal - `vector.shape_cast` can be convenient when
it's folded away, but that's not happening in this case. Instead, this
patch updates the vectoriser to use `vector.shuffle` (the overall result
is identical):

```mlir
  %sh_filter = vector.shuffle %filter, %filter
      [0, 1, 2, 3, 0, 1, 2, 3] : vector<4xf32>, vector<4xf32>
  %b_filter = vector.broadcast %sh_filter : vector<8xf32> to vector<3x8xf32>
```
2023-12-15 17:56:59 +00:00
Amir Bishara
cf2d625a5d [mlir][linalg] Expose getPreservedProducerResults method from ElementwiseOpFusion file (#73850)
Declare `getPreservedProducerResults` function which helps to get the
preserved results of the producer linalg generic operation as a result
of elementwise fusion.
2023-12-08 11:50:33 +02:00
Matthias Springer
986287e7f3 [mlir][SparseTensor] Fix invalid API usage in patterns (#74690)
Rewrite patterns must return `success` if the IR was modified. This
commit fixes sparse tensor tests such as
`SparseTensor/sparse_fusion.mlir`,
`SparseTensor/CPU/sparse_reduce_custom.mlir`,
`SparseTensor/CPU/sparse_semiring_select.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.
2023-12-07 12:05:20 +09:00