Commit Graph

483 Commits

Author SHA1 Message Date
Max191
f1595ecfdc [mlir] Fix bug in UnPackOp tiling implementation causing infinite loop (#113571)
This fixes a bug in the tiling implementation of tensor.unpack that was
causing an infinite loop when certain unpack ops get tiled and fused as
a producer. The tiled implementation of tensor.unpack sometimes needs to
create an additional tensor.extract_slice on the result of the tiled
unpack op, but this slice was getting added to the `generatedSlices` of
the tiling result. The `generatedSlices` are used to find the next
producers to fuse, so it caused an infinite loop of fusing the same
unpack op after it was already in the loop. This fixes the bug by adding
the slice of the source instead of the result.

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
2024-10-24 21:32:45 -04:00
Ian Wood
455f71d285 [mlir] Convert expand_shape to more static form (#112265)
Add pattern that converts a `tensor.expand_shape` op to a more static
form.

This matches the pattern: `tensor.cast` -> `tensor.expand_shape` if it
has a foldable `tensor.cast` and some constant foldable `output_shape`
operands for the `tensor.expand_shape`. This makes the
`tensor.expand_shape` more static, as well as allowing the static
information to be propagated further down in the program.
2024-10-24 17:04:02 -07:00
Matthias Springer
f18c3e4e73 [mlir][Transforms] Dialect Conversion: Simplify materialization fn result type (#113031)
This commit simplifies the result type of materialization functions.

Previously: `std::optional<Value>`
Now: `Value`

The previous implementation allowed 3 possible return values:
- Non-null value: The materialization function produced a valid
materialization.
- `std::nullopt`: The materialization function failed, but another
materialization can be attempted.
- `Value()`: The materialization failed and so should the dialect
conversion. (Previously: Dialect conversion can roll back.)

This commit removes the last variant. It is not particularly useful
because the dialect conversion will fail anyway if all other
materialization functions produced `std::nullopt`.

Furthermore, in contrast to type conversions, at least one
materialization callback is expected to succeed. In case of a failing
type conversion, the current dialect conversion can roll back and try a
different pattern. This also used to be the case for materializations,
but that functionality was removed with #107109: failed materializations
can no longer trigger a rollback. (They can just make the entire dialect
conversion fail without rollback.) With this in mind, it is even less
useful to have an additional error state for materialization functions.

This commit is in preparation of merging the 1:1 and 1:N type
converters. Target materializations will have to return multiple values
instead of a single one. With this commit, we can keep the API simple:
`SmallVector<Value>` instead of `std::optional<SmallVector<Value>>`.

Note for LLVM integration: All 1:1 materializations should return
`Value` instead of `std::optional<Value>`. Instead of `std::nullopt`
return `Value()`.
2024-10-23 07:29:17 -07:00
Andrzej Warzyński
2a25200828 [mlir][tensor] Restrict the verifier for tensor.pack/tensor.unpack (#113108)
Restricts the verifier for tensor.pack and tensor.unpack Ops so that the
following is no longer allowed:

```mlir
  %c8 = arith.constant 8 : index
  %0 = tensor.pack %input inner_dims_pos = [0, 1] inner_tiles = [8, %c8] into %output : tensor<?x?xf32> -> tensor<?x?x8x8xf32>
```

Specifically, in line with other Tensor Ops, require:
  * a dynamic dimensions for each (dynamic) SSA value,
  * a static dimension for each static size (attribute).

In the example above, a static dimension (8) is mixed with a dynamic
size (%c8).

Note that this is mostly deleting existing code - that's because this
change simplifies the logic in verifier.

For more context:
* https://discourse.llvm.org/t/tensor-ops-with-dynamic-sizes-which-behaviour-is-more-correct
2024-10-22 20:11:05 -07:00
Max191
98e838a890 [mlir] Do not bufferize parallel_insert_slice dest to read for full slices (#112761)
In the insert_slice bufferization interface implementation, the
destination tensor is not considered read if the full tensor is
overwritten by the slice. This PR adds the same check for
tensor.parallel_insert_slice.

Adds two new StaticValueUtils:
- `isAllConstantIntValue` checks if an array of `OpFoldResult` are all
equal to a passed `int64_t` value.
- `areConstantIntValues` checks if an array of `OpFoldResult` are all
equal to a passed array of `int64_t` values.

fixes https://github.com/llvm/llvm-project/issues/112435

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
2024-10-18 16:02:03 -04:00
Alexander Pivovarov
a24c468782 [MLIR] Fix assert expressions (#112474)
I noticed that several assertions in MLIR codebase have issues with
operator precedence

The issue with operator precedence in these assertions is due to the way
logical operators are evaluated. The `&&` operator has higher precedence
than the `||` operator, which means the assertion is currently
evaluating incorrectly, like this:
```
assert((resType.getNumDynamicDims() == dynOutDims.size()) ||
       (dynOutDims.empty() && "Either none or all output dynamic dims must be specified!"));
```

We should add parentheses around the entire expression involving
`dynOutDims.empty()` to ensure that the logical conditions are grouped
correctly. Here’s the corrected version:
```
assert(((resType.getNumDynamicDims() == dynOutDims.size()) || dynOutDims.empty()) &&
       "Either none or all output dynamic dims must be specified!");

```
2024-10-16 15:22:29 -07:00
Mehdi Amini
275a2b0581 [MLIR][Tensor] Perform shape inference via in-place modification (NFC) (#111593)
This is more efficient to avoid a clone that is immediately removed. 
Also guard the insertion of a cast on the result on whether the
destination type changed.
2024-10-09 09:42:16 +02:00
Prashant Kumar
971b579bc6 [MLIR] Don't drop attached discardable attributes (#111261)
The creation of pack op was dropping discardable attributes.
2024-10-07 22:21:30 +05:30
BARRET
1666d13078 [CMake]: Remove unnecessary dependencies on LLVM/MLIR (#111255)
Previous https://github.com/llvm/llvm-project/pull/110362 (reverted)
caused breakage. Here is the PR with fix.

My build cmdline:

```
cmake ../llvm \
    -G Ninja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=install \
    -DCMAKE_C_COMPILER=gcc-9 \
    -DCMAKE_CXX_COMPILER=g++-9 \
    -DCMAKE_CUDA_COMPILER=$(which nvcc) \
    -DLLVM_ENABLE_LLD=OFF \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_BUILD_EXAMPLES=ON \
    -DCOMPILER_RT_BUILD_LIBFUZZER=OFF \
    -DLLVM_CCACHE_BUILD=ON \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DBUILD_SHARED_LIBS=ON \
    -DLLVM_ENABLE_PROJECTS='llvm;mlir'
```
2024-10-07 15:52:43 +02:00
Danial Klimkin
eb6222b9ea [bazel] Fix build past 66f84c8b8a (#110830) 2024-10-02 14:01:19 +02:00
Andrzej Warzyński
66f84c8b8a [mlir][tensor] Extend the logic to generalise tensor.pack (#109815)
Extends the logic to generalise tensor.pack (into e.g. tensor.pad +
tensor.transpose) so that it also works when one of the inner tile sizes
is scalable (i.e. a multiple of `vector.vscale`). For example:
```mlir
  %c8 = arith.constant 8 : index
  %vscale = vector.vscale
  %c8_vscale = arith.muli %vscale, %c8 : index
  %0 = tensor.pack %input
      padding_value(%pad : f32)
      inner_dims_pos = [0, 1]
      inner_tiles = [%c8_vscale, 2]
      into %output : tensor<5x1xf32> -> tensor<1x1x?x2xf32>
}
```
is generalised as:
```mlir
  %c8 = arith.constant 8 : index
  %vscale = vector.vscale
  %c8_vscale = arith.muli %vscale, %c8 : index
  %0 = affine.apply #map()[%c8_vscale, %c5]
  %padded = tensor.pad %arg0 low[0, 0] high[%0, 1] {
  ^bb0(%arg3: index, %arg4: index):
    tensor.yield %arg2 : f32
  } : tensor<5x1xf32> to tensor<?x2xf32>
```

At the Tensor level, we model scalability using dynamic shapes and this
change basically extends the relevant logic so that it also works for
dynamic shapes.
2024-10-02 09:44:13 +01:00
Rajveer Singh Bharadwaj
760ffa4736 [mlir][tensor] Apply InsertSliceOfTransferWriteOpFolder only when transfer_write overwrites all elements of insert_slice (#108803)
Resolves #101708

The updated logic now correctly checks if `transfer_write` completely
overwrites `insert_slice` and only then applies the rewrite for this
pattern.

This check currently covers static sizes, for dynamic sizes
value bounds analysis is needed (see `TODO:`).
2024-10-01 14:29:37 -07:00
Mehdi Amini
8b47711e84 Revert "CMake: Remove unnecessary dependencies on LLVM/MLIR" (#110594)
Reverts llvm/llvm-project#110362

Multiple bots are broken.
2024-10-01 00:44:21 +02:00
BARRET
4980f2177e CMake: Remove unnecessary dependencies on LLVM/MLIR (#110362)
There are some spurious libraries which can be removed.

I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
2024-09-30 23:57:13 +02:00
Andrzej Warzyński
bfde17834d [mlir] Update the return type of getNum{Dynamic|Scalable}Dims (#110472)
Updates the return type of `getNumDynamicDims` and `getNumScalableDims`
from `int64_t` to `size_t`. This is for consistency with other
helpers/methods that return "size" and to reduce the number of
`static_cast`s in various places.
2024-09-30 14:53:50 +01:00
Han-Chung Wang
a285ba7529 Revert "[mlir][tensor] Refine the semantics of createPadHighOp" (#110153) 2024-09-26 12:44:43 -07:00
Andrzej Warzyński
9c48a04328 [mlir][tensor] Refine the semantics of createPadHighOp (#109667)
Refine `createPadHighOp` so that the output tensor is required to be
statically shaped. This is to prevent the current behaviour, which is
incorrect:

>  // If `type` has dynamic dimensions the padding width is set to zero.

The actual padding width should be set to: `%new_dim - %old_dim`, where
%new_dim` and `%old_dim` are defined via e.g. `tensor.dim` Op applied to
output and input tensors, respectively.

This PR is an attempt to clarify the semantics surrounding dynamic
shapes in preparation for adding support for scalable vectors to the
pack/unpack logic in Tensor/Linalg (dynamic shapes is what we use to
model scalable (*) sizes at the Tensor/MemRef level).

(*) Scalable as in Arm's Scalable Vector Extension (SVE)
2024-09-26 16:18:46 +01:00
Andrzej Warzyński
c1826aeef3 [mlir][tensor] Add new helper hooks for RelayoutOp (#109642)
Implements two helper hooks for PackOp and UnPackOP, `getAllOuterDims`
and `getTiledOuterDims`, and adds them to RelayoutOp (that both PackOp
an UnPackOp inherit from).

This improves code re-use and also clarifies the meaning of "outer dims"
and "tiled outer dims".
2024-09-24 13:14:49 +01:00
MaheshRavishankar
d5f0969c96 [mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882)
Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of tiled/tiled+fused operations to see if they are
produced by `extract_slice` operations to populate the worklist used to
continue fusion. This implicit assumption does not always work. Instead
make the implementations of `getTiledImplementation` return the slices
to use to continue fusion.

This is a breaking change

- To continue to get the same behavior of
`scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree
implementation of `TilingInterface::getTiledImplementation` to return
the slices to continue fusion on. All in-tree implementations have been
adapted to this.
- This change touches parts that required a simplification to the
`ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a
`std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that
should be `std::nullopt` if fusion is not to be performed.

Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
2024-09-11 22:15:43 -07:00
Quinn Dawkins
6cc3bf7d1d [mlir][tensor] Add canonicalization to fold consecutive tensor.pad ops (#107302)
`tensor.pad(tensor.pad)` with the same constant padding value can be
combined into a single pad that pads to the sum of the high and low
padding amounts.
2024-09-09 11:05:37 -04:00
Longsheng Mou
ede40da1f8 [mlir][tensor] Add check for indices of tensor.gather (#106894)
This patch add a check for indices of `tensor.gather` and
`tensor.scatter`. For that the length of gather_dims/scatter_dims should
match the size of last dimension of the indices. Fix #94901.
2024-09-06 10:45:59 +08:00
Benoit Jacob
c1667f9099 Fix transpose->unpack folding pattern for the partial-tile case of unpack (#107271)
Just directly create the empty tensor of appropriate shape instead of
relying on `UnPackOp::createDestinationTensor` which is trying to infer
the destination shape, which isn't possible in general with the set of
paramters that it is taking.

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
2024-09-04 15:06:27 -04:00
yifeizh2
8d0816615f [MLIR][Tensor] Fix source/dest type check in UnPackOp canonicalize (#106094)
Fix `RankedTensorType` equality check in unpack op canonicalization.
2024-09-04 10:10:43 +08:00
Yun-Fly
c8763f04bf [mlir][tensor] Fix consumer fusion for tensor.pack without explicit outer_dims_perm attribute (#106687) 2024-09-04 09:19:09 +08:00
Christopher Bate
8bf69ceb00 Reapply "[mlir] NFC: fix dependence of (Tensor|Linalg|MemRef|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832)" (#105703)
Reapply the commit 43b5085667 with
additional fixes for building with
BUILD_SHARED_LIBS=ON.
2024-08-28 22:34:14 -06:00
Quinn Dawkins
91e57c6fa8 [mlir][tensor] Add TilingInterface support for fusing tensor.pad (#105892)
This adds implementations for the two TilingInterface methods required
for fusion to `tensor.pad`: `getIterationDomainTileFromResultTile` and
`generateResultTileValue`, allowing fusion of pad with a tiled consumer.
2024-08-23 19:10:04 -04:00
Yun-Fly
f06563a5c0 [mlir][tensor] Add consumer fusion for tensor.pack op. (#103715)
Add missing `getIterationDomainTileFromOperandTile` and `getTiledImplementationFromOperandTile` to `tensor.pack` and enable fusing it as a consumer. NOTE that, it only expects perfect tiling scenario without padding semantic currently.
2024-08-23 10:07:17 +08:00
Frank Schlimbach
681ae09722 [MLIR][mesh] moving shardinginterfaceimpl for tensor to tensor extension lib (#104913)
Follow-up to #102598 : as discussed, move tensor sharding implementation
into separate tensor extension lib.

@sogartar @yaochengji, could you take a look at this PR?
2024-08-21 11:59:44 +01:00
Christopher Bate
06fd808654 Revert "[mlir] NFC: fix dependence of (Tensor|Linalg|MemRef|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832)"
This reverts commit 43b5085667 since it
caused the build to break with BUILD_SHARED_LIBS=ON.
2024-08-20 03:46:29 +00:00
Christopher Bate
43b5085667 [mlir] NFC: fix dependence of (Tensor|Linalg|MemRef|Complex) dialects on LLVM Dialect and LLVM Core in CMake build (#104832)
This change removes dependencies declared as either 'LINK_LIBS' or
'LINK_COMPONENTS' across several MLIR libraries. The removed
dependencies appear
to be incorrect and may have been required in older versions of the
project.
These dependencies cause many high level dialects to have transitive
dependence on the LLVM dialect and the LLVM 'Core' library
('llvm/lib/IR').

Note that if using the 'Ninja' CMake generator, one can inspect the
dependencies
(including all transitive libraries) of any given MLIR target but using
the command `ninja -C <build dir> -t browse` and navigating to the
library
of interest in a web browser.
2024-08-19 18:49:22 -06:00
Ian Wood
a95ad2da36 [mlir] Add bubbling patterns for non intersecting reshapes (#103401)
Refactored @Max191's PR https://github.com/llvm/llvm-project/pull/94637
to move it to `Tensor`

From the original PR
>This PR adds fusion by expansion patterns to push a tensor.expand_shape
up through a tensor.collapse_shape with non-intersecting reassociations.
Sometimes parallel collapse_shape ops like this can block propagation of
expand_shape ops, so this allows them to pass through each other.

I'm not sure if I put the code/tests in the right places, so let me know
where those go if they aren't.

cc @MaheshRavishankar @hanhanW

---------

Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
2024-08-14 13:58:35 -07:00
Renato Golin
3968942f10 Revert "[mlir][mesh] adding shard-size control (#98145)"
This reverts commit fca69838ca.

Also reverts the fixup: "[mlir] Fix -Wunused-variable in MeshOps.cpp (NFC)"

This reverts commit fc737368fe.
2024-08-07 15:12:37 +01:00
Frank Schlimbach
fca69838ca [mlir][mesh] adding shard-size control (#98145)
- Replacing `#mesh.sharding` attribute with operation `mesh.sharding`
- extended semantics now allow providing optional `halo_sizes` and
`sharded_dims_sizes`
- internally a sharding is represented as a non-IR class
`mesh::MeshSharding`

What previously was
```mlir
%sharded0 = mesh.shard %arg0 <@mesh0, [[0]]> : tensor<4x8xf32>
%sharded1 = mesh.shard %arg1 <@mesh0, [[0]]> annotate_for_users : tensor<16x8xf32>
```
is now
```mlir
%sharding = mesh.sharding @mesh0, [[0]] : !mesh.sharding
%0 = mesh.shard %arg0 to %sharding : tensor<4x8xf32>
%1 = mesh.shard %arg1 to %sharding annotate_for_users : tensor<16x8xf32>
```
and allows additional annotations to control the shard sizes:
```mlir
mesh.mesh @mesh0 (shape = 4)
%sharding0 = mesh.sharding @mesh0, [[0]] halo_sizes = [1, 2] : !mesh.sharding
%0 = mesh.shard %arg0 to %sharding0 : tensor<4x8xf32>
%sharding1 = mesh.sharding @mesh0, [[0]] sharded_dims_sizes = [3, 5, 5, 3] : !mesh.sharding
%1 = mesh.shard %arg1 to %sharding1 annotate_for_users : tensor<16x8xf32>
```
- `mesh.shard` op accepts additional optional attribute `force`, useful
for halo updates
- Some initial spmdization support for the new semantics
- Support for `tensor.empty` reacting on `sharded_dims_sizes` and
`halo_sizes` in the sharding
- New collective operation `mesh.update_halo` as a spmdized target for
shardings with `halo_sizes`

@sogartar @yaochengji
2024-08-07 13:34:57 +01:00
Nikhil Kalra
84cc1865ef [mlir] Support DialectRegistry extension comparison (#101119)
`PassManager::run` loads the dependent dialects for each pass into the
current context prior to invoking the individual passes. If the
dependent dialect is already loaded into the context, this should be a
no-op. However, if there are extensions registered in the
`DialectRegistry`, the dependent dialects are unconditionally registered
into the context.

This poses a problem for dynamic pass pipelines, however, because they
will likely be executing while the context is in an immutable state
(because of the parent pass pipeline being run).

To solve this, we'll update the extension registration API on
`DialectRegistry` to require a type ID for each extension that is
registered. Then, instead of unconditionally registered dialects into a
context if extensions are present, we'll check against the extension
type IDs already present in the context's internal `DialectRegistry`.
The context will only be marked as dirty if there are net-new extension
types present in the `DialectRegistry` populated by
`PassManager::getDependentDialects`.

Note: this PR removes the `addExtension` overload that utilizes
`std::function` as the parameter. This is because `std::function` is
copyable and potentially allocates memory for the contained function so
we can't use the function pointer as the unique type ID for the
extension.

Downstream changes required:
- Existing `DialectExtension` subclasses will need a type ID to be
registered for each subclass. More details on how to register a type ID
can be found here:
8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)
- Existing uses of the `std::function` overload of `addExtension` will
need to be refactored into dedicated `DialectExtension` classes with
associated type IDs. The attached `std::function` can either be inlined
into or called directly from `DialectExtension::apply`.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-06 01:32:36 +02:00
Kazu Hirata
5262865aac [mlir] Construct SmallVector with ArrayRef (NFC) (#101896) 2024-08-04 11:43:05 -07:00
Rafael Ubal
38d0b2d174 [mlir] New canonicalization patterns for shape.shape_of and tensor.reshape (#98531)
This PR includes 3 new canonicalization patterns:

- Operation `shape.shape_of`: shape of reshape

```
// Before
func.func @f(%arg0: tensor<*xf32>, %arg1: tensor<?xindex>) -> tensor<?xindex> {
  %reshape = tensor.reshape %arg0(%arg1) : (tensor<*xf32>, tensor<?xindex>) -> tensor<*xf32>
  %0 = shape.shape_of %reshape : tensor<*xf32> -> tensor<?xindex>
  return %0 : tensor<?xindex>
}

// After
func.func @f(%arg0: tensor<*xf32>, %arg1: tensor<?xindex>) -> tensor<?xindex> {
  return %arg1 : tensor<?xindex>
}
```

- Operation `tensor.reshape`: reshape of reshape

```
// Before
func.func @fold_tensor_reshape(%arg0: tensor<*xf32>, %arg1: tensor<?xindex>, %arg2: tensor<?xindex>) -> tensor<*xf32> {
  %0 = tensor.reshape %arg0(%arg1) : (tensor<*xf32>, tensor<?xindex>) -> tensor<*xf32>
  %1 = tensor.reshape %0(%arg2) : (tensor<*xf32>, tensor<?xindex>) -> tensor<*xf32>
  return %1 : tensor<*xf32>
}

// After
func.func @fold_tensor_reshape(%arg0: tensor<*xf32>, %arg1: tensor<?xindex>, %arg2: tensor<?xindex>) -> tensor<*xf32> {
  %reshape = tensor.reshape %arg0(%arg2) : (tensor<*xf32>, tensor<?xindex>) -> tensor<*xf32>
  return %reshape : tensor<*xf32>
}
```

- Operation `tensor.reshape`: reshape 1D to 1D

```
// Before
func.func @fold_reshape_1d(%input: tensor<?xf32>, %shape: tensor<1xindex>) -> tensor<?xf32> {
  %0 = tensor.reshape %input(%shape) : (tensor<?xf32>, tensor<1xindex>) -> tensor<?xf32>
  return %0 : tensor<?xf32>
}

// After
func.func @fold_reshape_1d(%arg0: tensor<?xf32>, %arg1: tensor<1xindex>) -> tensor<?xf32> {
  return %arg0 : tensor<?xf32>
}
```

These three canonicalization patterns cooperate to simplify the IR
structure emerging from the lowering of certain element-wise ops with
unranked tensor inputs. See file `unranked-tensor-lowering.mlir` in the
proposed change list for a detailed example and description.

For context, this PR is meant to enable code optimizations for the code
generated while lowering ops `quant.qcast` and `quant.dcast` with
unranked tensors, as proposed in
https://discourse.llvm.org/t/rfc-improvements-in-the-quant-dialect/79942
(implementation currently in progress).
2024-07-19 10:09:31 -04:00
MaheshRavishankar
c077a4f305 [mlir][Tensor] Add pattern to fold concats of empty. (#98994)
A concatenation of empty tensors can be replaced by a single empty
tensor of the concatenated shape. Add this pattern to
`populateFoldTensorEmptyPatterns`.
2024-07-17 09:51:00 -07:00
donald chen
d69e94916e [mlir] [linalg] Fix bufferize error in tensor.parallel_insert_slice op (#98312)
tensor.parallel_insert_slice op has implicit inplace behavior. In the
"copy-before-write" bufferize mode, the resolveConflict function will
generate bufferize.copy, making the result incorrect. This patch fixes
this issue.
2024-07-11 20:16:06 +08:00
Max191
c9529f7601 [mlir] Drop outermost dims in slice rank reduction inference (#95020)
The `getDroppedDims` utility function does not follow the convention of
dropping outermost unit dimensions first when inferring a rank reduction
mask for a slice. This PR updates the implementation to match this
convention.
2024-06-25 12:33:02 -04:00
Ramkumar Ramachandra
0fb216fb2f mlir/MathExtras: consolidate with llvm/MathExtras (#95087)
This patch is part of a project to move the Presburger library into
LLVM.
2024-06-11 23:00:02 +01:00
Prashant Kumar
1752740f4b [mlir][tensor] Fix FoldTensorCastProducerOp for multiple result operations (#93374)
For patterns where there are multiple results apart from dpsInits, this
fails.
E.g.:
```
%13:2 = iree_codegen.ukernel.generic "iree_uk_unpack"
ins(%extracted_slice : tensor<?x1x16x16xf32>) outs(%11 :
tensor<?x?xf32>) ... -> tensor<?x?xf32>, i32
``` 
The above op has results apart from dpsInit and hence fails. The PR
assumes that the result has dpsInits followed by nonDpsInits.
2024-06-07 11:22:36 +05:30
Max191
7ef83f5561 [mlir] Add pack/unpack transpose foldings for linalg.generic ops, fix bugs (#93055)
This PR adds transpose + pack/unpack folding support for transpose ops
in the form of `linalg.generic` ops. There were also some bugs with the
permutation composing in the previous patterns, so this PR fixes these
bugs and adds tests for them as well.
2024-06-06 10:54:27 -04:00
Spenser Bauman
a9205c5c9d [mlir][tensor] Implement constant folder for tensor.pad (#92691)
Extend the folding ability of the RewriteAsConstant patterns to include
tensor.pad operations on constants. The new pattern with constant fold
tensor.pad operations which operate on tensor constants and have
statically resolvable padding sizes/values.

    %init = arith.constant dense<[[6, 7], [8, 9]]> : tensor<2x2xi32>
    %pad_value = arith.constant 0 : i32

    %0 = tensor.pad %init low[1, 1] high[1, 1] {
      ^bb0(%arg1: index, %arg2: index):
        tensor.yield %pad_value : i32
    } : tensor<2x2xi32> to tensor<4x4xi32>

becomes

    %cst = arith.constant dense<[[0, 0, 0, 0],
                                 [0, 6, 7, 0],
                                 [0, 8, 9, 0],
                                 [0, 0, 0, 0]]> : tensor<4x4xi32>

Co-authored-by: Spenser Bauman <sabauma@fastmail>
2024-06-06 10:22:16 -04:00
Abhishek Varma
2b2ce50fe8 [MLIR][SCF] Add an API to fuse consumer to a producer within scf loop (#88712)
This commit adds an API (`tileAndFuseConsumerOfSlice`) to fuse consumer to a producer within
scf.for/scf.forall loop.

To support this two new methods are added to the `TilingInterface`
- `getIterationDomainTileFromOperandTile`
- `getTiledImplementationFromOperandTile`.

Consumer operations that implement this method can be used to be fused with tiled producer operands in a manner similar to (but essentially the inverse of) the fusion of an untiled producer with a tiled consumer.

Note that this only does one `tiled producer` -> `consumer` fusion. This could be called repeatedly for fusing multiple consumers. The current implementation also is conservative in when this kicks in (like single use of the value returned by the inter-tile loops that surround the tiled producer, etc.) These can be relaxed over time.

Signed-off-by: Abhishek Varma <abhvarma@amd.com>

---------

Signed-off-by: Abhishek Varma <abhvarma@amd.com>
Signed-off-by: Abhishek Varma <avarma094@gmail.com>
Co-authored-by: cxy <chenxunyu1993@gmail.com>
2024-06-01 11:23:41 -07:00
Han-Chung Wang
2db190fda6 [mlir][tensor][NFC] Move function comments to where they are declared. (#94002)
According to LLVM style guide, we prefer putting the documentation
comments for public APIs into the header file.

See
https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments
for more details.
2024-05-31 13:21:44 -07:00
Adam Siemieniuk
8f4d5a32ac [mlir][tensor] Fold unpadding collapse_shape into extract_slice (#93554) 2024-05-31 13:29:40 +02:00
Kunwar Grover
debdbeda15 [mlir] Remove dialect specific bufferization passes (Reland) (#93535)
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.

Relands https://github.com/llvm/llvm-project/pull/93488 which failed on
buildbot. Fixes the failure by updating integration tests to use
one-shot-bufferize instead.
2024-05-28 20:04:27 +01:00
Kunwar Grover
39848d0a98 Revert "[mlir] Remove dialect specific bufferization passes" (#93528)
Reverts llvm/llvm-project#93488

Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/220/builds/39911
2024-05-28 11:21:34 +01:00
Kunwar Grover
2fc5106437 [mlir] Remove dialect specific bufferization passes (#93488)
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.
2024-05-28 11:12:58 +01:00
Adam Siemieniuk
a79a0c5288 [mlir][tensor] Simplify pad-like tensor pack and unpack (#92388)
Extend existing tensor patterns to simplify pad-like tensor pack/unpack
into expand/collapse shape operations.
2024-05-24 10:25:42 +02:00