Commit Graph

182 Commits

Author SHA1 Message Date
Han-Chung Wang
2472c45ba3 [mlir][tensor] Enhance pack/unpack simplification for identity outer_dims_perm cases. (#77409)
They can be simplified to reshape ops if outer_dims_perm is an identity
permutation. The revision adds a `isIdentityPermutation` method to
IndexingUtils.
2024-01-10 08:30:34 -08:00
Prathamesh Tagore
113bce0c79 [mlir][tensor] Fold producer linalg transpose with consumer tensor pack (#75658)
Successor to https://github.com/llvm/llvm-project/pull/74206 

Partial fix to https://github.com/openxla/iree/issues/15367
2024-01-10 06:55:27 -08:00
Han-Chung Wang
76cb0bb7a4 [mlir][tensor] Add a pattern to simplify tensor.unpack to collpase shape (#76607) 2024-01-03 09:34:52 -08:00
Han-Chung Wang
78348b6915 [mlir][tensor] Improve tensor.pack simplication pattern. (#76606)
A tensor.pack op can be rewritten to a tensor.expand_shape op if the
packing only happens on inner most dimension.

This also formats the lit checks better.
2024-01-02 09:34:24 -08:00
Han-Chung Wang
4b14205bc0 [mlir][tensor] Centralize pack/unpack related patterns. (#76603)
The revision moves pack/unpack related patterns to
PackAndUnpackPatterns.cpp. This follows the convention like other tensor
ops.

It also renames `populateSimplifyTensorPack` to
`populateSimplifyPackAndUnpackPatterns` and adds a TODO item for
tensor.unpack op.
2023-12-30 11:40:40 -08:00
Prathamesh Tagore
f397bdf5ae [mlir][tensor] Fold consumer linalg transpose with producer tensor pack (#74206)
Partial fix to https://github.com/openxla/iree/issues/15367
2023-12-13 14:26:19 -08:00
Quinn Dawkins
f310a5d2c1 [mlir][tensor] Add a tensor.concat operation (#72779)
This adds an operation for concatenating ranked tensors along a static
dimension, as well as a decomposition mirroring the existing lowering
from TOSA to Tensor. This offers a convergence point for "input" like
dialects that include various lowerings for concatenation operations,
easing later analysis. In the future, this op can implement the
necessary interfaces for tiling, as well as potentially add conversions
to some kind of linalg and/or memref counterpart.

This patch adds the op, the decomposition, and some basic
folding/canonicalization. Replacing lowerings with the op (such as the
TOSA lowering) will come as a follow up.

See
https://discourse.llvm.org/t/rfc-tensor-add-a-tensor-concatenate-operation/74858
2023-12-01 15:05:29 -05:00
Felix Schneider
6343ee7292 [mlir] Fix handling of "no rank reduction" case in two Patterns (#71293)
This patch fixes two checks where a `SmallBitVector` containing the
potential dropped dims of a SubView/ExtractSlice operation was queried
via `empty()` instead of `none()`.
2023-11-10 08:20:51 +01:00
Matthias Springer
ff614a5729 [mlir][Interfaces] LISH: Add helpers for hyperrectangular subsets (#70628)
The majority of subset ops operate on hyperrectangular subsets. This
commit adds a new optional interface method
(`getAccessedHyperrectangularSlice`) that can be implemented by such
subset ops. If implemented, the other `operatesOn...` interface methods
of the `SubsetOpInterface` do not have to be implemented anymore.

The comparison logic for hyperrectangular subsets (is
disjoint/equivalent) is implemented with `ValueBoundsOpInterface`. This
makes the subset hoisting more powerful: simple cases where two
different SSA values always have the same runtime value can now be
supported.
2023-11-01 11:29:00 +09:00
Matthias Springer
1abd8d1a8d [mlir][Interfaces] Add SubsetOpInterface and SubsetExtractionOpInterface (#70617)
There is currently an op interface for subset insertion ops
(`SubsetInsertionOpInterface`), but not for subset extraction ops. This
commit adds `SubsetExtractionOpInterface` to `mlir/Interfaces`, as well
as a common dependent op interface: `SubsetOpInterface`.

- `SubsetOpInterface` is for ops that operate on tensor subsets. It
provides interface methods to check if two subset ops operate on
equivalent or disjoint subsets. Ops that implement this interface must
implement either `SubsetExtractionOpInterface` or
`SubsetInsertionOpInterface`.
- `SubsetExtractionOpInterface` is for ops that extract from a tensor at
a subset. E.g., `tensor.extract_slice`, `tensor.gather`,
`vector.transfer_read`. Current implemented only on
`tensor.extract_slice`.
- `SubsetInsertionOpInterface` is for ops that insert into a destination
tensor at a subset. E.g., `tensor.insert_slice`,
`tensor.parallel_insert_slice`, `tensor.scatter`,
`vector.transfer_write`. Currently only implemented on
`tensor.insert_slice`, `tensor.parallel_insert_slice`.

Other changes:
- Rename `SubsetInsertionOpInterface.td` to `SubsetOpInterface.td`.
- Add helper functions to `ValueBoundsOpInterface.cpp` for checking
whether two slices are disjoint.

The new interfaces will be utilized by a new "loop-invariant subset
hoisting"
transformation. (This new transform is roughly
what `Linalg/Transforms/SubsetHoisting.cpp` is doing, but in a generic
and interface-driven way.)
2023-11-01 10:26:31 +09:00
Matthias Springer
a8d0c86174 [mlir][Interfaces][NFC] Move SubsetInsertionOpInterface to mlir/Interfaces (#70615)
`SubsetInsertionOpInterface` is an interface for ops that insert into a
destination tensor at a subset. It is currently used by the
bufferization framework to support efficient
`tensor.extract_slice/insert_slice` bufferization and to drive "empty
tensor elimination".

This commit moves the interface to `mlir/Interfaces`. This is in
preparation of adding a new "loop-invariant subset hoisting"
transformation to
`mlir/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp`, which will
utilize `SubsetInsertionOpInterface`. (This new transform is roughly
what `Linalg/Transforms/SubsetHoisting.cpp` is doing, but in a generic
and interface-driven way.)
2023-10-30 13:42:44 +09:00
Matthias Springer
5558504374 [mlir][IR] Make OpOperand comparable (#70410)
Two `OpOperand`s are the same if they belong to the same owner and have
the same operand number. There are currently no comparison operators
defined on `OpOperand` and we work around this in multiple places by
comparing pointers.

Note: `OpOperand`s are stored in an op, so it is valid to compare their
pointers to determine if they are the same operand. E.g.,
`getOperandNumber` is also implemented via pointer arithmetics.
2023-10-27 15:51:45 +09:00
Matthias Springer
2e3c62b15d [mlir][tensor][NFC] Simplify SubsetInsertionOpInterface implementation (#69999)
`tensor.insert_slice` and `tensor.parallel_insert_slice` can share the
same implementation.
2023-10-25 08:45:39 +09:00
Matthias Springer
ea71d2d0fe [mlir][tensor][bufferize] Reshapes: Fix memory side effects and memory space (#68195)
* `tensor.collapse_shape` may bufferize to a memory read because the op
may have to reallocate the source buffer.
* `tensor.reshape` should not use `bufferization.clone` for
reallocation. This op has requirements wrt. the order of buffer
writes/reads. Use `memref.alloc` and `memref.copy` instead. Also fix a
bug where the memory space of the source buffer was not propagated to
the reallocated buffer.
2023-10-05 14:33:04 +02:00
Matthias Springer
58678d3bcf [mlir][tensor][bufferize] tensor.empty bufferizes to allocation (#68201)
`BufferizableOpInterface::bufferizesToAllocation` is queried when
forming equivalence sets during bufferization. It is not really needed
for ops like `tensor.empty` which do not have tensor operands, but it
should be added for consistency.

This change should have been part of #68080. No test is added because
the return value of this function is irrelevant for ops without tensor
operands. (However, this function acts as a form documentation,
describing the bufferization semantics of the op.)
2023-10-05 14:06:00 +02:00
Matthias Springer
8823e961f6 [mlir][ODS] Change get...Mutable to return OpOperand & for single operands (#66519)
The TableGen code generator now generates C++ code that returns a single
`OpOperand &` for `get...Mutable` of operands that are not variadic and
not optional. `OpOperand::set`/`assign` can be used to set a value (same
as `MutableOperandRange::assign`). This is safer than
`MutableOperandRange` because only single values (and no longer
`ValueRange`) can be assigned.

E.g.:
```
// Assignment of multiple values to non-variadic operand.
// Before: Compiles, but produces invalid op.
// After: Compilation error.
extractSliceOp.getSourceMutable().assign({v1, v2});
```
2023-10-04 08:35:40 +02:00
Matthias Springer
464dfeba44 [mlir][tensor][bufferize] tensor.empty bufferizes to an allocation (#68080)
Make `tensor.empty` bufferizable, so that the
`-empty-tensor-to-alloc-tensor` pass becomes optional. This makes the
bufferization easier to use. `tensor.empty` used to be non-bufferizable,
so that there two separate ops, one that can be optimized away
(`tensor.empty`) and one that is guaranteed to bufferize to an
allocation (`bufferization.alloc_tensor`). With the recent improvements
of "empty tensor elimination" this is no longer needed and
`bufferization.alloc_tensor` can be phased out.
2023-10-03 16:00:37 +02:00
Spenser Bauman
0a0c7e8978 [mlir][tensor] Bufferize tensor.reshape with non-identity layouts (#65654)
Bufferization of tensor.reshape generates a memref.reshape operation.
memref.reshape requires the source memref to have an identity layout.
The bufferization process may result in the source memref having a
non-identity layout, resulting in a verification failure.

This change causes the bufferization interface for tensor.reshape to
copy the source memref to a new buffer when the source has a
non-identity layout.
2023-09-19 09:50:43 +09:00
Martin Erhart
6bf043e743 [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619)
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
2023-09-18 16:44:48 +02:00
Matthias Springer
0f952cfe24 [mlir][IR] Change MutableOperandRange::operator[] to return an OpOperand & (#66515)
`operator[]` returns `OpOperand &` instead of `Value`.

* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.

Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
2023-09-18 09:43:03 +02:00
Matthias Springer
a1ef5a9437 [mlir][bufferization] Empty tensor elimination based on SubsetOpInterface (#65766)
This commit generalizes empty tensor elimination to operate on subset
ops. No new test cases are added because all current subset ops were
already supported previously. From this perspective, this change is NFC.

A new interface method (and a helper method) are added to
`SubsetInsertionOpInterface` to build the subset of the destination
tensor.
2023-09-14 09:45:22 +02:00
Martin Erhart
c199f7dc62 Revert "[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute"
This reverts commit 6a91dfedeb.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:48 +00:00
Matthias Springer
8143307b33 [mlir][bufferization] Generalize tensor slice rules to subset ops (#65619)
This commit generalizes the special
tensor.extract_slice/tensor.insert_slice bufferization rules to tensor
subset ops.

Ops that insert a tensor into a tensor at a specified subset (e.g.,
tensor.insert_slice, tensor.scatter) can implement the
`SubsetInsertionOpInterface`.

Apart from adding a new op interface (extending the API), this change is
NFC. The only ops that currently implement the new interface are
tensor.insert_slice and tensor.parallel_insert_slice, and those ops were
are supported by One-Shot Bufferize.
2023-09-13 12:27:19 +02:00
Martin Erhart
6a91dfedeb [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.

This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.

The documentation should w.r.t. these pass option changes should also be
updated in this commit.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D156662
2023-09-13 09:30:22 +00:00
Matthias Springer
878950b82c [mlir][bufferization] Simplify getBufferType
`getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop.

To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument).

Also improve error messages in case of inconsistent memory spaces inside of a loop.

Differential Revision: https://reviews.llvm.org/D158060
2023-08-16 15:02:07 +02:00
Hanhan Wang
3cb0128d26 [mlir][tensor] Do not fold rank-reduced extract_slice into unpack op.
Differential Revision: https://reviews.llvm.org/D157927
2023-08-15 10:28:21 -07:00
Matthias Springer
a02ad6c177 [mlir][bufferization] Generalize getAliasingOpResults to getAliasingValues
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.

This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.

If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.

Differential Revision: https://reviews.llvm.org/D157957
2023-08-15 15:02:47 +02:00
Matthias Springer
867afe5e53 [mlir][vector] Remove duplicate tensor subset <-> vector transfer patterns
Remove patterns that fold tensor subset ops into vector transfer ops from the vector dialect. These patterns already exist in the tensor dialect.

Differential Revision: https://reviews.llvm.org/D154932
2023-07-11 11:12:29 +02:00
Matthias Springer
ef4f5357e3 [mlir][linalg] BufferizeToAllocationOp: Do not copy uninitialized buffers
Tensors/buffers that do not have any defined contents (e.g., `tensor.empty`) are no longer copied.

Differential Revision: https://reviews.llvm.org/D154081
2023-07-04 16:40:10 +02:00
Matthias Springer
6596b0dde8 [mlir][tensor] Clean up tensor::DimOp usage
* Remove duplicate functions. `tensor::getMixedSize` and `tensor::getMixedSizes` should be used.
* Use `tensor::getMixedSize` instead of `createOrFold<tensor::DimOp>`. This is more efficient. `createOrFold` will create an op an immediately try to fold it. In case of a static dimension size, an attribute can be used directly.

Differential Revision: https://reviews.llvm.org/D153332
2023-06-22 10:56:17 +02:00
Matthias Springer
efc290ce9c [mlir][affine] More efficient makeComposedFolded... helpers
The old code used to materialize constants as ops, immediately folded them into the resulting affine map and then deleted the constant ops again. Instead, directly fold the attributes into the affine map. Furthermore, all helpers accept `OpFoldResult` instead of `Value` now. This makes the code at call sites more efficient, because it is no longer necessary to materialize a `Value`, just to be able to use these helper functions.

Note: The API has changed (accepts OpFoldResult instead of Value), otherwise this change is NFC.

Differential Revision: https://reviews.llvm.org/D153324
2023-06-22 10:47:38 +02:00
Matthias Springer
9340996706 [mlir][tensor] Add pattern to rewrite tensor.generate as a constant
Only ops with a static tensor type and a constant yield value are rewritten.

Differential Revision: https://reviews.llvm.org/D152511
2023-06-09 12:56:07 +02:00
Matthias Springer
40052b08de [mlir][tensor] Add option to fold only tensor.empty with a single use
This is useful for transformations such as bufferization, which is looking for tensor.extract_slice/insert_slice pairs.

Also fix the documentation of the corresponding tranform op.

Differential Revision: https://reviews.llvm.org/D152455
2023-06-09 12:36:55 +02:00
Ingo Müller
9dbb8eefd4 [mlir][tensor] Implement getBufferType for ReshapeOp.
This function should be implemented for ops that work in one-shot
bufferization.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D151548
2023-06-02 13:40:48 +00:00
Matthias Springer
26864d8fb4 [mlir][tensor] Add pattern to drop redundant insert_slice rank expansion
Drop insert_slice rank expansions if they are directly followed by an inverse rank reduction.

Differential Revision: https://reviews.llvm.org/D151800
2023-06-01 08:47:53 +02:00
Ingo Müller
45aaa67fce [mlir][tensor] Fix one-shot bufferization of tensor.reshape.
I believe that the previous implementation did not work on any input. It
called getMemRefType with `layout = {}`, presumably with the intention
to create a MemrefType with identity layout. However, the implementation
of that function returns a MemrefType with *unknown* layout if it is
provided with a default-constructed layout attribute. This patch uses
getMemRefTypeWithStaticIdentityLayout instead, with has identical
behavior except for the case of a default-constructed layout, which it
passes on as-is to the MemrefType.

This problem did not surface in the test because tensor.reshape was not
tested with -one-shot-bufferize. This patch introduces a test copied
from the tests for -tesnor-bufferize adapted in as follows: since the
test is run with "bufferize-function-boundaries", a tensor that is
passed into the function is bufferized into a memref with unknown
layout, which wouldn't be a valid intput for memref.reshape, so the
tests now uses a tensor constructed with arith.constant inside of the
function.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D151544
2023-05-26 09:39:49 +00:00
Tres Popp
68f58812e3 [mlir] Move casting calls from methods to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
  for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.

Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
   additional check:
   main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
   and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
   them to a pure state.

```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
               -header-filter=mlir/ mlir/* -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```

Differential Revision: https://reviews.llvm.org/D151542
2023-05-26 10:29:55 +02:00
Matthias Springer
481b254e45 [mlir][tensor][bufferize] Bufferize tensor.splat op
The op bufferizes similarly to tensor.generate: it is lowered to a linalg.map, which may then lower to a loop nest that fills the buffer.

Differential Revision: https://reviews.llvm.org/D150952
2023-05-22 14:31:39 +02:00
Kai Sasaki
6cd7b655d8 [mlir][bufferization] Prevent crash in one shot bufferization with unranked tensor cast
One shot bufferization does not support bufferizing the cast between unranked tensors. To prevent the crash, we can check the compatibility of the result type in advance. Reported in https://github.com/llvm/llvm-project/issues/62369.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D149239
2023-05-19 08:54:43 +09:00
Tres Popp
5550c82189 [mlir] Move casting calls from methods to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
  for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.

Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
   additional check:
   https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
   and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
   them to a pure state.
4. Some changes have been deleted for the following reasons:
   - Some files had a variable also named cast
   - Some files had not included a header file that defines the cast
     functions
   - Some files are definitions of the classes that have the casting
     methods, so the code still refers to the method instead of the
     function without adding a prefix or removing the method declaration
     at the same time.

```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
               -header-filter=mlir/ mlir/* -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc

git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
            mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
            mlir/lib/**/IR/\
            mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
            mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
            mlir/test/lib/Dialect/Test/TestTypes.cpp\
            mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
            mlir/test/lib/Dialect/Test/TestAttributes.cpp\
            mlir/unittests/TableGen/EnumsGenTest.cpp\
            mlir/test/python/lib/PythonTestCAPI.cpp\
            mlir/include/mlir/IR/
```

Differential Revision: https://reviews.llvm.org/D150123
2023-05-12 11:21:25 +02:00
Matthias Springer
77124386fe [mlir][tensor] Add transform to make tensor.pad loop-independent
Add a transform to make `tensor.pad` and `tensor.empty` ops independent of SCF loop IVs. Such ops can then be hoisted.

E.g.:
```
scf.for %iv = %lb to %ub step %step {
  %high = affine.apply affine_map<(d0)[s0] -> (s0 - d0)> (%i)[%ub]
  %p = tensor.pad %t low[5] high[%high] ...
  ...
}
```
Is transformed to:
```
%high_new = affine.apply affine_map<()[s0, s1] -> (-s0 + s1)> ()[%lb, %ub]
%p_hoistable = tensor.pad %t low[5] high[%high_new]
%dim = tensor.dim %t, %c0
%size = affine.apply affine_map<(d0)[s0, s1] -> (-d0 + s0 + s1 + 5)>(%iv)[%ub, %dim]
%slice = tensor.extract_slice %p_hoistable [0] [%size] [1]
```

Differential Revision: https://reviews.llvm.org/D143910
2023-04-28 11:46:32 +09:00
Matthias Springer
4c48f016ef [mlir][Affine][NFC] Wrap dialect in "affine" namespace
This cleanup aligns the affine dialect with all the other dialects.

Differential Revision: https://reviews.llvm.org/D148687
2023-04-20 11:19:21 +09:00
Matthias Springer
7c06f63176 [mlir][tensor][bufferize] Fix dealloc placement in scf.forall op
The terminator of this op is special: it does not just yield a value,
but bufferizes to a memcpy. This requires special treatment to make sure
that deallocs are placed after the memcpy. (By default, deallocs are
placed right before the terminator.)

Differential Revision: https://reviews.llvm.org/D148408
2023-04-16 09:34:43 +09:00
Nicolas Vasilache
33468a51db [mlir][Tensor] Add support for insert_slice in FoldTensorSubsetOps
Differential Revision: https://reviews.llvm.org/D148334
2023-04-14 09:34:11 -07:00
Nicolas Vasilache
2031d7d66d [mlir][Tensor] Drop SplitPaddingPatterns.
These old patterns are not in use in either MLIR or downstream projects except for one test.
Additionally this is redundant with logic in the tensor.pad tiling implementation.

Drop SplitPaddingPatterns to reduce entropy.

Differential Revision: https://reviews.llvm.org/D148207
2023-04-13 03:38:29 -07:00
Nicolas Vasilache
4dc72d47ce [mlir][Tensor] Add a FoldTensorSubsetOps pass and patterns
These patterns follow FoldMemRefAliasOps which is further refactored for reuse.
In the process, fix FoldMemRefAliasOps handling of strides for vector.transfer ops which was previously incorrect.

These opt-in patterns generalize the existing canonicalizations on vector.transfer ops.
In the future the blanket canonicalizations will be retired.
They are kept for now to minimize porting disruptions.

Differential Revision: https://reviews.llvm.org/D146624
2023-03-23 04:03:27 -07:00
Mahesh Ravishankar
809e3d8c98 [mlir][TilingInterface] Modify TilingInterface methods to better return the state of the transformed IR.
Currently the `getTiledImplementation` and `generateResultTileValue`
return just `SmallVector<Operation *>` and `FailureOr<Value>`.

- For `getTiledImplementation` returning empty implies tiling wasnt
  done. There is also an implicit assumption that the tiled operation
  results correspond to the tiled values of the result of the original
  operation. This cannot handle cases where the tiled implementation
  might use multiple operations to compute the tiled value for the
  results of the untiled operation. Sometimes, the tiled operation
  might not directly give the tiled values, and might require casts,
  etc to get a replacement.
- For `generateResultTileValue`, it is assumed that the op defining
  the returned `Value` is the operation that represents the tiled
  computation. Again presence of casts, etc violate this.

Instead make these methods return
```
struct TilingResult {
  SmallVector<Operation *> tiledOps;
  SmallVector<Value> tiledValues;
};
```

The `tiledOps` represent the operations generated that are relevant
for subsequent transformations. The `tiledValues` represent the tiled
values for the results of the original operation. This better
transmits the state of the transformed IR.

As a consequence the following methods also return `FailureOr<TilingResult>`
- `tensor::replaceExtractSliceWithTiledProducer`
- `tensor::bubbleUpPadSlice`

Differential Revision: https://reviews.llvm.org/D145133
2023-03-16 14:29:03 +00:00
Jakub Kuderski
a0a76804c4 [ADT] Allow llvm::enumerate to enumerate over multiple ranges
This does not work by a mere composition of `enumerate` and `zip_equal`,
because C++17 does not allow for recursive expansion of structured
bindings.

This implementation uses `zippy` to manage the iteratees and adds the
stream of indices as the first zipped range. Because we have an upfront
assertion that all input ranges are of the same length, we only need to
check if the second range has ended during iteration.

As a consequence of using `zippy`, `enumerate` will now follow the
reference and lifetime semantics of the `zip*` family of functions. The
main difference is that `enumerate` exposes each tuple of references
through a new tuple-like type `enumerate_result`, with the familiar
`.index()` and `.value()` member functions.

Because the `enumerate_result` returned on dereference is a
temporary, enumeration result can no longer be used through an
lvalue ref.

Reviewed By: dblaikie, zero9178

Differential Revision: https://reviews.llvm.org/D144503
2023-03-15 19:34:22 -04:00
Matthias Springer
758329dc7c [mlir][NFC] reifyResultShapes: Add extra error checking
This change adds a new helper function `mlir::reifyResultShapes` that calls the corresponding interface method and also checks the result produced by the implementation when running in debug mode. Bugs due to incorrect interface implementations can be difficult to debug.

This helper function also reduces the amount of code needed at call sites: the cast to `ReifyRankedShapedTypeOpInterface` is done in the helper function.

Differential Revision: https://reviews.llvm.org/D145777
2023-03-10 11:37:54 +01:00
Matthias Springer
2a5b13e722 [mlir][Interfaces] ReifyRankedShapedTypeOpInterface returns OpFoldResults
`reifyResultShapes` now returns `OpFoldResult`s instead of `Value`s. This is often more efficient because many transformations immediately attempt to extract a constant from the reified values.

Differential Revision: https://reviews.llvm.org/D145250
2023-03-06 08:41:28 +01:00