At the moment, they are a part of EmptyOp::getCanonicalizationPatterns. When
extract_slice(tensor.empty) is rewritten as a new tensor.empty, it could
happen that we end up with two tensor.empty ops, since the original
tensor.empty can have two users. After bufferization such cases result in two
allocations.
Differential Revision: https://reviews.llvm.org/D139308
The only way to do this with the current hoisting strategy is by
lowering Affine to Scf first, but that prevents further passes on
Affine.
Differential Revision: https://reviews.llvm.org/D137600
This adds a tile_size parameter, when it is used the tiles are
cyclically distributed onto the threads of the scf.foreach_thread op.
Differential Revision: https://reviews.llvm.org/D139474
It introduces a pattern that swaps `linalg.generic + tensor.pack` to
`tensor.pack + linalg.generic`. It requires all the iteration types
being parallel; the indexing map of output operand is identiy. They can
all be relaxed in the future.
The user can decide whether the propagation should be applied or not by
passing a control function.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138882
The only way to do this with the current hoisting strategy is by
lowering Affine to Scf first, but that prevents further passes on
Affine.
Differential Revision: https://reviews.llvm.org/D137600
Incrementing `counter` variable is inside the if statement. If the code does not enter there, the while loop will iterate infinitely. This revision moves the codes outside of if statement.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D139005
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
RewriterBase is the proper builder to use so one can listen to IR modifications (i.e. not just creation).
Differential Revision: https://reviews.llvm.org/D137922
This fixes the case where scf::LoopNest::loops is empty.
Change LoopVector and ValueVector to SmallVector.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D136926
IREE's passes depend on the behavior of SplitReduction's introduced
parallel loop being the same as the introduced dimension in the
intermediate tensor (the order of loops was changed in
https://reviews.llvm.org/D137478).
Differential Revision: https://reviews.llvm.org/D138972
Relax linalg elementwise fusion check to allow mixed consumers. Producer is still required to be fully tensor to avoid potential memref aliasing.
Differential Revision: https://reviews.llvm.org/D138759
The output operands will be added to input operands if the generic op (on tensors)
becomes an elementwise operation. The outputs of the generic op is still the same.
They will be cleaned up by ReplaceWithEmptyTensorIfUnused pattern.
This is https://reviews.llvm.org/D138251, plus a cmake dep fix.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138843
This revision exports `collapseGenericOpIterationDims` to a header so it can be used outside of the pattern. We have use-case where we want to call this function directly.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D138697
The output operands will be added to input operands if the generic op (on tensors)
becomes an elementwise operation. The outputs of the generic op is still the same.
They will be cleaned up by ReplaceWithEmptyTensorIfUnused pattern.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138251
Elementwise op fusion conserves the result of the producer in the
fused op, relying on later clean up patterns to drop unused results of
the fused op. Instead, if the producer result has no other use apart
from the consumer op, avoid making the producer result available in
the fused node. This saves some unnecessary IR manipulations.
Differential Revision: https://reviews.llvm.org/D138096
dimension where the extra parallel dimension is inserted
Currently, the innerParallel and non innerParallel strategies use two
different ways to fix for where the extra loop is inserted and where the
extra dimension for the intermediate result is inserted - innerParallel
adds the extra (parallel) loop right after the pre-existing reduction
loop, whereas non innerParallel adds the reduction loop in the successor
to the index supplied by control, and the parallel loop in the index
supplied by the control. The semantics of the index supplied by the
control is supposed to only control where the extra tensor dimension is
inserted in the intermediate tensor. Conflating this index with where
the reduction (and parallel) loops are inserted leads to more complex
(and confusing) logic overall. This differential removes conflating the
two uses of the index, and keeps the reduction and parallel loops in the
same vicinity and uses the supplied index to only determine the position
of the extra tensor dimension. It also simplifies the code by merging
the two strategies in a lot more places.
Differential Revision: https://reviews.llvm.org/D137478
The patterns to remove dead arguments and results of `linalg.generic`
operations are not necessarily canonicalizations. Instead a new entry
point `populateEraseUnusedOperandsAndResults` is added to allow using
these patterns when needed. The transformations that rely on this
pattern for cleanup now include these patterns explicitly.
Differential Revision: https://reviews.llvm.org/D138085
This adds a transformation to tile reduction operations to partial
reduction using scf.foreachthread. This uses
PartialReductionOpInterface to create a merge operation of the partial
tiles.
Differential Revision: https://reviews.llvm.org/D137912
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```
It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.
The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137413
During elementwise fusion the fillOp's value was directly
referred without casting which can create mismatching dtypes.
Reviewed By: mravishankar, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D137447
[RFC: EnumAttr for iterator types in Linalg](https://discourse.llvm.org/t/rfc-enumattr-for-iterator-types-in-linalg/64535)
This affect touches and probably breaks most of the code that creates `linalg.generic`. A fix would be to replace calls to `getParallelIteratorTypeName/getReductionIteratorTypeName` with `mlir::utils::IteratorType::parallel/reduction` and types from `StringRef` to `mlir::utils::IteratorType`.
Due to limitations of tablegen, shared C++ definition of IteratorType enum lives in StructuredOpsUtils.td, but each dialect should have it's own EnumAttr wrapper. To avoid conflict, all enums in a dialect are put into a separate file with a separate tablegen rule.
Test dialect td files are refactored a bit.
Printed format of `linalg.generic` temporarily remains unchanged to avoid breaking code and tests in the same change.
Differential Revision: https://reviews.llvm.org/D137658
Vectorization of Linalg's depthwise convolution only supports floating
point types. Previous version assumed floating point operations would
work. This version checks whether the computation is integer or floating
point and adjust the inner loop computation.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D137595
Those methods were added long time ago. Now we get the same methods generated by tablegen, so there is no need for duplicates.
Differential Revision: https://reviews.llvm.org/D137544
Add a transformation to tile reduction ops into a parallel operation
followed by a merge operation. This is equivalent to the existing
reduction spliting transformation but using loops instead of using
higher dimensions linalg.
Differential Revision: https://reviews.llvm.org/D136586
`getDestinationOperands` was almost a duplicate of `DestinationStyleOpInterface::getOutputOperands`. Now that the interface has been moved to mlir/Interfaces, it is no longer needed.
Differential Revision: https://reviews.llvm.org/D136240
This is the second (and final) step of making "destination style" usable without depending on the Linalg dialect. (The first step was D135129.)
This change allows us to provide default bufferization implementations for all destination-style ops. It also allows us to simplify `TilingInterface`. (E.g., `getDestinationOperands` can be removed.)
Differential Revision: https://reviews.llvm.org/D136179
This patch implements the vectorization of tensor.extract for the
basic 1-d lookup case. It only vectorizes the tensor.extract to a
vector.gather when the op extracts value from an 1-d tensor.
Related discussion: https://github.com/iree-org/iree/issues/9198
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D133786
`getIteratorTypesArray` should be used instead. It's a better substitute for all the current usages of the interface.
The current `ArrayAttr iterator_types()` has a few problems:
* It creates an assumption operation has iterators types as an attribute, but it's not always the case. Sometime iterator types can be inferred from other attribute, or they're just static.
* ArrayAttr is an obscure contained and required extracting values in the client code.
* Makes it hard to migrate iterator types from strings to enums ([RFC](https://discourse.llvm.org/t/rfc-enumattr-for-iterator-types-in-linalg/64535/9)).
Concrete ops, like `linalg.generic` will still have iterator types as an attribute if needed.
As a side effect, this change helps a bit with migration to prefixed accessors.
Differential Revision: https://reviews.llvm.org/D135765