Collapsing / expanding a splatted value can be replaced with a single `tensor.splat` operation. Replace
these cases with a simple `tensor.splat` operation.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D140552
std::optional::value() has undesired exception checking semantics and is
unavailable in older Xcode (see _LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS). The
call sites block std::optional migration.
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
The main issue of tiling unpack op is about incomplete tile. Since all
the dimensions are orthogonal, discussing 1-d unpack case is enough. The
core idea is to make the input slice have complete tiles. In this case,
a larger unpacked tile will be created. We'll need an extract_slice op
to shift and truncate the output.
Take Nn_to_N as an example. Say that N=32, n=8, and tiling_size=15. The
coordinates of second tile (i.e., result[15..31]) are [(1, 7), (2, 0,),
(2, 1) ... (3, 6), (3, 7)]. The first row and the last row are
incomplete in terms of inputs. It's impossible to represent an unpack op
using the coordinates. Because the input has higher rank and the math
computation of coordinate is using mod and ceilDiv. That's very tricky.
To represent the unpack op, we have to complete the rows. I.e., the
input coordinates would start with (1, 0); end with (3, 7). In this
context, the tiled unpack produces a (3 * n) elements because there are
3 rows in total. Follow by a tensor.extract_slice op, we can get the
actual result.
If the tiling sizes are multiple of inner tile sizes, it is a perfect
tiling case. In this context, the larger input and output is not needed.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D139362
The op is not bufferizable but should be analyzable (for `EliminateEmptyTensors`, which uses the bufferization infrastructure).
Also improve debugging functionality and error messages.
Also adds a missing pass to the sparse pipeline. (tensor.empty should be replaced with bufferization.alloc_tensor, but it sometimes used to work without depending on how the tensor.empty is used. Now we always fail explicitly.)
At the moment, they are a part of EmptyOp::getCanonicalizationPatterns. When
extract_slice(tensor.empty) is rewritten as a new tensor.empty, it could
happen that we end up with two tensor.empty ops, since the original
tensor.empty can have two users. After bufferization such cases result in two
allocations.
Differential Revision: https://reviews.llvm.org/D139308
It introduces a pattern that swaps `linalg.generic + tensor.pack` to
`tensor.pack + linalg.generic`. It requires all the iteration types
being parallel; the indexing map of output operand is identiy. They can
all be relaxed in the future.
The user can decide whether the propagation should be applied or not by
passing a control function.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138882
This revision adapts the pattern in LinAlg to work on DPS interface, and
adds it to canonicalization patterns of tensor dialect. The
InsertSliceOp is skipped in the pattern because it has its own logic
about folding tensor.cast ops.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D139375
We can compute the offsets and sizes for the slice of input because the
iteration domain is defined over outer loops. If the dimension is tiled,
the i-th index is the product of offset_i and inner_tile_i.
Different from tiling a pad op, we do not have to deal with reading zero
data from input. Because the tiling sizes are indicated to packed outer
dimensions. We will read either the entire tile or partial tile for each
packed tile. The scf.if and tensor.generate ops are not needed in this
context.
Co-authored-by: Lorenzo Chelini <l.chelini@icloud.com>
Reviewed By: rengolin, mravishankar
Differential Revision: https://reviews.llvm.org/D138631
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
The `paddingValue` and `outerDimsPerm` are optional to the op;
`innerTiles` can be variadic in terms of static sizes and dynamic sizes.
Add a custom builder for building pack op easier.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138860
This reverts D132662 (apart from overall cleanups), which introduced a too aggressive optimization for tensor.insert_slice bufferization. Instead, bufferizesToMemoryRead is improved to handle some of these cases. The remaining cases can still bufferize efficiently when running the canonicalizer before the bufferization.
Differential Revision: https://reviews.llvm.org/D138745
Do not generate CollapseShapeOps/ExpandShapeOps that have the same source and result shape. Generate casts instead. Such reshapes became invalid with D138498.
Differential Revision: https://reviews.llvm.org/D138557
Avoid duplicate code by using an existing helper function to interchange
a vector based on a permutation. Address comments emerged after landing
D138119.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D138480
Pack and Unpack return new tensors within which the individual elements
are reshuffled according to the packing specification. This has the
consequence of modifying the canonical order in which a given operator
(i.e., Matmul) accesses the individual elements. After bufferization,
this typically translates to increased access locality and cache
behavior improvement, e.g., eliminating cache line splitting.
Co-authored-by: Mahesh Ravishankar <ravishankarm@google.com>
Co-authored-by: Han-Chung Wang <hanchung@google.com>
RFC: https://discourse.llvm.org/t/rfc-tensor-pack-and-tensor-unpack/66408/1
Reviewed By: nicolasvasilache, rengolin, hanchung
Differential Revision: https://reviews.llvm.org/D138119
MemRef has been accepting a general Attribute as memory space for
a long time. This commits updates bufferization side to catch up,
which allows downstream users to plugin customized symbolic memory
space. This also eliminates quite a few `getMemorySpaceAsInt`
calls, which is deprecated.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D138330
The `RankedTensorType` can have an optional encoding
attribute. Allowing the builders of `tensor.empty` to accept the
encoding attribute (optionally), allows building empty tensors with
the type having the encoding attribute.
Reviewed By: nicolasvasilache, hanchung, springerm
Differential Revision: https://reviews.llvm.org/D137297
This change adds memory space support to tensor.pad. (tensor.generate and tensor.from_elements do not support memory spaces yet.)
The memory space is inferred from the buffer of the source tensor.
Instead of lowering tensor.pad to tensor.generate + tensor.insert_slice, it is now lowered to bufferization.alloc_tensor (with the correct memory space) + linalg.map + tensor.insert_slice.
Memory space support for the remaining two tensor ops is left for a later point, as this requires some more design discussions.
Differential Revision: https://reviews.llvm.org/D136265
There is no memref equivalent of tensor.generate. The purpose of this change is to avoid creating scf.parallel loops during bufferization.
Differential Revision: https://reviews.llvm.org/D136767
tensor.insert and tensor.insert_slice (as destination style ops) do no longer need to implement the entire BufferizableOpInterface.
Differential Revision: https://reviews.llvm.org/D136347
When writing a tensor.extract/tensor.insert, the rank of the tensor is implied by the number of specified indices. When extracting from/inserting into an unranked tensor, it should first be casted to a ranked version.
Differential Revision: https://reviews.llvm.org/D136756
Drop the `createPadScalarOp` from Utils.h since it is a duplicate of
the `build` method added here.
Differential Revision: https://reviews.llvm.org/D136493
`getDestinationOperands` was almost a duplicate of `DestinationStyleOpInterface::getOutputOperands`. Now that the interface has been moved to mlir/Interfaces, it is no longer needed.
Differential Revision: https://reviews.llvm.org/D136240
Prior to this change, the "ExtractSliceFromReshape" pattern would transform
```
%collapsed = tensor.collapse_shape %input [[0, 1], [2]]
: tensor<1x11x100xf32> into tensor<11x100xf32>
%slice = tensor.extract_slice %collapsed [%offt, 0] [%size, 100] [1, 1]
: tensor<11x100xf32> to tensor<?x100xf32>
```
into a loop that iterated over the range `%size - %offt`, that pieces
together multiple sub-slices of `%input` along the first dimension. This
is correct but obviously inefficient. The technical condition is that
collapsing at-most-one non-unit dimension of `%src` will not result in a
subsequent slice along the corresponding dimension of `%collapsed`
mapping across discontinuities in the index space of `%src`. Thus, the
definition of a "linearized dimension" (from the perspective of
`tensor.collapse_shape`) is updated to reflect this condition.
The transform will now generate
```
%slice = tensor.extract_slice %input [0, %offt, 0][1, %size, 100] [1, 1]
: tensor<1x11x100xf32> to tensor<1x?x100xf32>
%result = tensor.collapse_shape [[0, 1], [2]]
: tensor<1x?x100xf32> to tensor<?x100xf32>
```
which can be further canonicalized.
Additional tests are added to check this family of edge cases.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D135726
The assert is misplaced as the result type is allowed to be null. A few
lines below the result type is inferred if it is passed a nullptr.
Besides, this behavior is described in the documentation of the builder.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D136262
These operations have undefined behavior if the index is not less than the rank of the source tensor / memref, so they cannot be freely speculated like they were before this patch. After this patch we speculate them only if we can prove that they don't have UB.
Depends on D135505.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D135748
Inserting a tensor into an equivalent tensor is a no-op after bufferization. No alloc is needed.
Differential Revision: https://reviews.llvm.org/D132662