This function used to create new ops even if the vectorization failed. Those ops were then folded away. This caused a failure of the GreedyPatternRewriter, which no longer terminated (each time the IR is modified => one more iteration).
Differential Revision: https://reviews.llvm.org/D140286
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
This patch introduces the initial bits to support vector masking
using the `vector.mask` operation. Vectorization changes should be
NFC for non-masked cases. We can't test masked cases directly until
we extend the Transform dialect to support masking.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137690
This patch implements the vectorization of tensor.extract for arbitrary
tensors. It basically extends https://reviews.llvm.org/D133786 by adding
support for n-D tensors (n >= 2). This is implemented by essentially
flattening the indices.
When benchmarking the vectorized code, we have observed that it is
slower than the scalar code. That's most likely due to sub-optimal (and,
in general slow) gather loads. More work is needed to identify an
implementation and/or a representation that would lead to better code.
In the meantime, the vectorization of n-D tensors (where n >= 2) has to
be explicitly enabled. This can be done either via:
* transfer dialect's `vectorize_nd_extract` attribute,
* dedicated bool argument in the `vectorize` method from
"Vectorization.cpp".
The second option was added to control the new functionality through
means other than the transfer dialect.
Related discussion: https://github.com/iree-org/iree/issues/9198
Differential Revision: https://reviews.llvm.org/D137660
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
RewriterBase is the proper builder to use so one can listen to IR modifications (i.e. not just creation).
Differential Revision: https://reviews.llvm.org/D137922
[RFC: EnumAttr for iterator types in Linalg](https://discourse.llvm.org/t/rfc-enumattr-for-iterator-types-in-linalg/64535)
This affect touches and probably breaks most of the code that creates `linalg.generic`. A fix would be to replace calls to `getParallelIteratorTypeName/getReductionIteratorTypeName` with `mlir::utils::IteratorType::parallel/reduction` and types from `StringRef` to `mlir::utils::IteratorType`.
Due to limitations of tablegen, shared C++ definition of IteratorType enum lives in StructuredOpsUtils.td, but each dialect should have it's own EnumAttr wrapper. To avoid conflict, all enums in a dialect are put into a separate file with a separate tablegen rule.
Test dialect td files are refactored a bit.
Printed format of `linalg.generic` temporarily remains unchanged to avoid breaking code and tests in the same change.
Differential Revision: https://reviews.llvm.org/D137658
Vectorization of Linalg's depthwise convolution only supports floating
point types. Previous version assumed floating point operations would
work. This version checks whether the computation is integer or floating
point and adjust the inner loop computation.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D137595
This patch implements the vectorization of tensor.extract for the
basic 1-d lookup case. It only vectorizes the tensor.extract to a
vector.gather when the op extracts value from an 1-d tensor.
Related discussion: https://github.com/iree-org/iree/issues/9198
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D133786
This patch exposes the method to check if an op can be vectorized or
not for downstream uses. Also adds a check to mark elementwise operations
that have non-vectorizable ops (like `tensor.extract`) as non vectorizable.
Reviewed By: nicolasvasilache, dcaballe, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D135201
tensor.empty/linalg.init_tensor produces an uninititalized tensor that can be used as a destination operand for destination-style ops (ops that implement `DestinationStyleOpInterface`).
This change makes it possible to implement `TilingInterface` for non-destination-style ops without depending on the Linalg dialect.
RFC: https://discourse.llvm.org/t/rfc-add-tensor-from-shape-operation/65101
Differential Revision: https://reviews.llvm.org/D135129
This patch fixes three warnings of the form:
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp:1436:5: error:
default label in switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]
Windows build requires brackets on switch-cases that initializes
variables.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D133889
Most computer vision torch models uses nchw/ncw convolution. In a previous patch we added decomposition conv2dNchw to conv1dNcw. To enhance the performance on torch models we add this vectorization pattern for conv1dNcw which would consquently also improve the performance on conv2dNchw.
On IREE + Intel Xeon 8360 + Resnet50, we were able to get ~7x speed up ~880ms to 126ms.
Reviewed By: nicolasvasilache, hanchung
Differential Revision: https://reviews.llvm.org/D133675
This is the first step in replacing interator_type from strings with enums in Vector and Linalg dialect. This change adds IteratorTypeAttr and uses it in ContractionOp.
To avoid breaking all the tests, print/parse code has conversion between string and enum for now.
There is a shared code in StructuredOpsUtils.h that expects iterator types to be strings. To break this dependancy, this change forks helper function `isParallelIterator` and `isReductionIterator` to utils in both dialects and adds `getIteratorTypeNames()` to support backward compatibility with StructuredGenerator.
In the later changes, I plan to add a similar enum attribute to Linalg.
Differential Revision: https://reviews.llvm.org/D133696
The current check for form of the output indexing maps disallows
generic ops that return both a reduced and unreduced value. Such an op
seems like it could fall within the scope of a Strucutred op. Drop the
check. The only load-bearing place this was found to cause isseus was
during vectorization, but the fix for that seems to be simple.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D132716
This allows vectorizing linalg reductions without changing the operation
order. Therefore this produce a valid vectorization even if operations
are not associative.
Differential Revision: https://reviews.llvm.org/D129535
The current vectorization logic implicitly expects "elementwise"
linalg ops to have projected permutations for indexing maps, but
the precondition logic misses this check. This can result in a
crash when executing the generic vectorization transform on an op
with a non-projected permutation input indexing map. This change
fixes the logic and adds a test (which crashes without this fix).
Differential Revision: https://reviews.llvm.org/D127000
Now that dialect constructors are generated in the .cpp file, we can
drop all of the dependent dialect includes from the .h file.
Differential Revision: https://reviews.llvm.org/D124298
Bubble up extract_slice above Linalg operation.
A sequence of operations
%0 = linalg.<op> ... arg0, arg1, ...
%1 = tensor.extract_slice %0 ...
can be replaced with
%0 = tensor.extract_slice %arg0
%1 = tensor.extract_slice %arg1
%2 = linalg.<op> ... %0, %1, ...
This results in the reduce computation of the linalg operation.
The implementation uses the tiling utility functions. One difference
from the tiling process is that we don't need to insert the checking
code for the out-of-bound accesses. The use of the slice itself
represents that the code writer is sure about the boundary condition.
To avoid adding the boundary condtion check code, `omitPartialTileCheck`
is introduced for the tiling utility functions.
Differential Revision: https://reviews.llvm.org/D122437
This has been on _Both for a couple of weeks. Flip usages in core with
intention to flip flag to _Prefixed in follow up. Needed to add a couple
of helper methods in AffineOps and Linalg to facilitate a pure flag flip
in follow up as some of these classes are used in templates and so
sensitive to Vector dialect changes.
Differential Revision: https://reviews.llvm.org/D122151
This provides a way to create an operation without manipulating
OperationState directly. This is useful for creating unregistered ops.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D120787
The Func has a large number of legacy dependencies carried over from the old
Standard dialect, which was pervasive and contained a large number of varied
operations. With the split of the standard dialect and its demise, a lot of lingering
dead dependencies have survived to the Func dialect. This commit removes a
large majority of then, greatly reducing the dependence surface area of the
Func dialect.