Made the createReadOrMaskedRead and isValidMaskedInputVector utility
functions - to be accessible outside of the CU. Needed by the IREE new
TopK implementation.
…se of tensor pack
When the vector sizes are not passed as inputs to the vector transform
operation, the vector sizes are queried from the static result shape in
the case of tensor.pack op.
This patch adds support for masked vectorisation of depthwise 1D WC
convolutions,`linalg.depthwise_conv_1d_nwc_wc`. This is implemented by
adding support for masking.
Two major assumptions are made:
* only the channel dimension can be dynamic/scalable (i.e. the
trailing dim),
* when specifying vector sizes to use in the vectoriser, only the size
corresponding to the channel dim is effectively used (other dims are
inferred from the context).
In terms of scalable vectorisation, this should be sufficient to cover
all practical cases (i.e. making arbitrary dim scalable wouldn't make
much sense). As for more generic cases with dynamic shapes (e.g. W or N
dims being dynamic), more work would be needed. In particular, one would
have to consider the filter and input/output tensors separately.
The reverse op is treated as a VectorMemoryAccessKind::Contiguous load.
It is contiguous slice, but we'll need to compute indices differently
and apply a reverse at vector level. It takes non-trivial efforts for
the approach. The revision flips the case to use vector.gather.
Otherwise there are functionality issues. E.g., the below example loaded
`2, 3, 4` (which is a bug), but what we want is `2, 1, 0`.
Before vectorization:
```mlir
func.func @vectorize_reverse_like_tensor_extract(%arg0: tensor<1x2x3xf32>, %arg1: tensor<1x1x3xf32>, %arg2: index) -> tensor<1x1x3xf32> {
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%c2 = arith.constant 2 : index
%0 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel", "parallel", "parallel"]} outs(%arg1 : tensor<1x1x3xf32>) {
^bb0(%out: f32):
%1 = linalg.index 1 : index
%2 = linalg.index 0 : index
%3 = affine.apply #map1(%1, %2, %arg2)
%4 = linalg.index 2 : index
%5 = arith.subi %c2, %4 : index
%extracted = tensor.extract %arg0[%c0, %3, %5] : tensor<1x2x3xf32>
linalg.yield %extracted : f32
} -> tensor<1x1x3xf32>
return %0 : tensor<1x1x3xf32>
}
```
Partial IR after vectorization:
```
%5 = vector.constant_mask [1, 1, 3] : vector<1x1x4xi1>
%6 = vector.broadcast %arg0 : index to vector<1x1x4xindex>
%7 = vector.shape_cast %6 : vector<1x1x4xindex> to vector<4xindex>
%8 = vector.extractelement %7[%c0_i32 : i32] : vector<4xindex>
%9 = vector.transfer_read %3[%c0, %8, %c2], %cst, %5 {in_bounds = [true, true, true]} : tensor<1x2x3xf32>, vector<1x1x4xf32>
```
Added support to vectorized tensor.unpack. The unpack Op is split into a
`vector.transfer_read`, `vector.transpose`, `vector.shape_cast` and a
`vector.transfer_write`.
This PR adds a direct vectorization lowering of `tensor.pack` into
`mask(vector.transfer_read)`->`vector.shape_cast`->`vector.transpose`->`vector.transfer_write`.
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Rename interface functions as follows:
* `hasTensorSemantics` -> `hasPureTensorSemantics`
* `hasBufferSemantics` -> `hasPureBufferSemantics`
These two functions return "true" if the op has tensor/buffer operands
but not buffer/tensor operands.
Also drop the "ranked" part from the interface, i.e., do not distinguish
between ranked/unranked types.
The new function names describe the functions more accurately. They also
align their semantics with the notion of "tensor semantics" with the
bufferization framework. (An op is supposed to be bufferized if it has
tensor operands, and we don't care if it also has memref operands.)
This change is in preparation of #75273, which adds
`BufferizableOpInterface::hasTensorSemantics`. By renaming the functions
in the `DestinationStyleOpInterface`, we can avoid name clashes between
the two interfaces.
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.
Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.
Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.
Issue: https://github.com/llvm/llvm-project/issues/72354
Updates the vectorisation of 1D depthwise convolution when flattening
the channel dimension (introduced in #71918). In particular - how the
convolution filter is "flattened". ATM, the vectoriser will use
`vector.shape_cast`:
```mlir
%b_filter = vector.broadcast %filter : vector<4xf32> to vector<3x2x4xf32>
%sc_filter = vector.shape_cast %b_filter : vector<3x2x4xf32> to vector<3x8xf32>
```
This lowering is not ideal - `vector.shape_cast` can be convenient when
it's folded away, but that's not happening in this case. Instead, this
patch updates the vectoriser to use `vector.shuffle` (the overall result
is identical):
```mlir
%sh_filter = vector.shuffle %filter, %filter
[0, 1, 2, 3, 0, 1, 2, 3] : vector<4xf32>, vector<4xf32>
%b_filter = vector.broadcast %sh_filter : vector<8xf32> to vector<3x8xf32>
```
The current vectorization of 1D depthwise convolutions in Linalg is
_sub-optimal_ for tensor with a low number of channel dimensions, e.g.:
```mlir
linalg.depthwise_conv_1d_nwc_wc
{dilations = dense<1> : vector<1xi64>,
strides = dense<1> : vector<1xi64>}
ins(%input, %filter : tensor<1x8x3xi8>, tensor<1x3xi8>)
outs(%output : tensor<1x8x3xi8>) -> tensor<1x8x3xi8>
```
That's due to the fact that ultimately (i.e. at LLVM level),
vectorization happens along the trailing dimension (i.e. the channel
dimension). In this case it leads to vectors with 3 elements (or worse,
if there's e.g. only 1 channel dimension). For comparison, a 128 bit
wide vector registers can hold 16 x i8.
Instead, this patch adds an option to flatten/collapse the channel
dimension into the width dimension of the input/filter/output using
`vector.shape_cast` operation:
```mlir
%sc_input = vector.shape_cast %input : vector<1x8x3xi8> to vector<1x24xi8>
%sc_output = vector.shape_cast %output : vector<1x8x3xi8> to vector<1x24xi8>
%b_filter = vector.broadcast %filter : vector<3xi8> to vector<1x8x3xi8>
%sc_filter = vector.shape_cast %b_filter : vector<1x8x3xi8> to vector<1x24xi8>
```
This new vectorization mode is implemented in `depthwiseConv` by
inserting `vector.shape_cast` Ops before and after
`depthwiseConv1dSliceAsMulAcc` is invoked. It can be selected through
e.g. a transform dialect attribute:
```mlir
transform.structured.vectorize_children_and_apply_patterns %conv {flatten_1d_depthwise_conv}
```
A forthcoming patch will implement a strategy to automatically switch
between the two implementations, depending on the shape of the input
tensors.
Co-authored by: Bradley Smith <bradley.smith@arm.com>
* "init" operands are specified with `MutableOperandRange` (which gives
access to the underlying `OpOperand *`). No more magic numbers.
* Remove most interface methods and make them helper functions. Only
`getInitsMutable` should be implemented.
* Provide separate helper functions for accessing mutable/immutable
operands (`OpOperand`/`Value`, in line with #66515): `getInitsMutable`
and `getInits` (same naming convention as auto-generated op accessors).
`getInputOperands` was not renamed because this function cannot return a
`MutableOperandRange` (because the operands are not necessarily
consecutive). `OpOperandVector` is no longer needed.
* The new `getDpsInits`/`getDpsInitsMutable` is more efficient than the
old `getDpsInitOperands` because no `SmallVector` is created. The new
functions return a range of operands.
* Fix a bug in `getDpsInputOperands`: out-of-bounds operands were
potentially returned.
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
This patch simply removes one of the pre-conditions of scalable
vectorisation. Namely, that only the trailing vector dimension can be
scalable, e.g.
vector<2x4x[8]xi32>
This limitation can be lifted following the recent work to support
scalable vectorisation in Linalg, most notably:
* https://reviews.llvm.org/D154336
* https://reviews.llvm.org/D153372
A test is added that demonstrates scalable vectorisation of
`linalg.matmul`. This is simply a copy of a similar test for fixed-width
vectorisation. The latter is updated - check lines are simplified and
annotated with regex patterns. This in the spirit of [1]:
Tests should be minimal, and only check what is absolutely necessary.
Note that this change is just one step towards scalable vectorisation in
Linalg. It will allow us to exercise new code paths in the context of
scalable vectors in Linalg and hence make further progress in the
forthcoming patches.
[1] https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices
Differential Revision: https://reviews.llvm.org/D158423
The Linalg vectoriser incorrectly recognises the following
`tensor.extract` as contiguous:
```
func.func @example(%in: tensor<123x321xf32>, %arg1: tensor<1x?x8xf32>) -> tensor<1x?x8xf32> {
%c0 = arith.constant 1 : index
%2 = linalg.generic {
indexing_maps = [#map1],
iterator_types = ["parallel", "parallel", "parallel"]
} outs(%arg1 : tensor<1x?x8xf32>)
{
^bb0(%arg3: f32):
%idx_0 = linalg.index 0 : index
%idx_1 = linalg.index 1 : index
%idx = arith.addi %idx_0, %idx_1 : index
%7 = tensor.extract %in[%c0, %idx] : tensor<123x321xf32>
linalg.yield %7 : f32
} -> tensor<1x?x8xf32>
return %2 : tensor<1x?x8xf32>
}
```
However, the following index Op corresponds to the dynamic dimension
in the iteration space:
```
%idx_1 = linalg.index 1 : index
```
The vectoriser should assume that:
* this index Op _is not_ loop invariant,
* the resulting memory access is a gather load
This is what this patch fixes.
Differential Revision: https://reviews.llvm.org/D155373
At the moment, only the trailing dimensions in the vector type can be
scalable, i.e. this is supported:
vector<2x[4]xf32>
and this is not allowed:
vector<[2]x4xf32>
This patch extends the vector type so that arbitrary dimensions can be
scalable. To this end, an array of bool values is added to every vector
type to denote whether the corresponding dimensions are scalable or not.
For example, for this vector:
vector<[2]x[3]x4xf32>
the following array would be created:
{true, true, false}.
Additionally, the current syntax:
vector<[2x3]x4xf32>
is replaced with:
vector<[2]x[3]x4xf32>
This is primarily to simplify parsing (this way, the parser can easily
process one dimension at a time rather than e.g. tracking whether
"scalable block" has been entered/left).
NOTE: The `isScalableDim` parameter of `VectorType` (introduced in this
patch) makes `numScalableDims` redundant. For the time being,
`numScalableDims` is preserved to facilitate the transition between the
two parameters. `numScalableDims` will be removed in one of the
subsequent patches.
This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Differential Revision: https://reviews.llvm.org/D153372
This patch fixes a bug in the way we compute the vector type for vector
transfer writes when the value to store needs to be transposed.
Reviewed By: nicolasvasilache, mravishankar
Differential Revision: https://reviews.llvm.org/D153687
It looks like scalable vector support broke vectorization for 0-D
tensors and we didn't have any test coverting that case. This patch
provides a fix and a test.
Differential Revision: https://reviews.llvm.org/D153181
We always assume mixed same type values. Instead of ExtF or ExtSI, we
need SIToFp when the values must be promoted.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D152982
For now, only elementwise operations are supported. Operations that perform any
kind of data permutation require changes in the representation of scalable
dimensions in VectorType.
Differential Revision: https://reviews.llvm.org/D152599
This patch extends the Linalg vectoriser so that scalar loads are
correctly identified as scalar rather than gather loads. Below is an
example of a scalar load (note that both indices are loop invariant):
```
func.func @example(%arg0: tensor<80x16xf32>, %arg2: tensor<1x4xf32>) -> tensor<1x4xf32> {
%c8 = arith.constant 8 : index
%c16 = arith.constant 16 : index
%1 = linalg.generic {
indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]
} outs(%arg2 : tensor<1x4xf32>) {
^bb0(%out: f32):
%2 = linalg.index 0 : index
%extracted = tensor.extract %arg0[%2, %c16] : tensor<80x16xf32>
linalg.yield %extracted : f32
} -> tensor<1x4xf32>
return %1 : tensor<1x4xf32>
}
```
This patch also makes sure that these scalar loads are indeed lowered to
a scalar load followed by a broadcast:
```
%extracted = tensor.extract %arg0[%1, %c16] : tensor<80x16xf32>
%2 = vector.broadcast %extracted : f32 to vector<1x4xf32>
```
Differential Revision: https://reviews.llvm.org/D149678
This patch enables specifying scalable vector sizes when using the
Transform dialect to drive vectorisation, e.g.:
```
transform.structured.masked_vectorize %0 vector_sizes [8, 16, [4]]
```
This is implemented by extending the MaskedVectorizeOp with a dedicated
attribute for "scalability" and by overloading `parseDynamicIndexList`
so that MaskedVectorizeOp can continue using the auto-generated parser
and printer.
At the moment, only the trailing vec size can be scalable. The following
is not yet supported:
```
transform.structured.masked_vectorize %0 vector_sizes [8, [16], [4]]
```
As the vectoriser does not support scalable vectorisation just yet, a
warning is issues when scalable vector sizes are used. You can also use
the debug output, `--debug-only=linalg-vectorization`, to check whether
scalable vectorisation has been switched on.
This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Similar patch for tiling: https://reviews.llvm.org/D150944
Differential Revision: https://reviews.llvm.org/D151892
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
If the input vector sizes are as same as tensor.pad result shape, the
masking is not needed. Otherwise, the masking is needed and the masking
operands should be as same as tensor.empty op.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D151391
It breaks the logic of maskedVectorize (on tensor.pad ops) into
precondition checks and vectorization implementation; unifies the
interface.
The revision also rename`s vectorizeLinalgOpPrecondition` to
`vectorizeOpPrecondition` because we can vectorize ops other
than LinalgOps.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D150495
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
It only accepted zeros attributes before the revision. It now acccepts
constant-like zero values as well.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D149474
This patch enables the vectorization of contraction ops using vector
masking. Support for vectorizing contractions is already there so this
is just adding contraction ops to the list of supported ops in
`vectorizeDynamicLinalgOpPrecondition` and adding a test.
Reviewed By: hanchung, awarzynski
Differential Revision: https://reviews.llvm.org/D148865
This patch updates the vectorisation of the extract Op so that the
permutation map for the transfer_read Op is defined explicitly by the
vectoriser (as opposed to being constructed implicitly by the
transfer_read builder).
This change is needed for cases where the rank of the source tensor is
lower than the rank of the output vector generated by the vectoriser:
```mlir
%17 = vector.transfer_read %arg1[%14, %16], %cst_4 {in_bounds = [true, true]} : tensor<257x24xf32>, vector<1x1x4xf32>
```
In cases like this, the vectorize will create the following permutation map:
```
(d0, d1) -> (0, d0, d1)
```
In other cases the behaviour remains unchanged.
Fixes https://github.com/openxla/iree/issues/13036. That's also where
the test case was extracted from.
Differential Revision: https://reviews.llvm.org/D148537
We are already doing this for depthwise convolution and pooling.
This helps to preserve the promotion semantics from Linalg op
definitions to lower layers.
Along the way, fixed the type mismatch issue in the existing
`promote` implementation.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D148471