Treat integer range for vector type as union of ranges of individual
elements. With this semantics, most arith ops on vectors will work out
of the box, the only special handling needed for constants and vector
elements manipulation ops.
The end goal of these changes is to be able to optimize vectorized index
calculations.
Currently, the lowering for vector.step lives
under a folder. This is not ideal if we want
to do transformation on it and defer the
materizaliztion of the constants much later.
This commits adds a rewrite pattern that
could be used by using
`transform.structured.vectorize_children_and_apply_patterns`
transform dialect operation.
Moreover, the rewriter of vector.step is also
now used in -convert-vector-to-llvm pass where
it handles scalable and non-scalable types as
LLVM expects it.
As a consequence of removing the vector.step
lowering as its folder, linalg vectorization
will keep vector.step intact.
The current vector.insert folder tries to replace a scalar with a 0-rank
vector. This patch fixes this crash by not folding unless they types of
the result and replacement are same.
This is a reasonable canonicalization because `extract` is more
constrained than `extract_strided_slices`, so there is no loss of
semantics here, just lifting an op to a special-case higher/constrained
op. And the additional `shape_cast` is merely adding leading unit dims
to match the original result type.
Context: discussion on #111541. I wasn't sure how this would turn out,
but in the process of writing this PR, I discovered at least 2 bugs in
the pattern introduced in #111541, which shows the value of shared
canonicalization patterns which are exercised on a high number of
testcases.
---------
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute
was made mandatory.
This PR updates the semantics of the `in_bounds` attribute so that
broadcast dimensions are no longer required to be "in bounds".
Specifically, these xfer_read/xfer_write Ops become valid after this
change:
```mlir
%read = vector.transfer_read %A[%base1, %base2], %pad
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: memref<?x?xf32>, vector<9xf32>
vector.transfer_write %vec, %A[%base1, %base2],
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: vector<9xf32>, memref<?x?xf32>
```
Note that the value `false` merely means "may run out-of-bounds", i.e.,
the corresponding access can still be "in bounds". In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to `true` if all non-broadcast dims
are marked as "in bounds".
Note that this PR doesn't change any of the lowerings. The changes in
"SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple
reverts of recent changes in #97049. Those were only meant to facilitate
making `in_bounds` mandatory and to work around the extra requirements
for broadcast dims (those requirements ere removed in this PR). All
changes in tests are also reverts of changes from #97049.
For context, here's a PR in which "broadcast" dims where forced to
always be "in-bounds":
* https://reviews.llvm.org/D102566
(*) See `foldTransferInBoundsAttribute`.
In cases where llvm.mlir.constant has an attribute with a different type than the returned type,
the folder use to create an incorrect DenseElementsAttr and crash.
Resolves#74236
This adds a new transform `eliminateVectorMasks()` which aims at
removing scalable `vector.create_masks` that will be all-true at
runtime. It attempts to do this by simply pattern-matching the mask
operands (similar to some canonicalizations), if that does not lead to
an answer (is all-true? yes/no), then value bounds analysis will be used
to find the lower bound of the unknown operands. If the lower bound is
>= to the corresponding mask vector type dim, then that dimension of the
mask is all true.
Note that the pattern matching prevents expensive value-bounds analysis
in cases where the mask won't be all true.
For example:
```mlir
%mask = vector.create_mask %dynamicValue, %c2 : vector<8x4xi1>
```
From looking at `%c2` we can tell this is not going to be an all-true
mask, so we don't need to run the value-bounds analysis for
`%dynamicValue` (and can exit the transform early).
Note: Eliminating create_masks here means replacing them with all-true
constants (which will then lead to the masks folding away).
Adds tests with scalable vectors for the Vector-To-LLVM conversion pass.
Covers the following Ops:
* vector.bitcast
* vector.broadcast
Note, this has uncovered some missing logic in `BroadcastOpLowering`.
This PR fixes the most basic cases where the scalable flags were dropped
and the generated code was incorrect. Also, the conditions in
`vector::isBroadcastableTo` are relaxed to allow cases like this:
```mlir
%0 = vector.broadcast %arg0 : vector<1xf32> to vector<[4]xf32>
```
The `BroadcastOpLowering` pattern is effectively disabled for scalable
vectors in more complex cases where an SCF loop would be required to
loop over the scalable dims, e.g.:
```mlir
%0 = vector.broadcast %arg0 : vector<[4]x1x2xf32> to vector<[4]x3x2xf32>
```
These cases are marked as "Stretch not at start" in the code. In those
cases, support for scalable vectors is left as a TODO.
Clarifies the semantics of `vector.broadcast` in the context of scalable
vectors. In particular, broadcasting a unit scalable dim, `[1]`, is not
valid unless there's a match between the output and the input dims.
See the examples below for an illustration:
```mlir
// VALID
%0 = vector.broadcast %arg0 : vector<[1]xf32> to vector<4x[1]xf32>
// INVALID
%0 = vector.broadcast %arg0 : vector<[1]xf32> to vector<[4]xf32>
// VALID FIXED-WIDTH EQUIVALENT
%0 = vector.broadcast %arg0 : vector<1xf32> to vector<4xf32>
```
Documentation, the Op verifier and tests are updated accordingly.
Updates the verifier for `vector.shape_cast` so that incorrect cases
where "scalability" is dropped are immediately rejected. For example:
```mlir
vector.shape_cast %vec : vector<1x1x[4]xindex> to vector<4xindex>
```
Also, as a separate PR, I've prepared a fix for the Linalg vectorizer to
avoid generating such shape casts (*):
* https://github.com/llvm/llvm-project/pull/100325
(*) Note, that's just one specific case that I've identified so far.
At the moment, the in_bounds attribute has two confusing/contradicting
properties:
1. It is both optional _and_ has an effective default-value.
2. The default value is "out-of-bounds" for non-broadcast dims, and
"in-bounds" for broadcast dims.
(see the `isDimInBounds` vector interface method for an example of this
"default" behaviour [1]).
This PR aims to clarify the logic surrounding the `in_bounds` attribute
by:
* making the attribute mandatory (i.e. it is always present),
* always setting the default value to "out of bounds" (that's
consistent with the current behaviour for the most common cases).
#### Broadcast dimensions in tests
As per [2], the broadcast dimensions requires the corresponding
`in_bounds` attribute to be `true`:
```
vector.transfer_read op requires broadcast dimensions to be in-bounds
```
The changes in this PR mean that we can no longer rely on the
default value in cases like the following (dim 0 is a broadcast dim):
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
Instead, the broadcast dimension has to explicitly be marked as "in
bounds:
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
All tests with broadcast dims are updated accordingly.
#### Changes in "SuperVectorize.cpp" and "Vectorization.cpp"
The following patterns in "Vectorization.cpp" are updated to explicitly
set the `in_bounds` attribute to `false`:
* `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern`
Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and
`vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to
make sure that xfer Ops created by these hooks set the dimension
corresponding to broadcast dims as "in bounds". Otherwise, the Op
verifier would complain
Note that there is no mechanism to verify whether the corresponding
memory access are indeed in bounds. Still, this is consistent with the
current behaviour where the broadcast dim would be implicitly assumed
to be "in bounds".
[1]
4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)
[2]
https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop
The `InsertOpConstantFolder` assumed that whenever the destination can
be folded to a constant attribute, that attribute must be a
`DenseElementsAttr`. That is is not necessarily the case.
This patch adds a new vector.step operation to the Vector dialect. It
produces a linear sequence of index values from 0 to N, where N is the
number of elements in the result vector, and can be used to create
vectors of indices.
It supports both fixed-width and scalable vectors. For fixed the
canonical representation is `arith.constant dense<[0, .., N]>`. A
scalable step cannot be represented as a constant and is lowered to the
`llvm.experimental.stepvector` intrinsic [1].
This op enables scalable vectorization of linalg.index ops, see #96778. It can
also be used in the SparseVectorizer in-place of lower-level stepvector
intrinsic, see [2] (patch to follow).
[1] https://llvm.org/docs/LangRef.html#llvm-experimental-stepvector-intrinsic
[2] acf675b63f/mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp (L385-L388)
This patch adds more precise side effects to the current ops with memory
effects, allowing us to determine which OpOperand/OpResult/BlockArgument
the
operation reads or writes, rather than just recording the reading and
writing
of values. This allows for convenient use of precise side effects to
achieve
analysis and optimization.
Related discussions:
https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
This commit adds a new operation to the vector dialect:
`vector.from_elements`
The op constructs a new vector from a given list of scalar values. It is
similar to `tensor.from_elements`.
```mlir
%0 = vector.from_elements %a, %b, %c, %a, %a, %a : vector<2x3xf32>
```
Constructing a new vector from elements was tedious before this op
existed: a typical way was to define an `arith.constant ... :
vector<...>`, followed by a chain of `vector.insert`.
Folders/canonicalizations are added that can fold `vector.extract` ops
and convert the `vector.from_elements` op into a `vector.splat` op.
The LLVM lowering generates an `llvm.mlir.undef`, followed by a sequence
of scalar insertions in the form of `llvm.insertelement`. Only 0-D and
1-D vectors are currently supported in the LLVM lowering.
There was an assertion in the folder that caused a crash when extracting
from a vector that is defined by an op with 0D semantics. This commit
removes the assertion and adds test cases to ensure that 0D scenarios
are handled correctly.
This only handles one case (that's fairly common in practice*), storing
a masked constant splat, then reloading again with the same mask and a
padding value that matches the splat.
* For SVE/SME (without peeling) this occurs when you have a
`linalg.fill` preceding a `linalg.matmul`.
Buffers are no longer deallocation by One-Shot Bufferize. This is now
done by a separate buffer deallocation pass.
Also fix a bug in the `vector.mask` folding, which was triggered by
`-buffer-deallocation-pipeline`, which runs the canonicalizer.
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.
Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
This reverts commit 5cdb8c0c88.
This pattern is producing incorrect IR. For example,
```mlir
func.func @extract_subvector_from_constant_mask() -> vector<16xi1> {
%mask = vector.constant_mask [2, 3] : vector<16x16xi1>
%extract = vector.extract %mask[8] : vector<16xi1> from vector<16x16xi1>
return %extract : vector<16xi1>
}
```
Canonicalizes to
```mlir
func.func @extract_subvector_from_constant_mask() -> vector<16xi1> {
%0 = vector.constant_mask [3] : vector<16xi1>
return %0 : vector<16xi1>
}
```
Where it should be a zero mask because the extraction index (8) is
greater than the constant mask size along that dim (2).
Extends `vector.insert_strided_slice` and `vector.insert_strided_slice`
to allow scalable input and output vectors. For scalable sizes, the
corresponding slice size has to match the corresponding dimension in the
output/input vector (insert/extract, respectively).
This is supported:
```mlir
vector.extract_strided_slice %1 {
offsets = [0, 3, 0],
sizes = [1, 1, 4],
strides = [1, 1, 1] } : vector<1x4x[4]xi32> to vector<1x1x[4]xi32>
```
This is not supported:
```mlir
vector.extract_strided_slice %1 {
offsets = [0, 3, 0],
sizes = [1, 1, 2],
strides = [1, 1, 1] } : vector<1x4x[4]xi32> to vector<1x1x[2]xi32>
```
1. Updates and clarifies a few comments related to hooks for
vector.{insert|extract}_strided_slice.
2. For consistency with vector.insert_strided_slice, removes a TODO from
vector.extract_strided_slice Op def. It's self-explenatory that
adding support for non-unit strides is a "TODO".
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.
Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.
Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.
Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.
Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.
Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.
Issue: https://github.com/llvm/llvm-project/issues/72354
Without this patch, MLIR crashes with
```
Assertion failed: (getNumDims() == map.getNumResults() && "Number of results mismatch"), function compose, file AffineMap.cpp, line 537.
```
during parsing.
This reverts commit f42b7615b8.
The fold pattern is incorrect, because it does not even look at the
permutation of non-unit dims and is happy to replace a pattern such as
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
%23 = vector.transpose %22, [1, 0] : vector<256x256xf32> to vector<256x256xf32>
```
with
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
```
which is obviously incorrect.
This folds transpose(shape_cast) into a new shape_cast, when the
transpose just permutes a unit dim from the result of the shape_cast.
Example:
```
%0 = vector.shape_cast %vec : vector<[4]xf32> to vector<[4]x1xf32>
%1 = vector.transpose %0, [1, 0] : vector<[4]x1xf32> to vector<1x[4]xf32>
```
Folds to:
```
%0 = vector.shape_cast %vec : vector<[4]xf32> to vector<1x[4]xf32>
```
This is an (alternate) fix for lowering matmuls to ArmSME.