This PR moves `extractInsertFoldConstantOp` earlier in the folder lists
of `vector.extract` and `vector.insert`. Many folders require having
non-dynamic indices so `extractInsertFoldConstantOp` is a requirement
for them to trigger.
This PR improves the `vector.mask` verifier to make sure it's not
applying masking semantics to operations defined outside of the
`vector.mask` region. Documentation is updated to emphasize that and
make it clearer, even though it already stated that.
As part of this change, the logic that ensures that a terminator is
present in the region mask has been simplified to make it less
surprising to the user when a `vector.yield` is explicitly provided in
the IR.
This patch contains two minor changes, which I believe were the original
author's intent.
* when folding `transpose(broadcast(x))` emit `broadcast(x)` instead of
`broadcast(broadcast(x))`. The latter causes transient verifier
failures with `mlir-opt --debug` , e.g.
```
mlir-asm-printer: 'func.func' failed to verify and will be printed in generic form
"func.func"() <{function_type = (vector<4x1x1x7xi8>) -> vector<3x2x4x5x6x7xi8>, sym_name = "broadcast_transpose_mixed_example"}> ({
^bb0(%arg0: vector<4x1x1x7xi8>):
%0 = "vector.broadcast"(%arg0) : (vector<4x1x1x7xi8>) -> vector<2x3x4x5x6x7xi8>
%1 = "vector.broadcast"(%0) : (vector<2x3x4x5x6x7xi8>) -> vector<3x2x4x5x6x7xi8>
"func.return"(%1) : (vector<3x2x4x5x6x7xi8>) -> ()
}) : () -> ()
```
* when checking permutation groups the variable `low` was set just once
to zero, thus checking was quadratic. It looks the intent was for `low`
to track the beginning of each dimension groups. (Nevertheless the check
was correct).
Fold transpose with unit-dimensions. Seen in the wild:
```
%0 = vector.transpose %arg, [0, 2, 1, 3] : vector<6x1x1x4xi8> to vector<6x1x1x4xi8>
```
This transpose can be folded because (1) it preserves the shape and (2)
the shuffled dims are unit extent.
Also addresses comment about static vs anonymous namespace:
https://github.com/llvm/llvm-project/pull/135841#discussion_r2071869067
---------
Signed-off-by: James Newling <james.newling@gmail.com>
[mlir][vector] Standardize base Naming Across Vector Ops (NFC)
This change standardizes the naming convention for the argument
representing the value to read from or write to in Vector ops that
interface with Tensors or MemRefs. Specifically, it ensures that all
such ops use the name `base` (i.e., the base address or location to
which offsets are applied).
Updated operations:
* `vector.transfer_read`,
* `vector.transfer_write`.
For reference, these ops already use `base`:
* `vector.load`, `vector.store`, `vector.scatter`, `vector.gather`,
`vector.expandload`, `vector.compressstore`, `vector.maskedstore`,
`vector.maskedload`.
This is a non-functional change (NFC) and does not alter the semantics of these
operations. However, it does require users of the XFer ops to switch from
`op.getSource()` to `op.getBase()`.
To ease the transition, this PR temporarily adds a `getSource()` interface
method for compatibility. This is intended for downstream use only and should
not be relied on upstream. The method will be removed prior to the LLVM 21
release.
Implements #131602
Handles special case where transpose doesn't permute any non-1
dimensions (and so is effectively a shape_cast) and is adjacent to a
shape_cast that it can fold into. For example
```
%1 = vector.transpose %0, [1, 0, 3, 2] : vector<4x1x1x6xf32> to vector<1x4x6x1xf32>
```
can be folded into an adjacent shape_cast. An alternative to this PR
would be to canonicalize such transposes to shape_casts directly, but I
think it'll be difficult getting consensus that shape_cast is 'more
canonical' than transpose, so this PR compromises with the less
opinionated claim that
1) shape_cast is more canonical than shape_cast(transpose)
2) shape_cast is more canonical than transpose(shape_cast)
The pattern `ConvertIllegalShapeCastOpsToTransposes` that is specific to
transposes with scalable dimensions reverses the canonicalization added
here, so I've I've disabled this canonicalization for scalable vectors
`vector.shape_cast` was initially designed to be the union of
collapse_shape and expand_shape. There was an inconsistency in the
verifier that allowed any shape casts when the rank did not change, which
led to a strange middle ground where you could cast from shape (4,3) to
(3,4) but not from (4,3) to (2,3,2). That issue was fixed (verifier made stricter)
in https://github.com/llvm/llvm-project/pull/135855, but further feedback
there (and polling) suggests that vector.shape_cast should rather allow all
shape casts (so more like tensor.reshape than
tensor.collapse_shape/tensor.expand_shape). This PR makes this simplification
by relaxing the verifier.
This is a small follow-on for #133721:
* Renamed `getRealVectorRank` as `getEffectiveVectorRankForXferOp` (to
emphasise that this method was written specifically for transfer Ops).
* Marginally tweaked the description for
`getEffectiveVectorRankForXferOp` (mostly to highlight the two edge
cases being covered).
* Added tests for cases when the element type (of the shaped type) is a
vector.
* Unified the naming (and the order) of arguments in tests with the
surrounding tests (e.g. `%vec_to_write` -> `%arg1`). Mostly for
consistency (it would be good to use self-documenting names like
`%vec_to_write` throughout).
This is a minor follow-up to #135498. It ensures that operations like
the following are not treated as out-of-bounds accesses and can be
folded correctly (*):
```mlir
%c_neg_1 = arith.constant -1 : index
%0 = vector.insert %value_to_store, %dest[%c_neg_1] : vector<5xf32> into vector<4x5xf32>
%1 = vector.extract %src[%c_neg_1, 0] : f32 from vector<4x5xf32>
```
In addition to adding tests for the case above, this PR also relocates
the tests from #135498 to be alongside existing tests for the
`vector.{insert|extract}` folder, and reformats them to follow:
* https://mlir.llvm.org/getting_started/TestingGuide/
For example:
* The "no_fold" prefix is now used to label negative tests.
* Redundant check lines have been removed (e.g., CHECK: vector.insert
is sufficient to verify that folding did not occur).
(*) As per https://mlir.llvm.org/docs/Dialects/Vector/#vectorinsert-vectorinsertop,
these are poison values.
Out of bound position values should not be folded in vector.extract and
vector.insert operations, as only in bounds constants and -1 are valid.
Fixes#134516
In addition to the new folder, I've also a test for broadcast(splat) ->
splat which I think was missing
Signed-off-by: James Newling <james.newling@gmail.com>
Example seen in the 'real world':
```
%0 = vector.broadcast %arg0 : vector<1xi8> to vector<1x8xi8>
%1 = vector.transpose %0, [1, 0] : vector<1x8xi8> to vector<8x1xi8>
```
This PR adds a canonicalizer that rewrites the above as
```
%1 = vector.broadcast %arg0 : vector<1xi8> to vector<8x1xi8>
```
It works by determining if a transpose is only shuffling contiguous
broadcast dimensions.
This patch fixes an issue in the FoldContiguousGather pattern which was
incorrectly folding vector.gather operations with contiguous indices
into vector.maskedload operations regardless of the base operand type.
While vector.gather operations can work on both tensor and memref types,
vector.maskedload operations are only valid for memref types. The
pattern was incorrectly lowering a tensor-based gather into a
masked-load, which is invalid.
This fix adds a type check to ensure the pattern only applies to
memref-based gather operations.
Co-authored-by: Sagar Kulkarni <sagar@rain.ai>
This change refines the verifier for `vector.load` and `vector.store` to
disallow the use of vectors with higher rank than the source or
destination memref. For example, the following is now rejected:
```mlir
%0 = vector.load %src[%c0] : memref<?xi8>, vector<16x16xi8>
vector.store %vec, %dest[%c0] : memref<?xi8>, vector<16x16xi8>
```
This pattern was previously used in SME end-to-end tests and "happened"
to work by implicitly assuming row-major memory layout. However, there
is no guarantee that such an assumption will always hold, and we should
avoid relying on it unless it can be enforced deterministically.
Notably, production ArmSME lowering pipelines do not rely on this
behavior. Instead, the expected usage (illustrated here with scalable
vector syntax) would be:
```mlir
%0 = vector.load %src[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
```
This PR updates the verifier accordingly and adjusts all affected tests.
These tests are either removed (if no longer relevant) or updated to use
memrefs with appropriately matching rank.
Add additional cases of this canonicalization, by checking the 'source
of truth' function `isBroadcastableTo` to check when it is possible to
broadcast directly to the shape resulting from the shape_cast.
---------
Signed-off-by: James Newling <james.newling@gmail.com>
Based on the ShapeCastConstantFolder, this pattern replaces
%0 = ub.poison : vector<2x3xf32>
%1 = vector.shape_cast %0 vector<2x3xf32> to vector<6xf32>
with
%1 = ub.poison : vector<6xf32>
---------
Signed-off-by: James Newling <james.newling@gmail.com>
This change standardises the naming convention for the argument
representing the value to store in various vector operations.
Specifically, it ensures that all vector ops storing a value—whether
into memory, a tensor, or another vector — use `valueToStore` for the
corresponding argument name.
Updated operations:
* `vector.transfer_write`, `vector.insert`, `vector.scalable_insert`,
`vector.insert_strided_slice`.
For reference, here are operations that currently use `valueToStore`:
* `vector.store` `vector.scatter`, `vector.compressstore`,
`vector.maskedstore`.
This change is non-functional (NFC) and does not affect the
functionality of these operations.
Implements #131602
For vector.extract, the folder always canonicalizes to a vector.extract
operation, while the rewrite pattern canonicalizes to a vector.broadcast
except in the case of 0-rank vectors.
Remove this special casing, and instead handle the 0-rank vector case in
the folder.
A number of places in our codebase special case to use
extractelement/insertelement for 0D vectors, because extract/insert did
not support 0D vectors previously. Since insert/extract support 0D
vectors now, use them instead of special casing.
This patch matches the definition of vector.scatter as a counter part of
vector.gather.
All of the changes done in this patch make vector.scatter match
vector.gather 's multi dimensional definition.
Unrolling for vector.scatter will be implemented in subsequent patches.
Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
This PR moves vector.insert canonicalizers for DenseElementsAttr (splat
and non splat case) to folders. Folders are local, and it's always
better to implement a folder than a canonicalizer.
This PR is mostly NFC-ish, because the functionality mostly remains
same, but is now run as part of a folder, which is why some tests are
changed, because GreedyPatternRewriter tries to fold by default.
This PR moves vector.extract canonicalizers for DenseElementsAttr (splat
and non splat case) to folders. Folders are local, and it's always
better to implement a folder than a canonicalization pattern.
This PR is mostly NFC-ish, because the functionality mostly remains
same, but is now run as part of a folder, which is why some tests are
changed, because GreedyPatternRewriter tries to fold by default.
There is also a test change which makes the indices of a vector.extract
test dynamic. This is so that it doesn't fold away after this pr.
This patch improves support for vector.extract(broadcast) dynamic
dimension folders. This is mostly a matter of moving a conservative
condition for dynamic dimensions. The broadcast folder for
vector.extract now covers the cases that the vector.extractelement +
broadcast folder does.
This patch also improves test coverage for vector.extract + broadcast
folders/canonicalizers. The folders/canonicalizers now enumerate every
supported / unsupported case.
This PR fixes an out-of-bounds bug that occurs when there are no overlap
dimensions between the `sizes` and source of
`vector.extract_strided_slice`, causing access to `sizes` to go out of
bounds. Fixes#126196.
Adds a small note to VectorOps.td on what "dim-1" broadcast is. Also
updates comments to consistently use quotes, i.e.
* "dim-1" broadcasting instead of dim-1 broadcasting.
This way it is clear that we are referring to "stretching" one of the
trailing dims rather than e.g. broadcasting a dim at idx 1.
This PR adds a folder for `vector.extract(ub.poison) -> ub.poison`. It
also replaces `create` with `createOrFold` insert/extract ops in vector
unroll and transpose lowering patterns to trigger the poison foldings
introduced recently.
Create `VectorToLLVMDialectInterface` which allows automatic conversion
discovery by generic `--convert-to-llvm` pass. This only covers final
dialect conversion step and not any previous preparation steps. Also,
currently there is no way to pass any additional parameters through this
conversion interface, but most users using default parameters anyway.
https://github.com/llvm/llvm-project/pull/124863 added folding support
for poison indices to `vector.shuffle`. This PR adds support for folding
`vector.shuffle` ops with one or two poison input vectors.
This PR adds support for UB constant materialization (i.e., generating
`ub::PoisonOp` to `VectorDialect::materializeConstant`. This was the
reason why the vector folders generating poison didn't work.
This PR fixes the folder of a `vector.shuffle` with constant input
vectors in the presence of a poison index. Partially poison vectors are
currently not supported in UB so the folder select v1[0] for elements
indexed by poison.
Following up on #122188, this PR adds support for poison indices to
`ExtractOp` and `InsertOp`. It also includes canonicalization patterns
to turn extract/insert ops with poison indices into `ub.poison`.
Vector::BroadCastOp expects the identical element type in folding. It
causes the crash if the different source type is given to the SCCP pass.
We need to guard the pass from crashing if the nonidentical element type
is given, but still compatible. (e.g. index vs integer type)
https://github.com/llvm/llvm-project/issues/120193
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.