For vector.extract, the folder always canonicalizes to a vector.extract
operation, while the rewrite pattern canonicalizes to a vector.broadcast
except in the case of 0-rank vectors.
Remove this special casing, and instead handle the 0-rank vector case in
the folder.
A number of places in our codebase special case to use
extractelement/insertelement for 0D vectors, because extract/insert did
not support 0D vectors previously. Since insert/extract support 0D
vectors now, use them instead of special casing.
This patch matches the definition of vector.scatter as a counter part of
vector.gather.
All of the changes done in this patch make vector.scatter match
vector.gather 's multi dimensional definition.
Unrolling for vector.scatter will be implemented in subsequent patches.
Discourse Discussion:
https://discourse.llvm.org/t/rfc-improving-gather-codegen-for-vector-dialect/85011/13
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
This PR moves vector.insert canonicalizers for DenseElementsAttr (splat
and non splat case) to folders. Folders are local, and it's always
better to implement a folder than a canonicalizer.
This PR is mostly NFC-ish, because the functionality mostly remains
same, but is now run as part of a folder, which is why some tests are
changed, because GreedyPatternRewriter tries to fold by default.
This PR moves vector.extract canonicalizers for DenseElementsAttr (splat
and non splat case) to folders. Folders are local, and it's always
better to implement a folder than a canonicalization pattern.
This PR is mostly NFC-ish, because the functionality mostly remains
same, but is now run as part of a folder, which is why some tests are
changed, because GreedyPatternRewriter tries to fold by default.
There is also a test change which makes the indices of a vector.extract
test dynamic. This is so that it doesn't fold away after this pr.
This patch improves support for vector.extract(broadcast) dynamic
dimension folders. This is mostly a matter of moving a conservative
condition for dynamic dimensions. The broadcast folder for
vector.extract now covers the cases that the vector.extractelement +
broadcast folder does.
This patch also improves test coverage for vector.extract + broadcast
folders/canonicalizers. The folders/canonicalizers now enumerate every
supported / unsupported case.
This PR fixes an out-of-bounds bug that occurs when there are no overlap
dimensions between the `sizes` and source of
`vector.extract_strided_slice`, causing access to `sizes` to go out of
bounds. Fixes#126196.
Adds a small note to VectorOps.td on what "dim-1" broadcast is. Also
updates comments to consistently use quotes, i.e.
* "dim-1" broadcasting instead of dim-1 broadcasting.
This way it is clear that we are referring to "stretching" one of the
trailing dims rather than e.g. broadcasting a dim at idx 1.
This PR adds a folder for `vector.extract(ub.poison) -> ub.poison`. It
also replaces `create` with `createOrFold` insert/extract ops in vector
unroll and transpose lowering patterns to trigger the poison foldings
introduced recently.
Create `VectorToLLVMDialectInterface` which allows automatic conversion
discovery by generic `--convert-to-llvm` pass. This only covers final
dialect conversion step and not any previous preparation steps. Also,
currently there is no way to pass any additional parameters through this
conversion interface, but most users using default parameters anyway.
https://github.com/llvm/llvm-project/pull/124863 added folding support
for poison indices to `vector.shuffle`. This PR adds support for folding
`vector.shuffle` ops with one or two poison input vectors.
This PR adds support for UB constant materialization (i.e., generating
`ub::PoisonOp` to `VectorDialect::materializeConstant`. This was the
reason why the vector folders generating poison didn't work.
This PR fixes the folder of a `vector.shuffle` with constant input
vectors in the presence of a poison index. Partially poison vectors are
currently not supported in UB so the folder select v1[0] for elements
indexed by poison.
Following up on #122188, this PR adds support for poison indices to
`ExtractOp` and `InsertOp`. It also includes canonicalization patterns
to turn extract/insert ops with poison indices into `ub.poison`.
Vector::BroadCastOp expects the identical element type in folding. It
causes the crash if the different source type is given to the SCCP pass.
We need to guard the pass from crashing if the nonidentical element type
is given, but still compatible. (e.g. index vs integer type)
https://github.com/llvm/llvm-project/issues/120193
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
The extract <-> shape cast folder was conservatively asserting and
failing on 0-d vectors. This pr fixes this.
This pr also adds more tests for 0d cases and updates related tests to
better reflect what they test.
When inferring the mask of a transfer operation that results in a single `i1` element,
we could represent it using either `vector<i1>` or vector<1xi1>. To avoid type mismatches,
this PR updates the mask inference logic to consistently generate `vector<1xi1>` for
these cases. We can enable 0-D masks if they are needed in the future.
See: https://github.com/llvm/llvm-project/issues/116197
This PR adds a folder for extracting an element from a vector shuffle.
It turns something like:
```
%shuffle = vector.shuffle %a, %b [0, 8, 7, 15]
: vector<8xf32>, vector<8xf32>
%extract = vector.extract %shuffle[3] : f32 from vector<4xf32>
```
into:
```
%extract = vector.extract %b[7] : f32 from vector<8xf32>
```
Treat integer range for vector type as union of ranges of individual
elements. With this semantics, most arith ops on vectors will work out
of the box, the only special handling needed for constants and vector
elements manipulation ops.
The end goal of these changes is to be able to optimize vectorized index
calculations.
Currently, the lowering for vector.step lives
under a folder. This is not ideal if we want
to do transformation on it and defer the
materizaliztion of the constants much later.
This commits adds a rewrite pattern that
could be used by using
`transform.structured.vectorize_children_and_apply_patterns`
transform dialect operation.
Moreover, the rewriter of vector.step is also
now used in -convert-vector-to-llvm pass where
it handles scalable and non-scalable types as
LLVM expects it.
As a consequence of removing the vector.step
lowering as its folder, linalg vectorization
will keep vector.step intact.
The current vector.insert folder tries to replace a scalar with a 0-rank
vector. This patch fixes this crash by not folding unless they types of
the result and replacement are same.
This is a reasonable canonicalization because `extract` is more
constrained than `extract_strided_slices`, so there is no loss of
semantics here, just lifting an op to a special-case higher/constrained
op. And the additional `shape_cast` is merely adding leading unit dims
to match the original result type.
Context: discussion on #111541. I wasn't sure how this would turn out,
but in the process of writing this PR, I discovered at least 2 bugs in
the pattern introduced in #111541, which shows the value of shared
canonicalization patterns which are exercised on a high number of
testcases.
---------
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute
was made mandatory.
This PR updates the semantics of the `in_bounds` attribute so that
broadcast dimensions are no longer required to be "in bounds".
Specifically, these xfer_read/xfer_write Ops become valid after this
change:
```mlir
%read = vector.transfer_read %A[%base1, %base2], %pad
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: memref<?x?xf32>, vector<9xf32>
vector.transfer_write %vec, %A[%base1, %base2],
{in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
{permutation_map = affine_map<(d0, d1) -> (0)>}
: vector<9xf32>, memref<?x?xf32>
```
Note that the value `false` merely means "may run out-of-bounds", i.e.,
the corresponding access can still be "in bounds". In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to `true` if all non-broadcast dims
are marked as "in bounds".
Note that this PR doesn't change any of the lowerings. The changes in
"SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple
reverts of recent changes in #97049. Those were only meant to facilitate
making `in_bounds` mandatory and to work around the extra requirements
for broadcast dims (those requirements ere removed in this PR). All
changes in tests are also reverts of changes from #97049.
For context, here's a PR in which "broadcast" dims where forced to
always be "in-bounds":
* https://reviews.llvm.org/D102566
(*) See `foldTransferInBoundsAttribute`.
In cases where llvm.mlir.constant has an attribute with a different type than the returned type,
the folder use to create an incorrect DenseElementsAttr and crash.
Resolves#74236
This adds a new transform `eliminateVectorMasks()` which aims at
removing scalable `vector.create_masks` that will be all-true at
runtime. It attempts to do this by simply pattern-matching the mask
operands (similar to some canonicalizations), if that does not lead to
an answer (is all-true? yes/no), then value bounds analysis will be used
to find the lower bound of the unknown operands. If the lower bound is
>= to the corresponding mask vector type dim, then that dimension of the
mask is all true.
Note that the pattern matching prevents expensive value-bounds analysis
in cases where the mask won't be all true.
For example:
```mlir
%mask = vector.create_mask %dynamicValue, %c2 : vector<8x4xi1>
```
From looking at `%c2` we can tell this is not going to be an all-true
mask, so we don't need to run the value-bounds analysis for
`%dynamicValue` (and can exit the transform early).
Note: Eliminating create_masks here means replacing them with all-true
constants (which will then lead to the masks folding away).
Adds tests with scalable vectors for the Vector-To-LLVM conversion pass.
Covers the following Ops:
* vector.bitcast
* vector.broadcast
Note, this has uncovered some missing logic in `BroadcastOpLowering`.
This PR fixes the most basic cases where the scalable flags were dropped
and the generated code was incorrect. Also, the conditions in
`vector::isBroadcastableTo` are relaxed to allow cases like this:
```mlir
%0 = vector.broadcast %arg0 : vector<1xf32> to vector<[4]xf32>
```
The `BroadcastOpLowering` pattern is effectively disabled for scalable
vectors in more complex cases where an SCF loop would be required to
loop over the scalable dims, e.g.:
```mlir
%0 = vector.broadcast %arg0 : vector<[4]x1x2xf32> to vector<[4]x3x2xf32>
```
These cases are marked as "Stretch not at start" in the code. In those
cases, support for scalable vectors is left as a TODO.
Clarifies the semantics of `vector.broadcast` in the context of scalable
vectors. In particular, broadcasting a unit scalable dim, `[1]`, is not
valid unless there's a match between the output and the input dims.
See the examples below for an illustration:
```mlir
// VALID
%0 = vector.broadcast %arg0 : vector<[1]xf32> to vector<4x[1]xf32>
// INVALID
%0 = vector.broadcast %arg0 : vector<[1]xf32> to vector<[4]xf32>
// VALID FIXED-WIDTH EQUIVALENT
%0 = vector.broadcast %arg0 : vector<1xf32> to vector<4xf32>
```
Documentation, the Op verifier and tests are updated accordingly.
Updates the verifier for `vector.shape_cast` so that incorrect cases
where "scalability" is dropped are immediately rejected. For example:
```mlir
vector.shape_cast %vec : vector<1x1x[4]xindex> to vector<4xindex>
```
Also, as a separate PR, I've prepared a fix for the Linalg vectorizer to
avoid generating such shape casts (*):
* https://github.com/llvm/llvm-project/pull/100325
(*) Note, that's just one specific case that I've identified so far.
At the moment, the in_bounds attribute has two confusing/contradicting
properties:
1. It is both optional _and_ has an effective default-value.
2. The default value is "out-of-bounds" for non-broadcast dims, and
"in-bounds" for broadcast dims.
(see the `isDimInBounds` vector interface method for an example of this
"default" behaviour [1]).
This PR aims to clarify the logic surrounding the `in_bounds` attribute
by:
* making the attribute mandatory (i.e. it is always present),
* always setting the default value to "out of bounds" (that's
consistent with the current behaviour for the most common cases).
#### Broadcast dimensions in tests
As per [2], the broadcast dimensions requires the corresponding
`in_bounds` attribute to be `true`:
```
vector.transfer_read op requires broadcast dimensions to be in-bounds
```
The changes in this PR mean that we can no longer rely on the
default value in cases like the following (dim 0 is a broadcast dim):
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
Instead, the broadcast dimension has to explicitly be marked as "in
bounds:
```mlir
%read = vector.transfer_read %A[%base1, %base2], %f, %mask
{in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} :
memref<?x?xf32>, vector<4x9xf32>
```
All tests with broadcast dims are updated accordingly.
#### Changes in "SuperVectorize.cpp" and "Vectorization.cpp"
The following patterns in "Vectorization.cpp" are updated to explicitly
set the `in_bounds` attribute to `false`:
* `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern`
Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and
`vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to
make sure that xfer Ops created by these hooks set the dimension
corresponding to broadcast dims as "in bounds". Otherwise, the Op
verifier would complain
Note that there is no mechanism to verify whether the corresponding
memory access are indeed in bounds. Still, this is consistent with the
current behaviour where the broadcast dim would be implicitly assumed
to be "in bounds".
[1]
4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)
[2]
https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop