This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.
Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.
Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.
Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.
Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.
Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.
Issue: https://github.com/llvm/llvm-project/issues/72354
Without this patch, MLIR crashes with
```
Assertion failed: (getNumDims() == map.getNumResults() && "Number of results mismatch"), function compose, file AffineMap.cpp, line 537.
```
during parsing.
This reverts commit f42b7615b8.
The fold pattern is incorrect, because it does not even look at the
permutation of non-unit dims and is happy to replace a pattern such as
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
%23 = vector.transpose %22, [1, 0] : vector<256x256xf32> to vector<256x256xf32>
```
with
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
```
which is obviously incorrect.
This folds transpose(shape_cast) into a new shape_cast, when the
transpose just permutes a unit dim from the result of the shape_cast.
Example:
```
%0 = vector.shape_cast %vec : vector<[4]xf32> to vector<[4]x1xf32>
%1 = vector.transpose %0, [1, 0] : vector<[4]x1xf32> to vector<1x[4]xf32>
```
Folds to:
```
%0 = vector.shape_cast %vec : vector<[4]xf32> to vector<1x[4]xf32>
```
This is an (alternate) fix for lowering matmuls to ArmSME.
Previously the pattern only worked when the permutation map was a minor
identity. Infer the new mask type from the new transfer map after
dropping leading unit dims.
* Declare arguments/results with `let` statements.
* Rename `transp` to `permutation`.
* Change type of `transp` from `I64ArrayAttr` to `DenseI64ArrayAttr`
(provides direct access to `ArrayRef<int64_t>` instead of `ArrayAttr`).
- Better documentation.
- Rename interface methods: `source` -> `getSource`, `indices` ->
`getIndices`, etc. to conform with MLIR naming conventions. A default
implementation is not needed.
- Turn many interface methods into helper functions. Most of the
previous interface methods were not meant to be overridden, and if some
were overridden without others, the op would be have been broken.
When the mask bounds of a `vector.constant_mask` exactly equal the shape
of the vector, any transfer op consuming that mask will be unaffected by
it. Drop the mask in such cases.
The IR is valid, but UB: there is an out-of-bound index for the position
to insert inside the vector. We should just ignore this in the folder.
Fixes#70884
If a dimension does not appear in the permutation map of a vector
transfer op, the size of the accessed slice in that dimension is `1`.
Before this fix, `getTransferChunkAccessed` used to return `0` for such
dimensions, which would means that `0` elements in the underlying
tensor/memref are accessed.
Note: There is no test case that fails due to this bug and because this
interface method is currently only used in one place, it is hard to
write a regression test. This fix is in preparation of subset hoisting
functionality that will be added in subsequent commits.
Adds an end-to-end test for `vector.contract` that targets SVE (i.e.
scalable vectors). Note that this requires lifting the restriction on
`vector.outerproduct` (to which `vector.contract` is lowered to) that
would deem the following as invalid by the Op verifier (*):
```
vector.outerproduct %27, %28, %26 {kind = #vector.kind<add>} : vector<3xf32>, vector<[2]xf32>
```
This is indeed valid as the end-to-end test demonstrates (at least when
compiling for SVE).
This allows folding extracts from `vector.create_mask` ops that have a
known value. Currently, there's no fold for this, but you get the same
effect from the unrolling in LowerVectorMask (part of
-convert-vector-to-llvm), then folds after that. However, for a future
patch, this simplification needs to be done before lowering to LLVM,
hence the need for this fold.
E.g.:
```
%0 = vector.create_mask %c1, %dimA, %dimB : vector<1x[4]x[4]xi1>
%1 = vector.extract %mask[0] : vector<[4]x[4]xi1>
```
->
```
%0 = vector.create_mask %dimA, %dimB : vector<[4]x[4]xi1>
```
This is just a slight specialization of `TypesMatchWith` that returns
success if an optional parameter is missing.
There may be other places this could help e.g.:
eb21049b4b/mlir/include/mlir/Dialect/X86Vector/X86Vector.td (L58-L59)
...but I'm leaving those to avoid some churn.
This constraint will be handy for us in some later patches, it's a
formalization of a short circuiting trick with the `comparator` of the
`TypesMatchWith` constraint (devised for #69195).
```
TypesMatchWith<
"padding type matches element type of result (if present)",
"result", "padding",
"::llvm::cast<VectorType>($_self).getElementType()",
// This returns true if no padding is present, or it's present with a type that matches the element type of `result`.
"!getPadding() || std::equal_to<>()">
```
This is a little non-obvious, so after this patch you can instead do:
```
OptionalTypesMatchWith<
"padding type matches element type of result (if present)",
"result", "padding",
"::llvm::cast<VectorType>($_self).getElementType()">
```
Recent changes (https://github.com/llvm/llvm-project/pull/66930)
disabled vector transfer ops hoisting with view-like intermediate ops.
The recommended way is to fold subview ops into transfer op indices
before invoking hoisting. That would mean now we see transfer op indices
involving dynamic values, instead of static constant values before with
subview ops. Therefore hoisting won't kick in anymore. This breaks
downstream users.
To fix it, this commit enables hoisting transfer ops with dynamic
indices by using `ValueBoundsConstraintSet` to prove ranges are disjoint
in `isDisjointTransferIndices`. Given that utility is used in many
places including op folders, right now we introduce a flag to it and
only set as true for "heavy" transforms in hoisting and load-store
forwarding.
This is not yet supported and previously led to a confusing crash where
an extract op with a kDynamic marker, but no dynamic positions was
created. The verifier has also been updated to check for this, and hint
at where the problem is likely to be.
The vector.extract assembly format currently only contains the source
type, for example:
%1 = vector.extract %0[1] : vector<3x7x8xf32>
it's not immediately obvious if this is the source or result type. This
patch improves the assembly format to make this clearer, so the above
becomes:
%1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>
This revision pipes the fastmath attribute support through the
vector.reduction op. This seemingly simple first step already requires
quite some genuflexions, file and builder reorganization. In the
process, retire the boolean reassoc flag deep in the LLVM dialect
builders and just use the fastmath attribute.
During conversions, templated builders for predicated intrinsics are
partially cleaned up. In the future, to finalize the cleanups, one
should consider adding fastmath to the VPIntrinsic ops.
This extends `vector.constant_mask` so that mask dim sizes that
correspond to a scalable dimension are treated as if they're implicitly
multiplied by vscale. Currently this is limited to mask dim sizes of 0
or the size of the dim/vscale. This allows constant masks to represent
all true and all false scalable masks (and some variations):
```
// All true scalable mask
%mask = vector.constant_mask [8] : vector<[8]xi1>
// All false scalable mask
%mask = vector.constant_mask [0] : vector<[8]xi1>
// First two scalable rows
%mask = vector.constant_mask [2,4] : vector<4x[4]xi1>
```
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
This is just a small fix that makes sure that `vector.contract` works
with scalable vectors.
Rather than duplicating all the roundtrip tests for vector.contract, I'm
treating scalable vectors as an edge case and just adding a couple to
verify that this works.
This was introduced before the Optional directive and uses Variadic, but
it's really optional.
Reviewed By: nicolasvasilache, benmxwl-arm, dcaballe
Differential Revision: https://reviews.llvm.org/D159259
0-D vectors are now supported, so the special case of returning the just
the element type can now be removed.
A few callers that relied on the old behaviour have been updated.
Reviewed By: awarzynski, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D159122
The current implementation is not very ergonomic or descriptive: It uses `std::optional<unsigned>` where `std::nullopt` represents the parent op and `unsigned` is the region number.
This doesn't give us any useful methods specific to region control flow and makes the code fragile to changes due to now taking the region number into account.
This patch introduces a new type called `RegionBranchPoint`, replacing all uses of `std::optional<unsigned>` in the interface. It can be implicitly constructed from a region or a `RegionSuccessor`, can be compared with a region to check whether the branch point is branching from the parent, adds `isParent` to check whether we are coming from a parent op and adds `RegionSuccessor::parent` as a descriptive way to indicate branching from the parent.
Differential Revision: https://reviews.llvm.org/D159116
This patch effectively enables the CastAwayElementwiseLeadingOneDim
rewrite pattern for scalable vectors. To this end,
`ExtractOp::inferReturnTypes` is updated so that scalable dimensions are
correctly recognised.
The change to ExtractOp will likely make also other conversion patterns
valid for scalable vectors, but this patch focuses on just one case.
Other conversion patterns will be enabled in the forthcoming patches.
Depends on D157993
Differential Revision: https://reviews.llvm.org/D158335
This commit starts enabling vector distruction over multiple
dimensions. It requires delinearize the lane ID to match the
expected rank. shape_cast and transfer_read now can properly
handle multiple dimensions.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D157931
Make sure that when canonicalising masked `vector.multi_reduction` and
creating `arith.select` to replace the mask, scalability of the mask is
preserved.
Differential Revision: https://reviews.llvm.org/D157732
This patch adds the missing logic so that the
`TransferReadPermutationLowering` can be used for scalable vectors. To
this end:
* TransferOp custom C++ builder is updated to support scalable
vectors,
* `TransferOpReduceRank` is also updated to support scalable vectors.
This pattern is relevant when lowering `linalg.matmul` via
`vector_multi_reduction` for scalable vectors.
I've also updated relevant code in `TransferOpReduceRank` not to use
`llvm::to_vector` for constructing `SmallVector` from `ArrayRef`. That
hook doesn't work for `ArraryRef<bool>` (*), so for consistency I
switched to an explicit constructor (so that both `newShape` and
`newScalableDim` are constructed in a similar fashion).
(*) IIUC, that's due how implicit narrowing conversions between `bool`
and `*bool` work. Note that these narrowing conversions change when
using initializer lists, see
* https://en.cppreference.com/w/cpp/language/list_initialization.
Depends on D157092
Differential Revision: https://reviews.llvm.org/D157268
The `RegionBranchOpInterface` had a few fundamental issues caused by the API design of `getSuccessorRegions`.
It always required passing values for the `operands` parameter. This is problematic as the operands parameter actually changes meaning depending on which predecessor `index` is referring to. If coming from a region, you'd have to find a `RegionBranchTerminatorOpInterface` in that region, get its operand count, and then create a `SmallVector` of that size.
This is not only inconvenient, but also error-prone, which has lead to a bug in the implementation of a previously existing `getSuccessorRegions` overload.
Additionally, this made the method dual-use, trying to serve two different use-cases: 1) Trying to determine possible control flow edges between regions and 2) Trying to determine the region being branched to based on constant operands.
This patch fixes these issues by changing the interface methods and adding new ones:
* The `operands` argument of `getSuccessorRegions` has been removed. The method is now only responsible for returning possible control flow edges between regions.
* An optional `getEntrySuccessorRegions` method has been added. This is used to determine which regions are branched to from the parent op based on constant operands of the parent op. By default, it calls `getSuccessorRegions`. This is analogous to `getSuccessorForOperands` from `BranchOpInterface`.
* Add `getSuccessorRegions` to `RegionBranchTerminatorOpInterface`. This is used to get the possible successors of the terminator based on constant operands. By default, it calls the containing `RegionBranchOpInterface`s `getSuccessorRegions` method.
* `getSuccessorEntryOperands` was renamed to `getEntrySuccessorOperands` for consistency.
Differential Revision: https://reviews.llvm.org/D157506
Support for scalable vectors in vector.multi_reduction is added by
simply updating MultiDimReductionOp::verify.
Also, the conversion pattern for reducing n-D vector.multi_reduction to
2D vector.multi_reduction is updated.
Differential Revision: https://reviews.llvm.org/D157092