Fix `UnnamedAddrAttrName` being inserted twice into the `elidedAttrs`
list for the attribute dictionary printer in `GlobalOp` and `AliasOp`
print functions.
Context:
`vector.transfer_read` always requires a padding value. Most of its
builders take no `padding` value and assume the safe value of `0`.
However, this should be a conscious choice by the API user, as it makes
it easy to introduce bugs.
For example, I found several occasions while making this patch that the
padding value was not getting propagated (`vector.transfer_read` was
transformed into another `vector.transfer_read`). These bugs, were
always caused because of constructors that don't require specifying
padding.
Additionally, using `ub.poison` as a possible default value is better,
as it indicates the user "doesn't care" about the actual padding value,
forcing users to specify the actual padding semantics they want.
With that in mind, this patch changes the builders in
`vector.transfer_read` to always having a `std::optional<Value> padding`
argument. This argument is never optional, but for convenience users can
pass `std::nullopt`, padding the transfer read with `ub.poison`.
---------
Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
Similar to 'enter data', except the data clauses have a 'getdeviceptr'
operation before, so that they can properly use the 'exit' operation
correctly. While this is a touch awkward, it fits perfectly into the
existing infrastructure.
Same as with 'enter data', we had to add some add-functions for async
and wait.
The select op has 3 inputs: input1, input2, input3 to according to the
tosa specification. However, use of getInput1(), getInput2() and
getInput3() in the codebase can be confusing and hinder readability.
This commit adds custom getters to help improve readability:
- input1 -> getPred()
- input2 -> getOnTrue()
- input3 -> getOnFalse()
They should be preferred as they are more descriptive, however, the ODS
generated getters (getInputX()) may still be used.
Unfortunately the custom getters don't propagate to Adaptors such as
`FoldAdaptor`, so the ODS generated getters must be used.
`tensor.splat` is currently restricted to only accepting input values
that are of integer, index or float type.
This is much more restrictive than the tensor type itself as well as any
lowerings of it.
This PR therefore removes this restriction by using `AnyType` for the
input value. Whether the type is actually valid or not for a tensor
remains verified through the type equality of the result tensor element
type and the input type.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Fixes: https://github.com/llvm/llvm-project/issues/130257
Fix affine-data-copy-generate in certain cases that involved users in
multiple blocks. Perform the memref replacement correctly during copy
generation.
Improve/clean up memref affine use replacement API. Instead of
supporting dominance and post dominance filters (which aren't adequate
in most cases) and computing dominance info expensively each time in
RAMUW, provide a user filter callback, i.e., force users to compute
dominance if needed.
* Clarified the `inner_dim_pos` attribute in the case of high
dimensionality tensors.
* Added a 5D examples to show-case the use-cases that triggered this
updated.
* Added a reminder for linalg.unpack that number of elements are not
required to be the same between input/output due to padding being
dropped.
I encountered some odd variations of `linalg.pack` and `linalg.unpack`
while working on some TFLite models and the definition in the
documentation did not match what I saw pass in IR verification.
The following changes reconcile those differences.
---------
Signed-off-by: Christopher McGirr <mcgirr@roofline.ai>
'enter data' is a new construct type that requires one of the data
clauses, so we had to wait for all clauses to be ready before we could
commit this. Most of the clauses are simple, but there is a little bit
of work to get 'async' and 'wait' to have similar interfaces in the ACC
dialect, where helpers were added.
ArrayRef(std::nullopt) just got deprecated. This patch does the same
to MutableArrayRef(std::nullopt). Since there are only a couple of
uses, this patch does migration and deprecation at the same time.
…ombined
This patch does the lowering of copyin (represented as a
acc.copyin/acc.delete), copyout (acc.create/acc.copyin), and create
(acc.create/acc.delete).
Additionally, it found a few problems with #144806, so it fixes those as
well.
This patch adds additional checks to the hoisting logic to prevent hoisting of
`vector.transfer_read` / `vector.transfer_write` pairs when the underlying
memref has users that introduce aliases via operations implementing
`ViewLikeOpInterface`.
Note: This may conservatively block some valid hoisting opportunities and could
affect performance. However, as demonstrated by the included tests, the current
logic is too permissive and can lead to incorrect transformations.
If this change prevents hoisting in cases that are provably safe, please share
a minimal repro - I'm happy to explore ways to relax the check.
Special treatment is given to `memref.assume_alignment`, mainly to accommodate
recent updates in:
* https://github.com/llvm/llvm-project/pull/139521
Note that such special casing does not scale and should generally be avoided.
The current hoisting logic lacks robust alias analysis. While better support
would require more work, the broader semantics of `memref.assume_alignment`
remain somewhat unclear. It's possible this op may eventually be replaced with
the "alignment" attribute added in:
* https://github.com/llvm/llvm-project/pull/144344
`ConversionPattern::getVoidPtrType` looks a little confusion since the
opaque pointer migration is already done. Also we cannot specify address
space in this method.
Maybe we can mark them as deprecated and add new method `getPtrType()`,
as this PR did : )
Given the following example:
```
module {
func.func @main(%arg0: tensor<1x1x1x4x1xf32>, %arg1: tensor<1x1x4xf32>) -> tensor<1x1x1x4x1xf32> {
%pack = linalg.pack %arg1 outer_dims_perm = [1, 2, 0] inner_dims_pos = [2, 0] inner_tiles = [4, 1] into %arg0 : tensor<1x1x4xf32> -> tensor<1x1x1x4x1xf32>
return %pack : tensor<1x1x1x4x1xf32>
}
}
```
We would generate an invalid transpose operation because the calculated
permutation would be `[0, 2, 0]` which is semantically incorrect. As the
permutation must contain unique integers corresponding to the source
tensor dimensions.
The following change modifies how we calculate the permutation array and
ensures that the dimension indices given in the permutation array is
unique.
The above example would then translate to a transpose having a
permutation of `[1, 2, 0]`. Following the rule, that the `inner_dim_pos`
is appended to the permutation array and the preceding indices are
filled with the remaining dimensions.
ArrayRef now has a new constructor that takes a parameter whose type
has data() and size(). This patch migrates:
ArrayRef<T>(X.data(), X.size()
to:
ArrayRef<T>(X)
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch replaces {} with std::nullopt.
This revision adds a pass working on FunctionOpInterface to connect recently introduced AffineMin/Max simplification patterns.
Additionally fixes some minor issues that have surfaced upon larger scale testing.
This patch enforces a restriction in the Vector dialect: the non-indexed
operands of `vector.insert` and `vector.extract` must no longer be 0-D
vectors. In other words, rank-0 vector types like `vector<f32>` are
disallowed as the source or result.
EXAMPLES
--------
The following are now **illegal** (note the use of `vector<f32>`):
```mlir
%0 = vector.insert %v, %dst[0, 0] : vector<f32> into vector<2x2xf32>
%1 = vector.extract %src[0, 0] : vector<f32> from vector<2x2xf32>
```
Instead, use scalars as the source and result types:
```mlir
%0 = vector.insert %v, %dst[0, 0] : f32 into vector<2x2xf32>
%1 = vector.extract %src[0, 0] : f32 from vector<2x2xf32>
```
Note, this change serves three goals. These are summarised below.
## 1. REDUCED AMBIGUITY
By enforcing scalar-only semantics when the result (`vector.extract`)
or source (`vector.insert`) are rank-0, we eliminate ambiguity
in interpretation. Prior to this patch, both `f32` and `vector<f32>`
were accepted.
## 2. MATCH IMPLEMENTATION TO DOCUMENTATION
The current behaviour contradicts the documented intent. For example,
`vector.extract` states:
> Degenerates to an element type if n-k is zero.
This patch enforces that intent in code.
## 3. ENSURE SYMMETRY BETWEEN INSERT AND EXTRACT
With the stricter semantics in place, it’s natural and consistent to
make `vector.insert` behave symmetrically to `vector.extract`, i.e.,
degenerate the source type to a scalar when n = 0.
NOTES FOR REVIEWERS
-------------------
1. Main change is in "VectorOps.cpp", where stricter type checks are
implemented.
2. Test updates in "invalid.mlir" and "ops.mlir" are minor cleanups to
remove now-illegal examples.
2. Lowering changes in "VectorToSCF.cpp" are the main trade-off: we now
require an additional `vector.extract` when a preceding
`vector.transfer_read` generates a rank-0 vector.
RELATED RFC
-----------
*
https://discourse.llvm.org/t/rfc-should-we-restrict-the-usage-of-0-d-vectors-in-the-vector-dialect
This commit adds extra state to `ConversionPatternRewriterImpl` to store
all modified / newly-created operations and moved / newly-created blocks
in separate lists on a per-pattern basis.
This is in preparation of the One-Shot Dialect Conversion refactoring:
the new driver will no longer maintain a list of all IR rewrites, so
information about newly-created operations (which is needed to trigger
recursive legalization) must be retained in a different data structure.
This commit is also expected to improve the performance of the existing
driver. The previous implementation iterated over all new IR
modifications and then filtered them by type. It also required an
additional pointer indirection (through `std::unique_ptr<IRRewrite>`) to
retrieve the operation/block pointer.
This thread through proper error handling / reporting capabilities to
avoid hitting llvm_unreachable while parsing linalg ops.
Fixes#132755Fixes#132740Fixes#129185
For consumer fusion cases of this form
```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {
tensor.parallel_insert_slice ... into %arg0
tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```
the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.
- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`
to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.
The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch migrates away from std::nullopt in favor of ArrayRef<T>()
where we use perfect forwarding. Note that {} would be ambiguous for
perfect forwarding to work.
Refactors the `@hoist_vector_transfer_pairs` test function in
`hoisting.mlir` into smaller, focused test functions - each covering a
specific `vector.transfer_read`/`vector.transfer_write` pair.
This makes it easier to identify which edge cases are tested, spot
duplication, and write more targeted and readable check lines, with less
surrounding noise.
This refactor also helped identify some issues with the original
`@hoist_vector_transfer_pairs` test:
* Input variables `%val` and `%cmp` were unused.
* There were no check lines for reads from `memref5`.
**Note for reviewers (current and future):**
This PR is split into small, incremental, and self-contained commits. It
should be easier to follow the changes by reviewing those commits
individually, rather than reading the full squashed diff. However, this
will be merged as a single commit to avoid adding unnecessary history
noise in-tree.
https://github.com/llvm/llvm-project/pull/144657 added #include
"mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h" to
mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp, though it is not
in use
This include introduces a dependency for LinalgTransforms on
LinalgTransformOps, which is unspecified in the module dependencies, and
would produce a cyclic dependency if it were specified.
The include is unused in WinogradConv2D.cpp, so this change removes it.
As a follow-up to PR #135841 (see discussion for background), this patch
removes the `ConvertIllegalShapeCastOpsToTransposes` pattern from the SME
legalization pass. This change unblocks folding for ShapeCastOp involving
scalable vectors.
Originally, the `ConvertIllegalShapeCastOpsToTransposes` pattern was introduced
to rewrite certain `vector.shape_cast` ops that could not be lowered otherwise.
Based on local end-to-end testing, this workaround is no longer required, and
the pattern can now be safely removed.
This patch also removes a special case from `ShapeCastOp::fold`, simplifying
the fold logic.
As a side effect of removing `ConvertIllegalShapeCastOpsToTransposes`, we lose
the mechanism that enabled lowering of certain ops like:
```mlir
%res = vector.transfer_read %mem[%a, %b] (...) :
memref<?x?xf32>, vector<[4]x1xf32>
```
Previously, such cases were handled by:
* Rewriting a nearby `vector.shape_cast` to a `vector.transpose` (via
`ConvertIllegalShapeCastOpsToTransposes`)
* Then lowering the result with `LiftIllegalVectorTransposeToMemory`.
This patch introduces a new dedicated pattern,
`LowerColumnTransferReadToLoops`, that directly handles illegal
`vector.transfer_read` ops involving leading scalable dimensions.