Erasing/replacing an op, which is also the current insertion point,
invalidates the insertion point. Explicitly set the insertion point, so
that `copy` does not crash after the One-Shot Dialect Conversion
refactoring. (`ConversionPatternRewriter` will start behaving more like
a "normal" rewriter.)
Just like 1d-tensors, reshapes of 0d-tensors (aka scalars) are always
no-folds as they only have one possible layout. This PR adds logic to
the `fold` implementation to optimize these away as is currently
implemented for 1d tensors.
linalg promotion attempts to compute a constant upper bound for the
allocated buffer size. Only when failed to compute an upperbound it
fallbacks to the original subview size, which may be dynamic.
Adding a promotion option to use the original subview size by default,
thus minimizing the allocation size.
Fixes#144268.
Currently AffineExpr Add has ability to optimize `"s1 + (s1 // c * -c)"
to "s1 % c"`,
but can not optimize `"(s0 + s1) + (s1 // c * -c)"`.
This patch provide an opportunity to do this simplification, let it can
be simplified to `"s0 + s1 % c"`.
This PR adds a `build()` constructor for `ConstantIntOp` that takes in
an `APInt`.
Creating an `arith` constant value with an `APInt` currently requires a
structure like the following:
```c
b.create<arith::ConstantOp>(IntegerAttr::get(apintValue, 5));
```
In comparison, the`ConstantFloatOp` already has an `APFloat` constructor
which allows for the following:
```c
b.create<arith::ConstantFloatOp>(floatType, apfloatValue);
```
Thus, intuitively, it makes sense that a similar `ConstantIntOp`
constructor is made for `APInts` like so:
```c
b.create<arith::ConstantIntOp>(intType, apintValue);
```
Depends on https://github.com/llvm/llvm-project/pull/144636
The issue is triggered by
ee070d0816
that checks `TensorLikeType` when downstream projects use the pattern
without registering bufferization::BufferizationDialect. The
registration is needed because the interface implementation for builtin
types locate at `BufferizationDialect::initialize()`. However, we do not
need to fix it by the registration. The proper fix is using the linalg
method, i.e., hasPureTensorSemantics.
No additional tests are added because the functionality is well tested
in
[transpose-matmul.mlir](https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/transpose-matmul.mlir).
To reproduce the issue, it requires a different setup, e.g., writing a
new C++ pass, which seems not worth it.
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Delay the erasure of an op, so that the insertion point of the rewriter
remains valid.
This commit is in preparation of the One-Shot Dialect Conversion
refactoring. (The current implementation works with the current dialect
conversion driver because op erasure is delayed.)
This PR adds a mechanism, so that downstream consumers can pass in
control functions for the application of these patterns. This change
shouldn't affect any consumers of this method that do not specify a
controlFn. The controlFn always gets the source operand of the consumer
in each of the patterns as a parameter.
In IREE, we (will) use it to control preventing folding patterns that
would inhibit fusion. See IREE issue
[#20896](https://github.com/iree-org/iree/issues/20896) for more
details.
In both `bubbleUpPackOpThroughGenericOp()` or
`pushDownUnPackOpThroughGenericOp()`, we can simplify the lowered IR by
removing the pack of an empty when the init tensor isn't used in generic
op. Instead of packing an empty tensor, the empty tensor can be
forwarded to the generic output. This allows cleaner result after data
layout propagation.
This pass reifies the shapes of a subset of
`ReifyRankedShapedTypeOpInterface` ops with `tensor` results.
The pass currently only supports result shape type reification for:
- tensor::PadOp
- tensor::ConcatOp
It addresses a representation gap where implicit op semantics are needed
to infer static result types from dynamic
operands. But it does so by using `ReifyRankedShapedTypeOpInterface` as
the source of truth rather than the op itself.
As a consequence, this cannot generalize today.
TODO: in the future, we should consider coupling this information with
op "transfer functions" (e.g.
`IndexingMapOpInterface`) to provide a source of truth that can work
across result shape inference, canonicalization and
op verifiers.
The pass replaces the operations with their reified versions, when more
static information can be derived, and inserts
casts when results shapes are updated.
Example:
```mlir
#map = affine_map<(d0) -> (-d0 + 256)>
func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
%0 = affine.apply #map(%arg1)
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
%padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index):
tensor.yield %arg0 : f32
} : tensor<1x?x64xf32> to tensor<1x?x64xf32>
return %padded : tensor<1x?x64xf32>
}
// mlir-opt --reify-result-shapes
#map = affine_map<()[s0] -> (-s0 + 256)>
func.func @func(%arg0: f32, %arg1: index, %arg2: tensor<64x?x64xf32>) -> tensor<1x?x64xf32> {
%0 = affine.apply #map()[%arg1]
%extracted_slice = tensor.extract_slice %arg2[0, 0, 0] [1, %arg1, 64] [1, 1, 1] : tensor<64x?x64xf32> to tensor<1x?x64xf32>
%padded = tensor.pad %extracted_slice low[0, 0, 0] high[0, %0, 0] {
^bb0(%arg3: index, %arg4: index, %arg5: index):
tensor.yield %arg0 : f32
} : tensor<1x?x64xf32> to tensor<1x256x64xf32>
%cast = tensor.cast %padded : tensor<1x256x64xf32> to tensor<1x?x64xf32>
return %cast : tensor<1x?x64xf32>
}
```
---------
Co-authored-by: Fabian Mora <fabian.mora-cordero@amd.com>
This implements the async, wait, if, and if_present (as well as
device_type, but that is a detail of async/wait) lowering. All of
these are implemented the same way they are for the compute constructs,
so this is a pretty mild amount of changes.
Previously, references to regions and successors were incorrectly disallowed outside the top-level assembly form. This change enables the use of bound regions and successors as variables in custom directives.
Firstly, this commit requires that all types are signless in the strict
mode of the validation pass. This is because signless types on
operations are required by the TOSA specification. The "strict" mode in
the validation pass is the final check for TOSA conformance to the
specification, which can often be used for conversion to other formats.
In addition, a conversion pass `--tosa-convert-integer-type-to-signless`
is provided to allow a user to convert all integer types to signless.
The intention is that this pass can be run before the validation pass.
Following use of this pass, input/output information should be carried
independently by the user.
Fix `UnnamedAddrAttrName` being inserted twice into the `elidedAttrs`
list for the attribute dictionary printer in `GlobalOp` and `AliasOp`
print functions.
Context:
`vector.transfer_read` always requires a padding value. Most of its
builders take no `padding` value and assume the safe value of `0`.
However, this should be a conscious choice by the API user, as it makes
it easy to introduce bugs.
For example, I found several occasions while making this patch that the
padding value was not getting propagated (`vector.transfer_read` was
transformed into another `vector.transfer_read`). These bugs, were
always caused because of constructors that don't require specifying
padding.
Additionally, using `ub.poison` as a possible default value is better,
as it indicates the user "doesn't care" about the actual padding value,
forcing users to specify the actual padding semantics they want.
With that in mind, this patch changes the builders in
`vector.transfer_read` to always having a `std::optional<Value> padding`
argument. This argument is never optional, but for convenience users can
pass `std::nullopt`, padding the transfer read with `ub.poison`.
---------
Signed-off-by: Fabian Mora <fabian.mora-cordero@amd.com>
Similar to 'enter data', except the data clauses have a 'getdeviceptr'
operation before, so that they can properly use the 'exit' operation
correctly. While this is a touch awkward, it fits perfectly into the
existing infrastructure.
Same as with 'enter data', we had to add some add-functions for async
and wait.
The select op has 3 inputs: input1, input2, input3 to according to the
tosa specification. However, use of getInput1(), getInput2() and
getInput3() in the codebase can be confusing and hinder readability.
This commit adds custom getters to help improve readability:
- input1 -> getPred()
- input2 -> getOnTrue()
- input3 -> getOnFalse()
They should be preferred as they are more descriptive, however, the ODS
generated getters (getInputX()) may still be used.
Unfortunately the custom getters don't propagate to Adaptors such as
`FoldAdaptor`, so the ODS generated getters must be used.
`tensor.splat` is currently restricted to only accepting input values
that are of integer, index or float type.
This is much more restrictive than the tensor type itself as well as any
lowerings of it.
This PR therefore removes this restriction by using `AnyType` for the
input value. Whether the type is actually valid or not for a tensor
remains verified through the type equality of the result tensor element
type and the input type.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Fixes: https://github.com/llvm/llvm-project/issues/130257
Fix affine-data-copy-generate in certain cases that involved users in
multiple blocks. Perform the memref replacement correctly during copy
generation.
Improve/clean up memref affine use replacement API. Instead of
supporting dominance and post dominance filters (which aren't adequate
in most cases) and computing dominance info expensively each time in
RAMUW, provide a user filter callback, i.e., force users to compute
dominance if needed.
* Clarified the `inner_dim_pos` attribute in the case of high
dimensionality tensors.
* Added a 5D examples to show-case the use-cases that triggered this
updated.
* Added a reminder for linalg.unpack that number of elements are not
required to be the same between input/output due to padding being
dropped.
I encountered some odd variations of `linalg.pack` and `linalg.unpack`
while working on some TFLite models and the definition in the
documentation did not match what I saw pass in IR verification.
The following changes reconcile those differences.
---------
Signed-off-by: Christopher McGirr <mcgirr@roofline.ai>
'enter data' is a new construct type that requires one of the data
clauses, so we had to wait for all clauses to be ready before we could
commit this. Most of the clauses are simple, but there is a little bit
of work to get 'async' and 'wait' to have similar interfaces in the ACC
dialect, where helpers were added.
ArrayRef(std::nullopt) just got deprecated. This patch does the same
to MutableArrayRef(std::nullopt). Since there are only a couple of
uses, this patch does migration and deprecation at the same time.
…ombined
This patch does the lowering of copyin (represented as a
acc.copyin/acc.delete), copyout (acc.create/acc.copyin), and create
(acc.create/acc.delete).
Additionally, it found a few problems with #144806, so it fixes those as
well.
This patch adds additional checks to the hoisting logic to prevent hoisting of
`vector.transfer_read` / `vector.transfer_write` pairs when the underlying
memref has users that introduce aliases via operations implementing
`ViewLikeOpInterface`.
Note: This may conservatively block some valid hoisting opportunities and could
affect performance. However, as demonstrated by the included tests, the current
logic is too permissive and can lead to incorrect transformations.
If this change prevents hoisting in cases that are provably safe, please share
a minimal repro - I'm happy to explore ways to relax the check.
Special treatment is given to `memref.assume_alignment`, mainly to accommodate
recent updates in:
* https://github.com/llvm/llvm-project/pull/139521
Note that such special casing does not scale and should generally be avoided.
The current hoisting logic lacks robust alias analysis. While better support
would require more work, the broader semantics of `memref.assume_alignment`
remain somewhat unclear. It's possible this op may eventually be replaced with
the "alignment" attribute added in:
* https://github.com/llvm/llvm-project/pull/144344
`ConversionPattern::getVoidPtrType` looks a little confusion since the
opaque pointer migration is already done. Also we cannot specify address
space in this method.
Maybe we can mark them as deprecated and add new method `getPtrType()`,
as this PR did : )
Given the following example:
```
module {
func.func @main(%arg0: tensor<1x1x1x4x1xf32>, %arg1: tensor<1x1x4xf32>) -> tensor<1x1x1x4x1xf32> {
%pack = linalg.pack %arg1 outer_dims_perm = [1, 2, 0] inner_dims_pos = [2, 0] inner_tiles = [4, 1] into %arg0 : tensor<1x1x4xf32> -> tensor<1x1x1x4x1xf32>
return %pack : tensor<1x1x1x4x1xf32>
}
}
```
We would generate an invalid transpose operation because the calculated
permutation would be `[0, 2, 0]` which is semantically incorrect. As the
permutation must contain unique integers corresponding to the source
tensor dimensions.
The following change modifies how we calculate the permutation array and
ensures that the dimension indices given in the permutation array is
unique.
The above example would then translate to a transpose having a
permutation of `[1, 2, 0]`. Following the rule, that the `inner_dim_pos`
is appended to the permutation array and the preceding indices are
filled with the remaining dimensions.
ArrayRef now has a new constructor that takes a parameter whose type
has data() and size(). This patch migrates:
ArrayRef<T>(X.data(), X.size()
to:
ArrayRef<T>(X)
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch replaces {} with std::nullopt.