This PR adds a mechanism, so that downstream consumers can pass in
control functions for the application of these patterns. This change
shouldn't affect any consumers of this method that do not specify a
controlFn. The controlFn always gets the source operand of the consumer
in each of the patterns as a parameter.
In IREE, we (will) use it to control preventing folding patterns that
would inhibit fusion. See IREE issue
[#20896](https://github.com/iree-org/iree/issues/20896) for more
details.
Previously, references to regions and successors were incorrectly disallowed outside the top-level assembly form. This change enables the use of bound regions and successors as variables in custom directives.
`ConversionPattern::getVoidPtrType` looks a little confusion since the
opaque pointer migration is already done. Also we cannot specify address
space in this method.
Maybe we can mark them as deprecated and add new method `getPtrType()`,
as this PR did : )
This thread through proper error handling / reporting capabilities to
avoid hitting llvm_unreachable while parsing linalg ops.
Fixes#132755Fixes#132740Fixes#129185
For consumer fusion cases of this form
```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {
tensor.parallel_insert_slice ... into %arg0
tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```
the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.
- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`
to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.
The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch migrates away from std::nullopt in favor of ArrayRef<T>()
where we use perfect forwarding. Note that {} would be ambiguous for
perfect forwarding to work.
https://github.com/llvm/llvm-project/pull/144657 added #include
"mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h" to
mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp, though it is not
in use
We only support fixed set of minimum filtering algorithm for Winograd
Conv2D decomposition. Instead of letting users specify any integer,
define a fixed set of enumeration values for the parameters of minimum
filtering algorithm.
This patch fixes a bug discovered in the
`affine::makeComposedFoldedAffineApply` function when `composeAffineMin
== true`. The bug happened because the simplification assumed the
symbols appearing in the `affine.apply` op corresponded to symbols in
the `affine.min` op, and that's not always the case. For example:
```mlir
#map = affine_map<()[s0, s1] -> (s1)>
#map1 = affine_map<()[s0, s1] -> (s0 ceildiv s1)>
module {
func.func @min_max_full_simplify() -> index {
%0 = test.value_with_bounds {max = 64 : index, min = 32 : index}
%1 = test.value_with_bounds {max = 64 : index, min = 32 : index}
%2 = affine.min #map()[%0, %1]
%3 = affine.apply #map1()[%2, %0]
return %3 : index
}
}
```
This patch also introduces the test `make_composed_folded_affine_apply`
transform operation to test this simplification. It also adds tests
ensuring we get correct behavior.
---------
Co-authored-by: Nicolas Vasilache <nico.vasilache@amd.com>
This commit adds support for non-attribute properties (such as
StringProp and I64Prop) in declarative rewrite patterns. The handling
for properties follows the handling for attributes in most cases,
including in the generation of static matchers.
Constraints that are shared between multiple types are supported by
making the constraint matcher a templated function, which is the
equivalent to passing ::mlir::Attribute for an arbitrary C++ type.
This patch is adding the ability to print a uint8_t/int8_t as an int
instead of a char and demonstrate support for int8_t/uin8_t properties.
Fix#144993
This commit adds 1:N support to
`ConversionPatternRewriter::replaceUsesOfBlockArgument`. This was one of
the few remaining dialect conversion APIs that does not support 1:N
conversions yet.
This commit also reuses `replaceUsesOfBlockArgument` in the
implementation of `applySignatureConversion`. This is in preparation of
the One-Shot Dialect Conversion refactoring. The goal is to bring the
`applySignatureConversion` implementation into a state where it works
both with and without rollbacks. To that end, `applySignatureConversion`
should not directly access the `mapping`.
This commit makes the following changes:
- Expose `map` and `mapOperands` in
`ValueBoundsConstraintSet::Variable`, so that the class can be used by
subclasses of `ValueBoundsConstraintSet`. Otherwise subclasses cannot
access those members.
- Add `ValueBoundsConstraintSet::strongCompare`. This method is similar
to `ValueBoundsConstraintSet::compare` except that it returns false when
the inverse comparison holds, and `llvm::failure()` if neither the
relation nor its inverse relation could be proven.
- Add `simplifyAffineMinOp`, `simplifyAffineMaxOp`, and
`simplifyAffineMinMaxOps` to simplify those operations using
`ValueBoundsConstraintSet`.
- Adds the `SimplifyMinMaxAffineOpsOp` transform op that uses
`simplifyAffineMinMaxOps`.
- Add the `test.value_with_bounds` op to test unknown values with a min
max range using `ValueBoundsOpInterface`.
- Adds tests verifying the transform.
Example:
```mlir
func.func @overlapping_constraints() -> (index, index) {
%0 = test.value_with_bounds {min = 0 : index, max = 192 : index}
%1 = test.value_with_bounds {min = 128 : index, max = 384 : index}
%2 = test.value_with_bounds {min = 256 : index, max = 512 : index}
%r0 = affine.min affine_map<()[s0, s1, s2] -> (s0, s1, s2)>()[%0, %1, %2]
%r1 = affine.max affine_map<()[s0, s1, s2] -> (s0, s1, s2)>()[%0, %1, %2]
return %r0, %r1 : index, index
}
// Result of applying `simplifyAffineMinMaxOps` to `func.func`
#map1 = affine_map<()[s0, s1] -> (s1, s0)>
func.func @overlapping_constraints() -> (index, index) {
%0 = test.value_with_bounds {max = 192 : index, min = 0 : index}
%1 = test.value_with_bounds {max = 384 : index, min = 128 : index}
%2 = test.value_with_bounds {max = 512 : index, min = 256 : index}
%3 = affine.min #map1()[%0, %1]
%4 = affine.max #map1()[%1, %2]
return %3, %4 : index, index
}
```
---------
Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
One of the uses of std::nullopt is in a the constructors for
TypeRange. This patch takes care of the migration where we need
TypeRange() to facilitate perfect forwarding. Note that {} would be
ambiguous for perfecting forwarding to work.
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch takes care of the mlir side of the migration, starting with
straightforward places like "return std::nullopt;" and ternally
expressions involving std::nullopt.
Cf. https://discourse.llvm.org/t/mlir-dead-code-analysis/67568/10
Custom analysis passes will not work properly unless both
DeadCodeAnalysis and SparseConstantPropagation are loaded to the
DataFlowSolver. This is intended behavior, but surprising to many users
as shown in the thread. In lieu of a longer-term fix (which I am not
knowledgeable enough to implement myself, yet), this commit adds a
helper function that loads these two analyses, as well as providing
breadcrumbs for an explanation of the problem. The existing places in
the codebase where these two analyses are loaded for the purpose of
running other unrelated analyses are replaced by the use of the helper.
---------
Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
Co-authored-by: Oleksandr "Alex" Zinenko <azinenko@amd.com>
### Description
This patch improves the folding efficiency of `vector.insert` and
`vector.extract` operations by not returning early after successfully
converting dynamic indices to static indices.
This PR also renames the test pass `TestConstantFold` to
`TestSingleFold` and adds comprehensive documentation explaining the
single-pass folding behavior.
### Motivation
Since the `OpBuilder::createOrFold` function only calls `fold` **once**,
the current `fold` methods of `vector.insert` and `vector.extract` may
leave the op in a state that can be folded further. For example,
consider the following un-folded IR:
```
%v1 = vector.insert %e1, %v0 [0] : f32 into vector<128xf32>
%c0 = arith.constant 0 : index
%e2 = vector.extract %v1[%c0] : f32 from vector<128xf32>
```
If we use `createOrFold` to create the `vector.extract` op, then the
result will be:
```
%v1 = vector.insert %e1, %v0 [127] : f32 into vector<128xf32>
%e2 = vector.extract %v1[0] : f32 from vector<128xf32>
```
But this is not the optimal result. `createOrFold` should have returned
`%e1`.
The reason is that the execution of fold returns immediately after
`extractInsertFoldConstantOp`, causing subsequent folding logics to be
skipped.
---------
Co-authored-by: Yang Bai <yangb@nvidia.com>
Following the addition of TensorLike and BufferLike type interfaces (see
00eaff3e9c), introduce minimal changes
required to bufferize a custom tensor operation into a custom buffer
operation.
To achieve this, new interface methods are added to TensorLike type
interface that abstract away the differences between existing (tensor ->
memref) and custom conversions.
The scope of the changes is intentionally limited (for example,
BufferizableOpInterface is untouched) in order to first understand the
basics and reach consensus design-wise.
---
Notable changes:
* mlir::bufferization::getBufferType() returns BufferLikeType (instead
of BaseMemRefType)
* ToTensorOp / ToBufferOp operate on TensorLikeType / BufferLikeType.
Operation argument "memref" renamed to "buffer"
* ToTensorOp's tensor type inferring builder is dropped (users now need
to provide the tensor type explicitly)
Add support for load/store with chunk_size, which requires special
consideration for the operand blocking since offests and masks are
n-D and tensor are n+1-D. Support operations including create_tdesc,
update_tdesc, load, store, and prefetch.
---------
Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
Add unrolling support for create_tdesc, load, store, prefetch, and update_offset.
---------
Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
Co-authored-by: Chao Chen <chao.chen@intel.com>
This patch removes the `test-linalg-to-vector-patterns` option from the
`-test-linalg-transform-patterns=` test flag. It was only used in one
test, where a more specialized transform dialect op can be used instead:
* `transform.apply_patterns.linalg.pad_vectorization`
While we could preserve `test-linalg-to-vector-patterns`, it's better to
rely on finer-grained transformations — this way, we know exactly what
is being run and tested. Now that its only use has been removed, it
feels natural to delete `test-linalg-to-vector-patterns`.
This fixes an issue with `TransferWriteOp`'s implementation of the
`MemoryEffectOpInterface` where the write effect was attached to the
stored value rather than the base.
This had the effect that when asking for the memory effects for the
input memref buffer using `getEffectsOnValue(...)`, the function would
return no-effects (as the effect would have been attached to the stored
value rather than the input buffer).
In #120115 the replacements for the tiled operations were wrapped within
the `MergeResult` object. That is a bit of an obfuscation and not
immediately obvious where to get the replacements post tiling. This
changes the `SCFTilingResult` to have `replacements` explicit (as it was
before that change).
`mergeOps` is added as a separate field of `SCFTilingResult`, which is
empty when the reduction type is `FullReduction`.
This is a API breaking change. All uses of `mergeResult.replacements`
should be replaced with `replacements`.
There was also an implicit assumption that
`PartialReductionTilingInterface` is derived from `TilingInterface`, so
all ops that implemented the `PartialReductionTilingInterface` were
expected to implement the `TilingInterface` as well. This pre-dated the
existence of derived inheritances. Make
`PartialReductionTilingInterface` derive from `TilingInterface`.
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This PR adds `UnrollBroadcastPattern` to `VectorUnroll` transform.
To support this, it also extends `BroadcastOp` definition with
`VectorUnrollOpInterface`
Below is the original commit description. Furthermore, it applies a
[fix](33a26b9ca2)
for CMakeList.txt
The issue occurs during a downstream pass which does dialect conversion,
where both
[`FuncOpConversion`](cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480))
and
[`SubviewFolder`](cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187))
are run together. The original starting IR is:
```mlir
module {
func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> {
%subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>>
return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>>
}
}
```
After `FuncOpConversion` runs, the IR looks like:
```mlir
"builtin.module"() ({
"llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({
^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64):
%0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>
"func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> ()
}) : () -> ()
"func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({
}) : () -> ()
}) {llvm.data_layout = "", llvm.target_triple = ""} : () -> ()
```
The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate
unlinked block, which is disconnected from the rest of the IR (so not
only is the IR verifier-invalid, it can't even be parsed). This IR is
created by signature conversion in the dialect conversion infra.
Now `SubviewFolder` is applied, and the utility function here is called
on one of these disconnected block arguments, causing a crash.
The TestMemRefToLLVMWithTransforms pass is introduced to exercise the
bug, and it can be reused by other contributors in the future.
Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
The issue occurs during a downstream pass which does dialect conversion,
where both
[`FuncOpConversion`](cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480))
and
[`SubviewFolder`](cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187))
are run together. The original starting IR is:
```mlir
module {
func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> {
%subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>>
return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>>
}
}
```
After `FuncOpConversion` runs, the IR looks like:
```mlir
"builtin.module"() ({
"llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({
^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64):
%0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>
"func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> ()
}) : () -> ()
"func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({
}) : () -> ()
}) {llvm.data_layout = "", llvm.target_triple = ""} : () -> ()
```
The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate
unlinked block, which is disconnected from the rest of the IR (so not
only is the IR verifier-invalid, it can't even be parsed). This IR is
created by signature conversion in the dialect conversion infra.
Now `SubviewFolder` is applied, and the utility function here is called
on one of these disconnected block arguments, causing a crash.
The TestMemRefToLLVMWithTransforms pass is introduced to exercise the
bug, and it can be reused by other contributors in the future.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>
Adds a few canonicalizers, folders, and rewrite patterns to tensor ops:
* tensor.insert folder: insert into a constant is replaced with a new
constant
* tensor.extract folder: extract from a parent tensor that was inserted
at the same indices is folded into the inserted value
* rewrite pattern added that replaces an extract of a collapse shape
with an extract of the source tensor (requires static source dimensions)
Signed-off-by: Asra Ali <asraa@google.com>
This patch wraps `populateLowerContractionToSMMLAPatternPatterns` into a
new TD Op `apply_patterns.arm_neon.vector_contract_to_i8mm` .
It also removes the "test-lower-to-arm-neon" pass.
We already have hasOneUse. Like llvm::Value we provide helper methods to
query the number of uses of a Value. Add unittests for Value, because
that was missing.
---------
Co-authored-by: Michael Maitland <michaelmaitland@meta.com>
The PR continues the work started in #141019 by adding the `BufferizationState` class also to the `getBufferType` and `resolveConflicts` interface methods, together with the additional support functions that are used throughout the bufferization infrastructure.
This PR introduces a new tool, mlir-irdl-to-cpp, that converts IRDL to
C++ definitions.
The C++ definitions allow use of the IRDL-defined dialect in MLIR C++
infrastructure, enabling the use of conversion patterns with IRDL
dialects for example. This PR also adds CMake utilities to easily
integrate the IRDL dialects into MLIR projects.
Note that most IRDL features are not supported. In general, we are only
able to define simple types and operations.
- The only type constraint supported is irdl.any.
- Variadic operands and results are not supported.
- Verifiers for the IRDL constraints are not generated.
- Attributes are not supported.
---------
Co-authored-by: Théo Degioanni <theo.degioanni.llvm.deluge062@simplelogin.fr>
Co-authored-by: Fehr Mathieu <mathieu.fehr@gmail.com>
Background issue: #139813
In
[emitEitherOperandMatch()](e62fc14a5d/mlir/tools/mlir-tblgen/RewriterGen.cpp (L774))
we check if `op.getArg(argIndex)` is a `NamedTypeConstraint`:
```cpp
} else if (isa<NamedTypeConstraint *>(op.getArg(argIndex))) {
emitOperandMatch(tree, opName, /*operandName=*/formatv("v{0}", i).str(),
operandIndex,
/*operandMatcher=*/eitherArgTree.getArgAsLeaf(i),
/*argName=*/eitherArgTree.getArgName(i), argIndex,
/*variadicSubIndex=*/std::nullopt);
++operandIndex;
}
```
but in `emitOperandMatch()` we cast on `op.getArg(operandIndex)`, which
is incorrect if the operation has attributes or other non-operand
arguments before its operands.
This patch fixes warnings of the form:
mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp:320:19: error:
unused variable 'result' [-Werror,-Wunused-variable]
The current implementation of getBackwardSlice will crash if an
operation in the dependency chain is defined by an operation with
multiple regions or blocks. Crashing is bad (and forbids many analyses
from using getBackwardSlice, as well as causing existing users of
getBackwardSlice to fail for IR with this property).
This PR instead causes the analysis to return a failure, rather than
crash in the cases it cannot compute the full slice
---------
Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>
Skipping only over the first results leads to the curCodeIt pointing to
the wrong location in the bytecode, causing the execution to continue
with a wrong instruction after the Constraint/Rewrite.
Signed-off-by: Rickert, Jonas <Jonas.Rickert@amd.com>