In several cases, the splitting may be known to be a noop, i.e., produce
no second part. Thread this information through the transform utilities
to the transform dialect, and differentiate it from the error state.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D141138
Adapt the implementation of TransformEachOpTrait to the existence of
parameter values recently introduced into the transform dialect. In
particular, allow `applyToOne` hooks to return a list containing a mix
of `Operation *` that will be associated with handles and `Attribute`
that will be associated with parameter values by the trait
implementation of the transform interface's `apply` method.
Disentangle the "transposition" of the list of per-payload op partial
results to decrease its overall complexity and detemplatize the code
that doesn't really need templates. This removes the poorly documented
special handling for single-result ops with TransformEachOpTrait that
could have assigned null pointer values to handles.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140979
It was originally placed in TransformInterfaces for convenience, but it
is really a generic utility. It may also create an include cycle between
TransformTypes and TransformInterfaces if the latter needs to include
the former because the former uses the failure util.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140978
A transformation tiling a reduction dimension of a Linalg op needs a
tile size for said dimension. When an insufficient number of dimensions
was provided, it would segfault due to out-of-bounds access to a vector.
Also fix incorrect error reporting in the structured transform op
exercising this functionality.
Reviewed By: springerm, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D141046
This customer parser/printer is similar to DynamicIndexList, but has special syntax for the case where one handle represents the entire list.
Example:
```
// Regular index list
[10, 20, %val]
// Packed handle (no square parentheses)
%val
```
Differential Revision: https://reviews.llvm.org/D138825
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
This patch introduces the initial bits to support vector masking
using the `vector.mask` operation. Vectorization changes should be
NFC for non-masked cases. We can't test masked cases directly until
we extend the Transform dialect to support masking.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137690
Now we have more convenient functions to construct silenceable errors
while emitting diagnostics, and the constructor is ambiguous as it
doesn't tell whether the logical error is silencebale or definite.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137257
This patch implements the vectorization of tensor.extract for arbitrary
tensors. It basically extends https://reviews.llvm.org/D133786 by adding
support for n-D tensors (n >= 2). This is implemented by essentially
flattening the indices.
When benchmarking the vectorized code, we have observed that it is
slower than the scalar code. That's most likely due to sub-optimal (and,
in general slow) gather loads. More work is needed to identify an
implementation and/or a representation that would lead to better code.
In the meantime, the vectorization of n-D tensors (where n >= 2) has to
be explicitly enabled. This can be done either via:
* transfer dialect's `vectorize_nd_extract` attribute,
* dedicated bool argument in the `vectorize` method from
"Vectorization.cpp".
The second option was added to control the new functionality through
means other than the transfer dialect.
Related discussion: https://github.com/iree-org/iree/issues/9198
Differential Revision: https://reviews.llvm.org/D137660
In the process, numerous insertion point issues were found and fixed.
RAII on insertion points is now used more dilligently.
Differential Revision: https://reviews.llvm.org/D139714
This adds a tile_size parameter, when it is used the tiles are
cyclically distributed onto the threads of the scf.foreach_thread op.
Differential Revision: https://reviews.llvm.org/D139474
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This op is useful for debugging/experiments and allows users to replace ops (without arguments + IsolatedFromAbove) with the given op in the region of transform op.
Differential Revision: https://reviews.llvm.org/D139026
This patch fixes:
mlir/include/mlir/ExecutionEngine/SparseTensor/Storage.h:955:20:
error: unused variable 'sz' [-Werror,-Wunused-variable]
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp:1460:2:
error: extra ';' outside of a function is incompatible with C++98
[-Werror,-Wc++98-compat-extra-semi]
This adds a transformation to tile reduction operations to partial
reduction using scf.foreachthread. This uses
PartialReductionOpInterface to create a merge operation of the partial
tiles.
Differential Revision: https://reviews.llvm.org/D137912
This commit adds automatic unpacking of Value's of type pdl::OperationType to the underlying single-result OpResult.
This allows mixing single-value, attribute and multi-value pdl::Operation tile sizes and num threads to TileToForeachThreadOp.
Differential Revision: https://reviews.llvm.org/D137896
D137413 clarified `scf_foreach_thread` thread mapping nicely. `tile_to_foreach_thread_op` is one of the op that generates `scf_foreach_thread`, however, its builders are still having integer array.
This is bug fix of potential problem.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137891
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```
It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.
The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.
```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
scf.foreach_thread (%i2, %j2) in (%c1, %c2)
{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```
Reviewed By: ftynse, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137413
Add a transformation to tile reduction ops into a parallel operation
followed by a merge operation. This is equivalent to the existing
reduction spliting transformation but using loops instead of using
higher dimensions linalg.
Differential Revision: https://reviews.llvm.org/D136586
`getDestinationOperands` was almost a duplicate of `DestinationStyleOpInterface::getOutputOperands`. Now that the interface has been moved to mlir/Interfaces, it is no longer needed.
Differential Revision: https://reviews.llvm.org/D136240
The transition to transform dialect based tests dropped several cases of
the split reduction testing. Adding them back.
Differential Revision: https://reviews.llvm.org/D136287
This class adds helper functions similar to `emitError` for the
DiagnosedSilenceableFailure class in both the silenceable and definite
failure cases. These helpers simplify the use of said class and make
tranfsorm op application code idiomatic.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D136072
This revision adds the ability to fuse tileable ops with multiple results to
the transform.fuse_into_containing_op.
Differential Revision: https://reviews.llvm.org/D135955
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785
Uses of `LinalgTilingPattern::returningMatchAndRewrite` are replaced by a top-level `tileWithLinalgTilingOptions` function that is marked obsolete and serves
as a temporary means to transition away from `LinalgTilingOptions`-based tiling.
LinalgTilingOptions supports too many options that have been orthogonalized with the use of the transform dialect.
Additionally, the revision introduces a `transform.structured.tile_to_scf_for` structured transform operation that is needed to properly tile `tensor.pad`
via the TilingInterface. Uses of `transform.structured.tile` will be deprecated and replaced by this new op.
This will achieve the deprecation of `linalg::tileLinalgOp`.
Context: https://discourse.llvm.org/t/psa-retire-tileandfuselinalgops-method/63850
In the process of transitioning, tests that were performing tile and distribute on tensors are retired: transformations should be orthogonalized better in the future.
In particular, tiling to specific loop types and tileAndDistribute behavior are not available via the transform ops.
The behavior is still available as part of the `tileWithLinalgTilingOptions` method to allow downstream clients to transition without breakages but is meant to be retired soon.
As more tests are ported to the transform dialect, it became necessary to introduce a test-transform-dialect-erase-schedule-pass to discard the transform specification
once applied so that e2e lowering and execution is possible.
Lastly, a number of redundant tests that were testing composition of patterns are retired as they are available with a better mechanism via the transform dialect.
Differential Revision: https://reviews.llvm.org/D135573
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785
In the process, also retire `tileConsumerAndFuseProducers` that is now replaced by `tileConsumerAndFuseProducerGreedilyUsingSCFForOp`.
Context: https://discourse.llvm.org/t/psa-retire-tileandfuselinalgops-method/63850
When performing this replacement, a change of behavior appeared: the older `tileConsumerAndFuseProducers` would split the parallel
and non-parallel dimensions automatically and perform a first level of tile-and-fuse on parallel dimensions only and then introduce a
second level of tiling-only on the reduction dimensions. The newer `tileConsumerAndFuseProducerGreedilyUsingSCFForOp` on the other hand
does not perform this breakdown. As a consequence, the transform specification is evolved to produce the same output.
Additionally, replace some uses of `unsigned` by `int64_t` where possible without pulling in larger interface changes (left for a future PR).
Context: https://www.youtube.com/watch?v=Puio5dly9N8
Lastly, tests that were performing tile and fuse and distribute on tensors are retired: the generated IR mixing scf.for, tensors and
distributed processor ids was racy at best ..
Differential Revision: https://reviews.llvm.org/D135559
The transform.split_handles op is useful for ensuring a statically known number of operations are
tracked by the source `handle` and to extract them into individual handles
that can be further manipulated in isolation.
In the process of making the op robust wrt to silenceable errors and the suppress mode, issues were
uncovered and fixed.
The main issue was that silenceable errors were short-circuited too early and the payloads were not
set. This resulted in suppressed silenceable errors not propagating correctly.
Fixing the issue triggered a few test failures: silenceable error returns now must properly set the results state.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D135426
This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect.
MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows:
`transform.structured.map_nested_foreach_thread_to_gpu_blocks` -> `transform.gpu.map_foreach_to_blocks`
This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate `gpu_launch`.
`transform.structured.map_nested_foreach_thread_to_gpu_threads` -> `transform.gpu.map_nested_foreach_to_threads`
This op parallelizes nested foreach_thread that are inside `gpu_launch` across GPU threads.
It doesn't add new functionality, but there are some minor refactoring of the code.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D134800
Previously, splitReduction transformation added the split parallel dimension
*before* the reduction dimension, leading to tiling for reduction. This
commit creates an option to create the parallel dimension *after* the
reduction dimension, allowing us to transform the op into vertical reduction
with SIMD parallelism.
Reviewed By: ThomasRaoux, dcaballe
Differential Revision: https://reviews.llvm.org/D134764
This revision refactors gpu block id generator lambda that is used in the transform dialect. It removes the lambda and instead uses a static function that's name generateGpuBlockIds.
It also simplifies arguments that the function takes.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134724