Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.
Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.
As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
Allows the barrier elimination code to be run from C++ as well. The code
from transforms dialect is copied as-is, the pass and populate functions
have beed added at the end.
Co-authored-by: Eric Eaton <eric@nod-labs.com>
This revision untangles a few more conversion pieces and allows rewriting
the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass
in a declarative fashion that provides a much better understanding and control.
Differential Revision: https://reviews.llvm.org/D157617
This revision significantly simplifies the specification and implementation of mapping loops to GPU ids.
Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes:
1. a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication
may occur on a per-dimension 3-D sub-rectangle basis.
2. a n-D linearized mode, on which predication may only occur on a linear basis.
In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages.
The `warp_dims` attribute was deemed confusing and is removed from the transform in favor of better size inference.
Differential Revision: https://reviews.llvm.org/D155941
Adds `apply_patterns.gpu.unroll_vectors_subgroup_mma` which allows
specifying a native MMA shape of `m`, `n`, and `k` to unroll to,
greedily unrolling the inner most dimension of contractions and other
vector operations based on expected usage.
Differential Revision: https://reviews.llvm.org/D156079
GPU code generation, and specifically the shared memory copy insertion
may introduce spurious barriers guarding read-after-read dependencies or
read-after-write on non-aliasing data, which degrades performance due to
unnecessary synchronization. Add a pattern and transform op that removes
such barriers by analyzing memory effects that the barrier actually
guards that are not also guarded by other barriers. The code is adapted
from the Polygeist incubator project.
Co-authored-by: William Moses <gh@wsmoses.com>
Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>
Reviewed By: nicolasvasilache, wsmoses
Differential Revision: https://reviews.llvm.org/D154720
The old code used to materialize constants as ops, immediately folded them into the resulting affine map and then deleted the constant ops again. Instead, directly fold the attributes into the affine map. Furthermore, all helpers accept `OpFoldResult` instead of `Value` now. This makes the code at call sites more efficient, because it is no longer necessary to materialize a `Value`, just to be able to use these helper functions.
Note: The API has changed (accepts OpFoldResult instead of Value), otherwise this change is NFC.
Differential Revision: https://reviews.llvm.org/D153324
All `apply` functions now have a `TransformRewriter &` parameter. This rewriter should be used to modify the IR. It has a `TrackingListener` attached and updates the internal handle-payload mappings based on rewrites.
Implementations no longer need to create their own `TrackingListener` and `IRRewriter`. Error checking is integrated into `applyTransform`. Tracking listener errors are reported only for ops with the `ReportTrackingListenerFailuresOpTrait` trait attached, allowing for a gradual migration. Furthermore, errors can be silenced with an op attribute.
Additional API will be added to `TransformRewriter` in subsequent revisions. This revision just adds an "empty" `TransformRewriter` class and updates all `apply` implementations.
Differential Revision: https://reviews.llvm.org/D152427
Update operations in Transform dialect extensions defined in the Affine,
GPU, MemRef and Tensor dialects to use the more generic
`TransformHandleTypeInterface` type constraint instead of hardcoding
`PDL_Operation`. See
https://discourse.llvm.org/t/rfc-type-system-for-the-transform-dialect/65702
for motivation.
Remove the dependency on PDLDialect from these extensions.
Update tests to use `!transform.any_op` instead of `!pdl.operation`.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150781
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
Currently conversions to interfaces may happen implicitly (e.g.
`Attribute -> TypedAttr`), failing a runtime assert if the interface
isn't actually implemented. This change marks the `Interface(ValueT)`
constructor as explicit so that a cast is required.
Where it was straightforward to I adjusted code to not require casts,
otherwise I just made them explicit.
Depends on D148491, D148492
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D148493
c59465e120 introduced mapping to warps and
linear GPU ids.
In the implementation, the delinearization basis is reversed from [x, y, z]
to [z, y x] order to properly compute the strides and allow delinearization.
Prior to this commit, we forgot to reverse it back to [x, y, z] order
before materializing the indices.
Fix this oversight.
This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified.
`warp_dims` is currently specified along with `block_dims` as a transform attribute.
Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads.
An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme.
Various implementation and transform op semantics cleanups are also applied.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D146130
This allows user to give both the thread ids and dimension of the threads we want to distribute on.
This means we can use it to distribute on warps as well.
Reviewed By: harsh
Differential Revision: https://reviews.llvm.org/D143950
Change map_nested_foreach_to_threads to ignore foreach_thread not
mapping to threads, this will allow us to call
mapNestedForeachToThreadsImpl with different set of ids to lower
multiple levels. Also adds warpIds attributes.
Differential Revision: https://reviews.llvm.org/D143298
Simplify the handling of silenceable failures in the transform dialect.
Previously, the logic of `TransformEachOpTrait` required that
`applyToEach` returned a list of null pointers when a silenceable
failure was emitted. This was not done consistently and also crept into
ops without this trait although they did not require it. Handle this
case earlier in the interpreter and homogeneously associated preivously
unset transform dialect values (both handles and parameters) with empty
lists of the matching kind. Ignore the results of `applyToEach` for the
targets for which it produced a silenceable failure. As a result, one
never needs to set results to lists containing nulls. Furthermore, the
objects associated with transform dialect values must never be null.
Depends On D140980
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D141305
The patch adds operations to `BlockAndValueMapping` and renames it to `IRMapping`. When operations are cloned, old operations are mapped to the cloned operations. This allows mapping from an operation to a cloned operation. Example:
```
Operation *opWithRegion = ...
Operation *opInsideRegion = &opWithRegion->front().front();
IRMapping map
Operation *newOpWithRegion = opWithRegion->clone(map);
Operation *newOpInsideRegion = map.lookupOrNull(opInsideRegion);
```
Migration instructions:
All includes to `mlir/IR/BlockAndValueMapping.h` should be replaced with `mlir/IR/IRMapping.h`. All uses of `BlockAndValueMapping` need to be renamed to `IRMapping`.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D139665
Adapt the implementation of TransformEachOpTrait to the existence of
parameter values recently introduced into the transform dialect. In
particular, allow `applyToOne` hooks to return a list containing a mix
of `Operation *` that will be associated with handles and `Attribute`
that will be associated with parameter values by the trait
implementation of the transform interface's `apply` method.
Disentangle the "transposition" of the list of per-payload op partial
results to decrease its overall complexity and detemplatize the code
that doesn't really need templates. This removes the poorly documented
special handling for single-result ops with TransformEachOpTrait that
could have assigned null pointer values to handles.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140979
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
Now we have more convenient functions to construct silenceable errors
while emitting diagnostics, and the constructor is ambiguous as it
doesn't tell whether the logical error is silencebale or definite.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D137257
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
In a nested loop nest, it is not feasible to map different loops to the same processing unit; for an example, check the code below. This modification includes a check in this circumstance.
```
scf.foreach_thread (%i, %j) in (%c32, %c32) {...}
{ mapping = [#gpu.thread<x>, #gpu.thread<x>] }
```
Note: It also deletes a test because it is not possible to reproduce this error.
Depends on D138020
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D138032
Finding uses of a value and replacing them with a new one is a common method. I have not seen an safe and easy shortcut that does that. This revision attempts to address that by intoroducing `replaceUsesOfWith` to `RewriterBase`.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D138110
The given test fails due to error below.
The following error is why the test is failing. One `memref.store` and two `memref.load` are consumers of the loop index for which I do RAUW. `memref.store` is first in the list. If I RAUW on this the loop of `llvm::make early inc range(threadIdx.getUsers())` does not return two `memref.load` as users. They remain unchanged. I'm not really certain why.
This change applies RAUW after collecting the users. If a better solution exists, I would be happy to implement it.
```
mlir-opt: ...llvm-project/mlir/include/mlir/IR/UseDefLists.h:175: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D138029
`DeviceMappingAttrInterface` is implemented as unifiying mechanism for thread mapping. A code generator could use any attribute that implements this interface to lower `scf.foreach_thread` to device specific code. It is allowed to choose its own mapping and interpretation.
Currently, GPU transform dialect supports only `GPUThreadMapping` and `GPUBlockMapping`; however, other mappings should to be supported as well. This change addresses this issue. It decouples gpu transform dialect from the `GPUThreadMapping` and `GPUBlockMapping`. Now, they can work any other mapping.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D138020