This PR adds support for `memref.collapse_shape` to sub-byte type emulation. The `memref.collapse_shape` becomes a no-opt given that we are flattening the memref as part of the emulation (i.e., we are collapsing all the dimensions).
This PR adds a new pattern to the set of patterns used to resolve the offset, sizes and
stride of a memref. Similar to `ExtractStridedMetadataOpSubviewFolder`, the new
pattern resolves strided_metadata(collapse_shape) directly, without introduce a
reshape_cast op.
This commit implements runtime verification for LinalgStructuredOps
using the existing `RuntimeVerifiableOpInterface`. The verification
checks that the runtime sizes of the operands match the runtime sizes
inferred by composing the loop ranges with the op's indexing maps.
This commit implements runtime verification for LinalgStructuredOps
using the existing `RuntimeVerifiableOpInterface`. The verification
checks that the runtime sizes of the operands match the runtime sizes
inferred by composing the loop ranges with the op's indexing maps.
This commit generalizes and cleans up the `ValueBoundsConstraintSet`
API. The API used to provide function overloads for comparing/computing
bounds of:
- index-typed SSA value
- dimension of shaped value
- affine map + operands
This commit removes all overloads. There is now a single entry point for
each `compare` variant and each `computeBound` variant. These functions
now take a `Variable`, which is internally represented as an affine map
and map operands.
This commit also adds support for computing bounds for an affine map +
operands. There was previously no public API for that.
This commit adds the `BufferViewFlowOpInterface` to the bufferization
dialect. This interface can be implemented by ops that operate on
buffers to indicate that a buffer op result and/or region entry block
argument may be the same buffer as a buffer operand (or a view thereof).
This interface is queried by the `BufferViewFlowAnalysis`.
The new interface has two interface methods:
* `populateDependencies`: Implementations use the provided callback to
declare dependencies between operands and op results/region entry block
arguments. E.g., for `%r = arith.select %c, %m1, %m2 : memref<5xf32>`,
the interface implementation should declare two dependencies: %m1 -> %r
and %m2 -> %r.
* `mayBeTerminalBuffer`: An SSA value is a terminal buffer if the buffer
view flow analysis stops at the specified value. E.g., because the value
is a newly allocated buffer or because no further information is
available about the origin of the buffer.
Ops that implement the `RegionBranchOpInterface` or `BranchOpInterface`
do not have to implement the `BufferViewFlowOpInterface`. The buffer
dependencies can be inferred from those two interfaces.
This commit makes the `BufferViewFlowAnalysis` more accurate. For
unknown ops, it conservatively used to declare all combinations of
operands and op results/region entry block arguments as dependencies
(false positives). This is no longer the case. While the analysis is
still a "maybe" analysis with false positives (e.g., when analyzing ops
such as `arith.select` or `scf.if` where the taken branch is not known
at compile time), results and region entry block arguments of unknown
ops are now marked as terminal buffers.
This commit addresses a TODO in `BufferViewFlowAnalysis.cpp`:
```
// TODO: We should have an op interface instead of a hard-coded list of
// interfaces/ops.
```
It is no longer needed to hard-code ops.
The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases.
Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit.
Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
* Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering.
Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
This change adds (runtime) bounds checks for `memref` ops using the
existing `RuntimeVerifiableOpInterface`. For `memref.load` and
`memref.store`, we check that the indices are in-bounds of the memref's
index space. For `memref.reinterpret_cast` and `memref.subview` we check
that the resulting address space is in-bounds of the input memref's
address space.
This adds a conversion for narrow type emulation of memref.store ops.
The conversion replaces the memref.store with two memref.atomic_rmw ops.
Atomics are used to prevent race conditions on same-byte accesses, in
the event that two threads are storing into the same byte.
Fixes https://github.com/openxla/iree/issues/15370
The contents of a mask on a masked transfer are unaffected by the
particular region of memory being read/stored to, so just forward the
mask in subview folding patterns.
`bufferization.materialize_in_destination` should be used instead. Both
ops bufferize to a memcpy. This change also conceptually cleans up the
memref dialect a bit: the memref dialect no longer contains ops that
operate on tensor values.
/llvm-project/mlir/lib/Dialect/MemRef/Transforms/ResolveShapedTypeResultDims.cpp:98:19: error: comparison of integers of different signs: 'value_type' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
if (*dimIndex >= reifiedResultShapes[resultNumber].size())
~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Do not crash when the input IR is invalid, i.e., when the index of the
dimension operand of a `tensor.dim`/`memref.dim` is out-of-bounds. This
fixes#70180.
The `resolveSourceIndicesCollapseShape` method is used to compute
indices into the source `MemRef` of a `CollapseShapeOp` from the
collapsed indices. This method didn't check for dynamic sizes of the
source shape which led to a crash.
Fix https://github.com/llvm/llvm-project/issues/68483
This patch fixes strided memref offset calculation for emulating narrow
types.
As a side effect, this patch also adds support for a 1-D subviews with
static sizes, static offsets and strides of 1 for testing. Emulate
narrow types pass was not tested for strided memrefs before this patch.
…st)`
`expand-strided-metadata` was missing a pattern to get rid of
`memref.cast`.
The pattern is straight foward:
Produce a new `extract_strided_metadata` with the source of the cast and
fold the static information (sizes, strides, offset) along the way.
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.
`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.
Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
This patch refactors the `AtomicRMWOpConverter` class to use
the dedicated operations from Arith dialect instead of using
`cmpf` + `select` pattern.
Also, a test for `minimumf` kind of `atomic_rmw` has been added.
Introduce a simple conversion of a memref.alloc/dealloc pair into an
alloca in the same scope. Expose it as a transform op and a pattern.
Allocas typically lower to stack allocations as opposed to alloc/dealloc
that lower to significantly more expensive malloc/free calls. In
addition, this can be combined with allocation hoisting from loops to
further improve performance.
There are two motivations for this change:
1. It considerably simplifies adding support for the realloc operation to the
new buffer deallocation pass by lowering the realloc such that no
deallocation operation is inserted and the deallocation pass itself can
insert that dealloc
2. The lowering is expressed on a higher level and thus easier to understand,
and the lowerings of the memref operations it is composed of don't have to
be duplicated in the MemRefToLLVM lowering (also see discussion in
https://reviews.llvm.org/D133424)
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D159430
When handling sub-byte emulation, the sizes of the converted `memref`s
also need to be updated (this was not done in the current
implementation). This adds the additional complexity of having to
linearize the `memref`s as well. Consider a `memref<3x3xi4>` where the
`i4` elements are packed. This has a overall size of 5 bytes (rounded
up to number of bytes). This can only be represented by a
`memref<5xi8>`. A `memref<3x2xi8>` would imply an implicit padding of
4 bits at the end of each row. So incorporate linearization into the
sub-byte load-store emulation.
This patch also updates some of the utility functions to make better
use of statically available information using `OpFoldResult` and
`makeComposedFoldedAffineApplyOps`.
Reviewed By: hanchung, yzhang93
Differential Revision: https://reviews.llvm.org/D158125
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.
This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.
If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.
Differential Revision: https://reviews.llvm.org/D157957
The pass uses `computeSuffixProduct` method which only allows static
shapes. This revision adds an early-exit for dynamic cases to avoid
crash.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D157668
It also unifies the computation of StridedLayoutAttr. If the stride is
static known value, we can just use it.
Differential Revision: https://reviews.llvm.org/D155017
* Use `create` instead of `createOrFold` for constant ops. Constants cannot be folded any further.
* Use `create` instead of `createOrFold` for ops that do not have a folder.
* Use C++ op builders that take an `int` instead of creating a `ConstantIndexOp`.
* Create `tensor::DimOp` instead of `linalg::createOrFoldDimOp` when it is certain that the operand is a tensor.
Differential Revision: https://reviews.llvm.org/D154196
Memref normalization fails to recognize the non-zero symbols used in the memref type itself with strided, offset information. It causes the crash with the type like `memref<128x512xf32, strided<[?, ?], offset: ?>>`. The original issue is here. https://github.com/llvm/llvm-project/issues/61345
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D150250
This patch adds support for narrow bitwidth storage emulation. The goal is to support sub-byte type
codegen for LLVM CPU. Specifically, a type converter is added to convert memref of narrow bitwidth
(e.g., i4) into supported wider bitwidth (e.g., i8). Another focus of this patch is to populate the
pattern for int4 memref.load. memref.store pattern should be added in a seperate patch.
Reviewed By: hanchung, mravishankar
Differential Revision: https://reviews.llvm.org/D151519
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```
Differential Revision: https://reviews.llvm.org/D151542
Folding mechanism does not recognize `ldmatrix` op. This work helps pass to recognize the op and fold the memref aliases.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D151412
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123