This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`
The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.
Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
Rename interface functions as follows:
* `hasTensorSemantics` -> `hasPureTensorSemantics`
* `hasBufferSemantics` -> `hasPureBufferSemantics`
These two functions return "true" if the op has tensor/buffer operands
but not buffer/tensor operands.
Also drop the "ranked" part from the interface, i.e., do not distinguish
between ranked/unranked types.
The new function names describe the functions more accurately. They also
align their semantics with the notion of "tensor semantics" with the
bufferization framework. (An op is supposed to be bufferized if it has
tensor operands, and we don't care if it also has memref operands.)
This change is in preparation of #75273, which adds
`BufferizableOpInterface::hasTensorSemantics`. By renaming the functions
in the `DestinationStyleOpInterface`, we can avoid name clashes between
the two interfaces.
Rationale:
Since this mini-pipeline may be used in alternative pipelines (viz.
different from the default "sparsifier" pipeline) where unknown ops are
handled by alternative bufferization methods that are downstream of this
mini-pipeline, we allow unknown ops by default (failure to bufferize is
eventually apparent by failing to convert to LLVM IR).
This is part of enabling e2e testing for TORCH-MLIR tests using a
sparsifier backend
This removes the temporary DENSE24 attribute and replaces it with proper
recognition of dense to 24 conversion. The compressionh will be
performed on the device prior to performing the matrix mult. Note that
we no longer need to start with the linalg version, we can lift this to
the proper named linalg op. Also renames some files into more consistent
names.
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.
Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
-> f32, f32 {
%elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
%elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.addf %lhs, %rhs : f32
scf.reduce.return %res : f32
}, {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.mulf %lhs, %rhs : f32
scf.reduce.return %res : f32
}
}
```
`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
Note that at the current moment, the newly-introduced
`SparseTensorLevel` classes are far from complete, we plan to migrate
code generation related to accessing sparse tensor levels to these
classes in the near future to simplify `LoopEmitter`.
Separates actual transformation files from supporting utility files in
the transforms directory. Includes a bazel overlay fix for the build (as
well as a bit of cleanup of that file to be less verbose and more
flexible).
Rewrite patterns must return `success` if the IR was modified. This
commit fixes sparse tensor tests such as
`SparseTensor/sparse_fusion.mlir`,
`SparseTensor/CPU/sparse_reduce_custom.mlir`,
`SparseTensor/CPU/sparse_semiring_select.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.
The `ForallRewriter` pattern used to generate invalid IR:
```
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: error: 'scf.for' op expects region #0 to have 0 or 1 blocks
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: note: see current operation:
"scf.for"(%8, %2, %9) ({
^bb0(%arg5: index):
// ...
"scf.yield"() : () -> ()
^bb1(%10: index): // no predecessors
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
```
This commit fixes tests such as
`mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir` when verifying the
IR after each pattern application (#74270).
This adds a consistent usage with `at` for everything that refers to the
current loop nesting. This cleans up some redundant legacy code from
when we were still using topSort inside sparsifier code.
Rationale:
We no longer deal with topsort during sparsification, so that LoopId ==
LoopOrd for all methods. This first revision removes the types. A follow
up revision will simplify some other remaining constructs that deal with
loop order (e.g. at and ldx).
This centralizes all COO methods, and provides a cleaner API. Note that
the "enc" only constructor is a temporary workaround the need for COO
methods inside the "enc" only storage specifier.
Migrates dangling convenience method into proper SparseTensorType class.
Also cleans up some details (picking right dim2lvl/lvl2dim). Removes
more dead code.
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.
The "dimension" before "level" does not really make sense Note that
renaming the actual type DimLevelType to LevelType is still TBD, since
this is an externally visible change (e.g. visible to Python API).
This change implements the correct *level* sizes set up for the direct
IR codegen fields in the sparse storage scheme. This brings libgen and
codegen together again.
This is step 3 out of 3 to make sparse_tensor.new work for BSR
This change provides access to the individual components of dim sizes
and lvl sizes after each codegenutil call.
This is step 2 out of 3 to make sparse_tensor.new work for BSR
This avoids seeing non-perm on the convert from COO to non-COO for
higher dimensional new operators (viz. reading in BSR).
This is step 1 out of 3 to make sparse_tensor.new work for BSR
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.
Note that the (dis)assemble operations still make some simplfying
assumptions (e.g. trailing 2-D COO in AoS format) but now at least both
the direct IR and support library path behave exactly the same.
Generalizing the ops is still TBD.