This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.
Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
-> f32, f32 {
%elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
%elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.addf %lhs, %rhs : f32
scf.reduce.return %res : f32
}, {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.mulf %lhs, %rhs : f32
scf.reduce.return %res : f32
}
}
```
`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
Note that at the current moment, the newly-introduced
`SparseTensorLevel` classes are far from complete, we plan to migrate
code generation related to accessing sparse tensor levels to these
classes in the near future to simplify `LoopEmitter`.
Separates actual transformation files from supporting utility files in
the transforms directory. Includes a bazel overlay fix for the build (as
well as a bit of cleanup of that file to be less verbose and more
flexible).
1. A single dimension can either be blocked (with floordiv and mod pair)
or non-blocked. Mixing them would be invalid.
2. Block size should be non-zero value.
For BSR and convolutions, we encounter
(d0, d1, d2, d3) -> ((d0 + d2) floordiv 2, (d1 + d3) floordiv 2, (d0 +
d2) mod 2, (d1 + d3) mod 2)
which crashed the current test. Note that an actual test and working
code is still to follow (since we need to fix a few other things first)
Rewrite patterns must return `success` if the IR was modified. This
commit fixes sparse tensor tests such as
`SparseTensor/sparse_fusion.mlir`,
`SparseTensor/CPU/sparse_reduce_custom.mlir`,
`SparseTensor/CPU/sparse_semiring_select.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.
The `ForallRewriter` pattern used to generate invalid IR:
```
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: error: 'scf.for' op expects region #0 to have 0 or 1 blocks
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: note: see current operation:
"scf.for"(%8, %2, %9) ({
^bb0(%arg5: index):
// ...
"scf.yield"() : () -> ()
^bb1(%10: index): // no predecessors
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
```
This commit fixes tests such as
`mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir` when verifying the
IR after each pattern application (#74270).
This adds a consistent usage with `at` for everything that refers to the
current loop nesting. This cleans up some redundant legacy code from
when we were still using topSort inside sparsifier code.
Rationale:
We no longer deal with topsort during sparsification, so that LoopId ==
LoopOrd for all methods. This first revision removes the types. A follow
up revision will simplify some other remaining constructs that deal with
loop order (e.g. at and ldx).
This centralizes all COO methods, and provides a cleaner API. Note that
the "enc" only constructor is a temporary workaround the need for COO
methods inside the "enc" only storage specifier.
This is a minor step towards moving ALL COO related tests into the
SparseTensorType class rather than
having it all over the place (with risk of becoming inconsistent). Next
revision will move ALL COO related methods into this class.
Migrates dangling convenience method into proper SparseTensorType class.
Also cleans up some details (picking right dim2lvl/lvl2dim). Removes
more dead code.
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.
The "dimension" before "level" does not really make sense Note that
renaming the actual type DimLevelType to LevelType is still TBD, since
this is an externally visible change (e.g. visible to Python API).
This change implements the correct *level* sizes set up for the direct
IR codegen fields in the sparse storage scheme. This brings libgen and
codegen together again.
This is step 3 out of 3 to make sparse_tensor.new work for BSR
This change provides access to the individual components of dim sizes
and lvl sizes after each codegenutil call.
This is step 2 out of 3 to make sparse_tensor.new work for BSR
This avoids seeing non-perm on the convert from COO to non-COO for
higher dimensional new operators (viz. reading in BSR).
This is step 1 out of 3 to make sparse_tensor.new work for BSR
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.