This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.
Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
-> f32, f32 {
%elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
%elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.addf %lhs, %rhs : f32
scf.reduce.return %res : f32
}, {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.mulf %lhs, %rhs : f32
scf.reduce.return %res : f32
}
}
```
`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
1. A single dimension can either be blocked (with floordiv and mod pair)
or non-blocked. Mixing them would be invalid.
2. Block size should be non-zero value.
This change implements the correct *level* sizes set up for the direct
IR codegen fields in the sparse storage scheme. This brings libgen and
codegen together again.
This is step 3 out of 3 to make sparse_tensor.new work for BSR
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.
When the Powers That Be decided that the name "sparse compiler" should
be changed to "sparsifier", we negected to change some of the comments
in the code; this pull request completes the name change.
Rather than extending sparsifier codegen with higher order
non-permutations, we follow the path of rewriting linalg generic ops
into higher order operations. That way, code generation will simply work
out of the box. This is a very first proof-of-concept rewriting of that
idea.
This commit changes the SparseTensor LLVM dialect lowering from using
`llvm.ptr<i8>` to `llvm.ptr`. This change ensures that the lowering now
properly relies on opaque pointers, instead of working with already type
erased i8 pointers.
This value should always be a plain contant or something invariant
computed outside the surrounding linalg operation, since there is no
co-iteration defined on anything done in this branch.
Fixes:
https://github.com/llvm/llvm-project/issues/69395
Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.
Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.
Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
Updates:
1. Verification of block sparsity.
2. Verification of singleton level type can only follow compressed or
loose_compressed levels. And all level types after singleton should be
singleton.
3. Added getBlockSize function.
4. Added an invalid encoding test for an incorrect lvlToDim map that
user provides.
Updates:
1. Infer lvlToDim from dimToLvl
2. Add more tests for block sparsity
3. Finish TODOs related to lvlToDim, including adding lvlToDim to python
binding
Verification of lvlToDim that user provides will be implemented in the
next PR.
Making the materialize-from-reader method part of the Swiss army knife
suite again removes a lot of redundant boiler plate code and unifies the
parameter setup into a single centralized utility. Furthermore, we now
have minimized the number of entry points into the library that need a
non-permutation map setup, simplifying what comes next
This completely centralizes all set up related to dim2lvl and lvl2dim
for the runtime library (and even parts of direct IR codegen) into one
place! And all comptatible with the MapRef data structure that should be
used in all remaining clients of dim2lvl and lvl2dim.
NOTE: the convert_x2y.mlir tests were becoming too overloaded
so I decided to bring them back to the basics; if e.g.
more coverage of the foreach is required, they should
go into isolated smalle tests
This revision introduces a MapRef, which will support a future
generalization beyond permutations (e.g. block sparsity). This revision
also unifies the conversion/codegen paths for the sparse_tensor.new
operation from file (eg. the readers). Note that more unification is
planned as well as general affine dim2lvl and lvl2dim (all marked with
TODOs).