(1) uses the previously introduce API to reuse AffineExpr parser without codedup
(2) solves the look-ahead problem when parsing level spec
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D154254
This might simplify frontend implementation by avoiding recomputation for the same value.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D154244
We are in the progress of migrating to a much improved surface syntax for the Sparse Tensor Encoding Attribute (STEA).
You can see a preview of this in the StableHLO RFC at
https://github.com/openxla/stablehlo/blob/main/rfcs/20230210-sparsity.md
//**This design is courtesy Wren Romano.**//
This initial revision
(1) Introduces the first version of a new parser written by Wren Romano
(2) Introduces a simple "migration plan" using NEW_SYNTAX on the STEA, which will allow us to test the new parser with new examples, as well as migrate existing examples over without the need to rewrite them all
This first "drop" merely provides the entry points to parse the new syntax. The parser is still under active development. For example, we need to address the "lookahead" issue when parsing the lvl spec (viz. do we see l0 = d0 or a direct d0). Another larger task is to actually implement "affine" parsing (since the MLIR affine parser is not accessible in other parts of the tree).
EXAMPLE:
Currently, CSR looks like
#CSR = #sparse_tensor.encoding<{
lvlTypes = ["dense","compressed"],
dimToLvl = affine_map<(i,j) -> (i,j)>
}>
but you can "force" the new parser with
#CSR = #sparse_tensor.encoding<{
NEW_SYNTAX =
(d0, d1) -> (l0 = d0 : dense, l1 = d1 : compressed)
}>
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D153997
Old pattern was missing some cases (e.g. swapping the arguments)
but it also allowed too many cases (e.g. non-empty "absent" or
different arguments for add/mul). This fixes the issues.
Reviewed By: K-Wu
Differential Revision: https://reviews.llvm.org/D153487
We prevent merging a sparse-in/dense-out with dense-in
kernels because the result is usuall not sparsifiable.
Dense kernels and sparse kernels are still fused, obviously.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D153077
The tensor levels are now explicitly categorized into different `LoopCondKind` to instruct LoopEmitter generate different code for different kinds of condition (e.g., `SparseCond`, `SparseSliceCond`, `SparseAffineIdxCond`, etc)
The process of generating a while loop is now dissembled into three steps and they are dispatched to different LoopCondKind handler.
1. Generate LoopCondition (e.g., `pos <= posHi` for `SparseCond`, `slice.isNonEmpty` for `SparseAffineIdxCond`)
2. Generate LoopBody (e.g., compute the coordinates)
3. Generate ExtraChecks (e.g., `if (onSlice(crd))` for `SparseSliceCond`)
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D152464
Formerly, we accepted and/prod reductions as a standard
reduction but these change the semantics after sparsification
by not looking at implicit zeros. Therefore, we only accept
standard reductions that are insensitive to implicit vs.
explicit zeros, and leave the more complex reductions to
the sparse_tensor.reduce custom reduction implementation.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D151929
This is a major step along the way towards the new STEA design. While a great deal of this patch is simple renaming, there are several significant changes as well. I've done my best to ensure that this patch retains the previous behavior and error-conditions, even though those are at odds with the eventual intended semantics of the `dimToLvl` mapping. Since the majority of the compiler does not yet support non-permutations, I've also added explicit assertions in places that previously had implicitly assumed it was dealing with permutations.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D151505
(1) minor bug fix in copy back [always nice to run stuff ;-)]
(2) run with and without lib (even though some fall back to CPU)
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D151507
(1) keep all cuSparse ops on single stream without wait() in right order
(2) use more type precise memref types for COO
(3) use ToTensor on resulting memref (even though it folds away again)
Reviewed By: K-Wu
Differential Revision: https://reviews.llvm.org/D151404
We previously only support packing two array (values and coordinates) into COO tensors.
This patch allows packing inputs into arbitrary sparse tensor format.
It also deletes the "implicit" data canonicalization performed inside sparse compiler,
but instead requires users to canonicalize the data before passing it to the sparse compiler.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150916
This is a followup to D150330, split out because it's not purely mechanical.
Depends On D150330
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150409
The GPU tests weren't updated when rebasing D150330, so this patch fixes that.
Reviewed By: anlunx
Differential Revision: https://reviews.llvm.org/D150822
This commit is part of the migration of towards the new STEA syntax/design. In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
* `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
* `Merger::{getDimLevelType => getLvlType}` (for consistency)
* `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
* the STEA parser and printer
* the C and Python bindings
* PyTACO
However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150330
The sparse compiler now has two prototype strategies for GPU acceleration:
* CUDA codegen: this converts sparsified code to CUDA threads
* CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls
This revision introduces the first steps required for the second approach.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D150170
The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D148682
`compressed(hi)` is similar to `compressed`, but instead of reusing the previous position high as the current position low, it uses a pair of positions for each sparse index.
The patch only introduces the definition (syntax) but does not provide codegen implementation.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D148664
This implements a proof-of-concept GPU code generator
to the sparse compiler pipeline, currently only capable
of generating CUDA threads for outermost parallel loops.
The objective, obviously, is to grow this concept
to a full blown GPU code generator, capable of the
right combinaton of code generation as well as exploiting
idiomatic kernels or vector specific libraries (think cuSparse).
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D147483
Previously, the genCast function generates arith.trunci for converting f32 to
i32. Fix the function to use mlir::convertScalarToDtype to correctly handle
conversion cases beyond index casting.
Add a test case for codegen the sparse_tensor.convert op.
Reviewed By: aartbik, Peiming, wrengr
Differential Revision: https://reviews.llvm.org/D147272