Commit Graph

353 Commits

Author SHA1 Message Date
Peiming Liu
269c82d389 [mlir][sparse] introduce new 2:4 block sparsity level type.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D155128
2023-07-12 23:33:53 +00:00
K-Wu
e37fc3cc39 [mlir][sparse][gpu] Impl 2:4 SpMM rewrite for linalg op w/ DENSE24 attr
Differential Revision: https://reviews.llvm.org/D154772
2023-07-10 22:36:57 +00:00
Peiming Liu
fc5d8fce7d [mlir][sparse] support dual sparse convolution.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152601
2023-07-10 16:49:32 +00:00
Aart Bik
03125e6894 [mlir][sparse][gpu] fix missing dealloc
This dealloc was incorrectly removed in
https://reviews.llvm.org/D153173

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D154564
2023-07-06 09:48:19 -07:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Aart Bik
b939c015a4 [mlir][sparse] add affine parsing to new surface syntax for STEA
(1) uses the previously introduce API to reuse AffineExpr parser without codedup
(2) solves the look-ahead problem when parsing level spec

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D154254
2023-06-30 14:48:23 -07:00
Peiming Liu
a63d6a0014 [mlir][sparse] make UnpackOp return the actual filled length of unpacked memory
This might simplify frontend implementation by avoiding recomputation for the same value.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D154244
2023-06-30 21:35:15 +00:00
Aart Bik
6b88c852b6 [mlir][sparse] Start migration to new surface syntax for STEA
We are in the progress of migrating to a much improved surface syntax for the Sparse Tensor Encoding Attribute (STEA).

You can see a preview of this in the StableHLO RFC at

 https://github.com/openxla/stablehlo/blob/main/rfcs/20230210-sparsity.md

//**This design is courtesy Wren Romano.**//

This initial revision
(1) Introduces the first version of a new parser written by Wren Romano
(2) Introduces a simple "migration plan" using NEW_SYNTAX on the STEA, which will allow us to test the new parser with new examples, as well as migrate existing examples over without the need to rewrite them all

This first "drop" merely provides the entry points to parse the new syntax. The parser is still under active development. For example, we need to address the "lookahead" issue when parsing the lvl spec (viz. do we see l0 = d0 or a direct d0). Another larger task is to actually implement "affine" parsing (since the MLIR affine parser is not accessible in other parts of the tree).

EXAMPLE:

Currently, CSR looks like

  #CSR = #sparse_tensor.encoding<{
    lvlTypes = ["dense","compressed"],
    dimToLvl = affine_map<(i,j) -> (i,j)>
  }>

but you can "force" the new parser with

  #CSR = #sparse_tensor.encoding<{
    NEW_SYNTAX =
    (d0, d1) -> (l0 = d0 : dense, l1 = d1 : compressed)
  }>

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D153997
2023-06-29 11:32:07 -07:00
Peiming Liu
df11a2b41a [mlir][sparse] admit un-sparsifiable operations if all its operands are loaded from dense input
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153998
2023-06-28 21:27:50 +00:00
Aart Bik
f14c8eb595 [mlir][sparse][gpu] refine SDDMM pattern for cuSPARSE
Old pattern was missing some cases (e.g. swapping the arguments)
but it also allowed too many cases (e.g. non-empty "absent" or
different arguments for add/mul). This fixes the issues.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D153487
2023-06-21 18:31:55 -07:00
Peiming Liu
e7df82816b [mlir][sparse] rewrite arith::SelectOp to semiring operations to sparsify it.
Reviewed By: aartbik, K-Wu

Differential Revision: https://reviews.llvm.org/D153397
2023-06-21 21:22:18 +00:00
Aart Bik
5c03c056e0 [mlir][sparse] enhance element-wise fusion heuristics
We prevent merging a sparse-in/dense-out with dense-in
kernels because the result is usuall not sparsifiable.
Dense kernels and sparse kernels are still fused, obviously.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D153077
2023-06-15 16:48:40 -07:00
Kun Wu
9167dd46ba [mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151582
2023-06-15 23:48:11 +00:00
Aart Bik
65bfd5cb25 [mlir][sparse] proper in-place SDDMM with spy function
This specific operation matches the cuSPARSE SDDMM semantics exactly.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D152969
2023-06-15 13:59:38 -07:00
Peiming Liu
faf7cd97d0 [mlir][sparse] merger extension to support sparsifying arith::CmpI/CmpF operation
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152761
2023-06-15 17:26:50 +00:00
Peiming Liu
83b7f018fd [mlir][sparse] fix crashes when the tensor that defines the loop bound can not be found
Reviewed By: aartbik, K-Wu

Differential Revision: https://reviews.llvm.org/D152877
2023-06-14 20:27:50 +00:00
Peiming Liu
fd68d36109 [mlir][sparse] unifying enterLoopOverTensorAtLvl and enterCoIterationOverTensorsAtLvls
The tensor levels are now explicitly categorized into different `LoopCondKind` to instruct LoopEmitter generate different code for different kinds of condition (e.g., `SparseCond`, `SparseSliceCond`, `SparseAffineIdxCond`, etc)

The process of generating a while loop is now dissembled into three steps and they are dispatched to different LoopCondKind handler.
1. Generate LoopCondition (e.g., `pos <= posHi` for `SparseCond`, `slice.isNonEmpty` for `SparseAffineIdxCond`)
2. Generate LoopBody (e.g., compute the coordinates)
3. Generate ExtraChecks (e.g., `if (onSlice(crd))` for `SparseSliceCond`)

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152464
2023-06-14 20:03:10 +00:00
Aart Bik
debdf7e0ff [mlir][sparse] refine single condition set up for semi-ring ops
Reviewed By: Peiming, K-Wu

Differential Revision: https://reviews.llvm.org/D152874
2023-06-14 09:23:09 -07:00
Peiming Liu
0258a53521 Brings back "[mlir][sparse] moving inbound check for slice driven loop into before block of the WhileOp"
This reverts commit 07b927902d.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152566
2023-06-09 17:45:46 +00:00
Peiming Liu
07b927902d Revert "[mlir][sparse] moving inbound check for slice driven loop into before block of the WhileOp"
This reverts commit 853d704fd0.

Differential Revision: https://reviews.llvm.org/D152562
2023-06-09 17:21:40 +00:00
Kun Wu
97f4c22b3a [mlir][sparse][gpu] unify dnmat and dnvec handle and ops
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152465
2023-06-09 17:16:48 +00:00
Peiming Liu
853d704fd0 [mlir][sparse] moving inbound check for slice driven loop into before block of the WhileOp
This patch changes the while loop generated for iterating over a fully reduced sparse level with affine index expression.
Before:
```
cont = true
while (cont) {
  if (inBound()) {
    ....
    cont = true;
  } else {
    cont = false;
  }
}
```
After:
```
while(inBound()) {
  ....
}
```

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152463
2023-06-09 17:03:15 +00:00
Kun Wu
8ed59c53de [mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151775
2023-06-06 23:13:21 +00:00
Aart Bik
378f1885e3 [mlir][sparse] enhance sparse reduction support
Formerly, we accepted and/prod reductions as a standard
reduction but these change the semantics after sparsification
by not looking at implicit zeros. Therefore, we only accept
standard reductions that are insensitive to implicit vs.
explicit zeros, and leave the more complex reductions to
the sparse_tensor.reduce custom reduction implementation.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151929
2023-06-01 16:30:21 -07:00
wren romano
540d5e0ce6 [mlir][sparse] Updating STEA parser/printer to use the name "dimSlices"
Depends On D151505

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151513
2023-05-30 15:50:07 -07:00
wren romano
76647fce13 [mlir][sparse] Combining dimOrdering+higherOrdering fields into dimToLvl
This is a major step along the way towards the new STEA design.  While a great deal of this patch is simple renaming, there are several significant changes as well.  I've done my best to ensure that this patch retains the previous behavior and error-conditions, even though those are at odds with the eventual intended semantics of the `dimToLvl` mapping.  Since the majority of the compiler does not yet support non-permutations, I've also added explicit assertions in places that previously had implicitly assumed it was dealing with permutations.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151505
2023-05-30 15:19:50 -07:00
Aart Bik
22caafc9f3 [mlir][sparse][gpu] end to end test for matmul
(1) minor bug fix in copy back [always nice to run stuff ;-)]
(2) run with and without lib (even though some fall back to CPU)

Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D151507
2023-05-25 16:10:22 -07:00
Peiming Liu
f7b8b005ff [mlir][sparse] fix bugs when computing the memory size when lowering pack op.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151481
2023-05-25 19:19:52 +00:00
Aart Bik
bcb698bfdc [mlir][sparse][gpu] various cuSparse refinements
(1) keep all cuSparse ops on single stream without wait() in right order
(2) use more type precise memref types for COO
(3) use ToTensor on resulting memref (even though it folds away again)

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D151404
2023-05-24 22:32:52 -07:00
Peiming Liu
b2e6b73544 [mlir][sparse] extend unpack operation to unpack arbitrary encodings.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151174
2023-05-23 22:34:01 +00:00
Aart Bik
b75d6a40f1 [mlir][sparse][gpu] recognize SpMM cuSparse during sparsification
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D150715
2023-05-19 17:22:59 -07:00
Peiming Liu
de56088866 [mlir][sparse] Support packing external data into arbitrary sparse tensor encoding.
We previously only support packing two array (values and coordinates) into COO tensors.
This patch allows packing inputs into arbitrary sparse tensor format.

It also deletes the "implicit" data canonicalization performed inside sparse compiler,
but instead requires users to canonicalize the data before passing it to the sparse compiler.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D150916
2023-05-19 17:41:49 +00:00
wren romano
d9a2f89bee [mlir][sparse] Adjusting error message wording, to better match new field names
This is a followup to D150330, split out because it's not purely mechanical.

Depends On D150330

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D150409
2023-05-18 15:02:08 -07:00
wren romano
7f5fb90bbb [mlir][sparse] Fixing GPU tests (followup to D150330)
The GPU tests weren't updated when rebasing D150330, so this patch fixes that.

Reviewed By: anlunx

Differential Revision: https://reviews.llvm.org/D150822
2023-05-17 15:29:54 -07:00
wren romano
a0615d020a [mlir][sparse] Renaming the STEA field dimLevelType to lvlTypes
This commit is part of the migration of towards the new STEA syntax/design.  In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
  * `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
  * `Merger::{getDimLevelType => getLvlType}` (for consistency)
  * `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
  * the STEA parser and printer
  * the C and Python bindings
  * PyTACO

However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D150330
2023-05-17 14:24:09 -07:00
Anlun Xu
6116ca67ab [mlir][sparse] Add sparse rewriting rules for tensor::ReshapeOp
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D149564
2023-05-16 14:56:33 -07:00
Aart Bik
ee42e23614 [mlir][sparse][gpu] first implementation of the GPU libgen approach
The sparse compiler now has two prototype strategies for GPU acceleration:

* CUDA codegen: this converts sparsified code to CUDA threads
* CUDA libgen: this converts pre-sparsified code to cuSPARSE library calls

This revision introduces the first steps required for the second approach.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D150170
2023-05-15 08:49:38 -07:00
Aart Bik
9a018a7b48 [mlir][sparse] relax constraints on tensor.cast with pre-rewriting
Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D149489
2023-05-01 16:03:44 -07:00
Peiming Liu
d4db528938 [mlir][sparse] extend unpack operation to support unpacking a batched COO type
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D149103
2023-05-01 18:17:29 +00:00
Aart Bik
86888e420c [mlir][sparse][gpu] generate proper memcpy in/out host and device
The host registration is a convenient way to get CUDA kernels
running, but it may be slow and does not work for all buffer
(like global constants). This revision uses the proper alloc
copy dealloc chains for buffers, using asynchronous chains
to increase overlap. The host registration mechanism is
kept under a flag for the output, just for experimentation
purposes while this project ramps up.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D148682
2023-04-21 09:30:42 -07:00
Peiming Liu
a7cfcc686b [mlir][sparse] fix crash when generating coiteration loop with compressed-hi DLT.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D148842
2023-04-20 21:15:49 +00:00
Peiming Liu
fd2211d84a use heap memory for position buffer allocated for PackOp.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D148818
2023-04-20 20:26:01 +00:00
Peiming Liu
7864d736cf [mlir][sparse] extend pack operation to support packing a batched COO type
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D148670
2023-04-20 01:35:30 +00:00
Peiming Liu
abd66d918a [mlir][sparse] support iteration over compressed-hi dimension level in loop emitter
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D148668
2023-04-20 00:57:08 +00:00
Peiming Liu
b9589545c4 [mlir][sparse] introduce a new compressed(hi) dimension level type
`compressed(hi)` is similar to `compressed`, but instead of reusing the previous position high as the current position low, it uses a pair of positions for each sparse index.

The patch only introduces the definition (syntax) but does not provide codegen implementation.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D148664
2023-04-18 23:26:11 +00:00
Aart Bik
4889214a48 [mlir][sparse][gpu] generate single module, unique kernel names
This fixes a TODO in the first version.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D148406
2023-04-15 17:25:36 -07:00
Peiming Liu
5fd9d80135 [mlir][sparse] extend loop emitter to emit slice driven loops
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D142930
2023-04-13 03:29:40 +00:00
Aart Bik
19466ebc7f [mlir][sparse][gpu] a first prototype sparse GPU code generator
This implements a proof-of-concept GPU code generator
to the sparse compiler pipeline, currently only capable
of generating CUDA threads for outermost parallel loops.

The objective, obviously, is to grow this concept
to a full blown GPU code generator, capable of the
right combinaton of code generation as well as exploiting
idiomatic kernels or vector specific libraries (think cuSparse).

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D147483
2023-04-05 11:32:06 -07:00
Peiming Liu
7b86f7c5d4 [mlir][sparse] support sparse bufferization.alloc_tensor with copy argument.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D147358
2023-03-31 22:27:23 +00:00
bixia1
6071f6fd67 [mlir][sparse] Fix a problem in handling data type conversion.
Previously, the genCast function generates arith.trunci for converting f32 to
i32. Fix the function to use mlir::convertScalarToDtype to correctly handle
conversion cases beyond index casting.

Add a test case for codegen the sparse_tensor.convert op.

Reviewed By: aartbik, Peiming, wrengr

Differential Revision: https://reviews.llvm.org/D147272
2023-03-30 14:54:53 -07:00