Commit Graph

471 Commits

Author SHA1 Message Date
Aart Bik
f388a3a446 [mlir][sparse] update doc and examples of the [dis]assemble operations (#88213)
The doc and examples of the [dis]assemble operations did not reflect all
the recent changes on order of the operands. Also clarified some of the
text.
2024-04-10 09:42:12 -07:00
Matthias Springer
a27d886ce4 [mlir][linalg][bufferize] Fix element-wise access optimization for sparse tensors (#87305)
`linalg.generic` ops with sparse tensors do not necessarily bufferize to
element-wise access, because insertions into a sparse tensor may change
the layout of (or reallocate) the underlying sparse data structures.
2024-04-03 09:57:25 +09:00
Aart Bik
3324f4d4f4 [mlir][sparse] avoid incompatible linalg fuse-into-consumer (#86752)
This fixes an "infinite" loop bug, where the incoming IR was repeatedly
rewritten while adding identical cast operations. The test for
compatible types should include the notion of an encoding. If it
differs, then a
naive fusion into the consumer is invalid.
2024-03-26 17:16:03 -07:00
Aart Bik
2e6b18b3f3 [mlir][sparse] add example to new operation doc, and roundtrip test (#85711) 2024-03-18 17:06:02 -07:00
Aart Bik
f3a8af07fa [mlir][sparse] best effort finalization of escaping empty sparse tensors (#85482)
This change lifts the restriction that purely allocated empty sparse
tensors cannot escape the method. Instead it makes a best effort to add
a finalizing operation before the escape.

This assumes that
(1) we never build sparse tensors across method boundaries
    (e.g. allocate in one, insert in other method)
(2) if we have other uses of the empty allocation in the
    same method, we assume that either that op will fail
    or will do the finalization for us.

This is best-effort, but fixes some very obvious missing cases.
2024-03-15 16:43:09 -07:00
Peiming Liu
94e27c265a [mlir][sparse] reuse tensor.insert operation to insert elements into … (#84987)
…a sparse tensor.
2024-03-12 16:59:17 -07:00
Peiming Liu
fc9f1d49aa [mlir][sparse] use a consistent order between [dis]assembleOp and sto… (#84079)
…rage layout.
2024-03-06 09:57:41 -08:00
Peiming Liu
52b69aa32f [mlir][sparse] support sparsifying batch levels (#83898) 2024-03-04 14:39:06 -08:00
Aart Bik
8dfa1d878d [mlir][sparse] add roundtrip and invalid tests for sparse_tensor.print (#83349) 2024-02-28 14:52:43 -08:00
Peiming Liu
0d1f95760b [mlir][sparse] support type conversion from batched sparse tensors to… (#83163)
… memrefs.
2024-02-27 12:05:28 -08:00
Peiming Liu
56d58295dd [mlir][sparse] Introduce batch level format. (#83082) 2024-02-26 16:08:28 -08:00
Peiming Liu
f40ee6e83f [mlir][sparse] assemble SoA COO correctly. (#82449) 2024-02-20 18:46:34 -08:00
Peiming Liu
f740366fa6 [mlir][sparse] support type conversion from SoA COO to memrefs. (#82398) 2024-02-20 11:19:13 -08:00
Peiming Liu
11705afc19 [mlir][sparse] deallocate tmp coo buffer generated during stage-spars… (#82017)
…e-ops pass.
2024-02-17 12:17:57 -08:00
Peiming Liu
088c7ce429 [mlir][sparse] introduce SoA level property on singleton level. (#81942) 2024-02-15 16:41:10 -08:00
Aart Bik
4d273b948e [mlir][sparse] ensure [dis]assembler wrapper methods properly inline (#81907) 2024-02-15 11:39:32 -08:00
Yinying Li
2a6b521b36 [mlir][sparse] Add more tests and verification for n:m (#81186)
1. Add python test for n out of m
2. Add more methods for python binding
3. Add verification for n:m and invalid encoding tests
4. Add e2e test for n:m

Previous PRs for n:m #80501 #79935
2024-02-09 14:34:36 -05:00
Yinying Li
e5924d6499 [mlir][sparse] Implement parsing n out of m (#79935)
1. Add parsing methods for block[n, m].
2. Encode n and m with the newly extended 64-bit LevelType enum.
3. Update 2:4 methods names/comments to n:m.
2024-02-08 14:38:42 -05:00
Aart Bik
5a9af39aab [mlir][sparse] made sparse vectorizer more robust on position of invariants (#80766)
Because the sparse vectorizer relies on the code coming out of the
sparsifier, the "patterns" are not always made very general. However, a
recent change in the generated code revealed an obvious situation where
the subscript analysis could be made a bit more robust.

Fixes:
https://github.com/llvm/llvm-project/issues/79897
2024-02-05 16:12:47 -08:00
Yinying Li
cd481fa827 [mlir][sparse] Change LevelType enum to 64 bit (#80501)
1. C++ enum is set through enum class LevelType : uint_64.
2. C enum is set through typedef uint_64 level_type. It is due to the
limitations in Windows build: setting enum width to ui64 is not
supported in C.
2024-02-05 17:00:52 -05:00
Aart Bik
d00e6d07b1 [mlir][sparse] refine sparse assembler strategy (#80521)
Rewrite *all* public methods, making original internal, private methods,
and exposing wrappers under the original name. This works a bit better
in practice (when combined with c-interface mechanism of torch-mlir for
example).
2024-02-05 10:48:18 -08:00
Peiming Liu
4a653b4df5 [mlir][sparse] Support pretty print to debug sparse iteration. (#80207) 2024-02-01 15:28:36 -08:00
Peiming Liu
07bf1ddb4e [mlir][sparse] support non-id map for [Dis]assembleOp (#80355) 2024-02-01 15:11:33 -08:00
Aart Bik
33b463ad99 [mlir][sparse] external entry method wrapper for sparse tensors (#80326)
Similar to the emit_c_interface, this pull request adds a pass that
converts public entry methods that use sparse tensors as input
parameters and/or output return values into wrapper functions that
[dis]assemble the individual tensors that constitute the actual storage
used externally into MLIR sparse tensors. This pass can be used to
prepare the public entry methods of a program that is compiled by the
MLIR sparsifier to interface with an external runtime, e.g., when
passing sparse tensors as numpy arrays from and to Python. Note that
eventual bufferization decisions (e.g. who [de]allocates the underlying
memory) should be resolved in agreement with the external runtime
(Python, PyTorch, JAX, etc.)
2024-02-01 13:32:52 -08:00
Peiming Liu
298412b578 [mlir][sparse] setup SparseIterator to help generating code to traverse a sparse tensor level. (#78345) 2024-01-24 11:33:06 -08:00
Oleksandr "Alex" Zinenko
2798b72ae7 [mlir] introduce debug transform dialect extension (#77595)
Introduce a new extension for simple print-debugging of the transform
dialect scripts. The initial version of this extension consists of two
ops that are printing the payload objects associated with transform
dialect values. Similar ops were already available in the test extenion
and several downstream projects, and were extensively used for testing.
2024-01-12 13:24:02 +01:00
Matthias Springer
bb6d5c2200 [mlir][Transforms] GreedyPatternRewriteDriver: Do not CSE constants during iterations (#75897)
The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply
rewrite patterns to ops. It has special handling for constants: they are
CSE'd and sometimes moved to parent regions to allow for additional
CSE'ing. This happens in `OperationFolder`.

To allow for efficient CSE'ing, `OperationFolder` maintains an internal
lookup data structure to find the existing constant ops with the same
value for each `IsolatedFromAbove` region:
```c++
/// A mapping between an insertion region and the constants that have been
/// created within it.
DenseMap<Region *, ConstantMap> foldScopes;
```

Rewrite patterns are allowed to modify operations. In particular, they
may move operations (including constants) from one region to another
one. Such an IR rewrite can make the above lookup data structure
inconsistent.

We encountered such a bug in a downstream project. This bug materialized
in the form of an op that uses the result of a constant op from a
different `IsolatedFromAbove` region (that is not accessible).

This commit changes the behavior of the `GreedyPatternRewriteDriver`
such that `OperationFolder` is used to CSE constants at the beginning of
each iteration (as the worklist is populated), but no longer during an
iteration. `OperationFolder` is no longer used after populating the
worklist, so we do not have to care about inconsistent state in the
`OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver`
now performs the op folding by itself instead of calling
`OperationFolder::tryToFold`.

This change changes the order of constant ops in test cases, but not the
region in which they appear. All broken test cases were fixed by turning
`CHECK` into `CHECK-DAG`.

Alternatives considered: The state of `OperationFolder` could be
partially invalidated with every `notifyOperationModified` notification.
That is more fragile than the solution in this commit because incorrect
rewriter API usage can lead to missing notifications and hard-to-debug
`IsolatedFromAbove` violations. (It did not fix the above mention bug in
a downstream project, which could be due to incorrect rewriter API usage
or due to another conceptual problem that I missed.) Moreover, ops are
frequently getting modified during a greedy pattern rewrite, so we would
likely keep invalidating large parts of the state of `OperationFolder`
over and over.

Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant
ops are no longer folded during a greedy pattern rewrite. If you rely on
folding (and rematerialization) of constant ops during a greedy pattern
rewrite, turn the folder into a pattern.
2024-01-05 09:22:18 +01:00
Aart Bik
41a07e668c [mlir][sparse] recognize NVidia 2:4 type for matmul (#76758)
This removes the temporary DENSE24 attribute and replaces it with proper
recognition of dense to 24 conversion. The compressionh will be
performed on the device prior to performing the matrix mult. Note that
we no longer need to start with the linalg version, we can lift this to
the proper named linalg op. Also renames some files into more consistent
names.
2024-01-02 14:44:24 -08:00
Matthias Springer
10056c821a [mlir][SCF] scf.parallel: Make reductions part of the terminator (#75314)
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
2023-12-20 11:06:27 +09:00
Peiming Liu
6c06bde7c4 [mlir][sparse] support loop range query using SparseTensorLevel. (#75670) 2023-12-15 16:33:31 -08:00
Yinying Li
31b72b0742 [mlir][sparse]Make isBlockSparsity more robust (#75113)
1. A single dimension can either be blocked (with floordiv and mod pair)
or non-blocked. Mixing them would be invalid.
2. Block size should be non-zero value.
2023-12-12 13:43:03 -05:00
Aart Bik
d96f46dd20 [mlir][sparse] fix bug in custom reduction scalarization code (#74898)
Bug found with BSR of "spy" SDDMM method
2023-12-11 10:22:17 -08:00
Peiming Liu
baa192ea65 [mlir][sparse] optimize memory loads to SSA values when generating sp… (#74787)
…arse conv.
2023-12-08 09:22:19 -08:00
Peiming Liu
097d2f1417 [mlir][sparse] optimize memory load to SSA value when generating spar… (#74750)
…se conv kernel.
2023-12-07 12:00:25 -08:00
Peiming Liu
b6cad75e07 [mlir][sparse] refactoring: using util functions to query the index to load from position array for slice-driven loop. (#73986) 2023-11-30 16:40:11 -08:00
Peiming Liu
2cc4b3d07c [mlir][sparse] code cleanup using the assumption that dim2lvl maps ar… (#72894)
…e simplified.
2023-11-20 10:25:42 -08:00
Peiming Liu
573c4db947 [mlir][sparse] refine reinterpret_map test cases (#72684) 2023-11-17 10:04:56 -08:00
Aart Bik
83cf0dc982 [mlir][sparse] implement direct IR alloc/empty/new for non-permutations (#72585)
This change implements the correct *level* sizes set up for the direct
IR codegen fields in the sparse storage scheme. This brings libgen and
codegen together again.

This is step 3 out of 3 to make sparse_tensor.new work for BSR
2023-11-16 17:17:41 -08:00
Yinying Li
c5a67e16b6 [mlir][sparse] Use variable instead of inlining sparse encoding (#72561)
Example:

#CSR = #sparse_tensor.encoding<{
  map = (d0, d1) -> (d0 : dense, d1 : compressed),
}>

// CHECK: #[[$CSR.*]] = #sparse_tensor.encoding<{ map = (d0, d1) -> (d0
: dense, d1 : compressed) }>
// CHECK-LABEL: func private @sparse_csr(
// CHECK-SAME: tensor<?x?xf32, **#[[$CSR]]**>)
func.func private @sparse_csr(tensor<?x?xf32, #CSR>)
2023-11-16 19:30:21 -05:00
Peiming Liu
06a65ce500 [mlir][sparse] schedule sparse kernels in a separate pass from sparsification. (#72423) 2023-11-15 12:16:05 -08:00
Tim Harvey
dce7a7cf69 Changed all code and comments that used the phrase "sparse compiler" to instead use "sparsifier" (#71875)
The changes in this p.r. mostly center around the tests that use the
flag sparse_compiler (also: sparse-compiler).
2023-11-15 20:12:35 +00:00
Aart Bik
a40900211a [mlir][sparse] set rwx permissions to consistent values (#72311)
some files had "x" permission set, others were missing "r"
2023-11-14 13:32:55 -08:00
Aart Bik
5f32bcfbae [mlir][sparse][gpu] re-enable all GPU libgen tests (#72185)
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.
2023-11-14 09:06:15 -08:00
Peiming Liu
269685545e [mlir][sparse] remove filter-loop based algorithm support to handle a… (#71840)
…ffine subscript expressions.
2023-11-13 11:36:49 -08:00
Peiming Liu
c99951d491 [mlir][sparse] end-to-end matmul between Dense and BSR tensors (#71448) 2023-11-08 11:28:00 -08:00
Tim Harvey
c43e627457 Changed the phrase sparse-compiler to sparsifier in comments (#71578)
When the Powers That Be decided that the name "sparse compiler" should
be changed to "sparsifier", we negected to change some of the comments
in the code; this pull request completes the name change.
2023-11-07 20:55:00 +00:00
Aart Bik
a4eadd7fb6 [mlir][sparse][gpu] add GPU BSR SDDMM check test (#71491)
also minor edits in other GPU check tests
2023-11-06 22:36:25 -08:00
Christian Ulmann
7ed96b1c0d [MLIR][LLVM] Remove last typed pointer remnants from tests (#71232)
This commit removes all LLVM dialect typed pointers from the lit tests.
Typed pointers have been deprecated for a while now and it's planned to
soon remove them from the LLVM dialect.

Related PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
2023-11-04 14:13:31 +01:00
Peiming Liu
c0d78c4232 [mlir][sparse] Implement rewriters to reinterpret maps on alloc_tenso… (#70993)
…r operation
2023-11-01 18:15:11 -07:00
Peiming Liu
3426d330a7 [mlir][sparse] Implement rewriters to reinterpret maps on foreach (#70868) 2023-11-01 12:11:47 -07:00