Commit Graph

493 Commits

Author SHA1 Message Date
Peiming Liu
fb8f492a1c [mlir][sparse] clone a empty sparse tensor when fuse convert into pro… (#92158)
…ducer.
2024-05-14 13:26:49 -07:00
Aart Bik
70e227a404 [mlir][sparse] recognize ReLu operation during sparsification (#92016)
This is a proof of concept recognition of the most basic forms of ReLu
operations, used to show-case sparsification of end-to-end PyTorch
models. In the long run, we must avoid lowering such constructs too
early (with this need for raising them back).

See discussion at

https://discourse.llvm.org/t/min-max-abs-relu-recognition-starter-project/78918
2024-05-13 14:02:29 -07:00
Peiming Liu
37ffbbb195 [mlir][tensor][sparse] don't drop encoding when infer result type (#91817)
A general question is: is it possible to support hooks here to infer the
encoding? E.g., when the extracted tensor slice is rank-reduced, the
encoding need to be updated accordingly as well.
2024-05-13 09:53:15 -07:00
Peiming Liu
13af97a70e [mlir][sparse] allow multiple COO segments in sparse encodings. (#91786)
**NOTE**: we still have implementation holes when handling multiple COO
segments in the encoding. But the format should be considered to be
legal.
2024-05-10 11:36:01 -07:00
Yinying Li
83f3b1cb48 [mlir][sparse] Add verification for explicit/implicit value (#90111)
1. Verify that the type of explicit/implicit values should be the same
as the tensor element type.
2. Verify that implicit value could only be zero.
3. Verify that explicit/implicit values should be numeric.
4. Fix the type change issue caused by SparseTensorType(enc).
2024-05-07 20:28:39 -04:00
Aart Bik
5c5116556f [mlir][sparse] force a properly sized view on pos/crd/val under codegen (#91288)
Codegen "vectors" for pos/crd/val use the capacity as memref size, not
the actual used size. Although the sparsifier itself always uses just
the defined pos/crd/val parts, printing these and passing them back to a
runtime environment could benefit from wrapping the basic pos/crd/val
getters into a proper memref view that sets the right size.
2024-05-07 09:20:56 -07:00
Aart Bik
fc398a112d [mlir][sparse] test optimization of binary-valued operations (#90986)
Make sure consumer-producer fusion happens (to avoid the temporary dense
tensor) and constant folding occurs in the generated code.
2024-05-03 10:41:16 -07:00
Peiming Liu
fc83eda46e [mlir][sparse] make sparse compiler more admissible. (#90927) 2024-05-02 18:53:38 -07:00
Yinying Li
e71eacc5b1 [mlir][sparse] Support explicit/implicit value for complex type (#90771) 2024-05-02 12:28:34 -04:00
Peiming Liu
78885395c8 [mlir][sparse] support tensor.pad on CSR tensors (#90687) 2024-05-01 15:37:38 -07:00
Gaurav Shukla
97069a8619 [MLIR] Generalize expand_shape to take shape as explicit input (#90040)
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.

The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.

Differential Revision: https://reviews.llvm.org/D140821

---------

Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
2024-04-30 09:28:35 -07:00
Aart Bik
65ee8f10b2 [mlir][sparse] fold explicit value during sparsification (#90530)
This ensures the explicit value is generated (and not a load into the
values array). Note that actually not storing values array at all is
still TBD, this is just the very first step.
2024-04-29 18:06:07 -07:00
Peiming Liu
3aeb28b93f [mlir][sparse] fold sparse convert into producer linalg op. (#89999) 2024-04-26 10:48:15 -07:00
Yinying Li
a10d67f9fb [mlir][sparse] Enable explicit and implicit value in sparse encoding (#88975)
1. Explicit value means the non-zero value in a sparse tensor. If
explicitVal is set, then all the non-zero values in the tensor have the
same explicit value. The default value Attribute() indicates that it is
not set.

2. Implicit value means the "zero" value in a sparse tensor. If
implicitVal is set, then the "zero" value in the tensor is equal to the
implicit value. For now, we only support `0` as the implicit value but
it could be extended in the future. The default value Attribute()
indicates that the implicit value is `0` (same type as the tensor
element type).

Example:

```
#CSR = #sparse_tensor.encoding<{
  map = (d0, d1) -> (d0 : dense, d1 : compressed),
  posWidth = 64,
  crdWidth = 64,
  explicitVal = 1 : i64,
  implicitVal = 0 : i64
}>
```

Note: this PR tests that implicitVal could be set to other values as
well. The following PR will add verifier and reject any value that's not
zero for implicitVal.
2024-04-24 16:20:25 -07:00
Peiming Liu
ea3eeb483f [mlir][sparse] fuse concat and extract_slice op if possible. (#89825) 2024-04-24 13:51:41 -07:00
Mehdi Amini
8c0341df02 Revert "[MLIR] Generalize expand_shape to take shape as explicit input" (#89540)
Reverts llvm/llvm-project#69267

this broke some bots.
2024-04-21 14:33:48 +02:00
Gaurav Shukla
e095d978ba [MLIR] Generalize expand_shape to take shape as explicit input (#69267)
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.

The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.

Differential Revision: https://reviews.llvm.org/D140821

Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
2024-04-21 07:37:02 -04:00
Peiming Liu
481bd5d416 [mlir][sparse] introduce sparse_tensor.extract_iteration_space operation. (#88554)
A `sparse_tensor.extract_space %tensor at %iterator` extracts a *sparse*
iteration space defined `%tensor`, the operation to traverse the
iteration space will be introduced in following PRs.
2024-04-16 11:32:30 -07:00
Peiming Liu
b9556532c7 Revert "[mlir][sparse] introduce sparse_tensor.iterate operation" (#88953)
Reverts llvm/llvm-project#88807 (merged by mistake)
2024-04-16 11:31:33 -07:00
Peiming Liu
8debcf03c5 [mlir][sparse] introduce sparse_tensor.iterate operation (#88807)
A `sparse_tensor.iterate` iterates over a sparse iteration space
extracted from `sparse_tensor.extract_iteration_space` operation
introduced in https://github.com/llvm/llvm-project/pull/88554.

*DO NOT MERGE* before https://github.com/llvm/llvm-project/pull/88554
2024-04-16 11:31:09 -07:00
Kunwar Grover
6f1e23b47d [MLIR][Bufferization] Choose default memory space in tensor copy insertion (#88500)
Tensor copy insertion currently uses memory_space = 0 when creating a
tensor copy using alloc_tensor. This memory space should instead be the
default memory space provided in bufferization options.
2024-04-12 17:56:46 +02:00
Aart Bik
5122a2c232 [mlir][sparse] allow for direct-out passing of sparse tensor buffers (#88327)
In order to support various external frameworks (JAX vs PyTorch) we need
a bit more flexibility in [dis]assembling external buffers to and from
sparse tensors in MLIR land. This PR adds a direct-out option that
avoids the rigid pre-allocated for copy-out semantics.

Note that over time, we expect the [dis]assemble operations to converge
into something that supports all sorts of external frameworks. Until
then, this option helps in experimenting with different options.
2024-04-11 10:07:24 -07:00
Aart Bik
f388a3a446 [mlir][sparse] update doc and examples of the [dis]assemble operations (#88213)
The doc and examples of the [dis]assemble operations did not reflect all
the recent changes on order of the operands. Also clarified some of the
text.
2024-04-10 09:42:12 -07:00
Matthias Springer
a27d886ce4 [mlir][linalg][bufferize] Fix element-wise access optimization for sparse tensors (#87305)
`linalg.generic` ops with sparse tensors do not necessarily bufferize to
element-wise access, because insertions into a sparse tensor may change
the layout of (or reallocate) the underlying sparse data structures.
2024-04-03 09:57:25 +09:00
Aart Bik
3324f4d4f4 [mlir][sparse] avoid incompatible linalg fuse-into-consumer (#86752)
This fixes an "infinite" loop bug, where the incoming IR was repeatedly
rewritten while adding identical cast operations. The test for
compatible types should include the notion of an encoding. If it
differs, then a
naive fusion into the consumer is invalid.
2024-03-26 17:16:03 -07:00
Aart Bik
2e6b18b3f3 [mlir][sparse] add example to new operation doc, and roundtrip test (#85711) 2024-03-18 17:06:02 -07:00
Aart Bik
f3a8af07fa [mlir][sparse] best effort finalization of escaping empty sparse tensors (#85482)
This change lifts the restriction that purely allocated empty sparse
tensors cannot escape the method. Instead it makes a best effort to add
a finalizing operation before the escape.

This assumes that
(1) we never build sparse tensors across method boundaries
    (e.g. allocate in one, insert in other method)
(2) if we have other uses of the empty allocation in the
    same method, we assume that either that op will fail
    or will do the finalization for us.

This is best-effort, but fixes some very obvious missing cases.
2024-03-15 16:43:09 -07:00
Peiming Liu
94e27c265a [mlir][sparse] reuse tensor.insert operation to insert elements into … (#84987)
…a sparse tensor.
2024-03-12 16:59:17 -07:00
Peiming Liu
fc9f1d49aa [mlir][sparse] use a consistent order between [dis]assembleOp and sto… (#84079)
…rage layout.
2024-03-06 09:57:41 -08:00
Peiming Liu
52b69aa32f [mlir][sparse] support sparsifying batch levels (#83898) 2024-03-04 14:39:06 -08:00
Aart Bik
8dfa1d878d [mlir][sparse] add roundtrip and invalid tests for sparse_tensor.print (#83349) 2024-02-28 14:52:43 -08:00
Peiming Liu
0d1f95760b [mlir][sparse] support type conversion from batched sparse tensors to… (#83163)
… memrefs.
2024-02-27 12:05:28 -08:00
Peiming Liu
56d58295dd [mlir][sparse] Introduce batch level format. (#83082) 2024-02-26 16:08:28 -08:00
Peiming Liu
f40ee6e83f [mlir][sparse] assemble SoA COO correctly. (#82449) 2024-02-20 18:46:34 -08:00
Peiming Liu
f740366fa6 [mlir][sparse] support type conversion from SoA COO to memrefs. (#82398) 2024-02-20 11:19:13 -08:00
Peiming Liu
11705afc19 [mlir][sparse] deallocate tmp coo buffer generated during stage-spars… (#82017)
…e-ops pass.
2024-02-17 12:17:57 -08:00
Peiming Liu
088c7ce429 [mlir][sparse] introduce SoA level property on singleton level. (#81942) 2024-02-15 16:41:10 -08:00
Aart Bik
4d273b948e [mlir][sparse] ensure [dis]assembler wrapper methods properly inline (#81907) 2024-02-15 11:39:32 -08:00
Yinying Li
2a6b521b36 [mlir][sparse] Add more tests and verification for n:m (#81186)
1. Add python test for n out of m
2. Add more methods for python binding
3. Add verification for n:m and invalid encoding tests
4. Add e2e test for n:m

Previous PRs for n:m #80501 #79935
2024-02-09 14:34:36 -05:00
Yinying Li
e5924d6499 [mlir][sparse] Implement parsing n out of m (#79935)
1. Add parsing methods for block[n, m].
2. Encode n and m with the newly extended 64-bit LevelType enum.
3. Update 2:4 methods names/comments to n:m.
2024-02-08 14:38:42 -05:00
Aart Bik
5a9af39aab [mlir][sparse] made sparse vectorizer more robust on position of invariants (#80766)
Because the sparse vectorizer relies on the code coming out of the
sparsifier, the "patterns" are not always made very general. However, a
recent change in the generated code revealed an obvious situation where
the subscript analysis could be made a bit more robust.

Fixes:
https://github.com/llvm/llvm-project/issues/79897
2024-02-05 16:12:47 -08:00
Yinying Li
cd481fa827 [mlir][sparse] Change LevelType enum to 64 bit (#80501)
1. C++ enum is set through enum class LevelType : uint_64.
2. C enum is set through typedef uint_64 level_type. It is due to the
limitations in Windows build: setting enum width to ui64 is not
supported in C.
2024-02-05 17:00:52 -05:00
Aart Bik
d00e6d07b1 [mlir][sparse] refine sparse assembler strategy (#80521)
Rewrite *all* public methods, making original internal, private methods,
and exposing wrappers under the original name. This works a bit better
in practice (when combined with c-interface mechanism of torch-mlir for
example).
2024-02-05 10:48:18 -08:00
Peiming Liu
4a653b4df5 [mlir][sparse] Support pretty print to debug sparse iteration. (#80207) 2024-02-01 15:28:36 -08:00
Peiming Liu
07bf1ddb4e [mlir][sparse] support non-id map for [Dis]assembleOp (#80355) 2024-02-01 15:11:33 -08:00
Aart Bik
33b463ad99 [mlir][sparse] external entry method wrapper for sparse tensors (#80326)
Similar to the emit_c_interface, this pull request adds a pass that
converts public entry methods that use sparse tensors as input
parameters and/or output return values into wrapper functions that
[dis]assemble the individual tensors that constitute the actual storage
used externally into MLIR sparse tensors. This pass can be used to
prepare the public entry methods of a program that is compiled by the
MLIR sparsifier to interface with an external runtime, e.g., when
passing sparse tensors as numpy arrays from and to Python. Note that
eventual bufferization decisions (e.g. who [de]allocates the underlying
memory) should be resolved in agreement with the external runtime
(Python, PyTorch, JAX, etc.)
2024-02-01 13:32:52 -08:00
Peiming Liu
298412b578 [mlir][sparse] setup SparseIterator to help generating code to traverse a sparse tensor level. (#78345) 2024-01-24 11:33:06 -08:00
Oleksandr "Alex" Zinenko
2798b72ae7 [mlir] introduce debug transform dialect extension (#77595)
Introduce a new extension for simple print-debugging of the transform
dialect scripts. The initial version of this extension consists of two
ops that are printing the payload objects associated with transform
dialect values. Similar ops were already available in the test extenion
and several downstream projects, and were extensively used for testing.
2024-01-12 13:24:02 +01:00
Matthias Springer
bb6d5c2200 [mlir][Transforms] GreedyPatternRewriteDriver: Do not CSE constants during iterations (#75897)
The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply
rewrite patterns to ops. It has special handling for constants: they are
CSE'd and sometimes moved to parent regions to allow for additional
CSE'ing. This happens in `OperationFolder`.

To allow for efficient CSE'ing, `OperationFolder` maintains an internal
lookup data structure to find the existing constant ops with the same
value for each `IsolatedFromAbove` region:
```c++
/// A mapping between an insertion region and the constants that have been
/// created within it.
DenseMap<Region *, ConstantMap> foldScopes;
```

Rewrite patterns are allowed to modify operations. In particular, they
may move operations (including constants) from one region to another
one. Such an IR rewrite can make the above lookup data structure
inconsistent.

We encountered such a bug in a downstream project. This bug materialized
in the form of an op that uses the result of a constant op from a
different `IsolatedFromAbove` region (that is not accessible).

This commit changes the behavior of the `GreedyPatternRewriteDriver`
such that `OperationFolder` is used to CSE constants at the beginning of
each iteration (as the worklist is populated), but no longer during an
iteration. `OperationFolder` is no longer used after populating the
worklist, so we do not have to care about inconsistent state in the
`OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver`
now performs the op folding by itself instead of calling
`OperationFolder::tryToFold`.

This change changes the order of constant ops in test cases, but not the
region in which they appear. All broken test cases were fixed by turning
`CHECK` into `CHECK-DAG`.

Alternatives considered: The state of `OperationFolder` could be
partially invalidated with every `notifyOperationModified` notification.
That is more fragile than the solution in this commit because incorrect
rewriter API usage can lead to missing notifications and hard-to-debug
`IsolatedFromAbove` violations. (It did not fix the above mention bug in
a downstream project, which could be due to incorrect rewriter API usage
or due to another conceptual problem that I missed.) Moreover, ops are
frequently getting modified during a greedy pattern rewrite, so we would
likely keep invalidating large parts of the state of `OperationFolder`
over and over.

Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant
ops are no longer folded during a greedy pattern rewrite. If you rely on
folding (and rematerialization) of constant ops during a greedy pattern
rewrite, turn the folder into a pattern.
2024-01-05 09:22:18 +01:00
Aart Bik
41a07e668c [mlir][sparse] recognize NVidia 2:4 type for matmul (#76758)
This removes the temporary DENSE24 attribute and replaces it with proper
recognition of dense to 24 conversion. The compressionh will be
performed on the device prior to performing the matrix mult. Note that
we no longer need to start with the linalg version, we can lift this to
the proper named linalg op. Also renames some files into more consistent
names.
2024-01-02 14:44:24 -08:00