Commit Graph

447 Commits

Author SHA1 Message Date
Peiming Liu
298412b578 [mlir][sparse] setup SparseIterator to help generating code to traverse a sparse tensor level. (#78345) 2024-01-24 11:33:06 -08:00
Oleksandr "Alex" Zinenko
2798b72ae7 [mlir] introduce debug transform dialect extension (#77595)
Introduce a new extension for simple print-debugging of the transform
dialect scripts. The initial version of this extension consists of two
ops that are printing the payload objects associated with transform
dialect values. Similar ops were already available in the test extenion
and several downstream projects, and were extensively used for testing.
2024-01-12 13:24:02 +01:00
Matthias Springer
bb6d5c2200 [mlir][Transforms] GreedyPatternRewriteDriver: Do not CSE constants during iterations (#75897)
The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply
rewrite patterns to ops. It has special handling for constants: they are
CSE'd and sometimes moved to parent regions to allow for additional
CSE'ing. This happens in `OperationFolder`.

To allow for efficient CSE'ing, `OperationFolder` maintains an internal
lookup data structure to find the existing constant ops with the same
value for each `IsolatedFromAbove` region:
```c++
/// A mapping between an insertion region and the constants that have been
/// created within it.
DenseMap<Region *, ConstantMap> foldScopes;
```

Rewrite patterns are allowed to modify operations. In particular, they
may move operations (including constants) from one region to another
one. Such an IR rewrite can make the above lookup data structure
inconsistent.

We encountered such a bug in a downstream project. This bug materialized
in the form of an op that uses the result of a constant op from a
different `IsolatedFromAbove` region (that is not accessible).

This commit changes the behavior of the `GreedyPatternRewriteDriver`
such that `OperationFolder` is used to CSE constants at the beginning of
each iteration (as the worklist is populated), but no longer during an
iteration. `OperationFolder` is no longer used after populating the
worklist, so we do not have to care about inconsistent state in the
`OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver`
now performs the op folding by itself instead of calling
`OperationFolder::tryToFold`.

This change changes the order of constant ops in test cases, but not the
region in which they appear. All broken test cases were fixed by turning
`CHECK` into `CHECK-DAG`.

Alternatives considered: The state of `OperationFolder` could be
partially invalidated with every `notifyOperationModified` notification.
That is more fragile than the solution in this commit because incorrect
rewriter API usage can lead to missing notifications and hard-to-debug
`IsolatedFromAbove` violations. (It did not fix the above mention bug in
a downstream project, which could be due to incorrect rewriter API usage
or due to another conceptual problem that I missed.) Moreover, ops are
frequently getting modified during a greedy pattern rewrite, so we would
likely keep invalidating large parts of the state of `OperationFolder`
over and over.

Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant
ops are no longer folded during a greedy pattern rewrite. If you rely on
folding (and rematerialization) of constant ops during a greedy pattern
rewrite, turn the folder into a pattern.
2024-01-05 09:22:18 +01:00
Aart Bik
41a07e668c [mlir][sparse] recognize NVidia 2:4 type for matmul (#76758)
This removes the temporary DENSE24 attribute and replaces it with proper
recognition of dense to 24 conversion. The compressionh will be
performed on the device prior to performing the matrix mult. Note that
we no longer need to start with the linalg version, we can lift this to
the proper named linalg op. Also renames some files into more consistent
names.
2024-01-02 14:44:24 -08:00
Matthias Springer
10056c821a [mlir][SCF] scf.parallel: Make reductions part of the terminator (#75314)
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
2023-12-20 11:06:27 +09:00
Peiming Liu
6c06bde7c4 [mlir][sparse] support loop range query using SparseTensorLevel. (#75670) 2023-12-15 16:33:31 -08:00
Yinying Li
31b72b0742 [mlir][sparse]Make isBlockSparsity more robust (#75113)
1. A single dimension can either be blocked (with floordiv and mod pair)
or non-blocked. Mixing them would be invalid.
2. Block size should be non-zero value.
2023-12-12 13:43:03 -05:00
Aart Bik
d96f46dd20 [mlir][sparse] fix bug in custom reduction scalarization code (#74898)
Bug found with BSR of "spy" SDDMM method
2023-12-11 10:22:17 -08:00
Peiming Liu
baa192ea65 [mlir][sparse] optimize memory loads to SSA values when generating sp… (#74787)
…arse conv.
2023-12-08 09:22:19 -08:00
Peiming Liu
097d2f1417 [mlir][sparse] optimize memory load to SSA value when generating spar… (#74750)
…se conv kernel.
2023-12-07 12:00:25 -08:00
Peiming Liu
b6cad75e07 [mlir][sparse] refactoring: using util functions to query the index to load from position array for slice-driven loop. (#73986) 2023-11-30 16:40:11 -08:00
Peiming Liu
2cc4b3d07c [mlir][sparse] code cleanup using the assumption that dim2lvl maps ar… (#72894)
…e simplified.
2023-11-20 10:25:42 -08:00
Peiming Liu
573c4db947 [mlir][sparse] refine reinterpret_map test cases (#72684) 2023-11-17 10:04:56 -08:00
Aart Bik
83cf0dc982 [mlir][sparse] implement direct IR alloc/empty/new for non-permutations (#72585)
This change implements the correct *level* sizes set up for the direct
IR codegen fields in the sparse storage scheme. This brings libgen and
codegen together again.

This is step 3 out of 3 to make sparse_tensor.new work for BSR
2023-11-16 17:17:41 -08:00
Yinying Li
c5a67e16b6 [mlir][sparse] Use variable instead of inlining sparse encoding (#72561)
Example:

#CSR = #sparse_tensor.encoding<{
  map = (d0, d1) -> (d0 : dense, d1 : compressed),
}>

// CHECK: #[[$CSR.*]] = #sparse_tensor.encoding<{ map = (d0, d1) -> (d0
: dense, d1 : compressed) }>
// CHECK-LABEL: func private @sparse_csr(
// CHECK-SAME: tensor<?x?xf32, **#[[$CSR]]**>)
func.func private @sparse_csr(tensor<?x?xf32, #CSR>)
2023-11-16 19:30:21 -05:00
Peiming Liu
06a65ce500 [mlir][sparse] schedule sparse kernels in a separate pass from sparsification. (#72423) 2023-11-15 12:16:05 -08:00
Tim Harvey
dce7a7cf69 Changed all code and comments that used the phrase "sparse compiler" to instead use "sparsifier" (#71875)
The changes in this p.r. mostly center around the tests that use the
flag sparse_compiler (also: sparse-compiler).
2023-11-15 20:12:35 +00:00
Aart Bik
a40900211a [mlir][sparse] set rwx permissions to consistent values (#72311)
some files had "x" permission set, others were missing "r"
2023-11-14 13:32:55 -08:00
Aart Bik
5f32bcfbae [mlir][sparse][gpu] re-enable all GPU libgen tests (#72185)
Previous change no longer properly used the GPU libgen pass (even though
most tests still passed falling back to CPU). This revision puts the
proper pass order into place. Also bit of a cleanup of CPU codegen vs.
libgen setup.
2023-11-14 09:06:15 -08:00
Peiming Liu
269685545e [mlir][sparse] remove filter-loop based algorithm support to handle a… (#71840)
…ffine subscript expressions.
2023-11-13 11:36:49 -08:00
Peiming Liu
c99951d491 [mlir][sparse] end-to-end matmul between Dense and BSR tensors (#71448) 2023-11-08 11:28:00 -08:00
Tim Harvey
c43e627457 Changed the phrase sparse-compiler to sparsifier in comments (#71578)
When the Powers That Be decided that the name "sparse compiler" should
be changed to "sparsifier", we negected to change some of the comments
in the code; this pull request completes the name change.
2023-11-07 20:55:00 +00:00
Aart Bik
a4eadd7fb6 [mlir][sparse][gpu] add GPU BSR SDDMM check test (#71491)
also minor edits in other GPU check tests
2023-11-06 22:36:25 -08:00
Christian Ulmann
7ed96b1c0d [MLIR][LLVM] Remove last typed pointer remnants from tests (#71232)
This commit removes all LLVM dialect typed pointers from the lit tests.
Typed pointers have been deprecated for a while now and it's planned to
soon remove them from the LLVM dialect.

Related PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
2023-11-04 14:13:31 +01:00
Peiming Liu
c0d78c4232 [mlir][sparse] Implement rewriters to reinterpret maps on alloc_tenso… (#70993)
…r operation
2023-11-01 18:15:11 -07:00
Peiming Liu
3426d330a7 [mlir][sparse] Implement rewriters to reinterpret maps on foreach (#70868) 2023-11-01 12:11:47 -07:00
Aart Bik
e599978760 [mlir][sparse] first proof-of-concept non-permutation rewriter (#70863)
Rather than extending sparsifier codegen with higher order
non-permutations, we follow the path of rewriting linalg generic ops
into higher order operations. That way, code generation will simply work
out of the box. This is a very first proof-of-concept rewriting of that
idea.
2023-10-31 16:19:27 -07:00
Christian Ulmann
dcae289d3a [MLIR][SparseTensor] Introduce opaque pointers in LLVM dialect lowering (#70570)
This commit changes the SparseTensor LLVM dialect lowering from using
`llvm.ptr<i8>` to `llvm.ptr`. This change ensures that the lowering now
properly relies on opaque pointers, instead of working with already type
erased i8 pointers.
2023-10-31 07:34:49 +01:00
Peiming Liu
ef100c228a [mlir][sparse] implements tensor.insert on sparse tensors. (#70737) 2023-10-30 16:04:41 -07:00
Peiming Liu
f82bee1367 [mlir][sparse] split post-sparsification-rewriting into two passes. (#70727) 2023-10-30 15:22:21 -07:00
Peiming Liu
7d608ee2bb [mlir][sparse] unify sparse_tensor.out rewriting rules (#70518) 2023-10-27 16:46:58 -07:00
Aart Bik
7cfac1bedd [mlir][sparse] add boilterplate code for a new reintepret map pass (#70393)
The interesting stuff is of course still coming ;-)
2023-10-26 17:57:46 -07:00
Peiming Liu
d808d922b4 [mlir][sparse] introduce sparse_tensor.reinterpret_map operation. (#70378) 2023-10-26 15:04:09 -07:00
Aart Bik
ff94061a9f [mlir][sparse] remove reshape dot test (#70359)
This no longer tests a required feature.
2023-10-26 11:14:44 -07:00
Aart Bik
0cbaff815c [mlir][sparse] cleanup conversion test (#70356)
Various TODOs had been added that actually removed the actual test.
This puts the CHECK test backs and removes the TODOs that have no
immediate plans.
2023-10-26 10:48:29 -07:00
Aart Bik
7e83a1af5d [mlir][sparse] add verification of absent value in sparse_tensor.unary (#70248)
This value should always be a plain contant or something invariant
computed outside the surrounding linalg operation, since there is no
co-iteration defined on anything done in this branch.

Fixes:
https://github.com/llvm/llvm-project/issues/69395
2023-10-25 13:56:43 -07:00
Aart Bik
a12d057be9 [mlir][sparse] update block24 example (#70145)
Removes TODO, shows how to define 8-bit crd (lacking 2-bit for now)
2023-10-25 08:29:31 -07:00
Peiming Liu
c780352de9 [mlir][sparse] implement sparse_tensor.lvl operation. (#69993) 2023-10-24 13:23:28 -07:00
Oleksandr "Alex" Zinenko
e4384149b5 [mlir] use transform-interpreter in test passes (#70040)
Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.

Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.

Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
2023-10-24 16:12:34 +02:00
Peiming Liu
f0f5fdf73d [mlir][sparse] introduce sparse_tensor.lvl operation. (#69978) 2023-10-23 15:49:39 -07:00
Peiming Liu
ff21a90e51 [mlir][sparse] introduce sparse_tensor.crd_translate operation (#69630) 2023-10-19 15:42:09 -07:00
Yinying Li
7b9fb1c228 [mlir][sparse] Update verifier for block sparsity and singleton (#69389)
Updates:
1. Verification of block sparsity.
2. Verification of singleton level type can only follow compressed or
loose_compressed levels. And all level types after singleton should be
singleton.
3. Added getBlockSize function.
4. Added an invalid encoding test for an incorrect lvlToDim map that
user provides.
2023-10-19 12:34:18 -04:00
Yinying Li
d4088e7d5f [mlir][sparse] Populate lvlToDim (#68937)
Updates:
1. Infer lvlToDim from dimToLvl
2. Add more tests for block sparsity
3. Finish TODOs related to lvlToDim, including adding lvlToDim to python
binding

Verification of lvlToDim that user provides will be implemented in the
next PR.
2023-10-17 16:09:39 -04:00
Peiming Liu
71c97c735c [mlir][sparse] avoid tensor to memref conversion in sparse tensor rewri… (#69362)
…ting rules.
2023-10-17 11:34:06 -07:00
Aart Bik
d392073f67 [mlir][sparse] simplify reader construction of new sparse tensor (#69036)
Making the materialize-from-reader method part of the Swiss army knife
suite again removes a lot of redundant boiler plate code and unifies the
parameter setup into a single centralized utility. Furthermore, we now
have minimized the number of entry points into the library that need a
non-permutation map setup, simplifying what comes next
2023-10-16 10:25:37 -07:00
Aart Bik
2045cca0c3 [mlir][sparse] add a forwarding insertion to SparseTensorStorage (#68939) 2023-10-12 21:03:07 -07:00
Peiming Liu
f248d0b28d [mlir][sparse] implement sparse_tensor.reorder_coo (#68916)
As a side effect of the change, it also unifies the convertOp
implementation between lib/codegen path.
2023-10-12 13:22:45 -07:00
Peiming Liu
0aacc2137a [mlir][sparse] introduce sparse_tensor.reorder_coo operation (#68827) 2023-10-12 09:42:12 -07:00
Peiming Liu
325576196b [mlir][sparse] remove tests (#68826) 2023-10-11 11:23:25 -07:00
Peiming Liu
dda3dc5e38 [mlir][sparse] simplify ConvertOp rewriting rules (#68350)
Canonicalize complex convertOp into multiple stages, such that it can
either be done by a direct conversion or by sorting.
2023-10-11 09:34:11 -07:00