Commit Graph

109 Commits

Author SHA1 Message Date
Matthias Springer
27a431f5e9 [mlir][bufferization][NFC] Move sparse_tensor.release to bufferization dialect
This op used to belong to the sparse dialect, but there are use cases for dense bufferization as well. (E.g., when a tensor alloc is returned from a function and should be deallocated at the call site.) This change moves the op to the bufferization dialect, which now has an `alloc_tensor` and a `dealloc_tensor` op.

Differential Revision: https://reviews.llvm.org/D129985
2022-07-19 09:18:19 +02:00
Jim Kitchen
2b8a4d9ce1 [mlir][sparse] Introduce new reduce op
A new sparse_tensor operation allows for
custom reduction code to be injected during
linalg.generic lowering for sparse tensors.
An identity value is provided to indicate
the starting value of the reduction. A single
block region is required to contain the
custom reduce computation.

Reviewed by: aartbik

Differential Revision: https://reviews.llvm.org/D128004
2022-07-15 15:30:41 -05:00
Matthias Springer
c66303c287 [mlir][sparse] Switch to One-Shot Bufferize
This change removes the partial bufferization passes from the sparse compilation pipeline and replaces them with One-Shot Bufferize. One-Shot Analysis (and TensorCopyInsertion) is used to resolve all out-of-place bufferizations, dense and sparse. Dense ops are then bufferized with BufferizableOpInterface. Sparse ops are still bufferized in the Sparsification pass.

Details:
* Dense allocations are automatically deallocated, unless they are yielded from a block. (In that case the alloc would leak.) All test cases are modified accordingly. E.g., some funcs now have an "out" tensor argument that is returned from the function. (That way, the allocation happens at the call site.)
* Sparse allocations are *not* automatically deallocated. They must be "released" manually. (No change, this will be addressed in a future change.)
* Sparse tensor copies are not supported yet. (Future change)
* Sparsification no longer has to consider inplacability. If necessary, allocations and/or copies are inserted during TensorCopyInsertion. All tensors are inplaceable by the time Sparsification is running. Instead of marking a tensor as "not inplaceable", it can be marked as "not writable", which will trigger an allocation and/or copy during TensorCopyInsertion.

Differential Revision: https://reviews.llvm.org/D129356
2022-07-14 09:52:48 +02:00
Aart Bik
faa00c1313 [mlir][sparse] implement sparse2sparse reshaping (expand/collapse)
A previous revision implemented expand/collapse reshaping between
dense and sparse tensors for sparse2dense and dense2sparse since those
could use the "cheap" view reshape on the already materialized
dense tensor (at either the input or output side), and do some
reshuffling from or to sparse. The dense2dense case, as always,
is handled with a "cheap" view change.

This revision implements the sparse2sparse cases. Lacking any "view"
support on sparse tensors this operation necessarily has to perform
data reshuffling on both ends.

Tracker for improving this:
https://github.com/llvm/llvm-project/issues/56477

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D129416
2022-07-11 14:49:06 -07:00
Matthias Springer
fc9b37dd53 [mlir][bufferization] Do not canonicalize to_tensor(to_memref(x))
This is a partial revert of D128615.

to_memref(to_tensor(x)) always be folded to x. But to_tensor(to_memref(x)) cannot be folded in the general case because writes to the intermediary memref may go unnoticed.

Differential Revision: https://reviews.llvm.org/D129354
2022-07-09 09:16:52 +02:00
Aart Bik
6d8e2f1e51 [mlir][sparse] implement simple reshaping (expand/collapse)
The revision makes a start with implementing expand/collapse reshaping
for sparse tensors. When either source or destination is sparse, but
other is dense, the "cheap" dense reshape can be used prior to converting
from or to a sparse tensor.

Note1
sparse to sparse reshaping is still TBD.

Note2
in the long run, we may want to implement a "view" into a sparse tensor so that the operation remains cheap and does not require data shuffling

Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D129031
2022-07-06 14:34:30 -07:00
Aart Bik
e057f25dee [mlir][sparse] auto-insertion of conversion to resolve cycles
When the iteration graph is cyclic (even after several attempts using less and less constraints), the current sparse compiler bails out, and no rewriting hapens. However, this revision adds some new logic where the sparse compiler tries to find a single input sparse tensor that breaks the cycle, and then adds a proper sparse conversion operation. This way, more incoming kernels can be handled!

Note, the resulting code is not optimal (although it keeps more or less proper "sparse" complexity), and more improvements should be added (especially when the kernel directly yields without computation, such as the transpose example). However, handling is better than not handling ;-)

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D128847
2022-06-29 18:28:18 -07:00
Aart Bik
eca6f9160f [mlir][sparse][bufferization] refine bufferization assumption enforcement
Enforce the assumption made on tensor buffers explicitly. When in-place,
reuse the buffer, but fill with all zeroes for the non-update case, since
the kernel assumes all elements are written to. When not in-place, zero
out the new buffer when materializing or when no-updates occur. Copy the
original tensor value when updates occur. This prepares migrating to the
new bufferization strategy, where these assumptions must be made explicit.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D128691
2022-06-28 09:43:30 -07:00
Matthias Springer
cb47124179 [mlir][bufferize] Improve to_tensor/to_memref folding
Differential Revision: https://reviews.llvm.org/D128615
2022-06-27 21:42:39 +02:00
Aart Bik
9a3d60e0d3 [mlir][bufferization][sparse] put restriction on sparse tensor allocation
Putting some direct use restrictions on tensor allocations in the
sparse case enables the use of simplifying assumptions in the
bufferization analysis.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D128463
2022-06-24 10:58:43 -07:00
Matthias Springer
3798678bd1 [mlir][sparse][bufferize] Implement BufferizableOpInterface
Only the analysis part of the interface is implemented. The bufferization itself is performed by the SparseTensorConversion pass.

Differential Revision: https://reviews.llvm.org/D128138
2022-06-24 13:47:01 +02:00
Aart Bik
e7d3ba1066 [mlir][sparse] accept sparse reshape (expand/collapse)
This revision makes sure we accept sparse tensors as arguments
of the expand/collapse reshaping operations in the tensor dialect.
Note that the actual lowering to runnable IR is still TBD.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D128311
2022-06-22 09:40:38 -07:00
Aart Bik
fde04aee33 [mlir][sparse] refine bufferization allocation lowering
Marking bufferization allocation operation as invalid
during sparse lowering is too strict, since dense and
sparse allocation can co-exist. This revision refines
the lowering with a dynamic type check.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D128305
2022-06-21 15:17:25 -07:00
Aart Bik
aef20f59a5 [mlir][sparse] move from by-value to by-reference for data types
This fixes all sorts of ABI issues due to passing by-value
(using by-reference with memref's exclusively).

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D128018
2022-06-17 08:39:25 -07:00
bixia1
ea8ed5cbcf [mlir][sparse] Add F16 and BF16.
This is the first PR to add `F16` and `BF16` support to the sparse codegen. There are still problems in supporting these two data types, such as `BF16` is not quite working yet.

Add tests cases.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D127010
2022-06-08 09:51:05 -07:00
wren romano
3cf03f1c56 [mlir][sparse] Adding IsSparseTensorPred and updating ops to use it
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D126994
2022-06-03 17:15:31 -07:00
Matthias Springer
6232a8f3d6 [mlir][sparse][NFC] Switch InitOp to bufferization::AllocTensorOp
Now that we have an AllocTensorOp (previously InitTensorOp) in the bufferization dialect, the InitOp in the sparse dialect is no longer needed.

Differential Revision: https://reviews.llvm.org/D126180
2022-06-02 00:03:52 +02:00
wren romano
b364c76683 [mlir][sparse] Using non-empty function name suffix for OverheadType::kIndex
The trick of using an empty token in the `FOREVERY_O` x-macro relies on preprocessor behavior which is only standard since C99 6.10.3/4 and C++11 N3290 16.3/4 (whereas it was undefined behavior up through C++03 16.3/10).  Since the `ExecutionEngine/SparseTensorUtils.cpp` file is required to be compile-able under C++98 compatibility mode (unlike the C++11 used elsewhere in MLIR), we shouldn't rely on that behavior.

Also, using a non-empty suffix helps improve uniformity of the API, since all other primary/overhead suffixes are also non-empty.  I'm using the suffix `0` since that's the value used by the `SparseTensorEncoding` attribute for indicating the index overhead-type.

Depends On D126720

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D126724
2022-06-01 14:18:42 -07:00
Aart Bik
28b6d412af [mlir][sparse] add support for complex zero/one building
Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D126039
2022-05-20 08:53:30 -07:00
wren romano
8cb332406c [mlir][sparse] Enhancing sparse=>sparse conversion.
Fixes: https://github.com/llvm/llvm-project/issues/51652

Depends On D122060

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D122061
2022-05-16 15:42:19 -07:00
River Riddle
a8308020ac [mlir] Remove special case parsing/printing of func operations
This was leftover from when the standard dialect was destroyed, and
when FuncOp moved to the func dialect. Now that these transitions
have settled a bit we can drop these.

Most updates were handled using a simple regex: replace `^( *)func` with `$1func.func`

Differential Revision: https://reviews.llvm.org/D124146
2022-05-06 13:36:15 -07:00
Aart Bik
952fa3018e [mlir][sparse] add more zero-preserving unary ops to sparse compiler
Although we now have semi-rings to deal with arbitrary ops,
it is still good to convey zero-preserving semantics of
ops to the sparse compiler.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D125043
2022-05-05 15:35:19 -07:00
Nick Kreeger
4620032ee3 Revert "[mlir][sparse] Expose SpareTensor passes as enums instead of opaque numbers for vectorization and parallelization options."
This reverts commit d59cf901cb.

Build fails on NVIDIA Sparse tests:
https://lab.llvm.org/buildbot/#/builders/61/builds/25447
2022-04-23 20:14:48 -05:00
Nick Kreeger
d59cf901cb [mlir][sparse] Expose SpareTensor passes as enums instead of opaque numbers for vectorization and parallelization options.
The SparseTensor passes currently use opaque numbers for the CLI, despite using an enum internally. This patch exposes the enums instead of numbered items that are matched back to the enum.

Fixes GitHub issue #53389

Reviewed by: aartbik, mehdi_amini

Differential Revision: https://reviews.llvm.org/D123876
2022-04-23 19:16:57 -05:00
River Riddle
fb35cd3baf [mlir][NFC] Update textual references of func to func.func in SparseTensor tests
The special case parsing of `func` operations is being removed.
2022-04-20 22:17:29 -07:00
River Riddle
af371f9f98 Reland [GreedPatternRewriter] Preprocess constants while building worklist when not processing top down
Reland Note: Adds a fix to properly mark a commutative operation as folded if we change the order
             of its operands. This was uncovered by the fact that we no longer re-process constants.

This avoids accidentally reversing the order of constants during successive
application, e.g. when running the canonicalizer. This helps reduce the number
of iterations, and also avoids unnecessary changes to input IR.

Fixes #51892

Differential Revision: https://reviews.llvm.org/D122692
2022-04-07 11:31:42 -07:00
Aart Bik
0b55f94d2b [mlir][sparse] replace stack-based access pattern with dyn-alloc
Rationale:
Allocating the temporary buffers for access pattern expansion on the stack
(using alloca) is a bit too agressive, since it easily runs out of stack space
for large enveloping tensor dimensions. This revision changes the dynamic
allocation of these buffers with explicit alloc/dealloc pairs.

Reviewed By: bixia, wrengr

Differential Revision: https://reviews.llvm.org/D123253
2022-04-06 17:10:43 -07:00
wren romano
63bdcaf92a [mlir][sparse] Moving delete coo into codegen instead of runtime library
Prior to this change there were a number of places where the allocation and deallocation of SparseTensorCOO objects were not cleanly paired, leading to inconsistencies regarding whether each function released its tensor/coo arguments or not, as well as making it easy to run afoul of memory leaks, use-after-free, or double-free errors.  This change cleans up the codegen vs runtime boundary to resolve those issues.  Now, the only time the runtime library frees an object is either (a) because it's a function explicitly designed to do so, or (b) because the allocated object is entirely local to the function and would be a memory leak if not released.  Thus, now the codegen takes complete responsibility for releasing any objects it caused to be allocated.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D122435
2022-04-01 11:08:52 -07:00
Mehdi Amini
ba43d6f85c Revert "[GreedPatternRewriter] Preprocess constants while building worklist when not processing top down"
This reverts commit 59bbc7a085.

This exposes an issue breaking the contract of
`applyPatternsAndFoldGreedily` where we "converge" without applying
remaining patterns.
2022-04-01 06:16:55 +00:00
River Riddle
59bbc7a085 [GreedPatternRewriter] Preprocess constants while building worklist when not processing top down
This avoids accidentally reversing the order of constants during successive
application, e.g. when running the canonicalizer. This helps reduce the number
of iterations, and also avoids unnecessary changes to input IR.

Fixes #51892

Differential Revision: https://reviews.llvm.org/D122692
2022-03-31 12:08:55 -07:00
Javier Setoain
7783a178f5 [mlir][Sparse] Add option for VLA sparsification
Use "enable-vla-vectorization=vla" to generate a vector length agnostic
loops during vectorization. This option works for vectorization strategy 2.

Differential Revision: https://reviews.llvm.org/D118379
2022-03-25 10:54:49 +00:00
wren romano
df948127ac [mlir][sparse] Adding Action::kSparseToSparse for @newSparseTensor
This is work towards: https://github.com/llvm/llvm-project/issues/51652

This differential doesn't yet make use of the new kSparseToSparse, just introduces it.  The differential that finally makes use of them is D122061, which is the final differential in the chain that fixes bug 51652.

Depends On D122054

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D122055
2022-03-22 13:46:59 -07:00
Aart Bik
69a7759b40 [mlir][sparse] implement loop index value vectorization
with CHECK and integration test

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D122040
2022-03-21 10:40:38 -07:00
Jim Kitchen
414ed019ac [mlir][sparse] Introduce new binary and unary op
When the sparse_tensor dialect lowers linalg.generic,
it makes inferences about how the operations should
affect the looping logic. For example, multiplication
is an intersection while addition is a union of two
sparse tensors.

The new binary and unary op separate the looping logic
from the computation by nesting the computation code
inside a block which is merged at the appropriate level
in the lowered looping code.

The binary op can have custom computation code for the
overlap, left, and right sparse overlap regions. The
unary op can have custom computation code for the
present and absent values.

Reviewed by: aartbik

Differential Revision: https://reviews.llvm.org/D121018
2022-03-17 12:31:09 -05:00
gysit
7294be2b8e [mlir][linalg] Replace linalg.fill by OpDSL variant.
The revision removes the linalg.fill operation and renames the OpDSL generated linalg.fill_tensor operation to replace it. After the change, all named structured operations are defined via OpDSL and there are no handwritten operations left.

A side-effect of the change is that the pretty printed form changes from:
```
%1 = linalg.fill(%cst, %0) : f32, tensor<?x?xf32> -> tensor<?x?xf32>
```
changes to
```
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x?xf32>) -> tensor<?x?xf32>
```
Additionally, the builder signature now takes input and output value ranges as it is the case for all other OpDSL operations:
```
rewriter.create<linalg::FillOp>(loc, val, output)
```
changes to
```
rewriter.create<linalg::FillOp>(loc, ValueRange{val}, ValueRange{output})
```
All other changes remain minimal. In particular, the canonicalization patterns are the same and the `value()`, `output()`, and `result()` methods are now implemented by the FillOpInterface.

Depends On D120726

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D120728
2022-03-14 10:51:08 +00:00
Aart Bik
52fb4f53c2 [mlir][sparse] added linalg.dot to sparse kernel collection
Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D121315
2022-03-09 15:10:44 -08:00
Aart Bik
53cc3a0637 [mlir][sparse] index support in sparse compiler codegen
This revision adds support for the linalg.index to the sparse compiler
pipeline. In essence, this adds the ability to refer to indices in
the tensor index expression, as illustrated below:

 Y[i, j, k, l, m] = T[i, j, k, l, m]  * i * j

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D121251
2022-03-08 17:25:36 -08:00
Aart Bik
34381a76c1 [mlir][sparse] avoid some codeup in sparsification transformation
A very small refactoring, but a big impact on tests that expect an exact order.
This revision fixes the tests, but also makes them less brittle for similar
minor changes in the future!

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D119992
2022-02-16 17:39:04 -08:00
Matthias Springer
fe0bf7d469 [mlir][vector][NFC] Use CombiningKindAttr instead of StringAttr
This makes the op consistent with other ops in vector dialect.

Differential Revision: https://reviews.llvm.org/D119343
2022-02-10 19:13:29 +09:00
River Riddle
dec8af701f [mlir] Move SelectOp from Standard to Arithmetic
This is part of splitting up the standard dialect. See https://llvm.discourse.group/t/standard-dialect-the-final-chapter/ for discussion.

Differential Revision: https://reviews.llvm.org/D118648
2022-02-02 14:45:12 -08:00
Uday Bondhugula
92ccb8cc50 [MLIR][NFC] Update SCF pass cmd line names to prefix scf
Update SCF pass cmd line names to prefix `scf`. This is consistent with
guidelines/convention on how to name dialect passes. This also avoids
ambiguity on the context given the multiple `for` operations in the
tree.

NFC.

Differential Revision: https://reviews.llvm.org/D118564
2022-01-31 07:09:30 +05:30
Matthias Springer
ab47418df6 [mlir][bufferize] Merge tensor-constant-bufferize into arith-bufferize
The bufferization of arith.constant ops is also switched over to BufferizableOpInterface-based bufferization. The old implementation is deleted. Both implementations utilize GlobalCreator, now renamed to just `getGlobalFor`.

GlobalCreator no longer maintains a set of all created allocations to avoid duplicate allocations of the same constant. Instead, `getGlobalFor` scans the module to see if there is already a global allocation with the same constant value.

For compatibility reasons, it is still possible to create a pass that bufferizes only `arith.constant`. This pass (createConstantBufferizePass) could be deleted once all users were switched over to One-Shot bufferization.

Differential Revision: https://reviews.llvm.org/D118483
2022-01-30 21:37:48 +09:00
Aart Bik
efa15f4178 [mlir][sparse] add ability for sparse tensor output
Rationale:
Although file I/O is a bit alien to MLIR itself, we provide two convenient ways
for sparse tensor I/O. The input part was already there (behind the swiss army
knife sparse_tensor.new). Now we have a sparse_tensor.out to write out data. As
before, the ops are kept vague and may change in the future. For now this
allows us to compare TACO vs MLIR very easily.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D117850
2022-01-21 15:43:29 -08:00
wren romano
bc04a47038 [mlir][sparse] adding OverheadType::kIndex
Depends On D115008

This change opens the way for D115012, and removes some corner cases in `CodegenUtils.cpp`. The `SparseTensorAttrDefs.td` already specifies that we allow `0` bitwidth for the two overhead types and that it is interpreted to mean the architecture's native width.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D115010
2022-01-04 16:15:54 -08:00
Aart Bik
e1b9d80532 [mlir][sparse] add a few more sparse output tests (for generated IR)
also fixes two typos in IR doc

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D115288
2021-12-07 15:31:29 -08:00
Aart Bik
4f2ec7f983 [mlir][sparse] finalize sparse output in the presence of reductions
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D115091
2021-12-07 10:54:29 -08:00
Aart Bik
0e85232fa3 [mlir][sparse] refine simply dynamic sparse tensor outputs
Proper test for sparse tensor outputs is a single condition throughout
the whole tensor index expression (not a general conjunction, since this
may include other conditions that cause cancellation).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114810
2021-11-30 13:45:58 -08:00
Aart Bik
7d4da4e1ab [mlir][sparse] generalize sparse tensor output implementation
Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D114399
2021-11-29 16:15:53 -08:00
Alexander Belyaev
57470abc41 [mlir] Move memref.[tensor_load|buffer_cast|clone] to "bufferization" dialect.
https://llvm.discourse.group/t/rfc-dialect-for-bufferization-related-ops/4712

Differential Revision: https://reviews.llvm.org/D114552
2021-11-25 11:50:39 +01:00
Mogball
7c5ecc8b7e [mlir][vector] Insert/extract element can accept index
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.

LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.

Reviewed By: nicolasvasilache, jpienaar

Differential Revision: https://reviews.llvm.org/D114139
2021-11-18 22:40:29 +00:00