This op used to belong to the sparse dialect, but there are use cases for dense bufferization as well. (E.g., when a tensor alloc is returned from a function and should be deallocated at the call site.) This change moves the op to the bufferization dialect, which now has an `alloc_tensor` and a `dealloc_tensor` op.
Differential Revision: https://reviews.llvm.org/D129985
This diff adds an integration test which does element wise multiplication for two sparse 3-d tensors of size 3x3x5
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D129638
Fixed some new memory leaks after migration to new
bufferization. One is expected, the other may need
some more careful analysis.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D129805
After recent bufferization improvement, this test
started failing due to missed zero initialization.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D129800
This change removes the partial bufferization passes from the sparse compilation pipeline and replaces them with One-Shot Bufferize. One-Shot Analysis (and TensorCopyInsertion) is used to resolve all out-of-place bufferizations, dense and sparse. Dense ops are then bufferized with BufferizableOpInterface. Sparse ops are still bufferized in the Sparsification pass.
Details:
* Dense allocations are automatically deallocated, unless they are yielded from a block. (In that case the alloc would leak.) All test cases are modified accordingly. E.g., some funcs now have an "out" tensor argument that is returned from the function. (That way, the allocation happens at the call site.)
* Sparse allocations are *not* automatically deallocated. They must be "released" manually. (No change, this will be addressed in a future change.)
* Sparse tensor copies are not supported yet. (Future change)
* Sparsification no longer has to consider inplacability. If necessary, allocations and/or copies are inserted during TensorCopyInsertion. All tensors are inplaceable by the time Sparsification is running. Instead of marking a tensor as "not inplaceable", it can be marked as "not writable", which will trigger an allocation and/or copy during TensorCopyInsertion.
Differential Revision: https://reviews.llvm.org/D129356
A previous revision implemented expand/collapse reshaping between
dense and sparse tensors for sparse2dense and dense2sparse since those
could use the "cheap" view reshape on the already materialized
dense tensor (at either the input or output side), and do some
reshuffling from or to sparse. The dense2dense case, as always,
is handled with a "cheap" view change.
This revision implements the sparse2sparse cases. Lacking any "view"
support on sparse tensors this operation necessarily has to perform
data reshuffling on both ends.
Tracker for improving this:
https://github.com/llvm/llvm-project/issues/56477
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D129416
The revision makes a start with implementing expand/collapse reshaping
for sparse tensors. When either source or destination is sparse, but
other is dense, the "cheap" dense reshape can be used prior to converting
from or to a sparse tensor.
Note1
sparse to sparse reshaping is still TBD.
Note2
in the long run, we may want to implement a "view" into a sparse tensor so that the operation remains cheap and does not require data shuffling
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D129031
When the iteration graph is cyclic (even after several attempts using less and less constraints), the current sparse compiler bails out, and no rewriting hapens. However, this revision adds some new logic where the sparse compiler tries to find a single input sparse tensor that breaks the cycle, and then adds a proper sparse conversion operation. This way, more incoming kernels can be handled!
Note, the resulting code is not optimal (although it keeps more or less proper "sparse" complexity), and more improvements should be added (especially when the kernel directly yields without computation, such as the transpose example). However, handling is better than not handling ;-)
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128847
Adding more test cases for sparse_tensor.BinaryOp, including different cases when overlap/left/right region is implemented/empty/identity
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D128383
Previously, the sparse_tensor.unary integration test does not contain cases with the use of `linalg.index` (previoulsy unsupported), this commit adds test cases that use `linalg.index` operators.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D128460
This adds weak versions of the truncation libcalls in case the runtime
environment doesn't have them.
Differential Revision: https://reviews.llvm.org/D128091
This resolves problems reported in commit 1a20252978.
1. Promote to float lowering for nodes XINT_TO_FP
2. Bail out f16 from shuffle combine due to vector type is not legal in the version
The semi-ring blocks were simply "inlined" by the sparse compiler but
without any filtering or patching. This revision improves the analysis
(rejecting blocks that use non-invariant computations from outside
their blocks, except for linalg.index) and also improves the codegen
by properly patching up index computations (previous version crashed).
With a regression test. Also updated the documentation now that the
example code is properly working.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128000
The LinalgElementwiseOpFusion pass has become smarter, and converts
the simple conversion linalg operation into a sparse dialect convert
operation. However, since our current bufferization does not take the
new semantics into consideration, we leak memory of the allocation.
For now, this has been fixed by making the operation less trivial.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128002
Support complex numbers for Matrix Market Exchange Formats. Add a test case.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D127138
This is the first PR to add `F16` and `BF16` support to the sparse codegen. There are still problems in supporting these two data types, such as `BF16` is not quite working yet.
Add tests cases.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D127010
Now that we have an AllocTensorOp (previously InitTensorOp) in the bufferization dialect, the InitOp in the sparse dialect is no longer needed.
Differential Revision: https://reviews.llvm.org/D126180
This was carry over from LLVM IR where the alias definition can
be ambiguous, but MLIR type aliases have no such problems.
Having the `type` keyword is superfluous and doesn't add anything.
This commit drops it, which also nicely aligns with the syntax for
attribute aliases (which doesn't have a keyword).
Differential Revision: https://reviews.llvm.org/D125501
This is the first implementation of complex (f64 and f32) support
in the sparse compiler, with complex add/mul as first operations.
Note that various features are still TBD, such as other ops, and
reading in complex values from file. Also, note that the
std::complex<float> had a bit of an ABI issue when passed as
single argument. It is still TBD if better solutions are possible.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D125596
Implements a floating-point sign operator (using the new semi-ring ops)
that accomodates +/-Inf and +/-NaN in consistent way.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D125494
This was leftover from when the standard dialect was destroyed, and
when FuncOp moved to the func dialect. Now that these transitions
have settled a bit we can drop these.
Most updates were handled using a simple regex: replace `^( *)func` with `$1func.func`
Differential Revision: https://reviews.llvm.org/D124146
Adding lowering for Unary and Binary required several changes due to
their unique nature of containing custom code for different "regions"
of the sparse structure being operated on. Along with a Kind, a pointer
to the Operation is passed along to be merged once the lattice
structure is figured out.
The original operation is maintained, as it is required for subsequent
lattice decisions. However, sparse_tensor.binary has some branches
are considered as fully handled and therefore are marked with as
kBinaryBranch to distinguish them.
A unique aspect of the custom code is that sometimes the desired result
is no result at all -- i.e. a user wants overlapping sparse entries to
become empty in the output. The solution to this is to return an
uninitialized Value(), which is checked and handled elsewhere in the
code and results in nothing being written to the output tensor for that
case.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D123057
The SparseTensor passes currently use opaque numbers for the CLI, despite using an enum internally. This patch exposes the enums instead of numbered items that are matched back to the enum.
Fixes GitHub issue #53389
Reviewed by: aartbik, mehdi_amini
Differential Revision: https://reviews.llvm.org/D123876
This revision adds support for the linalg.index to the sparse compiler
pipeline. In essence, this adds the ability to refer to indices in
the tensor index expression, as illustrated below:
Y[i, j, k, l, m] = T[i, j, k, l, m] * i * j
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D121251
Now that sparse tensor types are first-class citizens and the sparse compiler
is taking shape, it is time to make sure other compiler optimizations compose
well with sparse tensors. Mostly, this should be completely transparent (i.e.,
dense and sparse take the same path). However, in some cases, optimizations
only make sense in the context of sparse tensors. This is a first example of
such an optimization, where fusing a sampled elt-wise multiplication only makes
sense when the resulting kernel has a potential lower asymptotic complexity due
to the sparsity.
As an extreme example, running SDDMM with 1024x1024 matrices and a sparse
sampling matrix with only two elements runs in 463.55ms in the unfused
case but just 0.032ms in the fused case, with a speedup of 14485x that
is only possible in the exciting world of sparse computations!
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D120429
It is time to compose Linalg related optimizations with SparseTensor
related optimizations. This is a careful first start by adding some
general Linalg optimizations "upstream" of the sparse compiler in the
full sparse compiler pipeline. Some minor changes were needed to make
those optimizations aware of sparsity.
Note that after this, we will add a sparse specific fusion rule,
just to demonstrate the power of the new composition.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D119971