explores various sparsity combinations of
the SDMM kernel and verifies that the computed
result is the same for all cases
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115476
Add convertFromMLIRSparseTensor to the supporting C shared library to convert
SparseTensorStorage to COO-flavor format.
Add Python routine sparse_tensor_to_coo_tensor to convert sparse tensor storage
pointer to numpy values for COO-flavor format tensor.
Add a Python test for sparse tensor output.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115557
The sparse tensor code generator allocates memory for the output tensor. As
such, we only need to allocate a MemRefDescriptor to receive the output tensor
and do not need to allocate and initialize the storage for the tensor.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115292
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115091
Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114399
First version was vectors only. With some clever "path" insertion,
we now support any d-dimensional tensor. Up next: reductions too
Reviewed By: bixia, wrengr
Differential Revision: https://reviews.llvm.org/D114024
This revision contains all "sparsification" ops and rewriting necessary to support sparse output tensors when the kernel has no reduction (viz. insertions occur in lexicographic order and are "injective"). This will be later generalized to allow reductions too. Also, this first revision only supports sparse 1-d tensors (viz. vectors) as output in the runtime support library. This will be generalized to n-d tensors shortly. But this way, the revision is kept to a manageable size.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D113705
Added a type with different pointer/index bit width. Also
added some sanity CHECKs on the stored indices.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D112778
Rationale:
The silent exit(1) gives little clues on where the error occurs on failure
and may even be confusing at first. The CHECK testing of all computed values
and indices may be a little bit more elaborate, but it directly pinpoints
where errors happen if they occur. This style is also consistent with
the other tests, which I actually prefer.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112688
This revison lifts the artificial restriction on having exact matches between
source and destination type shapes. A static size may become dynamic. We still
reject changing a dynamic size into a static size to avoid the need for a
runtime "assert" on the conversion. This revision also refactors some of the
conversion code to share same-content buffers.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111915
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
This integration tests runs a fused and non-fused version of
sampled matrix multiplication. Both should eventually have the
same performance!
NOTE: relies on pending tensor.init fix!
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110444
The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.
Add tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D110373
This test makes sure kernels map to efficient sparse code, i.e. all
compressed for-loops, no co-iterating while loops. In addition, this
revision removes the special constant folding inside the sparse
compiler in favor of Mahesh' new generic linalg folding. Thanks!
NOTE: relies on Mahesh fix, which needs to be rebased first
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110001
Now not just SUM, but also PRODUCT, AND, OR, XOR. The reductions
MIN and MAX are still to be done (also depends on recognizing
these operations in cmp-select constructs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110203
Note that this revision adds a very tiny bit of constant folding in the
sparse compiler lattice construction. Although I am generally trying to
avoid such canonicalizations (and rely on other passes to fix this instead),
the benefits of avoiding a very expensive disjunction lattice construction
justify having this special code (at least for now).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109939
This enables the sparsification of more kernels, such as convolutions
where there is a x(i+j) subscript. It also enables more tensor invariants
such as x(1) or other affine subscripts such as x(i+1). Currently, we
reject sparsity altogether for such tensors. Despite this restriction,
however, we can already handle a lot more kernels with compound subscripts
for dense access (viz. convolution with dense input and sparse filter).
Some unit tests and an integration test demonstrate new capability.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109783
Further enhance the set of operations that can be handled by the sparse compiler
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109413
Conversion to the LLVM dialect is being refactored to be more progressive and
is now performed as a series of independent passes converting different
dialects. These passes may produce `unrealized_conversion_cast` operations that
represent pending conversions between built-in and LLVM dialect types.
Historically, a more monolithic Standard-to-LLVM conversion pass did not need
these casts as all operations were converted in one shot. Previous refactorings
have led to the requirement of running the Standard-to-LLVM conversion pass to
clean up `unrealized_conversion_cast`s even though the IR had no standard
operations in it. The pass must have been also run the last among all to-LLVM
passes, in contradiction with the partial conversion logic. Additionally, the
way it was set up could produce invalid operations by removing casts between
LLVM and built-in types even when the consumer did not accept the uncasted
type, or could lead to cryptic conversion errors (recursive application of the
rewrite pattern on `unrealized_conversion_cast` as a means to indicate failure
to eliminate casts).
In fact, the need to eliminate A->B->A `unrealized_conversion_cast`s is not
specific to to-LLVM conversions and can be factored out into a separate type
reconciliation pass, which is achieved in this commit. While the cast operation
itself has a folder pattern, it is insufficient in most conversion passes as
the folder only applies to the second cast. Without complex legality setup in
the conversion target, the conversion infra will either consider the cast
operations valid and not fold them (a separate canonicalization would be
necessary to trigger the folding), or consider the first cast invalid upon
generation and stop with error. The pattern provided by the reconciliation pass
applies to the first cast operation instead. Furthermore, having a separate
pass makes it clear when `unrealized_conversion_cast`s could not have been
eliminated since it is the only reason why this pass can fail.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109507
Recent changes outside sparse compiler exposed the requirement of running a
new pass (lower-affine) but this only became apparent with private testing.
By adding some vectorized runs to integration test, we will detect the need
for such changes earlier and also widen codegen coverage of course.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D108667
Looks "under the hood" of the sparse stogage schemes.
Users should typically not be interested in these details
(hey, that is why we have "sparse compilers"!) but this
test makes sure the compact contents are as expected.
Reviewed By: ThomasRaoux, bixia
Differential Revision: https://reviews.llvm.org/D107683
Implements lowering dense to sparse conversion, for static tensor types only.
First step towards general sparse_tensor.convert support.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D107681
With the migration from linalg.copy to memref.copy, this pass
(which was there solely to handle the linalg.copy op) is no
longer required for the end-to-end path for sparse compilation.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D106073
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
Adds an integration test for the SPMM (sparse matrix multiplication) kernel, which multiplies a sparse matrix by a dense matrix, resulting in a dense matrix. This is just a simple modification on the existing matrix-vector multiplication kernel.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D104334
Removed some of the older raw "MLIRized" versions that are
no longer needed now that the sparse runtime support library
can focus on the proper sparse tensor types rather than the
opague pointer approach of the past. This avoids legacy...
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D102960
This revision completes the "dimension ordering" feature
of sparse tensor types that enables the programmer to
define a preferred order on dimension access (other than
the default left-to-right order). This enables e.g. selection
of column-major over row-major storage for sparse matrices,
but generalized to any rank, as in:
dimOrdering = affine_map<(i,j,k,l,m,n,o,p) -> (p,o,j,k,i,l,m,n)>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102856
The experimental flag for "inplace" bufferization in the sparse
compiler can be replaced with the new inplace attribute. This gives
a uniform way of expressing the more efficient way of bufferization.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102538