This revision connects the generated sparse code with an actual
sparse storage scheme, which can be initialized from a test file.
Lacking a first-class citizen SparseTensor type (with buffer),
the storage is hidden behind an opaque pointer with some "glue"
to bring the pointer back to tensor land. Rather than generating
sparse setup code for each different annotated tensor (viz. the
"pack" methods in TACO), a single "one-size-fits-all" implementation
has been added to the runtime support library. Many details and
abstractions need to be refined in the future, but this revision
allows full end-to-end integration testing and performance
benchmarking (with on one end, an annotated Lingalg
op and, on the other end, a JIT/AOT executable).
Reviewed By: nicolasvasilache, bixia
Differential Revision: https://reviews.llvm.org/D95847
This makes ignoring a result explicit by the user, and helps to prevent accidental errors with dropped results. Marking LogicalResult as no discard was always the intention from the beginning, but got lost along the way.
Differential Revision: https://reviews.llvm.org/D95841
Use cases with 16- or even 8-bit pointer/index structures have been identified.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D95015
Similar to the parallelization strategies, the vectorization strategies
provide control on what loops should be vectorize. Unlike the parallel
strategies, only innermost loops are considered, but including reductions,
with the control of vectorizing dense loops only or dense and sparse loops.
The vectorized loops are always controlled by a vector mask to avoid
overrunning the iterations, but subsequent vector operation folding removes
redundant masks and replaces the operations with more efficient counterparts.
Similarly, we will rely on subsequent loop optimizations to further optimize
masking, e.g. using an unconditional full vector loop and scalar cleanup loop.
The current strategy already demonstrates a nice interaction between the
sparse compiler and all prior optimizations that went into the vector dialect.
Ongoing discussion at:
https://llvm.discourse.group/t/mlir-support-for-sparse-tensors/2020/10
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D94551
This change gives sparse compiler clients more control over selecting
individual types for the pointers and indices in the sparse storage schemes.
Narrower width obviously results in smaller memory footprints, but the
range should always suffice for the maximum number of entries or index value.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D92126
This CL adds the ability to request different parallelization strategies
for the generate code. Every "parallel" loop is a candidate, and converted
to a parallel op if it is an actual for-loop (not a while) and the strategy
allows dense/sparse outer/inner parallelization.
This will connect directly with the work of @ezhulenev on parallel loops.
Still TBD: vectorization strategy
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D91978
As discussed in https://llvm.discourse.group/t/mlir-support-for-sparse-tensors/2020
this CL is the start of sparse tensor compiler support in MLIR. Starting with a
"dense" kernel expressed in the Linalg dialect together with per-dimension
sparsity annotations on the tensors, the compiler automatically lowers the
kernel to sparse code using the methods described in Fredrik Kjolstad's thesis.
Many details are still TBD. For example, the sparse "bufferization" is purely
done locally since we don't have a global solution for propagating sparsity
yet. Furthermore, code to input and output the sparse tensors is missing.
Nevertheless, with some hand modifications, the generated MLIR can be
easily converted into runnable code already.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D90994