Depends On D115008
This change opens the way for D115012, and removes some corner cases in `CodegenUtils.cpp`. The `SparseTensorAttrDefs.td` already specifies that we allow `0` bitwidth for the two overhead types and that it is interpreted to mean the architecture's native width.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D115010
This revision implements sparse outputs (from scratch) in all cases where
the loops can be reordered with all but one parallel loops outer. If the
inner parallel loop appears inside one or more reductions loops, then an
access pattern expansion is required (aka. workspaces in TACO speak).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D115091
Proper test for sparse tensor outputs is a single condition throughout
the whole tensor index expression (not a general conjunction, since this
may include other conditions that cause cancellation).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114810
Moves sparse tensor output support forward by generalizing from injective
insertions only to include reductions. This revision accepts the case with all
parallel outer and all reduction inner loops, since that can be handled with
an injective insertion still. Next revision will allow the inner parallel loop
to move inward (but that will require "access pattern expansion" aka "workspace").
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D114399
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.
LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.
Reviewed By: nicolasvasilache, jpienaar
Differential Revision: https://reviews.llvm.org/D114139
First version was vectors only. With some clever "path" insertion,
we now support any d-dimensional tensor. Up next: reductions too
Reviewed By: bixia, wrengr
Differential Revision: https://reviews.llvm.org/D114024
This revision contains all "sparsification" ops and rewriting necessary to support sparse output tensors when the kernel has no reduction (viz. insertions occur in lexicographic order and are "injective"). This will be later generalized to allow reductions too. Also, this first revision only supports sparse 1-d tensors (viz. vectors) as output in the runtime support library. This will be generalized to n-d tensors shortly. But this way, the revision is kept to a manageable size.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D113705
The earlier reduction "scalarization" was only applied to a chain of
*innermost* and *for* loops. This revision generalizes this to any
nesting of for- and while-loops. This implies that reductions can be
implemented with a lot less load and store operations. The chaining
is implemented with a forest of yield statements (but not as bad as
when we would also include the while-induction).
Fixes https://bugs.llvm.org/show_bug.cgi?id=52311
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D113078
Rationale:
The currently used trait was demanding that all types are the same
which is not true (since the sparse part may change and the dim sizes
may be relaxed). This revision uses the correct trait and makes the
rank match test explicit in the verify method.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D112576
Even though tensor.cast is not part of the sparse tensor dialect,
it may be used to cast static dimension sizes to dynamic dimension
sizes for sparse tensors without changing the actual sparse tensor
itself. Those cases should be lowered properly when replacing sparse
tensor types with their opaque pointers. Likewise, no op sparse
conversions are handled by this revision in a similar manner.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112173
The current implementation used explicit index->int64_t casts for some, but
not all instances of passing values of type "index" in and from the sparse
support library. This revision makes the situation more consistent by
using new "index_t" type at all such places (which allows for less trivial
casting in the generated MLIR code). Note that the current revision still
assumes that "index" is 64-bit wide. If we want to support targets with
alternative "index" bit widths, we need to build the support library different.
But the current revision is a step forward by making this requirement explicit
and more visible.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D112122
This revison lifts the artificial restriction on having exact matches between
source and destination type shapes. A static size may become dynamic. We still
reject changing a dynamic size into a static size to avoid the need for a
runtime "assert" on the conversion. This revision also refactors some of the
conversion code to share same-content buffers.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111915
Next step towards supporting sparse tensors outputs.
Also some minor refactoring of enum constants as well
as replacing tensor arguments with proper buffer arguments
(latter is required for more general sizes arguments for
the sparse_tensor.init operation, as well as more general
spares_tensor.convert operations later)
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D111771
This is the first step towards supporting general sparse tensors as output
of operations. The init sparse tensor is used to materialize an empty sparse
tensor of given shape and sparsity into a subsequent computation (similar to
the dense tensor init operation counterpart).
Example:
%c = sparse_tensor.init %d1, %d2 : tensor<?x?xf32, #SparseMatrix>
%0 = linalg.matmul
ins(%a, %b: tensor<?x?xf32>, tensor<?x?xf32>)
outs(%c: tensor<?x?xf32, #SparseMatrix>) -> tensor<?x?xf32, #SparseMatrix>
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111684
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
This relaxes vectorization of dense memrefs a bit so that affine expressions
are allowed in more outer dimensions. Vectorization of non unit stride
references is disabled though, since this seems ineffective anyway.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111469
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
This revision makes sure that when the output buffer materializes locally
(in contrast with the passing in of output tensors either in-place or not
in-place), the zero initialization assumption is preserved. This also adds
a bit more documentation on our sparse kernel assumption (viz. TACO
assumptions).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110442
The sparse constant provides a constant tensor in coordinate format. We first split the sparse constant into a constant tensor for indices and a constant tensor for values. We then generate a loop to fill a sparse tensor in coordinate format using the tensors for the indices and the values. Finally, we convert the sparse tensor in coordinate format to the destination sparse tensor format.
Add tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D110373
This test makes sure kernels map to efficient sparse code, i.e. all
compressed for-loops, no co-iterating while loops. In addition, this
revision removes the special constant folding inside the sparse
compiler in favor of Mahesh' new generic linalg folding. Thanks!
NOTE: relies on Mahesh fix, which needs to be rebased first
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110001
Now not just SUM, but also PRODUCT, AND, OR, XOR. The reductions
MIN and MAX are still to be done (also depends on recognizing
these operations in cmp-select constructs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D110203
This enables the sparsification of more kernels, such as convolutions
where there is a x(i+j) subscript. It also enables more tensor invariants
such as x(1) or other affine subscripts such as x(i+1). Currently, we
reject sparsity altogether for such tensors. Despite this restriction,
however, we can already handle a lot more kernels with compound subscripts
for dense access (viz. convolution with dense input and sparse filter).
Some unit tests and an integration test demonstrate new capability.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109783
Generate an scf.for instead of an scf.if for the partial iteration. This is for consistency reasons: The peeling of linalg.tiled_loop also uses another loop for the partial iteration.
Note: Canonicalizations patterns may rewrite partial iterations to scf.if afterwards.
Differential Revision: https://reviews.llvm.org/D109568
The sparse index order must always be satisfied, but this
may give a choice in topsorts for several cases. We broke
ties in favor of any dense index order, since this gives
good locality. However, breaking ties in favor of pushing
unrelated indices into sparse iteration spaces gives better
asymptotic complexity. This revision improves the heuristic.
Note that in the long run, we are really interested in using
ML for ML to find the best loop ordering as a replacement for
such heuristics.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109100
Currently the builtin dialect is the default namespace used for parsing
and printing. As such module and func don't need to be prefixed.
In the case of some dialects that defines new regions for their own
purpose (like SpirV modules for example), it can be beneficial to
change the default dialect in order to improve readability.
Differential Revision: https://reviews.llvm.org/D107236
Rationale:
Passing in a pointer to the memref data in order to implement the
dense to sparse conversion was a bit too low-level. This revision
improves upon that approach with a cleaner solution of generating
a loop nest in MLIR code itself that prepares the COO object before
passing it to our "swiss army knife" setup. This is much more
intuitive *and* now also allows for dynamic shapes.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108491
Folding in the MLIR uses the order of the type directly
but folding in the underlying implementation must take
the dim ordering into account. These tests clarify that
behavior and verify it is done right.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108474
Apply the "for loop peeling" pattern from SCF dialect transforms. This pattern splits scf.for loops into full and partial iterations. In the full iteration, all masked loads/stores are canonicalized to unmasked loads/stores.
Differential Revision: https://reviews.llvm.org/D107733
This shares more code with existing utilities. Also, to be consistent,
we moved dimension permutation on the DimOp to the tensor lowering phase.
This way, both pre-existing DimOps on sparse tensors (not likely but
possible) as well as compiler generated DimOps are handled consistently.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108309
Implements lowering dense to sparse conversion, for static tensor types only.
First step towards general sparse_tensor.convert support.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D107681
Introduces a conversion from one (sparse) tensor type to another
(sparse) tensor type. See the operation doc for details. Actual
codegen for all cases is still TBD.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D107205
The order of testing in two sparse tensor ops was incorrect,
which could cause an invalid cast (crashing the compiler instead
of reporting the error). This revision fixes that bug.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D106841
Arbitrary shifts have some complications, but shift by invariants
(viz. tensor index exp only at left hand side) can be easily
handled with the conjunctive rule.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D106002
Adds zero-preserving unary operators from std. Also adds xor.
Performs minor refactoring to remove "zero" node, and pushed
the irregular logic for negi (not support in std) into one place.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105928
Integral AND and OR follow the simple conjunction and disjuction rules
for lattice building. This revision also completes some of the Merge
refactoring by moving the remainder parts that are merger specific from
sparsification into utils files.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105851
Right now, we only accept x/c with nonzero c, since this
conceptually can be treated as a x*(1/c) conjunction for both
FP and INT as far as lattice computations go. The codegen
keeps the division though to preserve precise semantics.
See discussion:
https://llvm.discourse.group/t/sparse-tensors-in-mlir/3389/28
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105731
This revision extends the sparse compiler support from fp/int addition and multiplication to fp/int negation and subtraction, thereby increasing the scope of sparse kernels that can be compiled.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D105306