This revision contains all "sparsification" ops and rewriting necessary to support sparse output tensors when the kernel has no reduction (viz. insertions occur in lexicographic order and are "injective"). This will be later generalized to allow reductions too. Also, this first revision only supports sparse 1-d tensors (viz. vectors) as output in the runtime support library. This will be generalized to n-d tensors shortly. But this way, the revision is kept to a manageable size.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D113705
The current implementation used explicit index->int64_t casts for some, but
not all instances of passing values of type "index" in and from the sparse
support library. This revision makes the situation more consistent by
using new "index_t" type at all such places (which allows for less trivial
casting in the generated MLIR code). Note that the current revision still
assumes that "index" is 64-bit wide. If we want to support targets with
alternative "index" bit widths, we need to build the support library different.
But the current revision is a step forward by making this requirement explicit
and more visible.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D112122
Next step towards supporting sparse tensors outputs.
Also some minor refactoring of enum constants as well
as replacing tensor arguments with proper buffer arguments
(latter is required for more general sizes arguments for
the sparse_tensor.init operation, as well as more general
spares_tensor.convert operations later)
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D111771
Some random changes that were hanging around in my workspace. Also,
a tiny step towards creating a header file for the sparse utils lib.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111589
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
When generating code to add an element to SparseTensorCOO (e.g., when doing dense=>sparse conversion), we used to check for nonzero values on the runtime side, whereas now we generate MLIR code to do that check.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D110121
This revision removes the ad-hoc MemRefs that were needed using the old
ABI (when we still passed by value) and replaces them with the shared
StridedMemRef definitions of CRunnerUtils (possible now that we pass by
pointer). This avoids code duplication and makes sure we have a consistent
view of strided memory references in all our support libraries.
Reviewed By: jsetoain
Differential Revision: https://reviews.llvm.org/D110221
This change adds automatic wrapper functoins with emit_c_interface
to all methods in the sparse support library that deal with MEMREFs.
The wrappers will take care of passing MEMREFs by value internally
and by pointer externally, thereby avoiding ABI issues across platforms.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D110219
We are having issues running the integration test of the sparse compiler
on AArch64 (crashing in the lib). This revision adds more assertions.
Reviewed By: jsetoain
Differential Revision: https://reviews.llvm.org/D109861
Create a gpu memset op and corresponding CUDA and ROCm wrappers.
Reviewed By: herhut, lorenrose1013
Differential Revision: https://reviews.llvm.org/D107548
This simplifies setting up sparse tensors through C-style data structures.
Useful for runtimes that want to interact with MLIR-generated code
without knowning about all bufferization details (viz. memrefs).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109251
(1) renamed SparseTensor to SparseTensorCOO, the other one remains SparseTensorStorage to focus on contrast
(2) documents difference between public API exclusively for compiler-generated code and methods that could be used by other runtimes (TBD) that want to interact with MLIR
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D109039
Trying to reduce confusion by having the name of the public method match that of the private method for handling the recursion. Also adding some comments to SparseTensorStorage::fromCOO to help clarify what the recursive calls are doing in the dense case.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D108954
Drop mgpuMemHostRegisterMemRef's dependence on LLVM Support. This
method is the only one in CUDA runtime wrappers library that creates
a dependence on libLLVMSupport due to its use of SmallVector and
ArrayRef. The code can be as easily/compactly written without those ADT.
The dependence on LLVMSupport adds a significant amount of additional
complexity for external things that want to link this library in (both
statically or as a shared object) since libLLVMSupport includes numerous
other objects that are sensitive to C++ compiler version and ABI.
Differential Revision: https://reviews.llvm.org/D108684
This prepares general sparse to sparse conversions. The code that
needs to be generated using this new feature is now simply:
(1) coo = sparse_tensor_1->asCOO(); // source format1
(2) sparse_tensor_2 = newSparseTensor(coo); // destination format2
By using COO as an intermediate, we can do *all* conversions without
having to implement the full O(N^2) conversion matrix. Note that we
can always improve particular conversions individually if a faster
solution is required.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108681
The emplace commands are variadic and should take all the constructor arguments directly, since they implicitly call the constructor themselves in order to avoid the cost of constructing and then moving/copying temporaries.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D108670
Rationale:
Passing in a pointer to the memref data in order to implement the
dense to sparse conversion was a bit too low-level. This revision
improves upon that approach with a cleaner solution of generating
a loop nest in MLIR code itself that prepares the COO object before
passing it to our "swiss army knife" setup. This is much more
intuitive *and* now also allows for dynamic shapes.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D108491
Implements lowering dense to sparse conversion, for static tensor types only.
First step towards general sparse_tensor.convert support.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D107681
Instead, include `<cstdlib>` which is the canonical header containing
the declaration of `alloca()`.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D107699
There was a slightly mismatch between the double COO and actual numerical
type in the final sparse tensor storage (due to external formats always
using double). This minor revision removes that inconsistency by using a
properly typed COO and casting during the "add" method instead. This also
prepares alternative ways of initializing the COO object.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D107310
Rationale:
External file formats always store the values as doubles, so this was
hard coded in the memory resident COO scheme used to pass data into the
final sparse storage scheme during setup. However, with alternative methods
on the horizon of setting up these temporary COO schemes, it is time to
properly template this data structure.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D107001
Remove uses of to-be-deprecated API. In cases where the correct
element type was not immediately obvious to me, fall back to
explicit getPointerElementType().
This format was missing from the support library. Although there are some
subtleties reading in an external format for int64 as double, there is no
good reason to omit support for this data type form the support library.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D106016
While replacing linalg.copy with the more desired memref.copy
I found a bug in the support library for rank 0 memref copying.
The code would loop for something like the following, since there
is code for no-rank and rank > 0, but rank == 0 was unexpected.
memref.copy %0, %1: memref<f32> to memref<f32>
Note that a "regression test" for this will follow using the
sparse compiler migration to memref.copy which exercises this
case many times.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D106036
Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group.
Reviewed By: ftynse, herhut
Differential Revision: https://reviews.llvm.org/D104780
Useful for "exhaustively" testing and benchmarking annotation combinations
to verify correctness and perform state space search for best performing.
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D103566
Depends On D103109
If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103203
Depends On D103102
Not yet implemented:
1. Error handling after synchronous await
2. Error handling for async groups
Will be addressed in the followup PRs
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D103109
Removed some of the older raw "MLIRized" versions that are
no longer needed now that the sparse runtime support library
can focus on the proper sparse tensor types rather than the
opague pointer approach of the past. This avoids legacy...
Reviewed By: penpornk
Differential Revision: https://reviews.llvm.org/D102960
Fix inconsistent MLIR CMake variable names. Consistently name them as
MLIR_ENABLE_<feature>.
Eg: MLIR_CUDA_RUNNER_ENABLED -> MLIR_ENABLE_CUDA_RUNNER
MLIR follows (or has mostly followed) the convention of naming
cmake enabling variables in the from MLIR_ENABLE_... etc. Using a
convention here is easy and also important for convenience. A counter
pattern was started with variables named MLIR_..._ENABLED. This led to a
sequence of related counter patterns: MLIR_CUDA_RUNNER_ENABLED,
MLIR_ROCM_RUNNER_ENABLED, etc.. From a naming standpoint, the imperative
form is more meaningful. Additional discussion at:
https://llvm.discourse.group/t/mlir-cmake-enable-variable-naming-convention/3520
Switch all inconsistent ones to the ENABLE form. Keep the couple of old
mappings needed until buildbot config is migrated.
Differential Revision: https://reviews.llvm.org/D102976