Note that even though the sparse runtime support lib always uses SoA
storage for COO storage (and provides correct codegen by means of views
into this storage), in some rare cases we need the true physical SoA
storage as a coordinate buffer. This PR provides that functionality by
means of a (costly) coordinate buffer call.
Since this is currently only used for testing/debugging by means of the
sparse_tensor.print method, this solution is acceptable. If we ever want
a performing version of this, we should truly support AoS storage of COO
in addition to the SoA used right now.
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
This commit fixes the remaining memory leaks in the MLIR test suite.
`check-mlir` now passes when built with ASAN.
This change lifts the restriction that purely allocated empty sparse
tensors cannot escape the method. Instead it makes a best effort to add
a finalizing operation before the escape.
This assumes that
(1) we never build sparse tensors across method boundaries
(e.g. allocate in one, insert in other method)
(2) if we have other uses of the empty allocation in the
same method, we assume that either that op will fail
or will do the finalization for us.
This is best-effort, but fixes some very obvious missing cases.
This commit adds a new test-only op:
`sparse_tensor.has_runtime_library`. The op returns "1" if the sparse
compiler runs in runtime library mode.
This op is useful for writing test cases that require different IR
depending on whether the sparse compiler runs in runtime library or
codegen mode.
This commit fixes a memory leak in `sparse_pack_d.mlir`. This test case
uses `sparse_tensor.assemble` to create a sparse tensor SSA value from
existing buffers. This runtime library reallocates+copies the existing
buffers; the codegen path does not. Therefore, the test requires
additional deallocations when running in runtime library mode.
Alternatives considered:
- Make the codegen path allocate. "Codegen" is the "default" compilation
mode and it is handling `sparse_tensor.assemble` correctly. The issue is
with the runtime library path, which should not allocate. Therefore, it
is better to put a workaround in the runtime library path than to work
around the issue with a new flag in the codegen path.
- Add a `sparse_tensor.runtime_only` attribute to
`bufferization.dealloc_tensor`. Verifying that the attribute can only be
attached to `bufferization.dealloc_tensor` may introduce an unwanted
dependency of `MLIRSparseTensorDialect` on `MLIRBufferizationDialect`.
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
This is first step (of many) cleaning up our tests to use the new and
exciting sparse_tensor.print operation instead of lengthy extraction +
print ops.
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
1. Add python test for n out of m
2. Add more methods for python binding
3. Add verification for n:m and invalid encoding tests
4. Add e2e test for n:m
Previous PRs for n:m #80501#79935
The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases.
Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit.
Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
* Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering.
Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)
Reverts llvm/llvm-project#79865
I think there is a bug in the stride computation in
`SubViewOp::inferResultType`. (Was already there before this change.)
Reverting this commit for now and updating the original pull request
with a fix and more test cases.
The `memref.subview` verifier currently checks result shape, element
type, memory space and offset of the result type. However, the strides
of the result type are currently not verified. This commit adds
verification of result strides for non-rank reducing ops and fixes
invalid IR in test cases.
Verification of result strides for ops with rank reductions is more
complex (and there could be multiple possible result types). That is
left for a separate commit.
Also refactor the implementation a bit:
* If `computeMemRefRankReductionMask` could not compute the dropped
dimensions, there must be something wrong with the op. Return
`FailureOr` instead of `std::optional`.
* `isRankReducedMemRefType` did much more than just checking whether the
op has rank reductions or not. Inline the implementation into the
verifier and add better comments.
* `produceSubViewErrorMsg` does not have to be templatized.
Note, tensor.empty may feed into SPARSE output (meaning it truly has no
values yet), but for a DENSE output, it should always have an initial
value. We ran a verifier over all our tests and this is the only
remaining omission.
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.