Note that even though the sparse runtime support lib always uses SoA
storage for COO storage (and provides correct codegen by means of views
into this storage), in some rare cases we need the true physical SoA
storage as a coordinate buffer. This PR provides that functionality by
means of a (costly) coordinate buffer call.
Since this is currently only used for testing/debugging by means of the
sparse_tensor.print method, this solution is acceptable. If we ever want
a performing version of this, we should truly support AoS storage of COO
in addition to the SoA used right now.
This is a follow-up for #81790. This patch basically extends:
* test/Integration/Dialect/Linalg/CPU/mmt4d.mlir
with pack/unpack ops so that to overall computation is a matrix
multiplication (as opposed to linalg.mmt4d). For comparison (and to make
it easier to verify correctness), linalg.matmul is also included in the
test.
Buffers are no longer deallocated by One-Shot Bufferize - this is now
done by a separate buffer deallocation pass.
In order to see the leaks in SVE integration tests, use the following
CMake flags (enables the address sanitizer and SVE integration tests):
-DLLVM_USE_SANITIZER="Address"
-DMLIR_INCLUDE_INTEGRATION_TESTS=On
-DMLIR_RUN_ARM_SVE_TESTS=On
Follow-up for #85366
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.
Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.
As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
This commit fixes the remaining memory leaks in the MLIR test suite.
`check-mlir` now passes when built with ASAN.
Buffers are no longer deallocation by One-Shot Bufferize. This is now
done by a separate buffer deallocation pass.
Also fix a bug in the `vector.mask` folding, which was triggered by
`-buffer-deallocation-pipeline`, which runs the canonicalizer.
This change lifts the restriction that purely allocated empty sparse
tensors cannot escape the method. Instead it makes a best effort to add
a finalizing operation before the escape.
This assumes that
(1) we never build sparse tensors across method boundaries
(e.g. allocate in one, insert in other method)
(2) if we have other uses of the empty allocation in the
same method, we assume that either that op will fail
or will do the finalization for us.
This is best-effort, but fixes some very obvious missing cases.
This commit adds a new test-only op:
`sparse_tensor.has_runtime_library`. The op returns "1" if the sparse
compiler runs in runtime library mode.
This op is useful for writing test cases that require different IR
depending on whether the sparse compiler runs in runtime library or
codegen mode.
This commit fixes a memory leak in `sparse_pack_d.mlir`. This test case
uses `sparse_tensor.assemble` to create a sparse tensor SSA value from
existing buffers. This runtime library reallocates+copies the existing
buffers; the codegen path does not. Therefore, the test requires
additional deallocations when running in runtime library mode.
Alternatives considered:
- Make the codegen path allocate. "Codegen" is the "default" compilation
mode and it is handling `sparse_tensor.assemble` correctly. The issue is
with the runtime library path, which should not allocate. Therefore, it
is better to put a workaround in the runtime library path than to work
around the issue with a new flag in the codegen path.
- Add a `sparse_tensor.runtime_only` attribute to
`bufferization.dealloc_tensor`. Verifying that the attribute can only be
attached to `bufferization.dealloc_tensor` may introduce an unwanted
dependency of `MLIRSparseTensorDialect` on `MLIRBufferizationDialect`.
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
This is first step (of many) cleaning up our tests to use the new and
exciting sparse_tensor.print operation instead of lengthy extraction +
print ops.
Since the vector.print str provides no punctuation control, it is
slightly more flexible to let the client of this operation decide
whether there should be a trailing newline. This allows for printing
like
vector.print str "nse = "
vector.print %nse : index
as
nse = 42
The ArmSME compilation pipeline has evolved significantly and is now
sufficiently complex enough that it warrants a proper lowering pipeline
that encapsulates the various passes and orderings. Currently the
pipeline is loosely defined in our integration tests, but these have
diverged and are not using the same passes or ordering everywhere.
This patch introduces a test-lower-to-arm-sme pipeline mirroring
test-lower-to-llvm that provides some sanity when running e2e examples
and can be used a reference for targeting ArmSME in MLIR.
All the integration tests are updated to use this pipeline. The
intention is to productize the pipeline once it becomes more mature.
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).
This commit fixes memory leaks in sparse tensor integration tests by
adding `bufferization.dealloc_tensor` ops.
Note: Buffer deallocation will be automated in the future with the
ownership-based buffer deallocation pass, making `dealloc_tensor`
obsolete (only codegen path, not when using the runtime library).