Commit Graph

758 Commits

Author SHA1 Message Date
Peiming Liu
a454d92c5a [mlir][sparse] rename files and unifies APIs (#88162) 2024-04-09 10:59:15 -07:00
Matthias Springer
a4c470555b [mlir][linalg] Fix builder API usage in RegionBuilderHelper (#87451)
Operations must be created with the supplied builder. Otherwise, the
dialect conversion / greedy pattern rewrite driver can break.

This commit fixes a crash in the dialect conversion:
```
within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: error: failed to legalize operation 'tosa.add'
  %0 = tosa.add %1, %arg2 : (tensor<10x10xf32>, tensor<*xf32>) -> tensor<*xf32>
       ^
within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: note: see current operation: %9 = "tosa.add"(%8, %arg2) : (tensor<10x10xf32>, tensor<*xf32>) -> tensor<*xf32>
mlir-opt: llvm-project/mlir/include/mlir/IR/UseDefLists.h:198: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed.
```

This commit is the proper fix for #87297 (which was reverted).
2024-04-04 11:17:59 +09:00
Peiming Liu
a54930e696 [mlir][sparse] allow YieldOp to yield multiple values. (#87261) 2024-04-01 10:30:36 -07:00
Aart Bik
dc4cfdbb8f [mlir][sparse] provide an AoS "view" into sparse runtime support lib (#87116)
Note that even though the sparse runtime support lib always uses SoA
storage for COO storage (and provides correct codegen by means of views
into this storage), in some rare cases we need the true physical SoA
storage as a coordinate buffer. This PR provides that functionality by
means of a (costly) coordinate buffer call.

Since this is currently only used for testing/debugging by means of the
sparse_tensor.print method, this solution is acceptable. If we ever want
a performing version of this, we should truly support AoS storage of COO
in addition to the SoA used right now.
2024-03-29 15:30:36 -07:00
Aart Bik
42c38b1cc5 [mlir][sparse] deallocate temporary transposed tensor (#85720)
Last resort resolution of cycles introduced a sparse conversion without
explicit sparse deallocation (which is not inserted by any automatic
means). This fixes 2 out of 5 remaining asan detected leaks in sparse
integration tests.
2024-03-18 17:10:23 -07:00
Aart Bik
f3a8af07fa [mlir][sparse] best effort finalization of escaping empty sparse tensors (#85482)
This change lifts the restriction that purely allocated empty sparse
tensors cannot escape the method. Instead it makes a best effort to add
a finalizing operation before the escape.

This assumes that
(1) we never build sparse tensors across method boundaries
    (e.g. allocate in one, insert in other method)
(2) if we have other uses of the empty allocation in the
    same method, we assume that either that op will fail
    or will do the finalization for us.

This is best-effort, but fixes some very obvious missing cases.
2024-03-15 16:43:09 -07:00
Matthias Springer
e8e8df4c1b [mlir][sparse] Add has_runtime_library test op (#85355)
This commit adds a new test-only op:
`sparse_tensor.has_runtime_library`. The op returns "1" if the sparse
compiler runs in runtime library mode.

This op is useful for writing test cases that require different IR
depending on whether the sparse compiler runs in runtime library or
codegen mode.

This commit fixes a memory leak in `sparse_pack_d.mlir`. This test case
uses `sparse_tensor.assemble` to create a sparse tensor SSA value from
existing buffers. This runtime library reallocates+copies the existing
buffers; the codegen path does not. Therefore, the test requires
additional deallocations when running in runtime library mode.

Alternatives considered:
- Make the codegen path allocate. "Codegen" is the "default" compilation
mode and it is handling `sparse_tensor.assemble` correctly. The issue is
with the runtime library path, which should not allocate. Therefore, it
is better to put a workaround in the runtime library path than to work
around the issue with a new flag in the codegen path.
- Add a `sparse_tensor.runtime_only` attribute to
`bufferization.dealloc_tensor`. Verifying that the attribute can only be
attached to `bufferization.dealloc_tensor` may introduce an unwanted
dependency of `MLIRSparseTensorDialect` on `MLIRBufferizationDialect`.
2024-03-15 13:35:48 +09:00
Matthias Springer
6ed4d15cf4 [mlir][sparse_tensor] Implement bufferization interface for foreach (#85183)
This commit fixes a memory leak in `sparse_codegen_foreach.mlir`. The
bufferization inserted a copy for the operand of `sparse_tensor.foreach`
because it conservatively assumed that the op writes to the operand.
2024-03-15 13:28:09 +09:00
Peiming Liu
94e27c265a [mlir][sparse] reuse tensor.insert operation to insert elements into … (#84987)
…a sparse tensor.
2024-03-12 16:59:17 -07:00
Peiming Liu
fc9f1d49aa [mlir][sparse] use a consistent order between [dis]assembleOp and sto… (#84079)
…rage layout.
2024-03-06 09:57:41 -08:00
Aart Bik
275fe3ae2d [mlir][sparse] support complex type for sparse_tensor.print (#83934)
With an integration test example
2024-03-04 17:14:31 -08:00
Peiming Liu
52b69aa32f [mlir][sparse] support sparsifying batch levels (#83898) 2024-03-04 14:39:06 -08:00
Aart Bik
691fc7cdcc [mlir][sparse] add dim/lvl information to sparse_tensor.print (#83913)
More information is more testing!
Also adjusts already migrated integration tests
2024-03-04 14:32:49 -08:00
Peiming Liu
1a0986f0f7 [mlir][sparse] code cleanup (using inferred type to construct to_[buf… (#83361)
…fer] op).
2024-02-28 16:55:28 -08:00
Peiming Liu
6bc7c9df7f [mlir][sparse] infer returned type for sparse_tensor.to_[buffer] ops (#83343)
The sparse structure buffers might not always be memrefs with rank == 1
with the presence of batch levels.
2024-02-28 16:10:20 -08:00
Aart Bik
d37affb06f [mlir][sparse] add a sparse_tensor.print operation (#83321)
This operation is mainly used for testing and debugging purposes but
provides a very convenient way to quickly inspect the contents of a
sparse tensor (all components over all stored levels).

Example:

[ [ 1, 0, 2, 0, 0, 0, 0, 0 ],
  [ 0, 0, 0, 0, 0, 0, 0, 0 ],
  [ 0, 0, 0, 0, 0, 0, 0, 0 ],
  [ 0, 0, 3, 4, 0, 5, 0, 0 ]

when stored sparse as DCSC prints as

---- Sparse Tensor ----
nse = 5
pos[0] : ( 0, 4,  )
crd[0] : ( 0, 2, 3, 5,  )
pos[1] : ( 0, 1, 3, 4, 5,  )
crd[1] : ( 0, 0, 3, 3, 3,  )
values : ( 1, 2, 3, 4, 5,  )
----
2024-02-28 12:33:26 -08:00
Peiming Liu
0d1f95760b [mlir][sparse] support type conversion from batched sparse tensors to… (#83163)
… memrefs.
2024-02-27 12:05:28 -08:00
Peiming Liu
56d58295dd [mlir][sparse] Introduce batch level format. (#83082) 2024-02-26 16:08:28 -08:00
Matthias Springer
91d5653e3a [mlir] Use OpBuilder::createBlock in op builders and patterns (#82770)
When creating a new block in (conversion) rewrite patterns,
`OpBuilder::createBlock` must be used. Otherwise, no
`notifyBlockInserted` notification is sent to the listener.

Note: The dialect conversion relies on listener notifications to keep
track of IR modifications. Creating blocks without the builder API can
lead to memory leaks during rollback.
2024-02-24 09:10:07 +01:00
Peiming Liu
f40ee6e83f [mlir][sparse] assemble SoA COO correctly. (#82449) 2024-02-20 18:46:34 -08:00
Peiming Liu
5248a98724 [mlir][sparse] support SoA COO in codegen path. (#82439)
*NOTE*: the `SoA` property only makes a difference on codegen path, and
is ignored in libgen path at the moment (only SoA COO is supported).
2024-02-20 17:06:21 -08:00
Peiming Liu
11705afc19 [mlir][sparse] deallocate tmp coo buffer generated during stage-spars… (#82017)
…e-ops pass.
2024-02-17 12:17:57 -08:00
Peiming Liu
aaf916456a Reapply "[mlir][sparse] remove LevelType enum, construct LevelType from LevelFormat and Properties" (#81923) (#81934) 2024-02-15 14:48:52 -08:00
Mehdi Amini
513448d28e Revert "[mlir][sparse] remove LevelType enum, construct LevelType from LevelF…" (#81923)
Reverts llvm/llvm-project#81799 ; this broke the mlir gcc7 bot.
2024-02-15 13:26:44 -08:00
Peiming Liu
235ec0f791 [mlir][sparse] remove LevelType enum, construct LevelType from LevelF… (#81799)
…ormat and properties instead.
2024-02-15 12:31:03 -08:00
Aart Bik
4d273b948e [mlir][sparse] ensure [dis]assembler wrapper methods properly inline (#81907) 2024-02-15 11:39:32 -08:00
Mehdi Amini
61f64d1c23 Apply clang-tidy fixes for llvm-qualified-auto in SparseTensorRewriting.cpp (NFC) 2024-02-12 13:27:49 -08:00
Mehdi Amini
56c385cd67 Apply clang-tidy fixes for modernize-loop-convert in SparseGPUCodegen.cpp (NFC) 2024-02-12 13:27:49 -08:00
Yinying Li
e5924d6499 [mlir][sparse] Implement parsing n out of m (#79935)
1. Add parsing methods for block[n, m].
2. Encode n and m with the newly extended 64-bit LevelType enum.
3. Update 2:4 methods names/comments to n:m.
2024-02-08 14:38:42 -05:00
Peiming Liu
35fae044c5 [mlir][sparse] using non-static field to avoid data races. (#81165) 2024-02-08 10:12:24 -08:00
Uday Bondhugula
fe8a62c463 [MLIR] Fix crash in AffineMap::replace for zero result maps (#80930)
Fix obvious bug in AffineMap::replace for the case of zero result maps.
Extend/complete inferExprsFromList to work with empty expression lists.
2024-02-08 19:16:29 +05:30
Aart Bik
5a9af39aab [mlir][sparse] made sparse vectorizer more robust on position of invariants (#80766)
Because the sparse vectorizer relies on the code coming out of the
sparsifier, the "patterns" are not always made very general. However, a
recent change in the generated code revealed an obvious situation where
the subscript analysis could be made a bit more robust.

Fixes:
https://github.com/llvm/llvm-project/issues/79897
2024-02-05 16:12:47 -08:00
Yinying Li
cd481fa827 [mlir][sparse] Change LevelType enum to 64 bit (#80501)
1. C++ enum is set through enum class LevelType : uint_64.
2. C enum is set through typedef uint_64 level_type. It is due to the
limitations in Windows build: setting enum width to ui64 is not
supported in C.
2024-02-05 17:00:52 -05:00
Aart Bik
d00e6d07b1 [mlir][sparse] refine sparse assembler strategy (#80521)
Rewrite *all* public methods, making original internal, private methods,
and exposing wrappers under the original name. This works a bit better
in practice (when combined with c-interface mechanism of torch-mlir for
example).
2024-02-05 10:48:18 -08:00
Peiming Liu
1ac6846263 [mlir][sparse] support sparse dilated convolution. (#80470) 2024-02-02 11:54:50 -08:00
Peiming Liu
4a653b4df5 [mlir][sparse] Support pretty print to debug sparse iteration. (#80207) 2024-02-01 15:28:36 -08:00
Peiming Liu
07bf1ddb4e [mlir][sparse] support non-id map for [Dis]assembleOp (#80355) 2024-02-01 15:11:33 -08:00
Aart Bik
33b463ad99 [mlir][sparse] external entry method wrapper for sparse tensors (#80326)
Similar to the emit_c_interface, this pull request adds a pass that
converts public entry methods that use sparse tensors as input
parameters and/or output return values into wrapper functions that
[dis]assemble the individual tensors that constitute the actual storage
used externally into MLIR sparse tensors. This pass can be used to
prepare the public entry methods of a program that is compiled by the
MLIR sparsifier to interface with an external runtime, e.g., when
passing sparse tensors as numpy arrays from and to Python. Note that
eventual bufferization decisions (e.g. who [de]allocates the underlying
memory) should be resolved in agreement with the external runtime
(Python, PyTorch, JAX, etc.)
2024-02-01 13:32:52 -08:00
Peiming Liu
1d3300d502 [mlir][sparse] use shared value storage between wrapped iterator and the wrapper. (#80046) 2024-01-30 12:01:19 -08:00
Peiming Liu
de5e4d7c69 [mlir][sparse] fix error when convolution stride is applied on a dens… (#79521)
…e level.
2024-01-25 17:11:24 -08:00
Peiming Liu
982c815aad [mlir][sparse] fix mismatch between enter/exitWhileLoop (#79493) 2024-01-25 12:21:47 -08:00
Peiming Liu
260e45cff0 [mlir][sparse] fix stack UAF (#79353) 2024-01-24 12:12:55 -08:00
Peiming Liu
298412b578 [mlir][sparse] setup SparseIterator to help generating code to traverse a sparse tensor level. (#78345) 2024-01-24 11:33:06 -08:00
Matthias Springer
5fcf907b34 [mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Matthias Springer
0a8e3dd432 [mlir][Interfaces] DestinationStyleOpInterface: Rename hasTensor/BufferSemantics (#77574)
Rename interface functions as follows:
* `hasTensorSemantics` -> `hasPureTensorSemantics`
* `hasBufferSemantics` -> `hasPureBufferSemantics`

These two functions return "true" if the op has tensor/buffer operands
but not buffer/tensor operands.

Also drop the "ranked" part from the interface, i.e., do not distinguish
between ranked/unranked types.

The new function names describe the functions more accurately. They also
align their semantics with the notion of "tensor semantics" with the
bufferization framework. (An op is supposed to be bufferized if it has
tensor operands, and we don't care if it also has memref operands.)

This change is in preparation of #75273, which adds
`BufferizableOpInterface::hasTensorSemantics`. By renaming the functions
in the `DestinationStyleOpInterface`, we can avoid name clashes between
the two interfaces.
2024-01-12 10:02:54 +01:00
Aart Bik
aec73eade7 [mlir][sparse] allow unknown ops in one-shot bufferization in mini-pipeline (#77688)
Rationale:
Since this mini-pipeline may be used in alternative pipelines (viz.
different from the default "sparsifier" pipeline) where unknown ops are
handled by alternative bufferization methods that are downstream of this
mini-pipeline, we allow unknown ops by default (failure to bufferize is
eventually apparent by failing to convert to LLVM IR).

This is part of enabling e2e testing for TORCH-MLIR tests using a
sparsifier backend
2024-01-10 13:36:20 -08:00
Peiming Liu
d933b88b71 [mlir][sparse] use a common util function to query the tensor level s… (#76764)
…et in a lattice point.
2024-01-02 15:56:42 -08:00
Aart Bik
41a07e668c [mlir][sparse] recognize NVidia 2:4 type for matmul (#76758)
This removes the temporary DENSE24 attribute and replaces it with proper
recognition of dense to 24 conversion. The compressionh will be
performed on the device prior to performing the matrix mult. Note that
we no longer need to start with the linalg version, we can lift this to
the proper named linalg op. Also renames some files into more consistent
names.
2024-01-02 14:44:24 -08:00
Peiming Liu
cf4dd91165 [mlir][sparse] initialize slice-driven loop-related fields in one place (#76099) 2023-12-20 14:20:57 -08:00
Matthias Springer
10056c821a [mlir][SCF] scf.parallel: Make reductions part of the terminator (#75314)
This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.

Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
    -> f32, f32 {
  %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
  %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
  scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.addf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }, {
    ^bb0(%lhs : f32, %rhs: f32):
      %res = arith.mulf %lhs, %rhs : f32
      scf.reduce.return %res : f32
  }
}
```

`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
2023-12-20 11:06:27 +09:00