Commit Graph

702 Commits

Author SHA1 Message Date
Fangrui Song
6a684dbc44 [Support] Remove llvm::is_trivially_{copy/move}_constructible
This restores D132311, which was reverted in
29c841ce93 (Sep 2022) due to certain files
not buildable with GCC 7.3.0. The previous attempt was reverted by
6cd9608fb3 (Dec 2020).

This time, GCC 7.3.0 has existing build errors for a long time due to
structured bindings for many files, e.g.

```
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9098:13: error: cannot decompose class type ‘std::pair<llvm::Value*, const llvm::SCEV*>’: both it and it
s base class ‘std::pair<llvm::Value*, const llvm::SCEV*>’ have non-static data members
   for (auto [_, Stride] : Legal->getLAI()->getSymbolicStrides()) {
             ^~~~~~~~~~~
```

... and also some `error: duplicate initialization of` instances due to llvm/Transforms/IPO/Attributor.h.

---

GCC 7.5.0 has a bug that, without this change, certain `SmallVector` with a `std::pair` element type like `SmallVector<std::pair<Instruction * const, Info>, 0> X;` lead to spurious

```
/tmp/opt/gcc-7.5.0/include/c++/7.5.0/type_traits:878:48: error: constructor required before non-static data member for ‘...’ has been parsed
```

Switching to std::is_trivially_{copy/move}_constructible fixes the error.
2023-07-25 17:21:16 -07:00
Mehdi Amini
5e8a1164f2 Revert "[mlir][gpu] Fallback to JIT compilation" "[mlir][gpu] Increase default SM version from 35 to 50" and "[mlir][gpu] Improving Cubin Serialization with ptxas Compiler"
This reverts commit 2e0e00ed84
and reverts commit a6eb40692c
and reverts commit 585cbe3f63.

15 tests are broken on the mlir-nvidia buildbot:

'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_SOURCE'
'cuModuleGetFunction(&function, module, name)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuModuleUnload(module)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2023-07-24 10:23:15 -07:00
Guray Ozen
585cbe3f63 [mlir][gpu] Improving Cubin Serialization with ptxas Compiler
This work improves how we compile the generated PTX code using the `ptxas` compiler. Currently, we rely on the driver's jit API to compile the PTX code. However, this approach has some limitations. It doesn't always produce the same binary output as the ptxas compiler, leading to potential inconsistencies in the generated Cubin files.

This work introduces a significant improvement by directly utilizing the ptxas compiler for PTX compilation. By doing so, we can achieve more consistent and reliable results in generating cubin files. Key Benefits:
- Using the Ptxas compiler directly ensures that the cubin files generated during the build process remain consistent with CUDA compilation using `nvcc` or `clang`.
- Another advantage of this work is that it allows developers to experiment with different ptxas compilers without the need to change the compiler. Performance among ptxas compiler versions are vary, therefore, one can easily try different ptxas compilers.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155563
2023-07-24 12:29:53 +02:00
wren romano
889f4bf264 [mlir][sparse] Improve DimLvlMapParser's handling of variable bindings
This commit comprises a number of related changes:

(1) Reintroduces the semantic distinction between `parseVarUsage` vs `parseVarBinding`, adds documentation explaining the distinction, and adds commentary to the one place that violates the desired/intended semantics.

(2) Improves documentation/commentary about the forward-declaration of level-vars, and about the meaning of the `bool` parameter to `parseLvlSpec`.

(2) Removes the `VarEnv::addVars` method, and instead has `DimLvlMapParser` handle the conversion issues directly.  In particular, the parser now stores and maintains the `{dims,lvls}AndSymbols` arrays, thereby avoiding the O(n^2) behavior of scanning through the entire `VarEnv` for each `parse{Dim,Lvl}Spec` call.  Unfortunately there still remains another source of O(n^2) behavior, namely: the `AsmParser::parseAffineExpr` method will copy the `DimLvlMapParser::{dims,lvls}AndSymbols` arrays into `AffineParser::dimsAndSymbols` on each `parse{Dim,Lvl}Spec` call; but fixing that would require extensive changes to `AffineParser` itself.

Depends On D155532

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D155533
2023-07-20 15:56:03 -07:00
wren romano
ad7a6b672c [mlir][sparse] Renaming CreationPolicy to Policy
This change is mostly for brevity's sake; but it also paves the way for the `Policy` enum to be reuseable for other situations that require the same three-way semantics.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D155532
2023-07-19 12:13:02 -07:00
Alex Zinenko
4a6b31b8d8 [mlir] NFC: untangle SCF Patterns.h and Transforms.h
These two headers both contained a strange mix of definitions related to
both patterns and non-pattern transforms. Put patterns and "populate"
functions into Patterns.h and standalone transforms into Transforms.h.

Depends On: D155223

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155454
2023-07-18 11:27:36 +00:00
Amir Bishara
1ad009ad3c [mlir][cmake] Comment out redundant static assert regarding VarInfo struct
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D154940
2023-07-18 00:14:24 -07:00
wren romano
47cf7a4ba5 [llvm] Allow SMLoc to be used in constexpr context
Since `StringRef::empty` can be used in constexpr context, it seems reasonable that `SMLoc::isValid` should be too.  The default-ctor and `operator==` are made constexpr for consistency.

In particular, the `constexpr` keyword is needed for silencing warnings on Windows (whereas Linux allows constexpr usage without the keyword).

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D154741
2023-07-17 11:45:22 -07:00
Peiming Liu
269c82d389 [mlir][sparse] introduce new 2:4 block sparsity level type.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D155128
2023-07-12 23:33:53 +00:00
K-Wu
e37fc3cc39 [mlir][sparse][gpu] Impl 2:4 SpMM rewrite for linalg op w/ DENSE24 attr
Differential Revision: https://reviews.llvm.org/D154772
2023-07-10 22:36:57 +00:00
Peiming Liu
fc5d8fce7d [mlir][sparse] support dual sparse convolution.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152601
2023-07-10 16:49:32 +00:00
wren romano
dcadb68a5c [mlir][sparse] Cleaning up OOB implementation details for VarSet
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D154674
2023-07-07 14:35:56 -07:00
wren romano
68785c1c44 [mlir][sparse] Correcting RTTI implementation for the Var class
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D154662
2023-07-06 18:57:02 -07:00
Aart Bik
03125e6894 [mlir][sparse][gpu] fix missing dealloc
This dealloc was incorrectly removed in
https://reviews.llvm.org/D153173

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D154564
2023-07-06 09:48:19 -07:00
Matthias Springer
cb7bda2ace [mlir][NFC] Use getConstantIntValue instead of casting to ConstantIndexOp
`getConstantIntValue` extracts constant values from all constant-like ops, not just `arith::ConstantIndexOp`.

Differential Revision: https://reviews.llvm.org/D154356
2023-07-04 14:08:37 +02:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Aart Bik
b939c015a4 [mlir][sparse] add affine parsing to new surface syntax for STEA
(1) uses the previously introduce API to reuse AffineExpr parser without codedup
(2) solves the look-ahead problem when parsing level spec

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D154254
2023-06-30 14:48:23 -07:00
Peiming Liu
a63d6a0014 [mlir][sparse] make UnpackOp return the actual filled length of unpacked memory
This might simplify frontend implementation by avoiding recomputation for the same value.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D154244
2023-06-30 21:35:15 +00:00
Aart Bik
6b88c852b6 [mlir][sparse] Start migration to new surface syntax for STEA
We are in the progress of migrating to a much improved surface syntax for the Sparse Tensor Encoding Attribute (STEA).

You can see a preview of this in the StableHLO RFC at

 https://github.com/openxla/stablehlo/blob/main/rfcs/20230210-sparsity.md

//**This design is courtesy Wren Romano.**//

This initial revision
(1) Introduces the first version of a new parser written by Wren Romano
(2) Introduces a simple "migration plan" using NEW_SYNTAX on the STEA, which will allow us to test the new parser with new examples, as well as migrate existing examples over without the need to rewrite them all

This first "drop" merely provides the entry points to parse the new syntax. The parser is still under active development. For example, we need to address the "lookahead" issue when parsing the lvl spec (viz. do we see l0 = d0 or a direct d0). Another larger task is to actually implement "affine" parsing (since the MLIR affine parser is not accessible in other parts of the tree).

EXAMPLE:

Currently, CSR looks like

  #CSR = #sparse_tensor.encoding<{
    lvlTypes = ["dense","compressed"],
    dimToLvl = affine_map<(i,j) -> (i,j)>
  }>

but you can "force" the new parser with

  #CSR = #sparse_tensor.encoding<{
    NEW_SYNTAX =
    (d0, d1) -> (l0 = d0 : dense, l1 = d1 : compressed)
  }>

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D153997
2023-06-29 11:32:07 -07:00
Peiming Liu
df11a2b41a [mlir][sparse] admit un-sparsifiable operations if all its operands are loaded from dense input
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153998
2023-06-28 21:27:50 +00:00
Andrzej Warzynski
f22af204ed [mlir][VectorType] Remove numScalableDims from the vector type
This is a follow-up of https://reviews.llvm.org/D153372 in which
`numScalableDims` (single integer) was effectively replaced with
`isScalableDim` bitmask.

This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
  * https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/

Differential Revision: https://reviews.llvm.org/D153412
2023-06-28 13:53:45 +01:00
Andrzej Warzynski
79c83e12c8 [mlir][VectorType] Allow arbitrary dimensions to be scalable
At the moment, only the trailing dimensions in the vector type can be
scalable, i.e. this is supported:

    vector<2x[4]xf32>

and this is not allowed:

    vector<[2]x4xf32>

This patch extends the vector type so that arbitrary dimensions can be
scalable. To this end, an array of bool values is added to every vector
type to denote whether the corresponding dimensions are scalable or not.
For example, for this vector:

  vector<[2]x[3]x4xf32>

the following array would be created:

  {true, true, false}.

Additionally, the current syntax:

  vector<[2x3]x4xf32>

is replaced with:

  vector<[2]x[3]x4xf32>

This is primarily to simplify parsing (this way, the parser can easily
process one dimension at a time rather than e.g. tracking whether
"scalable block" has been entered/left).

NOTE: The `isScalableDim` parameter of `VectorType` (introduced in this
patch) makes `numScalableDims` redundant. For the time being,
`numScalableDims` is preserved to facilitate the transition between the
two parameters. `numScalableDims` will be removed in one of the
subsequent patches.

This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
  * https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/

Differential Revision: https://reviews.llvm.org/D153372
2023-06-27 19:21:59 +01:00
Aart Bik
11a4f5bdfb [mlir][sparse] minor code changes
Submitting for Wren

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D153804
2023-06-26 12:57:52 -07:00
Aart Bik
f14c8eb595 [mlir][sparse][gpu] refine SDDMM pattern for cuSPARSE
Old pattern was missing some cases (e.g. swapping the arguments)
but it also allowed too many cases (e.g. non-empty "absent" or
different arguments for add/mul). This fixes the issues.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D153487
2023-06-21 18:31:55 -07:00
Peiming Liu
e7df82816b [mlir][sparse] rewrite arith::SelectOp to semiring operations to sparsify it.
Reviewed By: aartbik, K-Wu

Differential Revision: https://reviews.llvm.org/D153397
2023-06-21 21:22:18 +00:00
Kun Wu
9167dd46ba [mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151582
2023-06-15 23:48:11 +00:00
Peiming Liu
4e9526b9ea [mlir][sparse] using stable_sort to make sure the compiled code are consistent between different builds configuration
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153074
2023-06-15 21:09:35 +00:00
Aart Bik
65bfd5cb25 [mlir][sparse] proper in-place SDDMM with spy function
This specific operation matches the cuSPARSE SDDMM semantics exactly.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D152969
2023-06-15 13:59:38 -07:00
Peiming Liu
faf7cd97d0 [mlir][sparse] merger extension to support sparsifying arith::CmpI/CmpF operation
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152761
2023-06-15 17:26:50 +00:00
Kazu Hirata
a39adc00db [mlir] Fix warnings in release builds
This patch fixes:

  mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp:846:16:
  error: unused variable 'lvlTp' [-Werror,-Wunused-variable]

  mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp:1059:13:
  error: unused variable '[t, l]' [-Werror,-Wunused-variable]
2023-06-14 14:22:17 -07:00
Peiming Liu
83b7f018fd [mlir][sparse] fix crashes when the tensor that defines the loop bound can not be found
Reviewed By: aartbik, K-Wu

Differential Revision: https://reviews.llvm.org/D152877
2023-06-14 20:27:50 +00:00
Peiming Liu
fd68d36109 [mlir][sparse] unifying enterLoopOverTensorAtLvl and enterCoIterationOverTensorsAtLvls
The tensor levels are now explicitly categorized into different `LoopCondKind` to instruct LoopEmitter generate different code for different kinds of condition (e.g., `SparseCond`, `SparseSliceCond`, `SparseAffineIdxCond`, etc)

The process of generating a while loop is now dissembled into three steps and they are dispatched to different LoopCondKind handler.
1. Generate LoopCondition (e.g., `pos <= posHi` for `SparseCond`, `slice.isNonEmpty` for `SparseAffineIdxCond`)
2. Generate LoopBody (e.g., compute the coordinates)
3. Generate ExtraChecks (e.g., `if (onSlice(crd))` for `SparseSliceCond`)

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152464
2023-06-14 20:03:10 +00:00
Aart Bik
debdf7e0ff [mlir][sparse] refine single condition set up for semi-ring ops
Reviewed By: Peiming, K-Wu

Differential Revision: https://reviews.llvm.org/D152874
2023-06-14 09:23:09 -07:00
Aart Bik
1ea903e164 [mlir][sparse][gpu] guard matvec COO AoS
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152738
2023-06-12 16:49:58 -07:00
Aart Bik
80fe3168b5 [mlir][sparse] add support for direct prod/and/min/max reductions
We recently fixed a bug in "sparsifying" such reductions, since
it incorrectly changed this into reductions over stored elements
only , which only works for add/sub/or/xor. However, we still want
to be able to "sparsify" the reductions even in the general case,
and this is a first step by rewriting them into a custom reduction
that feeds in the implicit zeros. NOTE HOWEVER, that in the long run
we want to do this better and feed in any implicit zero only ONCE
for efficiency.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D152580
2023-06-12 09:27:47 -07:00
Peiming Liu
0258a53521 Brings back "[mlir][sparse] moving inbound check for slice driven loop into before block of the WhileOp"
This reverts commit 07b927902d.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152566
2023-06-09 17:45:46 +00:00
Peiming Liu
07b927902d Revert "[mlir][sparse] moving inbound check for slice driven loop into before block of the WhileOp"
This reverts commit 853d704fd0.

Differential Revision: https://reviews.llvm.org/D152562
2023-06-09 17:21:40 +00:00
Kun Wu
97f4c22b3a [mlir][sparse][gpu] unify dnmat and dnvec handle and ops
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152465
2023-06-09 17:16:48 +00:00
Peiming Liu
853d704fd0 [mlir][sparse] moving inbound check for slice driven loop into before block of the WhileOp
This patch changes the while loop generated for iterating over a fully reduced sparse level with affine index expression.
Before:
```
cont = true
while (cont) {
  if (inBound()) {
    ....
    cont = true;
  } else {
    cont = false;
  }
}
```
After:
```
while(inBound()) {
  ....
}
```

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152463
2023-06-09 17:03:15 +00:00
Kun Wu
8ed59c53de [mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151775
2023-06-06 23:13:21 +00:00
Aart Bik
9fc02a7a08 [mlir][sparse][gpu] add AoS COO support to cuSPARSE
Even though this feature was deprecated in release 11.2,
any library before this version still supports the feature,
which is why we are making it available under a macro.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152290
2023-06-06 12:32:46 -07:00
Aart Bik
e2167d89db [mlir][sparse] refine absent branch feeding into custom op
Document better that unary/binary may only feed to the output
or the input of a custom reduction (not even a regular reduction
since it may have "no value"!). Also fixes a bug when present
branch is empty and feeds into custom reduction.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D152224
2023-06-06 09:57:15 -07:00
Peiming Liu
23dc96bbe4 [mlir][sparse] fix crashes when using custom reduce with unary operation.
The tests case is directly copied from https://reviews.llvm.org/D152179 authored by @aartbik

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152204
2023-06-05 23:41:26 +00:00
Peiming Liu
7d9677a9bd [mlir][sparse] Make getNumTensors() consistent between LoopEmitter and Merger.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152178
2023-06-05 17:56:08 +00:00
Peiming Liu
e7b4c93f5e [mlir][sparse] fix crash when using sparse_tensor::UnaryOp and ReduceOp.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152048
2023-06-03 01:19:05 +00:00
Aart Bik
6a38c772d4 [mlir][sparse] fixed bug with unary op, dense output
Note that by sparse compiler convention, dense output
is zerod out when not set, so complement results in
zeros where elements were present.

Reviewed By: wrengr

Differential Revision: https://reviews.llvm.org/D152046
2023-06-02 18:15:33 -07:00
Kun Wu
fa98bdbd95 [mlir][sparse][gpu] make computeType mandatory
Differential Revision: https://reviews.llvm.org/D152018
2023-06-02 21:47:44 +00:00
Peiming Liu
ce6f8c5afe [mlir][sparse] fix various bug to support sparse pooling
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151776
2023-06-02 17:34:47 +00:00
Aart Bik
378f1885e3 [mlir][sparse] enhance sparse reduction support
Formerly, we accepted and/prod reductions as a standard
reduction but these change the semantics after sparsification
by not looking at implicit zeros. Therefore, we only accept
standard reductions that are insensitive to implicit vs.
explicit zeros, and leave the more complex reductions to
the sparse_tensor.reduce custom reduction implementation.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151929
2023-06-01 16:30:21 -07:00
Peiming Liu
54ac02dd16 [mlir][sparse] fix crashes when generation conv_2d_nchw_fchw with Compressed Dense Compressed Dense sparse encoding.
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151773
2023-05-31 18:06:01 +00:00