This might simplify frontend implementation by avoiding recomputation for the same value.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D154244
We recently fixed a bug in "sparsifying" such reductions, since
it incorrectly changed this into reductions over stored elements
only , which only works for add/sub/or/xor. However, we still want
to be able to "sparsify" the reductions even in the general case,
and this is a first step by rewriting them into a custom reduction
that feeds in the implicit zeros. NOTE HOWEVER, that in the long run
we want to do this better and feed in any implicit zero only ONCE
for efficiency.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D152580
Document better that unary/binary may only feed to the output
or the input of a custom reduction (not even a regular reduction
since it may have "no value"!). Also fixes a bug when present
branch is empty and feeds into custom reduction.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D152224
Note that by sparse compiler convention, dense output
is zerod out when not set, so complement results in
zeros where elements were present.
Reviewed By: wrengr
Differential Revision: https://reviews.llvm.org/D152046
Formerly, we accepted and/prod reductions as a standard
reduction but these change the semantics after sparsification
by not looking at implicit zeros. Therefore, we only accept
standard reductions that are insensitive to implicit vs.
explicit zeros, and leave the more complex reductions to
the sparse_tensor.reduce custom reduction implementation.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D151929
This is a major step along the way towards the new STEA design. While a great deal of this patch is simple renaming, there are several significant changes as well. I've done my best to ensure that this patch retains the previous behavior and error-conditions, even though those are at odds with the eventual intended semantics of the `dimToLvl` mapping. Since the majority of the compiler does not yet support non-permutations, I've also added explicit assertions in places that previously had implicitly assumed it was dealing with permutations.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D151505
This is an ongoing series of commits that are reformatting our
Python code.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Differential Revision: https://reviews.llvm.org/D150782
We previously only support packing two array (values and coordinates) into COO tensors.
This patch allows packing inputs into arbitrary sparse tensor format.
It also deletes the "implicit" data canonicalization performed inside sparse compiler,
but instead requires users to canonicalize the data before passing it to the sparse compiler.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150916
For some reason, even though D150822 passed the buildbot, it failed to
catch this test
Reviewed By: anlunx
Differential Revision: https://reviews.llvm.org/D150830
This commit is part of the migration of towards the new STEA syntax/design. In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
* `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
* `Merger::{getDimLevelType => getLvlType}` (for consistency)
* `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
* the STEA parser and printer
* the C and Python bindings
* PyTACO
However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150330
The MLIR SVE integration tests are now enabled in the
clang-aarch64-full-2stage buildbot under emulation (QEMU) and one of the
sparse integration tests is failing [1]:
* Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
That test is failing because we we don't have a LIT substitution to
replace:
```
; RUN: mlir-cpu-runner <command>
```
with
```
; RUN: <emulator> mlir-cpu-runner <command>
```
clang-aarch64-full-2stage does not support SVE natively and hence all
SVE integration tests require emulation. Other SVE tests use `lli` (for
which we do have the required substitution) and hence are not affected.
This patch simplifies concatenate_dim_1.mlir to always use fixed-width
vectorisation. We will re-enable scalable vectorisation once LIT
substitutions for `mlir-cpu-runner` are updated.
[1] https://lab.llvm.org/buildbot/#/builders/179/builds/6062
The MLIR SVE integration tests are now enabled in the
clang-aarch64-full-2stage buildbot under emulation (QEMU) and two of the
sparse integration tests are failing [1]:
* mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_sorted_coo.mlir
* mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_spmm.mlir
The reason for this is the SVE RUN lines use plain 'lli' rather than the
'%lli_host_or_aarch64_cmd' substitution that's necessary to run under
emulation. The CI doesn't support SVE so the tests will SIGILL unless
run under emulation.
I should note the logs don't show a SIGILL, only the non-descript:
FileCheck error: '<stdin>' is empty.
but I expect this is what's actually happening.
https://lab.llvm.org/buildbot/#/builders/179/builds/6051/steps/12/logs/stdio
The logic enabling the Arm SVE (and now SME) integration tests for
various dialects, that may run under emulation, is now duplicated in
several places.
This patch moves the configuration to the top-level MLIR integration
tests Lit config and renames the '%lli' substitution in contexts where
it will run exclusively (ArmSVE, ArmSME) on AArch64 (and possibly under
emulation) to '%lli_aarch64_cmd', and '%lli_host_or_aarch64_cmd' for
contexts where it may run AArch64 (also possibly under emulation). The
latter is for integration tests that have target-specific and
target-agnostic codepaths such as SparseTensor, which supports scalable
vectors.
The two substitutions have the same effect but the names are different to
convey this information. The '%lli_aarch64_cmd' substitution could be
used in the SparseTensor tests but that would be a misnomer if the host
were x86 and the MLIR_RUN_SVE_TESTS=OFF.
The reason for renaming the '%lli' substitution is to not prevent running other
target-specific integration tests at the same time, since the same substitution
'%lli' is used for lli in other integration tests:
* mlir/test/Integration/Dialect/Vector/CPU/X86Vector - (AVX emulation via Intel SDE)
* mlir/test/Integration/Dialect/Vector/CPU/AMX - (AMX emulation via Intel SDE)
* mlir/test/Integration/Dialect/LLVMIR/CPU/test-vp-intrinsic.mlir - (RISCV emulation via QEMU if supported, native otherwise)
and substituting '%lli' at the top-level with Arm specific logic would override
this.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D148929
This patch adds a couple of tests for targeting Arm Streaming SVE (SSVE)
mode, part of the Arm Scalable Matrix Extension (SME).
SSVE is enabled in the backend at the function boundary by specifying
the `aarch64_pstate_sm_enabled` attribute, as documented here [1]. SSVE
can be targeted from MLIR by specifying this in the passthrough
attributes [2] and compiling with
-mattr=+sme,+sve -force-streaming-compatible-sve
The passthrough will propagate to the backend where `smstart/smstop`
will be emitted around the call to the SSVE function.
The set of legal instructions changes in SSVE,
`-force-streaming-compatible-sve` avoids the use of NEON entirely and
instead lowers to (streaming-compatible) SVE. The behaviour this flag
predicates will be hooked up to the function attribute in the future
such that simply specifying this (should) lead to correct
code-generation.
Two tests are added:
* A basic LLVMIR test verifying the attribute is passed through.
* An integration test calling a SSVE function.
The integration test can be run with QEMU.
[1] https://llvm.org/docs/AArch64SME.html
[2] https://mlir.llvm.org/docs/Dialects/LLVM/#attribute-pass-through
Reviewed By: awarzynski, aartbik
Differential Revision: https://reviews.llvm.org/D148111
The removed tests evaluate the same kernels in existing tests, namely `sparse_conv2d.mlir` and `spares_conv3d.mlir`.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D148644
This patch adds support for `-mattr` and `-march` in mlir-cpu-runner.
With this change, one should be able to consistently use mlir-cpu-runner
for MLIR's integration tests (instead of e.g. resorting to lli when some
additional flags are needed). This is demonstrated in
concatenate_dim_1.mlir.
In order to support the new flags, this patch makes sure that
MLIR's ExecutionEngine/JITRunner (that mlir-cpu-runner is built on top of):
* takes into account the new command line flags when creating
TargetMachine,
* avoids recreating TargetMachine if one is already available,
* creates LLVM's DataLayout based on the previously configured
TargetMachine.
This is necessary in order to make sure that the command line
configuration is propagated correctly to the backend code generator.
A few additional updates are made in order to facilitate this change,
including support for debug dumps from JITRunner.
Differential Revision: https://reviews.llvm.org/D146917
The old "pointer/index" names often cause confusion since these names clash with names of unrelated things in MLIR; so this change rectifies this by changing everything to use "position/coordinate" terminology instead.
In addition to the basic terminology, there have also been various conventions for making certain distinctions like: (1) the overall storage for coordinates in the sparse-tensor, vs the particular collection of coordinates of a given element; and (2) particular coordinates given as a `Value` or `TypedValue<MemRefType>`, vs particular coordinates given as `ValueRange` or similar. I have striven to maintain these distinctions
as follows:
* "p/c" are used for individual position/coordinate values, when there is no risk of confusion. (Just like we use "d/l" to abbreviate "dim/lvl".)
* "pos/crd" are used for individual position/coordinate values, when a longer name is helpful to avoid ambiguity or to form compound names (e.g., "parentPos"). (Just like we use "dim/lvl" when we need a longer form of "d/l".)
I have also used these forms for a handful of compound names where the old name had been using a three-letter form previously, even though a longer form would be more appropriate. I've avoided renaming these to use a longer form purely for expediency sake, since changing them would require a cascade of other renamings. They should be updated to follow the new naming scheme, but that can be done in future patches.
* "coords" is used for the complete collection of crd values associated with a single element. In the runtime library this includes both `std::vector` and raw pointer representations. In the compiler, this is used specifically for buffer variables with C++ type `Value`, `TypedValue<MemRefType>`, etc.
The bare form "coords" is discouraged, since it fails to make the dim/lvl distinction; so the compound names "dimCoords/lvlCoords" should be used instead. (Though there may exist a rare few cases where is is appropriate to be intentionally ambiguous about what coordinate-space the coords live in; in which case the bare "coords" is appropriate.)
There is seldom the need for the pos variant of this notion. In most circumstances we use the term "cursor", since the same buffer is reused for a 'moving' pos-collection.
* "dcvs/lcvs" is used in the compiler as the `ValueRange` analogue of "dimCoords/lvlCoords". (The "vs" stands for "`Value`s".) I haven't found the need for it, but "pvs" would be the obvious name for a pos-`ValueRange`.
The old "ind"-vs-"ivs" naming scheme does not seem to have been sustained in more recent code, which instead prefers other mnemonics (e.g., adding "Buf" to the end of the names for `TypeValue<MemRefType>`). I have cleaned up a lot of these to follow the "coords"-vs-"cvs" naming scheme, though haven't done an exhaustive cleanup.
* "positions/coordinates" are used for larger collections of pos/crd values; in particular, these are used when referring to the complete sparse-tensor storage components.
I also prefer to use these unabbreviated names in the documentation, unless there is some specific reason why using the abbreviated forms helps resolve ambiguity.
In addition to making this terminology change, this change also does some cleanup along the way:
* correcting the dim/lvl terminology in certain places.
* adding `const` when it requires no other code changes.
* miscellaneous cleanup that was entailed in order to make the proper distinctions. Most of these are in CodegenUtils.{h,cpp}
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144773
Instead of always materializing a new sparse tensor after reshape, this patch tries to fuses the reshape (currently only on COO) with GenericOp and coiterates with the reshaped tensors without allocating a new sparse tensor.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D145016
No need for a temp COO and sort even when converting dense -> CSC, we can instead rotate the loop to yield a ordered coordinates at beginning.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144213
We will support symmetric MTX without expanding the data in the sparse tensor
storage.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144059