The MLIR SVE integration tests are now enabled in the
clang-aarch64-full-2stage buildbot under emulation (QEMU) and one of the
sparse integration tests is failing [1]:
* Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
That test is failing because we we don't have a LIT substitution to
replace:
```
; RUN: mlir-cpu-runner <command>
```
with
```
; RUN: <emulator> mlir-cpu-runner <command>
```
clang-aarch64-full-2stage does not support SVE natively and hence all
SVE integration tests require emulation. Other SVE tests use `lli` (for
which we do have the required substitution) and hence are not affected.
This patch simplifies concatenate_dim_1.mlir to always use fixed-width
vectorisation. We will re-enable scalable vectorisation once LIT
substitutions for `mlir-cpu-runner` are updated.
[1] https://lab.llvm.org/buildbot/#/builders/179/builds/6062
The MLIR SVE integration tests are now enabled in the
clang-aarch64-full-2stage buildbot under emulation (QEMU) and two of the
sparse integration tests are failing [1]:
* mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_sorted_coo.mlir
* mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_spmm.mlir
The reason for this is the SVE RUN lines use plain 'lli' rather than the
'%lli_host_or_aarch64_cmd' substitution that's necessary to run under
emulation. The CI doesn't support SVE so the tests will SIGILL unless
run under emulation.
I should note the logs don't show a SIGILL, only the non-descript:
FileCheck error: '<stdin>' is empty.
but I expect this is what's actually happening.
https://lab.llvm.org/buildbot/#/builders/179/builds/6051/steps/12/logs/stdio
The logic enabling the Arm SVE (and now SME) integration tests for
various dialects, that may run under emulation, is now duplicated in
several places.
This patch moves the configuration to the top-level MLIR integration
tests Lit config and renames the '%lli' substitution in contexts where
it will run exclusively (ArmSVE, ArmSME) on AArch64 (and possibly under
emulation) to '%lli_aarch64_cmd', and '%lli_host_or_aarch64_cmd' for
contexts where it may run AArch64 (also possibly under emulation). The
latter is for integration tests that have target-specific and
target-agnostic codepaths such as SparseTensor, which supports scalable
vectors.
The two substitutions have the same effect but the names are different to
convey this information. The '%lli_aarch64_cmd' substitution could be
used in the SparseTensor tests but that would be a misnomer if the host
were x86 and the MLIR_RUN_SVE_TESTS=OFF.
The reason for renaming the '%lli' substitution is to not prevent running other
target-specific integration tests at the same time, since the same substitution
'%lli' is used for lli in other integration tests:
* mlir/test/Integration/Dialect/Vector/CPU/X86Vector - (AVX emulation via Intel SDE)
* mlir/test/Integration/Dialect/Vector/CPU/AMX - (AMX emulation via Intel SDE)
* mlir/test/Integration/Dialect/LLVMIR/CPU/test-vp-intrinsic.mlir - (RISCV emulation via QEMU if supported, native otherwise)
and substituting '%lli' at the top-level with Arm specific logic would override
this.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D148929
This patch adds a couple of tests for targeting Arm Streaming SVE (SSVE)
mode, part of the Arm Scalable Matrix Extension (SME).
SSVE is enabled in the backend at the function boundary by specifying
the `aarch64_pstate_sm_enabled` attribute, as documented here [1]. SSVE
can be targeted from MLIR by specifying this in the passthrough
attributes [2] and compiling with
-mattr=+sme,+sve -force-streaming-compatible-sve
The passthrough will propagate to the backend where `smstart/smstop`
will be emitted around the call to the SSVE function.
The set of legal instructions changes in SSVE,
`-force-streaming-compatible-sve` avoids the use of NEON entirely and
instead lowers to (streaming-compatible) SVE. The behaviour this flag
predicates will be hooked up to the function attribute in the future
such that simply specifying this (should) lead to correct
code-generation.
Two tests are added:
* A basic LLVMIR test verifying the attribute is passed through.
* An integration test calling a SSVE function.
The integration test can be run with QEMU.
[1] https://llvm.org/docs/AArch64SME.html
[2] https://mlir.llvm.org/docs/Dialects/LLVM/#attribute-pass-through
Reviewed By: awarzynski, aartbik
Differential Revision: https://reviews.llvm.org/D148111
The removed tests evaluate the same kernels in existing tests, namely `sparse_conv2d.mlir` and `spares_conv3d.mlir`.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D148644
This patch adds support for `-mattr` and `-march` in mlir-cpu-runner.
With this change, one should be able to consistently use mlir-cpu-runner
for MLIR's integration tests (instead of e.g. resorting to lli when some
additional flags are needed). This is demonstrated in
concatenate_dim_1.mlir.
In order to support the new flags, this patch makes sure that
MLIR's ExecutionEngine/JITRunner (that mlir-cpu-runner is built on top of):
* takes into account the new command line flags when creating
TargetMachine,
* avoids recreating TargetMachine if one is already available,
* creates LLVM's DataLayout based on the previously configured
TargetMachine.
This is necessary in order to make sure that the command line
configuration is propagated correctly to the backend code generator.
A few additional updates are made in order to facilitate this change,
including support for debug dumps from JITRunner.
Differential Revision: https://reviews.llvm.org/D146917
The old "pointer/index" names often cause confusion since these names clash with names of unrelated things in MLIR; so this change rectifies this by changing everything to use "position/coordinate" terminology instead.
In addition to the basic terminology, there have also been various conventions for making certain distinctions like: (1) the overall storage for coordinates in the sparse-tensor, vs the particular collection of coordinates of a given element; and (2) particular coordinates given as a `Value` or `TypedValue<MemRefType>`, vs particular coordinates given as `ValueRange` or similar. I have striven to maintain these distinctions
as follows:
* "p/c" are used for individual position/coordinate values, when there is no risk of confusion. (Just like we use "d/l" to abbreviate "dim/lvl".)
* "pos/crd" are used for individual position/coordinate values, when a longer name is helpful to avoid ambiguity or to form compound names (e.g., "parentPos"). (Just like we use "dim/lvl" when we need a longer form of "d/l".)
I have also used these forms for a handful of compound names where the old name had been using a three-letter form previously, even though a longer form would be more appropriate. I've avoided renaming these to use a longer form purely for expediency sake, since changing them would require a cascade of other renamings. They should be updated to follow the new naming scheme, but that can be done in future patches.
* "coords" is used for the complete collection of crd values associated with a single element. In the runtime library this includes both `std::vector` and raw pointer representations. In the compiler, this is used specifically for buffer variables with C++ type `Value`, `TypedValue<MemRefType>`, etc.
The bare form "coords" is discouraged, since it fails to make the dim/lvl distinction; so the compound names "dimCoords/lvlCoords" should be used instead. (Though there may exist a rare few cases where is is appropriate to be intentionally ambiguous about what coordinate-space the coords live in; in which case the bare "coords" is appropriate.)
There is seldom the need for the pos variant of this notion. In most circumstances we use the term "cursor", since the same buffer is reused for a 'moving' pos-collection.
* "dcvs/lcvs" is used in the compiler as the `ValueRange` analogue of "dimCoords/lvlCoords". (The "vs" stands for "`Value`s".) I haven't found the need for it, but "pvs" would be the obvious name for a pos-`ValueRange`.
The old "ind"-vs-"ivs" naming scheme does not seem to have been sustained in more recent code, which instead prefers other mnemonics (e.g., adding "Buf" to the end of the names for `TypeValue<MemRefType>`). I have cleaned up a lot of these to follow the "coords"-vs-"cvs" naming scheme, though haven't done an exhaustive cleanup.
* "positions/coordinates" are used for larger collections of pos/crd values; in particular, these are used when referring to the complete sparse-tensor storage components.
I also prefer to use these unabbreviated names in the documentation, unless there is some specific reason why using the abbreviated forms helps resolve ambiguity.
In addition to making this terminology change, this change also does some cleanup along the way:
* correcting the dim/lvl terminology in certain places.
* adding `const` when it requires no other code changes.
* miscellaneous cleanup that was entailed in order to make the proper distinctions. Most of these are in CodegenUtils.{h,cpp}
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144773
Instead of always materializing a new sparse tensor after reshape, this patch tries to fuses the reshape (currently only on COO) with GenericOp and coiterates with the reshaped tensors without allocating a new sparse tensor.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D145016
No need for a temp COO and sort even when converting dense -> CSC, we can instead rotate the loop to yield a ordered coordinates at beginning.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144213
We will support symmetric MTX without expanding the data in the sparse tensor
storage.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144059
This reverts commit 5561e17411
The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators
Differential Revision: https://reviews.llvm.org/D143925
This patch contains the changes required to make the vast majority of integration and runner tests run on Windows.
Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo.
Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc.
This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests.
More specifically, following changes had to be done:
* The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext`
* Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd`
* Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED`
Some tests remain disabled on Windows as before:
* In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled
* Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX
The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`.
Differential Revision: https://reviews.llvm.org/D143925
Previously, when performing a reduction on a sparse tensor, the result
would be different depending on iteration order. For expanded access pattern,
an empty row would contribute no entry in the output. For lex ordering, the
identity would end up in the output.
This code changes that behavior and keeps track of whether any entries were
actually reduced in lex ordering, making the output consistent between the
two iteration styles.
Differential Revision: https://reviews.llvm.org/D142050
This patch updates the remaining SparseCompiler integration tests to
target SVE when available.
Two tests will require some investigation in the future:
* sparse_matmul.mlir
* sparse_tanh.mlir
The former passes regardless - that's due to how `CHECK` lines are
defined. The latter fails when SVE is enabled, but passes when it's
disabled. I marked it as UNSUPPORTED as there is no mechanism to XFAIL a
test conditionally. Also, see [1] for more details.
[1] https://github.com/llvm/llvm-project/issues/60626
Differential Revision: https://reviews.llvm.org/D143514
Currently, all the non-stable sorting algorithms are implemented via the
straightforward quick sort. This will be fixed in the following PR.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D142678
Previously, we choose the value at (lo + hi)/2 as a pivot for partitioning the
data in [lo, hi). We now choose the median for the three values at lo, (lo +
hi)/2, and (hi-1) as a pivot to match the std::qsort implementation.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D142679
This patch adds the logic necessary to target the sparse-tensor dialect
integration tests for SVE. As the LLVM backend for AArch64 does not
currently support product reductions, the corresponding tests are
disabled for SVE.
Not all tests have been updated yet. The remaining tests will be
refactored in a separate patch shortly.
Differential Revision: https://reviews.llvm.org/D121304
Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>
Use an array of structures to represent the indices for the tailing COO region
of a sparse tensor.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D140870