CHANGES SINCE THE ORIGINAL VERSION
----------------------------------
The default test set-up was extracted from
* SparseTensor/CPU/lit.local.cfg.
and duplicated in all tests. This is to support downstream users that
don't use these local LIT config files.
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625
SUMMARY OF CHANGES
------------------
This patch aims to reduce test duplication and to improve code re-use in
SparseTensor integration tests for CPU. This is a direct follow-up of:
1. https://reviews.llvm.org/D155403 (test duplication), and
2. https://reviews.llvm.org/D155405 (code re-use),
The key logic for this patch is implemented in:
* SparseTensor/CPU/lit.local.cfg.
Essentially, the set-up that used to be repeated across all test files
has been extracted into a common LIT configuration file. This makes code
re-use straightforward.
All SVE/VLA tests are now enabled _conditionally_ and refactored to use
`mlir-cpu-runner` rather than `lli`. The former helps with test
duplication and the latter with code re-use.
A few additional refactoring changes are included.
1. The reduce verbosity, long runtime library names like:
%mlir_native_utils_lib_dir/libmlir_c_runner_utils%shlibext
are replaced with:
%mlir_c_runner_utils
2. In order to keep the code and the comments in sync, and to maintain
consistency across the tests, the following:
enable-runtime-library=true
is swapped with (and vice-versa):
enable-runtime-library=false
Note that this change won't affect test coverage. Only few tests
required such update.
3. A VLS vectorization `RUN` line is added in tests where there was a
VLA/VLS `RUN` line, but no VLS `RUN` line (with a few exceptions of
tests that only contained one `RUN` line to begin with).
4. A few test variables are renamed/added. Most notable example:
* %{options}` --> %{sparse_compiler_opts}
TEST RUNTIME IMPROVEMENT
------------------------
Tl;Dr This change improves test execution time by ~25%.
At the moment, the following `llvm-lit` invocation takes ~7.30s on my
AArch64 workstation (with SVE):
llvm-lit <llvm-project>/mlir/test/Integration/Dialect/SparseTensor/CPU/
This timing doesn't change no matter what the value of the following
CMake variable is (that should disable some tests):
MLIR_RUN_ARM_SVE_TESTS
With this patch, the execution time will indeed depend on the value of
the above CMake variable:
* with `MLIR_RUN_ARM_SVE_TESTS=true` the timing remains intact,
* with `MLIR_RUN_ARM_SVE_TESTS=false` the timing drops to ~5.40s (~25%
improvement).
This is expected:
* on average there are 4 `RUN` lines per test,
* _without this change_ (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line would in most cases duplicate the 3rd `RUN` line,
* _with this change) (and with `MLIR_RUN_ARM_SVE_TESTS=false`) the
4th `RUN` line becomes empty.
PATCH SIZE
----------
While rather large and touching many files, most changes in this patch
are rather mechanical. All test configurations have been preserved and
only in a handful of cases new `RUN` lines added.
Differential Revision: https://reviews.llvm.org/D156625
This commit is part of the migration of towards the new STEA syntax/design. In particular, this commit includes the following changes:
* Renaming compiler-internal functions/methods:
* `SparseTensorEncodingAttr::{getDimLevelType => getLvlTypes}`
* `Merger::{getDimLevelType => getLvlType}` (for consistency)
* `sparse_tensor::{getDimLevelType => buildLevelType}` (to help reduce confusion vs actual getter methods)
* Renaming external facets to match:
* the STEA parser and printer
* the C and Python bindings
* PyTACO
However, the actual renaming of the `DimLevelType` itself (along with all the "dlt" names) will be handled in a separate commit.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D150330
The logic enabling the Arm SVE (and now SME) integration tests for
various dialects, that may run under emulation, is now duplicated in
several places.
This patch moves the configuration to the top-level MLIR integration
tests Lit config and renames the '%lli' substitution in contexts where
it will run exclusively (ArmSVE, ArmSME) on AArch64 (and possibly under
emulation) to '%lli_aarch64_cmd', and '%lli_host_or_aarch64_cmd' for
contexts where it may run AArch64 (also possibly under emulation). The
latter is for integration tests that have target-specific and
target-agnostic codepaths such as SparseTensor, which supports scalable
vectors.
The two substitutions have the same effect but the names are different to
convey this information. The '%lli_aarch64_cmd' substitution could be
used in the SparseTensor tests but that would be a misnomer if the host
were x86 and the MLIR_RUN_SVE_TESTS=OFF.
The reason for renaming the '%lli' substitution is to not prevent running other
target-specific integration tests at the same time, since the same substitution
'%lli' is used for lli in other integration tests:
* mlir/test/Integration/Dialect/Vector/CPU/X86Vector - (AVX emulation via Intel SDE)
* mlir/test/Integration/Dialect/Vector/CPU/AMX - (AMX emulation via Intel SDE)
* mlir/test/Integration/Dialect/LLVMIR/CPU/test-vp-intrinsic.mlir - (RISCV emulation via QEMU if supported, native otherwise)
and substituting '%lli' at the top-level with Arm specific logic would override
this.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D148929
The old "pointer/index" names often cause confusion since these names clash with names of unrelated things in MLIR; so this change rectifies this by changing everything to use "position/coordinate" terminology instead.
In addition to the basic terminology, there have also been various conventions for making certain distinctions like: (1) the overall storage for coordinates in the sparse-tensor, vs the particular collection of coordinates of a given element; and (2) particular coordinates given as a `Value` or `TypedValue<MemRefType>`, vs particular coordinates given as `ValueRange` or similar. I have striven to maintain these distinctions
as follows:
* "p/c" are used for individual position/coordinate values, when there is no risk of confusion. (Just like we use "d/l" to abbreviate "dim/lvl".)
* "pos/crd" are used for individual position/coordinate values, when a longer name is helpful to avoid ambiguity or to form compound names (e.g., "parentPos"). (Just like we use "dim/lvl" when we need a longer form of "d/l".)
I have also used these forms for a handful of compound names where the old name had been using a three-letter form previously, even though a longer form would be more appropriate. I've avoided renaming these to use a longer form purely for expediency sake, since changing them would require a cascade of other renamings. They should be updated to follow the new naming scheme, but that can be done in future patches.
* "coords" is used for the complete collection of crd values associated with a single element. In the runtime library this includes both `std::vector` and raw pointer representations. In the compiler, this is used specifically for buffer variables with C++ type `Value`, `TypedValue<MemRefType>`, etc.
The bare form "coords" is discouraged, since it fails to make the dim/lvl distinction; so the compound names "dimCoords/lvlCoords" should be used instead. (Though there may exist a rare few cases where is is appropriate to be intentionally ambiguous about what coordinate-space the coords live in; in which case the bare "coords" is appropriate.)
There is seldom the need for the pos variant of this notion. In most circumstances we use the term "cursor", since the same buffer is reused for a 'moving' pos-collection.
* "dcvs/lcvs" is used in the compiler as the `ValueRange` analogue of "dimCoords/lvlCoords". (The "vs" stands for "`Value`s".) I haven't found the need for it, but "pvs" would be the obvious name for a pos-`ValueRange`.
The old "ind"-vs-"ivs" naming scheme does not seem to have been sustained in more recent code, which instead prefers other mnemonics (e.g., adding "Buf" to the end of the names for `TypeValue<MemRefType>`). I have cleaned up a lot of these to follow the "coords"-vs-"cvs" naming scheme, though haven't done an exhaustive cleanup.
* "positions/coordinates" are used for larger collections of pos/crd values; in particular, these are used when referring to the complete sparse-tensor storage components.
I also prefer to use these unabbreviated names in the documentation, unless there is some specific reason why using the abbreviated forms helps resolve ambiguity.
In addition to making this terminology change, this change also does some cleanup along the way:
* correcting the dim/lvl terminology in certain places.
* adding `const` when it requires no other code changes.
* miscellaneous cleanup that was entailed in order to make the proper distinctions. Most of these are in CodegenUtils.{h,cpp}
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144773
This reverts commit 5561e17411
The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators
Differential Revision: https://reviews.llvm.org/D143925
This patch contains the changes required to make the vast majority of integration and runner tests run on Windows.
Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo.
Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc.
This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests.
More specifically, following changes had to be done:
* The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext`
* Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd`
* Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED`
Some tests remain disabled on Windows as before:
* In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled
* Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX
The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`.
Differential Revision: https://reviews.llvm.org/D143925
This patch updates the remaining SparseCompiler integration tests to
target SVE when available.
Two tests will require some investigation in the future:
* sparse_matmul.mlir
* sparse_tanh.mlir
The former passes regardless - that's due to how `CHECK` lines are
defined. The latter fails when SVE is enabled, but passes when it's
disabled. I marked it as UNSUPPORTED as there is no mechanism to XFAIL a
test conditionally. Also, see [1] for more details.
[1] https://github.com/llvm/llvm-project/issues/60626
Differential Revision: https://reviews.llvm.org/D143514
This patch adds the logic necessary to target the sparse-tensor dialect
integration tests for SVE. As the LLVM backend for AArch64 does not
currently support product reductions, the corresponding tests are
disabled for SVE.
Not all tests have been updated yet. The remaining tests will be
refactored in a separate patch shortly.
Differential Revision: https://reviews.llvm.org/D121304
Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>
This patch fix the re-revert D135927 (which caused a windows build failure) to re-enable parallel for/reduction. It also fix a warning caused by D137442.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D137565
As pointed out by Matthias: "DenseBufferizationPass should be run right after TensorCopyInsertionPass. (Running it after bufferizing the sparse IR is also OK.) The reason for this is that whether copies are needed for not depends on the structure of the program (SSA use-def chains). In particular, running the canonicalizer in-between is problematic because it could introduce new RaW conflicts"
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D136980
Currently, dense tensors are initialized in Sparse Integration tests using
"buffer.tensor_alloc and scf.for" . This makes code harder to read and maintain.
This diff uses tensor.generate instead to initialize dense tensors.
Testing: Ran integration tests after building with -DLLVM_USE_SANITIZER=Address flag.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D131404
This op used to belong to the sparse dialect, but there are use cases for dense bufferization as well. (E.g., when a tensor alloc is returned from a function and should be deallocated at the call site.) This change moves the op to the bufferization dialect, which now has an `alloc_tensor` and a `dealloc_tensor` op.
Differential Revision: https://reviews.llvm.org/D129985
This change removes the partial bufferization passes from the sparse compilation pipeline and replaces them with One-Shot Bufferize. One-Shot Analysis (and TensorCopyInsertion) is used to resolve all out-of-place bufferizations, dense and sparse. Dense ops are then bufferized with BufferizableOpInterface. Sparse ops are still bufferized in the Sparsification pass.
Details:
* Dense allocations are automatically deallocated, unless they are yielded from a block. (In that case the alloc would leak.) All test cases are modified accordingly. E.g., some funcs now have an "out" tensor argument that is returned from the function. (That way, the allocation happens at the call site.)
* Sparse allocations are *not* automatically deallocated. They must be "released" manually. (No change, this will be addressed in a future change.)
* Sparse tensor copies are not supported yet. (Future change)
* Sparsification no longer has to consider inplacability. If necessary, allocations and/or copies are inserted during TensorCopyInsertion. All tensors are inplaceable by the time Sparsification is running. Instead of marking a tensor as "not inplaceable", it can be marked as "not writable", which will trigger an allocation and/or copy during TensorCopyInsertion.
Differential Revision: https://reviews.llvm.org/D129356
This was carry over from LLVM IR where the alias definition can
be ambiguous, but MLIR type aliases have no such problems.
Having the `type` keyword is superfluous and doesn't add anything.
This commit drops it, which also nicely aligns with the syntax for
attribute aliases (which doesn't have a keyword).
Differential Revision: https://reviews.llvm.org/D125501
The SparseTensor passes currently use opaque numbers for the CLI, despite using an enum internally. This patch exposes the enums instead of numbered items that are matched back to the enum.
Fixes GitHub issue #53389
Reviewed by: aartbik, mehdi_amini
Differential Revision: https://reviews.llvm.org/D123876
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
We have several ways to materialize sparse tensors (new and convert) but no explicit operation to release the underlying sparse storage scheme at runtime (other than making an explicit delSparseTensor() library call). To simplify memory management, a sparse_tensor.release operation has been introduced that lowers to the runtime library call while keeping tensors, opague pointers, and memrefs transparent in the initial IR.
*Note* There is obviously some tension between the concept of immutable tensors and memory management methods. This tension is addressed by simply stating that after the "release" call, no further memref related operations are allowed on the tensor value. We expect the design to evolve over time, however, and arrive at a more satisfactory view of tensors and buffers eventually.
Bug:
http://llvm.org/pr52046
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D111099
Conversion to the LLVM dialect is being refactored to be more progressive and
is now performed as a series of independent passes converting different
dialects. These passes may produce `unrealized_conversion_cast` operations that
represent pending conversions between built-in and LLVM dialect types.
Historically, a more monolithic Standard-to-LLVM conversion pass did not need
these casts as all operations were converted in one shot. Previous refactorings
have led to the requirement of running the Standard-to-LLVM conversion pass to
clean up `unrealized_conversion_cast`s even though the IR had no standard
operations in it. The pass must have been also run the last among all to-LLVM
passes, in contradiction with the partial conversion logic. Additionally, the
way it was set up could produce invalid operations by removing casts between
LLVM and built-in types even when the consumer did not accept the uncasted
type, or could lead to cryptic conversion errors (recursive application of the
rewrite pattern on `unrealized_conversion_cast` as a means to indicate failure
to eliminate casts).
In fact, the need to eliminate A->B->A `unrealized_conversion_cast`s is not
specific to to-LLVM conversions and can be factored out into a separate type
reconciliation pass, which is achieved in this commit. While the cast operation
itself has a folder pattern, it is insufficient in most conversion passes as
the folder only applies to the second cast. Without complex legality setup in
the conversion target, the conversion infra will either consider the cast
operations valid and not fold them (a separate canonicalization would be
necessary to trigger the folding), or consider the first cast invalid upon
generation and stop with error. The pattern provided by the reconciliation pass
applies to the first cast operation instead. Furthermore, having a separate
pass makes it clear when `unrealized_conversion_cast`s could not have been
eliminated since it is the only reason why this pass can fail.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109507
Recent changes outside sparse compiler exposed the requirement of running a
new pass (lower-affine) but this only became apparent with private testing.
By adding some vectorized runs to integration test, we will detect the need
for such changes earlier and also widen codegen coverage of course.
Reviewed By: gussmith23
Differential Revision: https://reviews.llvm.org/D108667
With the migration from linalg.copy to memref.copy, this pass
(which was there solely to handle the linalg.copy op) is no
longer required for the end-to-end path for sparse compilation.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D106073
After the MemRef has been split out of the Standard dialect, the
conversion to the LLVM dialect remained as a huge monolithic pass.
This is undesirable for the same complexity management reasons as having
a huge Standard dialect itself, and is even more confusing given the
existence of a separate dialect. Extract the conversion of the MemRef
dialect operations to LLVM into a separate library and a separate
conversion pass.
Reviewed By: herhut, silvas
Differential Revision: https://reviews.llvm.org/D105625
A very elaborate, but also very fun revision because all
puzzle pieces are finally "falling in place".
1. replaces lingalg annotations + flags with proper sparse tensor types
2. add rigorous verification on sparse tensor type and sparse primitives
3. removes glue and clutter on opaque pointers in favor of sparse tensor types
4. migrates all tests to use sparse tensor types
NOTE: next CL will remove *all* obsoleted sparse code in Linalg
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D102095
This revision migrates more code from Linalg into the new permanent home of
SparseTensor. It replaces the test passes with proper compiler passes.
NOTE: the actual removal of the last glue and clutter in Linalg will follow
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D101811