(1) without the check, the results may silently be wrong, so check is needed
(2) add pruning step to guarantee 2:4 property
Note, in the longer run, we may want to split out the pruning step somehow,
or make it optional.
Reviewed By: K-Wu
Differential Revision: https://reviews.llvm.org/D155320
Also makes some minor consistency edits in the cuSparseLt wrapper lib.
Reviewed By: Peiming, K-Wu
Differential Revision: https://reviews.llvm.org/D155139
Currently crashes if function isn't void when specifiying
'-entry-point-result=void'.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D154352
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.
Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.
This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D150549
There are two ways to make symbols from a shared library visible in the
execution engine: exporting the symbols with public visibility or
implementing a loading/unloading mechansim that registers the exported
symbols explicitly. The latter has only been available in the JIT runner
until recently, but https://reviews.llvm.org/D153029 makes it available
in any usage of the execution engine (including the Python bindings).
This patch makes the runner utils library use the latter mechanism
instead of the former, i.e., it makes all of its symbols private and
implements the init/destroy functions of the loading mechanism to
control explicitly which symbols it registers.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D153250
The async runtime library explicitly registers the symbols it exports
with the loading mechanism of the execution engine. This even works even
though these symbols were marked as hidden in the library. However, if
used outside the execution engine, such as with `lli --dlopen` or if AOT
compiled, these hidden symbols would not be found. This patch thus marks
all symbols that are part of the API as visible.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D153348
In https://reviews.llvm.org/D153029, I moved the loading/unloading
mechanisms of shared libraries from the JIT runner to the execution
engine in order to make that mechanism available in the latter
(including its Python bindings). However, I realized that I introduced a
small change in semantic: previously, the JIT runner checked for the
presence of init/destroy functions and only loaded the library as
JITDyLib if they were not present. After I moved the code, all libraries
were loaded as JITDyLib, even if they registered their symbols
explicitly in their init function. I am not sure if this is really a
problem but (1) the previous behavior was different and (2) I guess it
could cause a problem if some symbols are exported through the init
function *and* have public visibility. This patch reestablishes the
original behaviour in the new place of the code.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D153249
This updates the code comments about the library registration mechanism,
which changed in https://reviews.llvm.org/D153029, and which should have
updated as part of that patch.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D153147
Both the mlir-cpu-runner and the execution engine allow to provide a
list of shared libraries that should be loaded into the process such
that the jitted code can use the symbols from those libraries. The
runner had implemented a protocol that allowed libraries to control
which symbols it wants to provide in that context (with a function
called __mlir_runner_init). In absence of that, the runner would rely on
the loading mechanism of the execution engine, which didn't do anything
particular with the symbols, i.e., only symbols with public visibility
were visible to jitted code.
Libraries used a mix of the two mechanisms: while the runner utils and C
runner utils libs (and potentially others) used public visibility, the
async runtime lib (as the only one in the monorepo) used the loading
protocol. As a consequence, the async runtime library could not be used
through the Python bindings of the execution engine.
This patch moves the loading protocol from the runner to the execution
engine. For the runner, this should not change anything: it lets the
execution engine handle the loading which now implements the same
protocol that the runner had implemented before. However, the Python
binding now get to benefit from the loading protocol as well, so the
async runtime library (and potentially other out-of-tree libraries) can
now be used in that context.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D153029
Add 16-bit version of cudaMemset in cudaRuntimeWrappers and update the GPU to LLVM lowering.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D151642
Even though this feature was deprecated in release 11.2,
any library before this version still supports the feature,
which is why we are making it available under a macro.
Reviewed By: K-Wu
Differential Revision: https://reviews.llvm.org/D152290
The cmake logic to find cuda paths exposes some paths to search for the cuda
library, we need to propagate this through the call for find_library.
This was already done for cuSparse but not for cuda.
Differential Revision: https://reviews.llvm.org/D151645
(1) keep all cuSparse ops on single stream without wait() in right order
(2) use more type precise memref types for COO
(3) use ToTensor on resulting memref (even though it folds away again)
Reviewed By: K-Wu
Differential Revision: https://reviews.llvm.org/D151404
The alpha/beta variables, residing on the host, should have the
32-bit or 64-bit width of the result type. It was formerly always
passed as double.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D151255
This no longer assumes just F64 output.
Note, however, that it will be cleaner to carry the data type in the corresponding operation (rather than tracking operands). That will also allow for mixed type cases, where operands and result type are different
This will be done in a follow revision where the result type is carried by the SpMV/SpMM op itself (and friends).
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D151005
This revision extends the GPU dialect with ops that can be lowered to
host-oriented sparse matrix library calls (in this case cuSparse focused
although the ops could be generalized to support more GPUs in principle).
This will allow the "sparse compiler pipeline" to accelerate sparse operations
(see follow up revisions with examples of this).
For some background;
https://discourse.llvm.org/t/sparse-compiler-and-gpu-code-generation/69786/2
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D150152
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
This patch is primarily about the change in
"mlir/tools/mlir-cpu-runner/CMakeLists.txt". LLJIT needs access to
symbols (e.g. llvm_orc_registerEHFrameSectionWrapper) that will be
defined in the executable when LLVM is linked statically. This change is
consistent with how other tools within LLVM use LLJIT. It is required to
make sure that:
```
$ mlir-cpu-runner --host-supports-jit
```
correctly returns `true` on platforms that do support JITting (in my
case that's AArch64 Linux).
The change in "mlir/lib/ExecutionEngine/CMakeLists.txt" is required to
avoid ODR violations when symbols from `mlir-cpu-runner` are exported
and when loading `libmlir_async_runtime.so` in `mlir-cpu-runner`.
Specifically, to avoid `EnableABIBreakingChecks` being defined twice.
For more context:
* https://github.com/llvm/llvm-project/issues/61712
* https://github.com/llvm/llvm-project/issues/61856
* https://reviews.llvm.org/D146935 (this PR)
This change relands ccdcfad081Fixes#61856
Differential Revision: https://reviews.llvm.org/D146935
Without explicitly unregistering you will get
```
'cuMemHostRegister(ptr, sizeBytes, 0)' failed with 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'
```
in CUDA (for example) after repeated runs (e.g., during benchmarking the same kernel).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D147277
This patch adds support for `-mattr` and `-march` in mlir-cpu-runner.
With this change, one should be able to consistently use mlir-cpu-runner
for MLIR's integration tests (instead of e.g. resorting to lli when some
additional flags are needed). This is demonstrated in
concatenate_dim_1.mlir.
In order to support the new flags, this patch makes sure that
MLIR's ExecutionEngine/JITRunner (that mlir-cpu-runner is built on top of):
* takes into account the new command line flags when creating
TargetMachine,
* avoids recreating TargetMachine if one is already available,
* creates LLVM's DataLayout based on the previously configured
TargetMachine.
This is necessary in order to make sure that the command line
configuration is propagated correctly to the backend code generator.
A few additional updates are made in order to facilitate this change,
including support for debug dumps from JITRunner.
Differential Revision: https://reviews.llvm.org/D146917
This patch supports the processing of dialect attributes attached to top-level
module-type operations during MLIR-to-LLVMIR lowering.
This approach modifies the `mlir::translateModuleToLLVMIR()` function to call
`ModuleTranslation::convertOperation()` on the top-level operation, after its
body has been lowered. This, in turn, will get the
`LLVMTranslationDialectInterface` object associated to that operation's dialect
before trying to use it for lowering prior to processing dialect attributes
attached to the operation.
Since there are no `LLVMTranslationDialectInterface`s for the builtin and GPU
dialects, which define their own module-type operations, this patch also adds
and registers them. The requirement for always calling
`mlir::registerBuiltinDialectTranslation()` before any translation of MLIR to
LLVM IR where builtin module operations are present is introduced. The purpose
of these new translation interfaces is to succeed when processing module-type
operations, allowing the lowering process to continue and to prevent the
introduction of failures related to not finding such interfaces.
Differential Revision: https://reviews.llvm.org/D145932
The default and pre-link pipeline builders currently require you to
call a separate method for optimization level O0, even though they
have perfectly well-defined O0 optimization pipelines.
Accept O0 optimization level and call buildO0DefaultPipeline()
internally, so all consumers don't need to repeat this.
Differential Revision: https://reviews.llvm.org/D146200
Replace references to enumerate results with either result_pairs
(reference wrapper type) or structured bindings. I did not use
structured bindings everywhere as it wasn't clear to me it would
improve readability.
This is in preparation to the switch to zip semantics which won't
support non-const lvalue reference to elements:
https://reviews.llvm.org/D144503.
I chose to use values instead of const lvalue-refs because MLIR is
biased towards avoiding `const` local variables. This won't degrade
performance because currently `result_pair` is cheap to copy (size_t
+ iterator), and in the future, the enumerator iterator dereference
will return temporaries anyway.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D146006
The old "pointer/index" names often cause confusion since these names clash with names of unrelated things in MLIR; so this change rectifies this by changing everything to use "position/coordinate" terminology instead.
In addition to the basic terminology, there have also been various conventions for making certain distinctions like: (1) the overall storage for coordinates in the sparse-tensor, vs the particular collection of coordinates of a given element; and (2) particular coordinates given as a `Value` or `TypedValue<MemRefType>`, vs particular coordinates given as `ValueRange` or similar. I have striven to maintain these distinctions
as follows:
* "p/c" are used for individual position/coordinate values, when there is no risk of confusion. (Just like we use "d/l" to abbreviate "dim/lvl".)
* "pos/crd" are used for individual position/coordinate values, when a longer name is helpful to avoid ambiguity or to form compound names (e.g., "parentPos"). (Just like we use "dim/lvl" when we need a longer form of "d/l".)
I have also used these forms for a handful of compound names where the old name had been using a three-letter form previously, even though a longer form would be more appropriate. I've avoided renaming these to use a longer form purely for expediency sake, since changing them would require a cascade of other renamings. They should be updated to follow the new naming scheme, but that can be done in future patches.
* "coords" is used for the complete collection of crd values associated with a single element. In the runtime library this includes both `std::vector` and raw pointer representations. In the compiler, this is used specifically for buffer variables with C++ type `Value`, `TypedValue<MemRefType>`, etc.
The bare form "coords" is discouraged, since it fails to make the dim/lvl distinction; so the compound names "dimCoords/lvlCoords" should be used instead. (Though there may exist a rare few cases where is is appropriate to be intentionally ambiguous about what coordinate-space the coords live in; in which case the bare "coords" is appropriate.)
There is seldom the need for the pos variant of this notion. In most circumstances we use the term "cursor", since the same buffer is reused for a 'moving' pos-collection.
* "dcvs/lcvs" is used in the compiler as the `ValueRange` analogue of "dimCoords/lvlCoords". (The "vs" stands for "`Value`s".) I haven't found the need for it, but "pvs" would be the obvious name for a pos-`ValueRange`.
The old "ind"-vs-"ivs" naming scheme does not seem to have been sustained in more recent code, which instead prefers other mnemonics (e.g., adding "Buf" to the end of the names for `TypeValue<MemRefType>`). I have cleaned up a lot of these to follow the "coords"-vs-"cvs" naming scheme, though haven't done an exhaustive cleanup.
* "positions/coordinates" are used for larger collections of pos/crd values; in particular, these are used when referring to the complete sparse-tensor storage components.
I also prefer to use these unabbreviated names in the documentation, unless there is some specific reason why using the abbreviated forms helps resolve ambiguity.
In addition to making this terminology change, this change also does some cleanup along the way:
* correcting the dim/lvl terminology in certain places.
* adding `const` when it requires no other code changes.
* miscellaneous cleanup that was entailed in order to make the proper distinctions. Most of these are in CodegenUtils.{h,cpp}
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D144773