Commit Graph

581 Commits

Author SHA1 Message Date
Nikita Popov
979c275097 [IR] Store Triple in Module (NFC) (#129868)
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.

For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.

The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
2025-03-06 10:27:47 +01:00
Lang Hames
b18e5b6a36 Re-apply "[ORC] Remove the Triple argument from LLJITBuilder::..." with fixes.
This re-applies f905bf3e1e, which was reverted in
c861c1a046 due to compiler errors, with a fix for
MLIR.
2025-03-06 17:17:05 +11:00
Andrea Faulds
eb206e9ea8 [mlir] Rename mlir-cpu-runner to mlir-runner (#123776)
With the removal of mlir-vulkan-runner (as part of #73457) in
e7e3c45bc7, mlir-cpu-runner is now the
only runner for all CPU and GPU targets, and the "cpu" name has been
misleading for some time already. This commit renames it to mlir-runner.
2025-01-24 14:08:38 +01:00
Michał Górny
047e8e47c1 Reapply "[mlir] Link libraries that aren't included in libMLIR to libMLIR" (#123910)
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.

I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.

[v3 with more `-DBUILD_SHARED_LIBS=ON` fixes]
2025-01-22 09:01:50 +00:00
Michał Górny
9decc24c6b Revert "[mlir] Link libraries that aren't included in libMLIR to libMLIR (#123781)"
This reverts commit 4c6242ebf5.  More
BUILD_SHARED_LIBS=ON regressions, sigh.
2025-01-22 09:09:52 +01:00
Michał Górny
4c6242ebf5 [mlir] Link libraries that aren't included in libMLIR to libMLIR (#123781)
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.

I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.

[v2 with fixed `-DBUILD_SHARED_LIBS=ON` build]
2025-01-22 07:54:54 +00:00
Andrea Faulds
e7e3c45bc7 [mlir] Remove mlir-vulkan-runner and GPUToVulkan conversion passes (#123750)
This follows up on 733be4ed7d, which made
mlir-vulkan-runner and its associated passes redundant, and completes
the main goal of #73457. The mlir-vulkan-runner tests become part of the
integration test suite, and the Vulkan runner runtime components become
part of ExecutionEngine, just as was done when removing other
target-specific runners.
2025-01-21 16:51:27 +01:00
Michał Górny
8b879d106b Revert "[mlir] Link libraries that aren't included in libMLIR to libMLIR (#123477)"
This reverts commit af6616676f.  It broke
builds with `-DBUILD_SHARED_LIBS=ON`.
2025-01-20 19:33:51 +01:00
Michał Górny
af6616676f [mlir] Link libraries that aren't included in libMLIR to libMLIR (#123477)
Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.

I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
2025-01-20 17:25:20 +00:00
Andrea Faulds
0e39b1348e [mlir] Remove the mlir-spirv-cpu-runner (move to mlir-cpu-runner) (#114563)
This commit builds on and completes the work done in
9f6c632ecd to eliminate the need for a
separate mlir-spirv-cpu-runner binary. Since the MLIR processing is
already done outside this runner, the only real difference between it
and the mlir-cpu-runner is the final linking step between the nested
LLVM IR modules. By moving this step into mlir-cpu-runner behind a new
command-line flag (`--link-nested-modules`), this commit is able to
completely remove the runner component of the mlir-spirv-cpu-runner.

The runtime libraries and the tests are moved and renamed to fit into
the Execution Engine and Integration tests, following the model of the
similar migration done for the CUDA Runner in D97463.
2024-11-08 08:01:52 -05:00
Zentrik
74e1062e34 [MLIR] Don't build MLIRExecutionEngineShared on Windows (#109524)
This disabled the build of `MLIRExecutionEngineShared` because this causes linkage issues in windows for currently unknown reasons.
Related issue: https://github.com/llvm/llvm-project/issues/106859.
2024-10-09 21:43:11 +02:00
Umang Yadav
9f8f1d9890 [MLIR][AMDGPU] Add ability to do 16-bit Memset with HIP APIs (#108587)
CC: @krzysz00  @manupak
2024-09-20 09:53:41 -05:00
JOE1994
884221eddb [mlir] Tidy uses of llvm::raw_stream_ostream (NFC)
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly

( 65b13610a5 for further reference )

* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
2024-09-16 23:23:25 -04:00
Guray Ozen
20861f1f2f [mlir][gpu] Use alloc OP's host_shared in cuda runtime (#99035) 2024-07-17 07:25:11 +02:00
Christian Ulmann
631ae59d30 [MLIR][ExecutionEngine] Introduce shared library (#87067)
This commit introduces a shared library for the MLIR execution engine.
This library is only built when `LLVM_BUILD_LLVM_DYLIB` is set. Having
such a library allows downstream users to depend on the execution engine
without giving up dynamic linkage. This is especially important for CPU
runner-style tools, as they link against large parts of MLIR and LLVM.

It is alternatively possible to modify the `MLIRExecutionEngine` target
when `LLVM_BUILD_LLVM_DYLIB` is set, to avoid duplicated libraries.
2024-03-30 09:53:19 +01:00
Aart Bik
dc4cfdbb8f [mlir][sparse] provide an AoS "view" into sparse runtime support lib (#87116)
Note that even though the sparse runtime support lib always uses SoA
storage for COO storage (and provides correct codegen by means of views
into this storage), in some rare cases we need the true physical SoA
storage as a coordinate buffer. This PR provides that functionality by
means of a (costly) coordinate buffer call.

Since this is currently only used for testing/debugging by means of the
sparse_tensor.print method, this solution is acceptable. If we ever want
a performing version of this, we should truly support AoS storage of COO
in addition to the SoA used right now.
2024-03-29 15:30:36 -07:00
Kai Sasaki
cb898e26f3 [mlir] Make the print function in CRunnerUtil platform agnostic (#86767)
The platform running on Apple Silicon does not seem to support the
negative nan. It causes the test failure where we explicitly specify the
negative nan bit pattern and check the output printed by the CRunnerUtil
function.

We can make the print function in the utility platform agnostic by using
the standard library functions (i.e. `std::isnan` and `std::signbit`) so
that we can run the test across platforms that do not support the
negative bit pattern.

I have added two test cases that would fail in the Apple Silicon
platform without print function changes.

```
$ uname -a
Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:44 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6000 arm64
```

See:
https://discourse.llvm.org/t/test-failure-of-sparse-sign-test-in-apple-silicon/77876/3
2024-03-28 09:40:17 +09:00
Justin Holewinski
5e78417db5 [MLIR][CUDA] Use _alloca instead of alloca on Windows (#85853)
MSVC/Windows does not support `alloca()`; instead it defines `_alloca()`
in `malloc.h`.
2024-03-20 00:32:19 -07:00
Benjamin Kramer
9a3ece232c [mlir][sparse] Fix the calling convention of __truncsfbf2 on windows x64
It also wants us to return the value in XMM0.
2024-03-19 13:48:10 +01:00
Guray Ozen
7d55b916a5 [mlir][nvgpu] Support strided memref when creating TMA descriptor (#85652) 2024-03-18 19:47:39 +01:00
Aart Bik
4daf86ef3f [mlir][sparse] refactoring sparse runtime lib into less paths (#85332)
Two constructors could be easily refactored into one after a lot of
previous deprecated code has been removed.
2024-03-14 17:06:39 -07:00
Mehdi Amini
716042a63f Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702)
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.

This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
2024-03-05 18:00:46 -08:00
Mehdi Amini
4a4fb930a5 Use the new ThreadPoolInterface base class instead of the concrete implementation (NFC) (#84056) 2024-03-05 12:37:11 -08:00
Aart Bik
1c2456d659 [mlir][sparse] remove very thin header file from sparse runtime support (#82820) 2024-02-23 12:37:36 -08:00
Aart Bik
f8ce460e48 [mlir][sparse] cleanup sparse runtime library (#82807)
remove some obsoleted APIs from the library that have been fully
replaced with actual direct IR codegen
2024-02-23 10:52:28 -08:00
Mehdi Amini
744616b3ae Rename ThreadPool::getThreadCount() to getMaxConcurrency() (NFC) (#82296)
This is addressing a long-time TODO to rename this misleading API. The
old one is preserved for now but marked deprecated.
2024-02-19 18:07:12 -08:00
Mehdi Amini
bf4480d923 Apply clang-tidy fixes for readability-identifier-naming in SparseTensorRuntime.cpp (NFC) 2024-02-14 10:11:37 -08:00
Yinying Li
e5924d6499 [mlir][sparse] Implement parsing n out of m (#79935)
1. Add parsing methods for block[n, m].
2. Encode n and m with the newly extended 64-bit LevelType enum.
3. Update 2:4 methods names/comments to n:m.
2024-02-08 14:38:42 -05:00
Benjamin Maxwell
e280c287e4 [mlir] Add mlir_arm_runner_utils library for use in integration tests (#78583)
This adds a new `mlir_arm_runner_utils` library that contains utils
specific to Arm/AArch64. This is for use in MLIR integration tests.

This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This
allows changing vector length or streaming vector length at runtime (or
setting it to a known minimum, i.e. 128-bits).
2024-01-22 09:28:13 +00:00
Fabian Mora
01dbc5da33 Reland [mlir][ExecutionEngine] Add support for global constructors and destructors #78070 (#78170)
This patch add support for executing global constructors and destructors
in the ExecutionEngine.
2024-01-15 12:10:14 -05:00
Cullen Rhodes
3295b88a66 Revert "[mlir][ExecutionEngine] Add support for global constructors and destructors" (#78164)
this is causing test failures on AArch64 linux, hitting the
following assert:

# | mlir-cpu-runner: /home/culrho01/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:519: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const SectionEntry &, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.

Seeing the same in buildbot as well, e.g.

https://lab.llvm.org/buildbot/#/builders/179/builds/9094/steps/12/logs/FAIL__MLIR__sparse_codegen_dim_mlir

Reverts llvm/llvm-project#78070
2024-01-15 14:21:41 +00:00
Fabian Mora
48e8cd8345 [mlir][ExecutionEngine] Add support for global constructors and destructors (#78070)
This patch add support for executing global constructors and destructors
in the `ExecutionEngine`.
2024-01-14 21:41:23 -05:00
Yinying Li
753dc0a01c [mlir][verifyMemref] Fix bug and support more types for verifyMemref (#77682)
1. Fix a bug in verifyMemref to pass in `data` instead of `baseptr`,
which didn't verify data correctly.
2. Add `==` for f16 and bf16.
3. Add a comprehensive test of verifyMemref for all supported types.
2024-01-10 20:04:43 -05:00
Yinying Li
412d784188 [mlir][sparse][CRunnerUtils] Add shuffle in CRunnerUtils (#77124)
Shuffle can generate an array of unique and random numbers from 0 to
size-1. It can be used to generate tensors with specified sparsity
level.
2024-01-09 19:46:35 -05:00
Aart Bik
41a07e668c [mlir][sparse] recognize NVidia 2:4 type for matmul (#76758)
This removes the temporary DENSE24 attribute and replaces it with proper
recognition of dense to 24 conversion. The compressionh will be
performed on the device prior to performing the matrix mult. Note that
we no longer need to start with the linalg version, we can lift this to
the proper named linalg op. Also renames some files into more consistent
names.
2024-01-02 14:44:24 -08:00
Adrian Kuegel
ac8b53fc92 [mlir] Apply ClangTidy performance fix
- Use '\n' instead of std::endl;

https://clang.llvm.org/extra/clang-tidy/checks/performance/avoid-endl.html
2024-01-02 10:00:29 +00:00
Adam Paszke
12e4332501 [mlir][nvgpu] Fix the TMA stride setup (#75838)
There were two issues with the previous computation:
* it never looked at dimensions past the second one
* the definition was recursive, making each dimension have an extra
`elementSize` power
2023-12-19 08:40:26 +01:00
Yinying Li
7bc6c4abe8 [mlir][print]Add functions for printing memref f16/bf16/i16 (#75094)
1. Added functions for printMemrefI16/f16/bf16.
2. Added a new integration test for all the printMemref functions.
2023-12-14 13:06:25 -05:00
Adam Paszke
65aab9e722 [mlir][gpu] Generate multiple rank-specializations for tensor map cre… (#74082)
…ation

The previous code was technically incorrect in that the type indicated
that the memref only has 1 dimension, while the code below was happily
dereferencing the size array out of bounds. Now, if the compiler doesn't
get too smart about optimizations, this code *might even work*. But, if
the compiler realizes that the array has 1 element it might starrt doing
silly things. This generates a specialization per each supported rank,
making sure we don't do any UB.
2023-12-01 15:51:48 +01:00
Aart Bik
6fb7c2d713 [mlir][sparse] bug fix on all-dense lex insertion (#73987)
Fixes a bug that appended values after insertion completed. Also slight
optimization by avoiding all-Dense computation for every lexInsert call
2023-11-30 14:19:02 -08:00
Adam Paszke
1c2a0768de [MLIR][CUDA] Update export macros in CudaRuntimeWrappers (#73932)
This fixes a few issues present in the current version:
1) The macro doesn't enforce the default visibility on exported
   functions, causing compilation to fail when using
   `-fvisibility=hidden`
2) Not all functions are exported
3) Sometimes the macro ended up weirdly interleaved with `extern "C"`
   declarations
2023-11-30 14:57:39 +01:00
Fangrui Song
a3ef858968 [mlir,polly] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC 2023-11-27 20:58:25 -08:00
Aart Bik
1944c4f76b [mlir][sparse] rename DimLevelType to LevelType (#73561)
The "Dim" prefix is a legacy left-over that no longer makes sense, since
we have a very strict "Dimension" vs. "Level" definition for sparse
tensor types and their storage.
2023-11-27 14:27:52 -08:00
Guray Ozen
f21a70f9fe [mlir][cuda] Guard mgpuLaunchClusterKernel for Cuda 12.0+ (NFC) (#73495) 2023-11-27 11:50:46 +01:00
Guray Ozen
edf5cae739 [mlir][gpu] Support Cluster of Thread Blocks in gpu.launch_func (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
                clusters in (%1, %0, %0) // <-- Optional
                blocks in (%0, %0, %0)
                threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id  x
%cidY = gpu.cluster_id  y
%cidZ = gpu.cluster_id  z

%cdimX = gpu.cluster_dim  x
%cdimY = gpu.cluster_dim  y
%cdimZ = gpu.cluster_dim  z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR. 

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
2023-11-27 11:05:07 +01:00
Ivan Butygin
0bda20b8be Reland [mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs #73147 (#73238)
https://github.com/llvm/llvm-project/pull/73147

Fixed the visibility macro
2023-11-23 16:59:17 +03:00
Ivan Butygin
bf353a71a2 Revert "[mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs (#73147)"
This reverts commit 6248c24876.

broke the bots
2023-11-23 13:32:42 +01:00
Ivan Butygin
6248c24876 [mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs (#73147)
Using mlir cmake in downstream project fails with error
```
CMake Error at D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake:2537 (message):
  The imported target "mlir_arm_sme_abi_stubs" references the file

     "D:/projs/llvm/llvm-install/lib/mlir_arm_sme_abi_stubs.lib"

  but this file does not exist.  Possible reasons include:

  * The file was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and contained

     "D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake"

  but not all the files it references.

Call Stack (most recent call first):
  D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRConfig.cmake:37 (include)
  mlir/CMakeLists.txt:5 (find_package)
```

Windows cmake needs export libaries but it seems they are only being
generated if you have at least one exported symbol.
Add export attributes to symbols.

Not sure what the best approach to fix this (probably we should just
disable this lib on windows entirely), but it fixed things for me
locally.
2023-11-23 15:23:01 +03:00
Benjamin Maxwell
783ac3b6fb [mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044)
Previously, we were inserting za.enable/disable intrinsics for functions
with the "arm_za" attribute (at the MLIR level), rather than using the
backend attributes. This was done to avoid a dependency on the SME ABI
functions from compiler-rt (which have only recently been implemented).

Doing things this way did have correctness issues, for example, calling
a streaming-mode function from another streaming-mode function (both
with ZA enabled) would lead to ZA being disabled after returning to the
caller (where it should still be enabled). Fixing issues like this would
require re-doing the ABI work already done in the backend within MLIR.

Instead, this patch switches to use the "arm_new_za" (backend) attribute
for enabling ZA for an MLIR function. For the integration tests, this
requires some way of linking the SME ABI functions. This is done via the
`%arm_sme_abi_shlib` lit substitution. By default, this expands to a
stub implementation of the SME ABI functions, but this can be overridden
by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable
(pointing it at an alternative implementation). For now, the ArmSME
integration tests pass with just stubs, as we don't make use of nested
ZA-enabled calls.

A future patch may add an option to compiler-rt to build the SME
builtins into a standalone shared library to allow easily
building/testing with the actual implementation.
2023-11-14 12:50:38 +00:00
Aart Bik
4f183b1f6e [mlir][sparse] remove obsoleted output methods from runtime (#70523)
Our CODE and LIB are more unified every day!
2023-10-27 16:58:41 -07:00