clang-p2996

Author	SHA1	Message	Date
Christian Ulmann	631ae59d30	[MLIR][ExecutionEngine] Introduce shared library (#87067 ) This commit introduces a shared library for the MLIR execution engine. This library is only built when `LLVM_BUILD_LLVM_DYLIB` is set. Having such a library allows downstream users to depend on the execution engine without giving up dynamic linkage. This is especially important for CPU runner-style tools, as they link against large parts of MLIR and LLVM. It is alternatively possible to modify the `MLIRExecutionEngine` target when `LLVM_BUILD_LLVM_DYLIB` is set, to avoid duplicated libraries.	2024-03-30 09:53:19 +01:00
Aart Bik	dc4cfdbb8f	[mlir][sparse] provide an AoS "view" into sparse runtime support lib (#87116 ) Note that even though the sparse runtime support lib always uses SoA storage for COO storage (and provides correct codegen by means of views into this storage), in some rare cases we need the true physical SoA storage as a coordinate buffer. This PR provides that functionality by means of a (costly) coordinate buffer call. Since this is currently only used for testing/debugging by means of the sparse_tensor.print method, this solution is acceptable. If we ever want a performing version of this, we should truly support AoS storage of COO in addition to the SoA used right now.	2024-03-29 15:30:36 -07:00
Kai Sasaki	cb898e26f3	[mlir] Make the print function in CRunnerUtil platform agnostic (#86767 ) The platform running on Apple Silicon does not seem to support the negative nan. It causes the test failure where we explicitly specify the negative nan bit pattern and check the output printed by the CRunnerUtil function. We can make the print function in the utility platform agnostic by using the standard library functions (i.e. `std::isnan` and `std::signbit`) so that we can run the test across platforms that do not support the negative bit pattern. I have added two test cases that would fail in the Apple Silicon platform without print function changes. ``` $ uname -a Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:44 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6000 arm64 ``` See: https://discourse.llvm.org/t/test-failure-of-sparse-sign-test-in-apple-silicon/77876/3	2024-03-28 09:40:17 +09:00
Justin Holewinski	5e78417db5	[MLIR][CUDA] Use _alloca instead of alloca on Windows (#85853 ) MSVC/Windows does not support `alloca()`; instead it defines `_alloca()` in `malloc.h`.	2024-03-20 00:32:19 -07:00
Benjamin Kramer	9a3ece232c	[mlir][sparse] Fix the calling convention of __truncsfbf2 on windows x64 It also wants us to return the value in XMM0.	2024-03-19 13:48:10 +01:00
Guray Ozen	7d55b916a5	[mlir][nvgpu] Support strided memref when creating TMA descriptor (#85652 )	2024-03-18 19:47:39 +01:00
Aart Bik	4daf86ef3f	[mlir][sparse] refactoring sparse runtime lib into less paths (#85332 ) Two constructors could be easily refactored into one after a lot of previous deprecated code has been removed.	2024-03-14 17:06:39 -07:00
Mehdi Amini	716042a63f	Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702 ) The base class llvm::ThreadPoolInterface will be renamed llvm::ThreadPool in a subsequent commit. This is a breaking change: clients who use to create a ThreadPool must now create a DefaultThreadPool instead.	2024-03-05 18:00:46 -08:00
Mehdi Amini	4a4fb930a5	Use the new ThreadPoolInterface base class instead of the concrete implementation (NFC) (#84056 )	2024-03-05 12:37:11 -08:00
Aart Bik	1c2456d659	[mlir][sparse] remove very thin header file from sparse runtime support (#82820 )	2024-02-23 12:37:36 -08:00
Aart Bik	f8ce460e48	[mlir][sparse] cleanup sparse runtime library (#82807 ) remove some obsoleted APIs from the library that have been fully replaced with actual direct IR codegen	2024-02-23 10:52:28 -08:00
Mehdi Amini	744616b3ae	Rename `ThreadPool::getThreadCount()` to `getMaxConcurrency()` (NFC) (#82296 ) This is addressing a long-time TODO to rename this misleading API. The old one is preserved for now but marked deprecated.	2024-02-19 18:07:12 -08:00
Mehdi Amini	bf4480d923	Apply clang-tidy fixes for readability-identifier-naming in SparseTensorRuntime.cpp (NFC)	2024-02-14 10:11:37 -08:00
Yinying Li	e5924d6499	[mlir][sparse] Implement parsing n out of m (#79935 ) 1. Add parsing methods for block[n, m]. 2. Encode n and m with the newly extended 64-bit LevelType enum. 3. Update 2:4 methods names/comments to n:m.	2024-02-08 14:38:42 -05:00
Benjamin Maxwell	e280c287e4	[mlir] Add `mlir_arm_runner_utils` library for use in integration tests (#78583 ) This adds a new `mlir_arm_runner_utils` library that contains utils specific to Arm/AArch64. This is for use in MLIR integration tests. This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This allows changing vector length or streaming vector length at runtime (or setting it to a known minimum, i.e. 128-bits).	2024-01-22 09:28:13 +00:00
Fabian Mora	01dbc5da33	Reland [mlir][ExecutionEngine] Add support for global constructors and destructors #78070 (#78170 ) This patch add support for executing global constructors and destructors in the ExecutionEngine.	2024-01-15 12:10:14 -05:00
Cullen Rhodes	3295b88a66	Revert "[mlir][ExecutionEngine] Add support for global constructors and destructors" (#78164 ) this is causing test failures on AArch64 linux, hitting the following assert: # \| mlir-cpu-runner: /home/culrho01/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:519: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const SectionEntry &, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed. Seeing the same in buildbot as well, e.g. https://lab.llvm.org/buildbot/#/builders/179/builds/9094/steps/12/logs/FAIL__MLIR__sparse_codegen_dim_mlir Reverts llvm/llvm-project#78070	2024-01-15 14:21:41 +00:00
Fabian Mora	48e8cd8345	[mlir][ExecutionEngine] Add support for global constructors and destructors (#78070 ) This patch add support for executing global constructors and destructors in the `ExecutionEngine`.	2024-01-14 21:41:23 -05:00
Yinying Li	753dc0a01c	[mlir][verifyMemref] Fix bug and support more types for verifyMemref (#77682 ) 1. Fix a bug in verifyMemref to pass in `data` instead of `baseptr`, which didn't verify data correctly. 2. Add `==` for f16 and bf16. 3. Add a comprehensive test of verifyMemref for all supported types.	2024-01-10 20:04:43 -05:00
Yinying Li	412d784188	[mlir][sparse][CRunnerUtils] Add shuffle in CRunnerUtils (#77124 ) Shuffle can generate an array of unique and random numbers from 0 to size-1. It can be used to generate tensors with specified sparsity level.	2024-01-09 19:46:35 -05:00
Aart Bik	41a07e668c	[mlir][sparse] recognize NVidia 2:4 type for matmul (#76758 ) This removes the temporary DENSE24 attribute and replaces it with proper recognition of dense to 24 conversion. The compressionh will be performed on the device prior to performing the matrix mult. Note that we no longer need to start with the linalg version, we can lift this to the proper named linalg op. Also renames some files into more consistent names.	2024-01-02 14:44:24 -08:00
Adrian Kuegel	ac8b53fc92	[mlir] Apply ClangTidy performance fix - Use '\n' instead of std::endl; https://clang.llvm.org/extra/clang-tidy/checks/performance/avoid-endl.html	2024-01-02 10:00:29 +00:00
Adam Paszke	12e4332501	[mlir][nvgpu] Fix the TMA stride setup (#75838 ) There were two issues with the previous computation: * it never looked at dimensions past the second one * the definition was recursive, making each dimension have an extra `elementSize` power	2023-12-19 08:40:26 +01:00
Yinying Li	7bc6c4abe8	[mlir][print]Add functions for printing memref f16/bf16/i16 (#75094 ) 1. Added functions for printMemrefI16/f16/bf16. 2. Added a new integration test for all the printMemref functions.	2023-12-14 13:06:25 -05:00
Adam Paszke	65aab9e722	[mlir][gpu] Generate multiple rank-specializations for tensor map cre… (#74082 ) …ation The previous code was technically incorrect in that the type indicated that the memref only has 1 dimension, while the code below was happily dereferencing the size array out of bounds. Now, if the compiler doesn't get too smart about optimizations, this code might even work. But, if the compiler realizes that the array has 1 element it might starrt doing silly things. This generates a specialization per each supported rank, making sure we don't do any UB.	2023-12-01 15:51:48 +01:00
Aart Bik	6fb7c2d713	[mlir][sparse] bug fix on all-dense lex insertion (#73987 ) Fixes a bug that appended values after insertion completed. Also slight optimization by avoiding all-Dense computation for every lexInsert call	2023-11-30 14:19:02 -08:00
Adam Paszke	1c2a0768de	[MLIR][CUDA] Update export macros in CudaRuntimeWrappers (#73932 ) This fixes a few issues present in the current version: 1) The macro doesn't enforce the default visibility on exported functions, causing compilation to fail when using `-fvisibility=hidden` 2) Not all functions are exported 3) Sometimes the macro ended up weirdly interleaved with `extern "C"` declarations	2023-11-30 14:57:39 +01:00
Fangrui Song	a3ef858968	[mlir,polly] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC	2023-11-27 20:58:25 -08:00
Aart Bik	1944c4f76b	[mlir][sparse] rename DimLevelType to LevelType (#73561 ) The "Dim" prefix is a legacy left-over that no longer makes sense, since we have a very strict "Dimension" vs. "Level" definition for sparse tensor types and their storage.	2023-11-27 14:27:52 -08:00
Guray Ozen	f21a70f9fe	[mlir][cuda] Guard mgpuLaunchClusterKernel for Cuda 12.0+ (NFC) (#73495 )	2023-11-27 11:50:46 +01:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Ivan Butygin	0bda20b8be	Reland [mlir] Workaround for export lib generation on Windows for mlir_arm_sme_abi_stubs #73147 (#73238 ) https://github.com/llvm/llvm-project/pull/73147 Fixed the visibility macro	2023-11-23 16:59:17 +03:00
Ivan Butygin	bf353a71a2	Revert "[mlir] Workaround for export lib generation on Windows for `mlir_arm_sme_abi_stubs` (#73147 )" This reverts commit `6248c24876`. broke the bots	2023-11-23 13:32:42 +01:00
Ivan Butygin	6248c24876	[mlir] Workaround for export lib generation on Windows for `mlir_arm_sme_abi_stubs` (#73147 ) Using mlir cmake in downstream project fails with error ``` CMake Error at D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake:2537 (message): The imported target "mlir_arm_sme_abi_stubs" references the file "D:/projs/llvm/llvm-install/lib/mlir_arm_sme_abi_stubs.lib" but this file does not exist. Possible reasons include: * The file was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and contained "D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRTargets.cmake" but not all the files it references. Call Stack (most recent call first): D:/projs/llvm/llvm-install/lib/cmake/mlir/MLIRConfig.cmake:37 (include) mlir/CMakeLists.txt:5 (find_package) ``` Windows cmake needs export libaries but it seems they are only being generated if you have at least one exported symbol. Add export attributes to symbols. Not sure what the best approach to fix this (probably we should just disable this lib on windows entirely), but it fixed things for me locally.	2023-11-23 15:23:01 +03:00
Benjamin Maxwell	783ac3b6fb	[mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044 ) Previously, we were inserting za.enable/disable intrinsics for functions with the "arm_za" attribute (at the MLIR level), rather than using the backend attributes. This was done to avoid a dependency on the SME ABI functions from compiler-rt (which have only recently been implemented). Doing things this way did have correctness issues, for example, calling a streaming-mode function from another streaming-mode function (both with ZA enabled) would lead to ZA being disabled after returning to the caller (where it should still be enabled). Fixing issues like this would require re-doing the ABI work already done in the backend within MLIR. Instead, this patch switches to use the "arm_new_za" (backend) attribute for enabling ZA for an MLIR function. For the integration tests, this requires some way of linking the SME ABI functions. This is done via the `%arm_sme_abi_shlib` lit substitution. By default, this expands to a stub implementation of the SME ABI functions, but this can be overridden by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable (pointing it at an alternative implementation). For now, the ArmSME integration tests pass with just stubs, as we don't make use of nested ZA-enabled calls. A future patch may add an option to compiler-rt to build the SME builtins into a standalone shared library to allow easily building/testing with the actual implementation.	2023-11-14 12:50:38 +00:00
Aart Bik	4f183b1f6e	[mlir][sparse] remove obsoleted output methods from runtime (#70523 ) Our CODE and LIB are more unified every day!	2023-10-27 16:58:41 -07:00
Youngsuk Kim	645b7795d4	[mlir] Remove no-op ptr-to-ptr bitcasts (NFC) Opaque pointer cleanup effort. NFC.	2023-10-26 13:01:23 -05:00
Nishant Patel	7fa19e6f4b	[MLIR] Add SyclRuntimeWrapper (#69648 )	2023-10-26 19:41:09 +02:00
Benjamin Maxwell	274ce8895b	[mlir] Remove `printCString()` from RunnerUtils (#70197 ) This is now unused and can be replaced with `printString()` from CRunnerUtils or `vector.print str`.	2023-10-26 10:07:23 +01:00
Guray Ozen	5ef45c02dc	[mlir][cuda] Avoid driver call to check max shared memory (#70021 ) This PR guards the driver call with if-statement as the driver calls are more expensive. As a future todo, the if statement could be generated by the compiler and thus optimized in some cases.	2023-10-26 11:02:32 +03:00
Benjamin Maxwell	3be3883e6d	[mlir][VectorOps] Support string literals in `vector.print` (#68695 ) Printing strings within integration tests is currently quite annoyingly verbose, and can't be tucked into shared helpers as the types depend on the length of the string: ``` llvm.mlir.global internal constant @hello_world("Hello, World!\0") func.func @entry() { %0 = llvm.mlir.addressof @hello_world : !llvm.ptr<array<14 x i8>> %1 = llvm.mlir.constant(0 : index) : i64 %2 = llvm.getelementptr %0[%1, %1] : (!llvm.ptr<array<14 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> () return } ``` So this patch adds a simple extension to `vector.print` to simplify this: ``` func.func @entry() { // Print a vector of characters ;) vector.print str "Hello, World!" return } ``` Most of the logic for this is now shared with `cf.assert` which already does something similar. Depends on #68694	2023-10-24 09:34:14 +01:00
Aart Bik	e6005d5a9c	[mlir][sparse] support 2:4 structured sparsity and loose compressed (#69968 ) This adds library support for these two new level formats.	2023-10-23 15:34:45 -07:00
Kazu Hirata	5a98dd6734	[mlir] Remove an extraneous typename (NFC)	2023-10-22 10:42:16 -07:00
Brad Smith	a157a82b1e	[mlir] Avoid including <alloca.h> on DragonFly	2023-10-21 01:19:34 -04:00
Aart Bik	48962383ad	[mlir][sparse] tiny cleanup making local 'using' explicit (#69740 )	2023-10-20 12:41:08 -07:00
Aart Bik	306f4c306a	[mlir][sparse] implement non-permutation MapRef encoding (#69406 ) This enables reading block sparse from file using libgen! (and soon also direct IR codegen)	2023-10-18 13:01:12 -07:00
Aart Bik	d816c221b4	[mlir][sparse] complete migration to dim2lvl/lvl2dim in library (#69268 ) This last revision completed the migration to non-permutation support in the SparseTensor library. All mappings are now controlled by the MapRef (forward and backward). Unused code has been removed, which simplifies subsequent testing of block sparsity.	2023-10-17 09:32:22 -07:00
Aart Bik	233c3e6c53	[mlir][sparse] remove sparse2sparse path in library (#69247 ) This cleans up all external entry points that will have to deal with non-permutations, making any subsequent refactoring much more local to the lib files.	2023-10-16 14:45:57 -07:00
Aart Bik	d392073f67	[mlir][sparse] simplify reader construction of new sparse tensor (#69036 ) Making the materialize-from-reader method part of the Swiss army knife suite again removes a lot of redundant boiler plate code and unifies the parameter setup into a single centralized utility. Furthermore, we now have minimized the number of entry points into the library that need a non-permutation map setup, simplifying what comes next	2023-10-16 10:25:37 -07:00
Aart Bik	9bd5bfc689	[mlir][sparse] remove unused sparse tensor iterator (#68951 )	2023-10-12 22:51:07 -07:00

1 2 3 4 5 ...

567 Commits