clang-p2996

Author	SHA1	Message	Date
Guray Ozen	5caae72d1a	[mlir][gpu] Productize `test-lower-to-nvvm` as `gpu-lower-to-nvvm` (#75775 ) The `test-lower-to-nvvm` pipeline serves as the common and proper pipeline for nvvm+host compilation, and it's used across our CUDA integration tests. This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm` and moves it within `InitAllPasses.h`. The aim is to call it from Python, also having a standardize compilation process for nvvm.	2023-12-19 08:40:46 +01:00
Sang Ik Lee	7fc792cba7	[MLIR] Enable GPU Dialect to SYCL runtime integration (#71430 ) GPU Dialect lowering to SYCL runtime is driven by spirv.target_env attached to gpu.module. As a result of this, spirv.target_env remains as an input to LLVMIR Translation. A SPIRVToLLVMIRTranslation without any actual translation is added to avoid an unregistered error in mlir-cpu-runner. SelectObjectAttr.cpp is updated to 1) Pass binary size argument to getModuleLoadFn 2) Pass parameter count to getKernelLaunchFn This change does not impact CUDA and ROCM usage since both mlir_cuda_runtime and mlir_rocm_runtime are already updated to accept and ignore the extra arguments.	2023-12-05 16:55:24 -05:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `dd09221a29` and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Guray Ozen	51916f0c92	[mlir] Add sm_90a GEMM test 128x128x128 (F32 += F16 * F16) (#69913 ) This PR adds a test that performs GEMM 128x128x128 (F32 += F16 * F16). It uses `sm_90a` features in NVGPU dialect. Simplified algorithm is as follows: Prologue ``` mgroup = mbarriers.init x 2 tma.load ... shmem_buffer_lhs<0 x 128 x 64> tma.load ... shmem_buffer_rhs<0 x 64 x 64> tma.load ... shmem_buffer_rhs<0 x 64 x 64> mbarrier.expect_tx 32768 tma.load ... shmem_buffer_lhs<1 x 128 x 64> tma.load ... shmem_buffer_rhs<1 x 64 x 64> tma.load ... shmem_buffer_rhs<1 x 64 x 64> mbarrier.expect_tx 32768 ``` Mainloop ``` matrixD = for(i = 0;...2) { mbarrier.try_wait [i] lhs = shmem_buffer_lhs<pipe x 128 x 64> rhs = shmem_buffer_rhs<pipe x 64 x 128> yield nvgpu.warpgroup.mma (lhs, rhs) // Expanded : nvgpu.warpgroup.mma [128][128]+=[128][64][64][128] // wgmma.m64n128k16(A[0:64][0:16] B[0:16][0:128]) // wgmma.m64n128k16(A[0:64][16:32] * B[16:32][0:128]) // wgmma.m64n128k16(A[0:64][32:48] * B[32:48][0:128]) // wgmma.m64n128k16(A[0:64][48:64] * B[48:64][0:128]) // wgmma.m64n128k16(A[64:128][0:16] * B[0:16][0:128]) // wgmma.m64n128k16(A[64:128][16:32] * B[16:32][0:128]) // wgmma.m64n128k16(A[64:128][32:48] * B[32:48][0:128]) // wgmma.m64n128k16(A[64:128][48:64] * B[48:64][0:128]) ``` Epilogue ``` //reg->shmem warpgroup.mma.store matrixD, shmem //shmem->glbmem parallel-for(i=0;...128) parallel-for(j=0;...128) store shmem, globalmem ```	2023-11-10 16:53:43 +01:00
Guray Ozen	a00caad6bf	[mlir] Add sm_90a GEMM test 128x128x128 (F32 =F16F16) with predicate (#70028 ) PR #69913 added a GEMM test (128x128x128 F32 += F16 F16) with if-statement. This PR adds the same test using predicates in PTX. Predicate support is enabled using _BasicPtxBuilderInterface_ `(nvgpu.opcode ..., predicate = %pred)`. The predicate condition is computed in `Step 2. [GPU] Elect fastest thread in CTA` inspired by cutlass. It is as follows: ``` lane_predicate = nvvm.elect.sync warp_idx = __shfl_sync(0xffffffff, threadIdx.x / 32, 0) warp_idx_in_warp_group = warp_idx % 4 predicate = (lane_predicate & warp_idx_in_warp_group) ``` Depends on #70027 #69934 #69935 #69584	2023-11-10 16:52:00 +01:00
Guray Ozen	f4d59522cf	[mlir] Fix sm90 test for new verifier #70923 improved verifier. The verifier caught that the tensor map type in the tma descriptor in this test isn't correct. The program was working correctly anway since the offset is calculated correctly. This work fixes the test.	2023-11-10 16:50:01 +01:00
Christian Ulmann	52491c99fa	[MLIR][LLVM] Remove typed pointer remnants from integration tests (#71208 ) This commit removes all LLVM dialect typed pointers from the integration tests. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-03 21:21:25 +01:00
Christian Ulmann	6cc1c2c6f3	[MLIR][LLVM] Remove last remants of use-opaque-pointers from tests (#71076 ) This commit removes the last remnants of `use-opaque-pointers` from the mlir tests. Two of the tests seem to be disabled, while the CUDA one is an integration test that didn't trigger a buildbot failure.	2023-11-02 21:25:05 +01:00
Guray Ozen	f7dc26cab2	[mlir] Fixed typo in type (128x64 -> 64x128) in TMA load test (#70022 ) The test was meant to check `64x128xf16` as the contiguous dimension exceeds the cache line (128b). TMA requires cache line-aligned loads, so loading 64x128 can be done with two 64x64 loads, as documented in the test. However, there was a typo in the type, which was `memref<128x64xf16>` instead of the correct `memref<64x128xf16>`. This PR corrects the issue and updates the verification.	2023-10-26 11:02:54 +03:00
Guray Ozen	f8058a37ae	[mlir] Fix nvvm integration tests build error (#70113 ) #69934 broke integration tests that rely on the kernel-bare-ptr-calling-convention and host-bare-ptr-calling-convention flags. This PR brings these flags. Also the kernel-index-bitwidth flag is removed, as kernel pointer size depends on the host. Separating host (64-bit) and kernel (32-bit) is not viable.	2023-10-24 22:32:46 +02:00
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Guray Ozen	afe400620f	[MLIR] Use `test-lower-to-nvvm` for sm_90 Integration Tests on GitHub (#68184 ) This PR enables `test-lower-to-nvvm` pass pipeline for the integration tests for NVIDIA sm_90 architecture. This PR adjusts `test-lower-to-nvvm` pass in two ways: 1) Calls `createConvertNVGPUToNVVMPass` before the outlining process. This particular pass is responsible for generating both device and host code. On the host, it calls the CUDA driver to build the TMA descriptor (`cuTensorMap`). 2) Integrates the `createConvertNVVMToLLVMPass` to generate PTXs for NVVM Ops.	2023-10-04 09:50:48 +02:00
Guray Ozen	f9149a34d9	[mlir] adapt sm_90 integration test `mbarrier.group` (#67423 ) #65951 improved mbarrier supports. This PR adapts that usage in the integration test.	2023-09-26 15:50:17 +02:00
Guray Ozen	f4fb03937a	[MLIR] Cleanup Pass Pipeline in sm_90 Integration Tests (#67416 ) MLIR has begun supporting many features of Nvidia's sm_90 architecture, and new tests have been added for it. Although the tests worked well, there were redundancies in the pipeline. This PR cleans up unnecessary passes.	2023-09-26 14:21:22 +02:00
Guray Ozen	9420fc4956	[MLIR] SM_90 integratation test of TMA `128x64xf16` and `64x64xf16` with 128b Swizzling (#65954 ) The #65953 added a test `128x64xf16` that does a single TMA load. This PR adds more complex test that does 2 additional TMA loads with 128B Swizzling: ``` TMA Load: Matrix-A[0:128][0:64] TMA Load: Matrix-B[0:64][0:64] TMA Load: Matrix-B[64:128][0:64] ``` The program tests the loaded data for Matrix-B.	2023-09-15 09:30:15 +02:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
frgossen	1cddbf8cf5	Revert `Add host-supports-nvptx requirement to lit tests` (#66102 and #66129 ) (#66225 )	2023-09-13 12:20:38 -04:00
Guray Ozen	ba81cd10d6	[MLIR] Fix the tma_load test (#66208 ) clang was used for local testing. The PR changes it to `mlir-cpu-runner`	2023-09-13 15:33:42 +02:00
Guray Ozen	38faecc692	[MLIR] SM_90 integration test of TMA with 128b Swizzling (#65953 ) An integration test for the 128b Swizzling TMA. TMA with 128B Swizzle loads data as follows (each numbered cell is 16 bytes). The program tests this pattern for `128x64xf16` type. ``` \|-------------------------------\| \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \| 6 \| 7 \| \| 1 \| 0 \| 3 \| 2 \| 5 \| 4 \| 7 \| 6 \| \| 2 \| 3 \| 0 \| 1 \| 6 \| 7 \| 4 \| 5 \| \| 3 \| 2 \| 1 \| 0 \| 7 \| 6 \| 5 \| 4 \| \| 4 \| 5 \| 6 \| 7 \| 0 \| 1 \| 2 \| 3 \| \| 5 \| 4 \| 7 \| 6 \| 1 \| 0 \| 3 \| 2 \| \| 6 \| 7 \| 4 \| 5 \| 2 \| 3 \| 0 \| 1 \| \|-------------------------------\| \| ... pattern repeats ... \| \|-------------------------------\| ```	2023-09-13 15:22:48 +02:00
frgossen	a3b894287f	Add host-supports-nvptx requirement to lit tests (#66102 )	2023-09-12 12:21:36 -04:00
Guray Ozen	ad4411230a	[MLIR] Make SM_90 integration tests use `TargetAttr` (#65926 ) The 'TargetAttr' workflow was recently introduced to serialization for 'MLIR->LLVM->PTX'. #65857 removes previous passes (gpu::Serialization* passes) because they are duplicates. This PR removes the use of gpu::Serialization* passes in SM_90 integration tests, and enables the 'TargetAttr' workflow. It also moves the transform dialect specific test to a new folder.	2023-09-11 14:34:03 +02:00
Fabian Mora	119c489cc1	Reland [mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (llvm#65768) The revert happened due to a build bot failure that threw 'CUDA_ERROR_UNSUPPORTED_PTX_VERSION'. The failure's root cause was a pass using "+ptx76" for compilation and an old CUDA driver on the bot. This commit relands the patch with "+ptx60". Original Gh PR: #65768 Original commit message: Migrate tests referencing `gpu-to-cubin` to the new compilation workflow using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified to use the new compilation workflow to simplify the introduction of future tests. The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't allow for passing all options available in the `ConvertGpuOpsToNVVMOp` pass.	2023-09-09 12:45:21 +00:00
Fabian Mora	2c596ea951	Revert "[mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (#65768 ) (#65848 ) This reverts commit `d21b67293b`.	2023-09-09 07:14:19 -04:00
Fabian Mora	d21b67293b	[mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (#65768 ) Migrate tests referencing `gpu-to-cubin` to the new compilation workflow using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified to use the new compilation workflow to simplify the introduction of future tests. The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't allow for passing all options available in the `ConvertGpuOpsToNVVMOp` pass.	2023-09-09 07:03:38 -04:00
Fabian Mora	5db8990cda	[mlir][test][gpu] Migrate ROCM tests to the TargetAttr compilation workflow (#65805 ) Migrate tests referencing `gpu-to-hsaco` to the new compilation workflow using TargetAttrs.	2023-09-08 21:26:43 -04:00
Guray Ozen	8031a088eb	[MLIR] Run the TMA test for sm_90 TMA was introduced to MLIR, however, it needed `ptxas` compiler. Recent work D154117 introduced that! This work runs the existing integration test. Reviewed By: fmorac Differential Revision: https://reviews.llvm.org/D159347	2023-09-04 18:15:37 +02:00
Benjamin Maxwell	f36e909da0	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors Reland of the original patch after updating the Python binding tests, a few CUDA/GPU MLIR tests, and ensuring the assembly format is round-trippable. This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print punctuation <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-11 09:29:54 +00:00
Mehdi Amini	bf57c1fa1e	Revert "[tests] Fix gpu-to-cubin.mlir (NFC)" This reverts commit `9ef6cffbb0`. This was an attempt to fix the bot for a commit I reverted since. It isn't necessary anymore.	2023-08-09 19:55:19 -07:00
Mehdi Amini	1b272d21c8	Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors" This reverts commit `490dae26cb`. Bot is broken, seems like there is a problem of ambiguity in the parser.	2023-08-09 19:37:01 -07:00
Mehdi Amini	5033ec0a9e	Revert "[tests] Fix gpu-to-cubin.mlir (NFC)" This reverts commit `4434bc5508`. It does not make sense to introduce more passes to fix a parsing issue. More importantly: it didn't fix the test!	2023-08-09 19:37:01 -07:00
Anlun Xu	9ef6cffbb0	[tests] Fix gpu-to-cubin.mlir (NFC) Differential Revision: https://reviews.llvm.org/D157561	2023-08-09 16:25:08 -07:00
Benjamin Maxwell	4434bc5508	[tests] Fix gpu-to-cubin.mlir (NFC)	2023-08-09 12:02:56 +00:00
Benjamin Maxwell	490dae26cb	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors Reland of the original patch after updating the Python binding tests and a few CUDA/GPU MLIR tests. This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-09 11:47:18 +00:00
Nicolas Vasilache	a3cd2eeb2d	[mlir][nvgpu] Add a nvgpu.rewrite_copy_as_tma transform operation. This revision adds support for direct lowering of a linalg.copy on buffers between global and shared memory to a tma async load + synchronization operations. This uses the recently introduced Hopper NVVM and NVGPU abstraction to connect things end to end. Differential Revision: https://reviews.llvm.org/D157087	2023-08-08 12:07:59 +00:00
Guray Ozen	ca74ad8829	[mlir] Nvidia Hopper TMA load integration test This work introduces sm90 integration testing and adds a single test. Depends on : D155825 D155680 D155563 D155453 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155838	2023-07-28 21:04:07 +00:00
Nicolas Vasilache	7e78ecfe10	[mlir][cuda] Add a test-lower-to-nvvm catchall passpipeline. This mirrors the test-lower-to-llvm pass pipeline that provides some sanity when running e2e examples. One peculiarity of the GPU pipeline is that we want to allow 32b indexing in kernels. This is currently not straightforward as there are dependencies between passes. This new test pass orders passes in a way that connects end-to-end. Differential Revision: https://reviews.llvm.org/D155463	2023-07-17 15:18:33 +00:00
Nicolas Vasilache	13f4e889c5	Revert "Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions"" This reverts commit `6506692fe6`. Differential Revision: https://reviews.llvm.org/D153845	2023-06-28 06:50:05 +00:00
Mehdi Amini	6506692fe6	Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions" This reverts commit `40deed40ae`. and commit `1660f2174d`. The buildbot is broken, the two tests aren't passing.	2023-06-27 08:46:18 +02:00
Nicolas Vasilache	1660f2174d	[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite. This PR adds support for the m16n8k16 f16 case. At this point, the support is mostly mechanical and could be Tablegen'd to all cases. Until then, this can be populated as needed on a case-by-case basis. Depends on: D153420 Differential Revision: https://reviews.llvm.org/D153428	2023-06-26 16:46:42 +00:00
Nicolas Vasilache	40deed40ae	[mlir][Transform] Introduce nvgpu transform extensions Mapping to NVGPU operations such as mma.sync with mixed precision and ldmatrix with transposes and various data types involves complex matchings from low-level IR. This is akin to raising complex patterns after unnecessarily having lost structural information. To avoid such unnecessary complexity, introduce a direct mapping step from a matmul on memrefs to distributed NVGPU vector abstractions. In this context, mapping to specific mma.sync operations is trivial and consists in simply translating the documentation into indexing expressions. Correctness is demonstrated with an end-to-end integration test. Differential Revision: https://reviews.llvm.org/D153420	2023-06-26 16:21:28 +00:00
Uday Bondhugula	27ccf0f407	[MLIR] Provide bare pointer memref lowering option on gpu-to-nvvm pass Provide the bare pointer memref lowering option on gpu-to-nvvm pass. This is needed whenever we lower memrefs on the host function side and the kernel calls on the host-side (gpu-to-llvm) with the bare ptr convention. The GPU module side of the lowering should also "align" and use the bare pointer convention. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D152480	2023-06-18 22:53:50 +05:30
Tobias Hieta	f9008e6366	[NFC][Py Reformat] Reformat python files in mlir subdir This is an ongoing series of commits that are reformatting our Python code. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Differential Revision: https://reviews.llvm.org/D150782	2023-05-26 08:05:40 +02:00
Markus Böck	9048ea28da	Reland "[mlir] Make the vast majority of intgration and runner tests work on Windows" This reverts commit `5561e17411` The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators Differential Revision: https://reviews.llvm.org/D143925	2023-02-15 19:14:43 +01:00
Aart Bik	5561e17411	Revert "[mlir] Make the vast majority of integration and runner tests work on Windows" This reverts commit `161b9d741a`. REASON: cmake --build . --target check-mlir-integration Failed Tests (186): MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-addi-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-cmpi-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-compare-results-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-constants-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-max-min-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-muli-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shli-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shrsi-i16.mlir MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shrui-i16.mlir MLIR :: Integration/Dialect/Async/CPU/microbench-linalg-async-parallel-for.mlir MLIR :: Integration/Dialect/Async/CPU/microbench-scf-async-parallel-for.mlir MLIR :: Integration/Dialect/Async/CPU/test-async-parallel-for-1d.mlir MLIR :: Integration/Dialect/Async/CPU/test-async-parallel-for-2d.mlir MLIR :: Integration/Dialect/Complex/CPU/correctness.mlir MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm.mlir MLIR :: Integration/Dialect/LLVMIR/CPU/test-vector-reductions-fp.mlir MLIR :: Integration/Dialect/LLVMIR/CPU/test-vector-reductions-int.mlir MLIR :: Integration/Dialect/Linalg/CPU/matmul-vs-matvec.mlir MLIR :: Integration/Dialect/Linalg/CPU/rank-reducing-subview.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-collapse-tensor.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-conv-1d-call.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-conv-1d-nwc-wcf-call.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-conv-2d-call.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-conv-2d-nhwc-hwcf-call.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-conv-3d-call.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-conv-3d-ndhwc-dhwcf-call.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-elementwise.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-expand-tensor.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-one-shot-bufferize.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-padtensor.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-subtensor-insert-multiple-uses.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-subtensor-insert.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-tensor-e2e.mlir MLIR :: Integration/Dialect/Linalg/CPU/test-tensor-matmul.mlir MLIR :: Integration/Dialect/Memref/cast-runtime-verification.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/concatenate.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output_bf16.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output_f16.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_abs.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_binary.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_cast.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_codegen_dim.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_codegen_foreach.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex32.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex64.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex_ops.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_constant_to_sparse_tensor.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_1d_nwc_wcf.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_2d.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_nhwc_hwcf.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_3d.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_ndhwc_dhwcf.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_dyn.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_ptr.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_sparse2dense.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_sparse2sparse.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_dot.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_expand.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_file_io.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_flatten.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_foreach_slices.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_index.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_index_dense.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_1d.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_2d.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_3d.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matmul.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matrix_ops.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matvec.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_mttkrp.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_mult_elt.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_simple.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_pack.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_quantized_matmul.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_re_im.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom_prod.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reductions.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reductions_prod.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reshape.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_push_back.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort_coo.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_matmul.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_mm_fusion.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_scale.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_scf_nested.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_select.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sign.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sorted_coo.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_spmm.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_storage.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_bf16.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_c32.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_f16.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tanh.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tensor_mul.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tensor_ops.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_unary.mlir MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_vector_ops.mlir MLIR :: Integration/Dialect/SparseTensor/python/test_SDDMM.py MLIR :: Integration/Dialect/SparseTensor/python/test_SpMM.py MLIR :: Integration/Dialect/SparseTensor/python/test_elementwise_add_sparse_output.py MLIR :: Integration/Dialect/SparseTensor/python/test_output.py MLIR :: Integration/Dialect/SparseTensor/python/test_stress.py MLIR :: Integration/Dialect/SparseTensor/taco/test_MTTKRP.py MLIR :: Integration/Dialect/SparseTensor/taco/test_SDDMM.py MLIR :: Integration/Dialect/SparseTensor/taco/test_SpMM.py MLIR :: Integration/Dialect/SparseTensor/taco/test_SpMV.py MLIR :: Integration/Dialect/SparseTensor/taco/test_Tensor.py MLIR :: Integration/Dialect/SparseTensor/taco/test_scalar_tensor_algebra.py MLIR :: Integration/Dialect/SparseTensor/taco/test_simple_tensor_algebra.py MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_complex.py MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_types.py MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_unary_ops.py MLIR :: Integration/Dialect/SparseTensor/taco/test_true_dense_tensor_algebra.py MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_core.py MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_io.py MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_utils.py MLIR :: Integration/Dialect/Standard/CPU/test-ceil-floor-pos-neg.mlir MLIR :: Integration/Dialect/Standard/CPU/test_subview.mlir MLIR :: Integration/Dialect/Vector/CPU/AMX/test-mulf-full.mlir MLIR :: Integration/Dialect/Vector/CPU/AMX/test-mulf.mlir MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli-ext.mlir MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli-full.mlir MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli.mlir MLIR :: Integration/Dialect/Vector/CPU/AMX/test-tilezero-block.mlir MLIR :: Integration/Dialect/Vector/CPU/AMX/test-tilezero.mlir MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-dot.mlir MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-inline-asm-vector-avx512.mlir MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-mask-compress.mlir MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-rsqrt.mlir MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-sparse-dot-product.mlir MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-vp2intersect-i32.mlir MLIR :: Integration/Dialect/Vector/CPU/test-0-d-vectors.mlir MLIR :: Integration/Dialect/Vector/CPU/test-broadcast.mlir MLIR :: Integration/Dialect/Vector/CPU/test-compress.mlir MLIR :: Integration/Dialect/Vector/CPU/test-constant-mask.mlir MLIR :: Integration/Dialect/Vector/CPU/test-contraction.mlir MLIR :: Integration/Dialect/Vector/CPU/test-create-mask-v4i1.mlir MLIR :: Integration/Dialect/Vector/CPU/test-create-mask.mlir MLIR :: Integration/Dialect/Vector/CPU/test-expand.mlir MLIR :: Integration/Dialect/Vector/CPU/test-extract-strided-slice.mlir MLIR :: Integration/Dialect/Vector/CPU/test-flat-transpose-col.mlir MLIR :: Integration/Dialect/Vector/CPU/test-flat-transpose-row.mlir MLIR :: Integration/Dialect/Vector/CPU/test-fma.mlir MLIR :: Integration/Dialect/Vector/CPU/test-gather.mlir MLIR :: Integration/Dialect/Vector/CPU/test-index-vectors.mlir MLIR :: Integration/Dialect/Vector/CPU/test-insert-strided-slice.mlir MLIR :: Integration/Dialect/Vector/CPU/test-maskedload.mlir MLIR :: Integration/Dialect/Vector/CPU/test-maskedstore.mlir MLIR :: Integration/Dialect/Vector/CPU/test-matrix-multiply-col.mlir MLIR :: Integration/Dialect/Vector/CPU/test-matrix-multiply-row.mlir MLIR :: Integration/Dialect/Vector/CPU/test-outerproduct-f32.mlir MLIR :: Integration/Dialect/Vector/CPU/test-outerproduct-i64.mlir MLIR :: Integration/Dialect/Vector/CPU/test-print-int.mlir MLIR :: Integration/Dialect/Vector/CPU/test-realloc.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f32-reassoc.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f32.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f64-reassoc.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f64.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i32.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i4.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i64.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-si4.mlir MLIR :: Integration/Dialect/Vector/CPU/test-reductions-ui4.mlir MLIR :: Integration/Dialect/Vector/CPU/test-scan.mlir MLIR :: Integration/Dialect/Vector/CPU/test-scatter.mlir MLIR :: Integration/Dialect/Vector/CPU/test-shape-cast.mlir MLIR :: Integration/Dialect/Vector/CPU/test-shuffle.mlir MLIR :: Integration/Dialect/Vector/CPU/test-sparse-dot-matvec.mlir MLIR :: Integration/Dialect/Vector/CPU/test-sparse-saxpy-jagged-matvec.mlir MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-3d.mlir MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read.mlir MLIR :: Integration/Dialect/Vector/CPU/test-transfer-to-loops.mlir MLIR :: Integration/Dialect/Vector/CPU/test-transfer-write.mlir MLIR :: Integration/Dialect/Vector/CPU/test-transpose.mlir Testing Time: 0.29s Unsupported: 31 Passed : 5 Failed : 186 Differential Revision: https://reviews.llvm.org/D143970	2023-02-13 18:30:52 -08:00
Markus Böck	161b9d741a	[mlir] Make the vast majority of integration and runner tests work on Windows This patch contains the changes required to make the vast majority of integration and runner tests run on Windows. Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo. Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc. This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests. More specifically, following changes had to be done: * The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext` * Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd` * Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED` Some tests remain disabled on Windows as before: * In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled * Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`. Differential Revision: https://reviews.llvm.org/D143925	2023-02-13 22:24:20 +01:00
Thomas Raoux	7efdc117b1	[mlir][nvvm] Add lowering of gpu.printf to nvvm When converting to nvvm lowering gpu.printf to vprintf allows us to support printing when running on cuda. Differential Revision: https://reviews.llvm.org/D141049	2023-01-06 17:29:30 +00:00
Ivan Butygin	befd167050	[mlir][gpu] Fix cuda integration tests https://reviews.llvm.org/D138758 has added `uniform` flag to gpu reduce ops, update integration tests. Differential Revision: https://reviews.llvm.org/D140014	2022-12-14 14:01:00 +01:00
Navdeep Katel	3d35546cd1	Support `transpose` mode for `gpu.subgroup` WMMA ops Add support for loading, computing, and storing `gpu.subgroup` WMMA ops in transpose mode as well. Update the GPU to NVVM lowerings to support `transpose` mode and update integration tests as well. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D139021	2022-12-05 22:37:02 +05:30
rkayaith	13bd410962	[mlir][Pass] Include anchor op in -pass-pipeline In D134622 the printed form of a pass manager is changed to include the name of the op that the pass manager is anchored on. This updates the `-pass-pipeline` argument format to include the anchor op as well, so that the printed form of a pipeline can be directly passed to `-pass-pipeline`. In most cases this requires updating `-pass-pipeline='pipeline'` to `-pass-pipeline='builtin.module(pipeline)'`. This also fixes an outdated assert that prevented running a `PassManager` anchored on `'any'`. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D134900	2022-11-03 11:36:12 -04:00

1 2

77 Commits