Commit Graph

77 Commits

Author SHA1 Message Date
Guray Ozen
5caae72d1a [mlir][gpu] Productize test-lower-to-nvvm as gpu-lower-to-nvvm (#75775)
The `test-lower-to-nvvm` pipeline serves as the common and proper
pipeline for nvvm+host compilation, and it's used across our CUDA
integration tests.

This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm`
and moves it within `InitAllPasses.h`. The aim is to call it from
Python, also having a standardize compilation process for nvvm.
2023-12-19 08:40:46 +01:00
Sang Ik Lee
7fc792cba7 [MLIR] Enable GPU Dialect to SYCL runtime integration (#71430)
GPU Dialect lowering to SYCL runtime is driven by spirv.target_env
attached to gpu.module. As a result of this, spirv.target_env remains as
an input to LLVMIR Translation.
A SPIRVToLLVMIRTranslation without any actual translation is added to
avoid an unregistered error in mlir-cpu-runner.
SelectObjectAttr.cpp is updated to
1) Pass binary size argument to getModuleLoadFn
2) Pass parameter count to getKernelLaunchFn
This change does not impact CUDA and ROCM usage since both
mlir_cuda_runtime and mlir_rocm_runtime are already updated to accept
and ignore the extra arguments.
2023-12-05 16:55:24 -05:00
Jakub Kuderski
7eccd52842 Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423)"
This reverts commit dd09221a29 and relands
https://github.com/llvm/llvm-project/pull/73423.

* Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.
2023-11-27 11:38:18 -05:00
Guray Ozen
edf5cae739 [mlir][gpu] Support Cluster of Thread Blocks in gpu.launch_func (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
                clusters in (%1, %0, %0) // <-- Optional
                blocks in (%0, %0, %0)
                threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id  x
%cidY = gpu.cluster_id  y
%cidZ = gpu.cluster_id  z

%cdimX = gpu.cluster_dim  x
%cdimY = gpu.cluster_dim  y
%cdimZ = gpu.cluster_dim  z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR. 

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
2023-11-27 11:05:07 +01:00
Guray Ozen
51916f0c92 [mlir] Add sm_90a GEMM test 128x128x128 (F32 += F16 * F16) (#69913)
This PR adds a test that performs GEMM 128x128x128 (F32 += F16 * F16).
It uses `sm_90a` features in NVGPU dialect.

Simplified algorithm is as follows:

**Prologue** 
```
mgroup = mbarriers.init x 2
tma.load ... shmem_buffer_lhs<0 x 128 x 64>
tma.load ... shmem_buffer_rhs<0 x 64 x 64>
tma.load ... shmem_buffer_rhs<0 x 64 x 64>
mbarrier.expect_tx 32768
tma.load ... shmem_buffer_lhs<1 x 128 x 64>
tma.load ... shmem_buffer_rhs<1 x 64 x 64>
tma.load ... shmem_buffer_rhs<1 x 64 x 64>
mbarrier.expect_tx 32768
```
**Mainloop**
```
matrixD = 
 for(i = 0;...2) {   
   mbarrier.try_wait [i]
   lhs = shmem_buffer_lhs<pipe x 128 x 64>
   rhs = shmem_buffer_rhs<pipe x 64 x 128>
   yield nvgpu.warpgroup.mma (lhs, rhs)

//   Expanded : nvgpu.warpgroup.mma [128][128]+=[128][64]*[64][128]
//                  wgmma.m64n128k16(A[0:64][0:16]  *  B[0:16][0:128])
//                  wgmma.m64n128k16(A[0:64][16:32] *  B[16:32][0:128])
//                  wgmma.m64n128k16(A[0:64][32:48] *  B[32:48][0:128])
//                  wgmma.m64n128k16(A[0:64][48:64] *  B[48:64][0:128])
//                  wgmma.m64n128k16(A[64:128][0:16]  *  B[0:16][0:128])
//                  wgmma.m64n128k16(A[64:128][16:32] *  B[16:32][0:128])
//                  wgmma.m64n128k16(A[64:128][32:48] *  B[32:48][0:128])
//                  wgmma.m64n128k16(A[64:128][48:64] *  B[48:64][0:128])
```

**Epilogue** 
```
//reg->shmem
warpgroup.mma.store matrixD, shmem
//shmem->glbmem
parallel-for(i=0;...128)
 parallel-for(j=0;...128)
   store shmem, globalmem
```
2023-11-10 16:53:43 +01:00
Guray Ozen
a00caad6bf [mlir] Add sm_90a GEMM test 128x128x128 (F32 =F16*F16) with predicate (#70028)
PR #69913 added a GEMM test (128x128x128 F32 += F16 * F16) with
if-statement. This PR adds the same test using predicates in PTX.
Predicate support is enabled using _BasicPtxBuilderInterface_
`(nvgpu.opcode ..., predicate = %pred)`.

The predicate condition is computed in `Step 2. [GPU] Elect fastest
thread in CTA` inspired by cutlass. It is as follows:
```
lane_predicate = nvvm.elect.sync
warp_idx = __shfl_sync(0xffffffff, threadIdx.x / 32, 0)
warp_idx_in_warp_group = warp_idx % 4
predicate = (lane_predicate & warp_idx_in_warp_group)
```

Depends on #70027 #69934 #69935 #69584
2023-11-10 16:52:00 +01:00
Guray Ozen
f4d59522cf [mlir] Fix sm90 test for new verifier
#70923 improved verifier. The verifier caught that the tensor map type in the tma descriptor in this test isn't correct. The program was working correctly anway since the offset is calculated correctly.

This work fixes the test.
2023-11-10 16:50:01 +01:00
Christian Ulmann
52491c99fa [MLIR][LLVM] Remove typed pointer remnants from integration tests (#71208)
This commit removes all LLVM dialect typed pointers from the integration
tests. Typed pointers have been deprecated for a while now and it's
planned to soon remove them from the LLVM dialect.

Related PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
2023-11-03 21:21:25 +01:00
Christian Ulmann
6cc1c2c6f3 [MLIR][LLVM] Remove last remants of use-opaque-pointers from tests (#71076)
This commit removes the last remnants of `use-opaque-pointers` from the
mlir tests. Two of the tests seem to be disabled, while the CUDA one is
an integration test that didn't trigger a buildbot failure.
2023-11-02 21:25:05 +01:00
Guray Ozen
f7dc26cab2 [mlir] Fixed typo in type (128x64 -> 64x128) in TMA load test (#70022)
The test was meant to check `64x128xf16` as the contiguous dimension
exceeds the cache line (128b). TMA requires cache line-aligned loads, so
loading 64x128 can be done with two 64x64 loads, as documented in the
test.

However, there was a typo in the type, which was `memref<128x64xf16>`
instead of the correct `memref<64x128xf16>`. This PR corrects the issue
and updates the verification.
2023-10-26 11:02:54 +03:00
Guray Ozen
f8058a37ae [mlir] Fix nvvm integration tests build error (#70113)
#69934 broke integration tests that rely on the
kernel-bare-ptr-calling-convention and host-bare-ptr-calling-convention
flags. This PR brings these flags.

Also the kernel-index-bitwidth flag is removed, as kernel pointer size
depends on the host. Separating host (64-bit) and kernel (32-bit) is not
viable.
2023-10-24 22:32:46 +02:00
Oleksandr "Alex" Zinenko
e4384149b5 [mlir] use transform-interpreter in test passes (#70040)
Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.

Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.

Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
2023-10-24 16:12:34 +02:00
Guray Ozen
afe400620f [MLIR] Use test-lower-to-nvvm for sm_90 Integration Tests on GitHub (#68184)
This PR enables `test-lower-to-nvvm` pass pipeline for the integration
tests for NVIDIA sm_90 architecture.

This PR adjusts `test-lower-to-nvvm` pass in two ways: 

1) Calls `createConvertNVGPUToNVVMPass` before the outlining process.
This particular pass is responsible for generating both device and host
code. On the host, it calls the CUDA driver to build the TMA descriptor
(`cuTensorMap`).

2) Integrates the `createConvertNVVMToLLVMPass` to generate PTXs for
NVVM Ops.
2023-10-04 09:50:48 +02:00
Guray Ozen
f9149a34d9 [mlir] adapt sm_90 integration test mbarrier.group (#67423)
#65951 improved mbarrier supports. This PR adapts that usage in the
integration test.
2023-09-26 15:50:17 +02:00
Guray Ozen
f4fb03937a [MLIR] Cleanup Pass Pipeline in sm_90 Integration Tests (#67416)
MLIR has begun supporting many features of Nvidia's sm_90 architecture,
and new tests have been added for it. Although the tests worked well,
there were redundancies in the pipeline. This PR cleans up unnecessary
passes.
2023-09-26 14:21:22 +02:00
Guray Ozen
9420fc4956 [MLIR] SM_90 integratation test of TMA 128x64xf16 and 64x64xf16 with 128b Swizzling (#65954)
The #65953 added a test `128x64xf16` that does a single TMA load. This
PR adds more complex test that does 2 additional TMA loads with 128B
Swizzling:
```
    TMA Load: Matrix-A[0:128][0:64]
    TMA Load: Matrix-B[0:64][0:64]
    TMA Load: Matrix-B[64:128][0:64]
```

The program tests the loaded data for Matrix-B.
2023-09-15 09:30:15 +02:00
Fabian Mora
5093413a50 [mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220)
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.

2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.

3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.

4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
2023-09-14 18:00:27 -04:00
frgossen
1cddbf8cf5 Revert Add host-supports-nvptx requirement to lit tests (#66102 and #66129) (#66225) 2023-09-13 12:20:38 -04:00
Guray Ozen
ba81cd10d6 [MLIR] Fix the tma_load test (#66208)
clang was used for local testing. The PR changes it to `mlir-cpu-runner`
2023-09-13 15:33:42 +02:00
Guray Ozen
38faecc692 [MLIR] SM_90 integration test of TMA with 128b Swizzling (#65953)
An integration test for the 128b Swizzling TMA.

TMA with 128B Swizzle loads data as follows (each numbered cell is 16
bytes). The program tests this pattern for `128x64xf16` type.

```
|-------------------------------|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 0 | 3 | 2 | 5 | 4 | 7 | 6 |
| 2 | 3 | 0 | 1 | 6 | 7 | 4 | 5 |
| 3 | 2 | 1 | 0 | 7 | 6 | 5 | 4 |
| 4 | 5 | 6 | 7 | 0 | 1 | 2 | 3 |
| 5 | 4 | 7 | 6 | 1 | 0 | 3 | 2 |
| 6 | 7 | 4 | 5 | 2 | 3 | 0 | 1 |
|-------------------------------|
| ... pattern repeats ...       |
|-------------------------------|
```
2023-09-13 15:22:48 +02:00
frgossen
a3b894287f Add host-supports-nvptx requirement to lit tests (#66102) 2023-09-12 12:21:36 -04:00
Guray Ozen
ad4411230a [MLIR] Make SM_90 integration tests use TargetAttr (#65926)
The 'TargetAttr' workflow was recently introduced to serialization for
'MLIR->LLVM->PTX'. #65857 removes previous passes (gpu::Serialization*
passes) because they are duplicates.

This PR removes the use of gpu::Serialization* passes in SM_90
integration tests, and enables the 'TargetAttr' workflow.

It also moves the transform dialect specific test to a new folder.
2023-09-11 14:34:03 +02:00
Fabian Mora
119c489cc1 Reland [mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (llvm#65768)
The revert happened due to a build bot failure that threw 'CUDA_ERROR_UNSUPPORTED_PTX_VERSION'.
The failure's root cause was a pass using "+ptx76" for compilation and an old CUDA driver
on the bot. This commit relands the patch with "+ptx60".

Original Gh PR: #65768
Original commit message:
    Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
    using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
    to use the new compilation workflow to simplify the introduction of
    future tests.

    The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
    allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
    pass.
2023-09-09 12:45:21 +00:00
Fabian Mora
2c596ea951 Revert "[mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (#65768) (#65848)
This reverts commit d21b67293b.
2023-09-09 07:14:19 -04:00
Fabian Mora
d21b67293b [mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (#65768)
Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
to use the new compilation workflow to simplify the introduction of
future tests.

The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
pass.
2023-09-09 07:03:38 -04:00
Fabian Mora
5db8990cda [mlir][test][gpu] Migrate ROCM tests to the TargetAttr compilation workflow (#65805)
Migrate tests referencing `gpu-to-hsaco` to the new compilation workflow
using TargetAttrs.
2023-09-08 21:26:43 -04:00
Guray Ozen
8031a088eb [MLIR] Run the TMA test for sm_90
TMA was introduced to MLIR, however, it needed `ptxas` compiler. Recent work D154117 introduced that!

This work runs the existing integration test.

Reviewed By: fmorac

Differential Revision: https://reviews.llvm.org/D159347
2023-09-04 18:15:37 +02:00
Benjamin Maxwell
f36e909da0 [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
Reland of the original patch after updating the Python binding tests,
a few CUDA/GPU MLIR tests, and ensuring the assembly format is
round-trippable.

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

  vector.print punctuation <comma>

lowers to

  llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Reviewed By: awarzynski, c-rhodes, aartbik

Differential Revision: https://reviews.llvm.org/D156519
2023-08-11 09:29:54 +00:00
Mehdi Amini
bf57c1fa1e Revert "[tests] Fix gpu-to-cubin.mlir (NFC)"
This reverts commit 9ef6cffbb0.

This was an attempt to fix the bot for a commit I reverted since.
It isn't necessary anymore.
2023-08-09 19:55:19 -07:00
Mehdi Amini
1b272d21c8 Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors"
This reverts commit 490dae26cb.

Bot is broken, seems like there is a problem of ambiguity in the parser.
2023-08-09 19:37:01 -07:00
Mehdi Amini
5033ec0a9e Revert "[tests] Fix gpu-to-cubin.mlir (NFC)"
This reverts commit 4434bc5508.

It does not make sense to introduce more passes to fix a parsing issue.
More importantly: it didn't fix the test!
2023-08-09 19:37:01 -07:00
Anlun Xu
9ef6cffbb0 [tests] Fix gpu-to-cubin.mlir (NFC)
Differential Revision: https://reviews.llvm.org/D157561
2023-08-09 16:25:08 -07:00
Benjamin Maxwell
4434bc5508 [tests] Fix gpu-to-cubin.mlir (NFC) 2023-08-09 12:02:56 +00:00
Benjamin Maxwell
490dae26cb [mlir][VectorOps] Use SCF for vector.print and allow scalable vectors
Reland of the original patch after updating the Python binding tests and
a few CUDA/GPU MLIR tests.

This patch splits the lowering of vector.print into first converting
an n-D print into a loop of scalar prints of the elements, then a second
pass that converts those scalar prints into the runtime calls. The
former is done in VectorToSCF and the latter in VectorToLLVM.

The main reason for this is to allow printing scalable vector types,
which are not possible to fully unroll at compile time, though this
also avoids fully unrolling very large vectors.

To allow VectorToSCF to add the necessary punctuation between vectors
and elements, a "punctuation" attribute has been added to vector.print.
This abstracts calling the runtime functions such as printNewline(),
without leaking the LLVM details into the higher abstraction levels.
For example:

  vector.print <comma>

lowers to

  llvm.call @printComma() : () -> ()

The output format and runtime functions remain the same, which avoids
the need to alter a large number of tests (aside from the pipelines).

Reviewed By: awarzynski, c-rhodes, aartbik

Differential Revision: https://reviews.llvm.org/D156519
2023-08-09 11:47:18 +00:00
Nicolas Vasilache
a3cd2eeb2d [mlir][nvgpu] Add a nvgpu.rewrite_copy_as_tma transform operation.
This revision adds support for direct lowering of a linalg.copy on buffers between global and shared memory to a tma async load + synchronization operations.
This uses the recently introduced Hopper NVVM and NVGPU abstraction to connect things end to end.

Differential Revision: https://reviews.llvm.org/D157087
2023-08-08 12:07:59 +00:00
Guray Ozen
ca74ad8829 [mlir] Nvidia Hopper TMA load integration test
This work introduces sm90 integration testing and adds a single test.

Depends on : D155825 D155680 D155563 D155453

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155838
2023-07-28 21:04:07 +00:00
Nicolas Vasilache
7e78ecfe10 [mlir][cuda] Add a test-lower-to-nvvm catchall passpipeline.
This mirrors the test-lower-to-llvm pass pipeline that provides some sanity when running e2e examples.

One peculiarity of the GPU pipeline is that we want to allow 32b indexing in kernels.
This is currently not straightforward as there are dependencies between passes.
This new test pass orders passes in a way that connects end-to-end.

Differential Revision: https://reviews.llvm.org/D155463
2023-07-17 15:18:33 +00:00
Nicolas Vasilache
13f4e889c5 Revert "Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions""
This reverts commit 6506692fe6.

Differential Revision: https://reviews.llvm.org/D153845
2023-06-28 06:50:05 +00:00
Mehdi Amini
6506692fe6 Revert "[mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite." and "[mlir][Transform] Introduce nvgpu transform extensions"
This reverts commit 40deed40ae.
and commit 1660f2174d.

The buildbot is broken, the two tests aren't passing.
2023-06-27 08:46:18 +02:00
Nicolas Vasilache
1660f2174d [mlir][Transform] Add support for mma.sync m16n8k16 f16 rewrite.
This PR adds support for the m16n8k16 f16 case.
At this point, the support is mostly mechanical and could be Tablegen'd to all cases.
Until then, this can be populated as needed on a case-by-case basis.

Depends on: D153420

Differential Revision: https://reviews.llvm.org/D153428
2023-06-26 16:46:42 +00:00
Nicolas Vasilache
40deed40ae [mlir][Transform] Introduce nvgpu transform extensions
Mapping to NVGPU operations such as mma.sync with mixed precision and ldmatrix with transposes and
various data types involves complex matchings from low-level IR.
This is akin to raising complex patterns after unnecessarily having lost structural information.
To avoid such unnecessary complexity, introduce a direct mapping step from a matmul on memrefs
to distributed NVGPU vector abstractions.
In this context, mapping to specific mma.sync operations is trivial and consists in simply
translating the documentation into indexing expressions.

Correctness is demonstrated with an end-to-end integration test.

Differential Revision: https://reviews.llvm.org/D153420
2023-06-26 16:21:28 +00:00
Uday Bondhugula
27ccf0f407 [MLIR] Provide bare pointer memref lowering option on gpu-to-nvvm pass
Provide the bare pointer memref lowering option on gpu-to-nvvm pass.
This is needed whenever we lower memrefs on the host function side and
the kernel calls on the host-side (gpu-to-llvm) with the bare ptr
convention. The GPU module side of the lowering should also "align" and
use the bare pointer convention.

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D152480
2023-06-18 22:53:50 +05:30
Tobias Hieta
f9008e6366 [NFC][Py Reformat] Reformat python files in mlir subdir
This is an ongoing series of commits that are reformatting our
Python code.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Differential Revision: https://reviews.llvm.org/D150782
2023-05-26 08:05:40 +02:00
Markus Böck
9048ea28da Reland "[mlir] Make the vast majority of intgration and runner tests work on Windows"
This reverts commit 5561e17411

The logic was moved from cmake into lit fixing the issue that lead to the revert and potentially others with multi-config cmake generators

Differential Revision: https://reviews.llvm.org/D143925
2023-02-15 19:14:43 +01:00
Aart Bik
5561e17411 Revert "[mlir] Make the vast majority of integration and runner tests work on Windows"
This reverts commit 161b9d741a.

REASON:

cmake --build . --target check-mlir-integration

Failed Tests (186):
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-addi-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-cmpi-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-compare-results-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-constants-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-max-min-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-muli-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shli-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shrsi-i16.mlir
  MLIR :: Integration/Dialect/Arith/CPU/test-wide-int-emulation-shrui-i16.mlir
  MLIR :: Integration/Dialect/Async/CPU/microbench-linalg-async-parallel-for.mlir
  MLIR :: Integration/Dialect/Async/CPU/microbench-scf-async-parallel-for.mlir
  MLIR :: Integration/Dialect/Async/CPU/test-async-parallel-for-1d.mlir
  MLIR :: Integration/Dialect/Async/CPU/test-async-parallel-for-2d.mlir
  MLIR :: Integration/Dialect/Complex/CPU/correctness.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm-vector.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/X86/test-inline-asm.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/test-vector-reductions-fp.mlir
  MLIR :: Integration/Dialect/LLVMIR/CPU/test-vector-reductions-int.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/matmul-vs-matvec.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/rank-reducing-subview.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-collapse-tensor.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-1d-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-1d-nwc-wcf-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-2d-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-2d-nhwc-hwcf-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-3d-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-conv-3d-ndhwc-dhwcf-call.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-elementwise.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-expand-tensor.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-one-shot-bufferize.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-padtensor.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-subtensor-insert-multiple-uses.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-subtensor-insert.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-tensor-e2e.mlir
  MLIR :: Integration/Dialect/Linalg/CPU/test-tensor-matmul.mlir
  MLIR :: Integration/Dialect/Memref/cast-runtime-verification.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/concatenate.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output_bf16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/dense_output_f16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_abs.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_binary.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_cast.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_codegen_dim.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_codegen_foreach.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex32.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex64.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_complex_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_constant_to_sparse_tensor.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_1d_nwc_wcf.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_2d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_2d_nhwc_hwcf.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_3d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conv_3d_ndhwc_dhwcf.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_dyn.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_ptr.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_sparse2dense.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_conversion_sparse2sparse.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_dot.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_expand.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_file_io.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_filter_conv2d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_flatten.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_foreach_slices.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_index.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_index_dense.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_1d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_2d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_insert_3d.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matmul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matrix_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_matvec.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_mttkrp.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_mult_elt.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_reduction.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_out_simple.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_pack.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_quantized_matmul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_re_im.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom_prod.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reductions.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reductions_prod.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_reshape.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_push_back.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort_coo.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_matmul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sampled_mm_fusion.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_scale.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_scf_nested.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_select.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sign.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sorted_coo.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_spmm.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_storage.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_bf16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_c32.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_sum_f16.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tanh.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tensor_mul.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_tensor_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_unary.mlir
  MLIR :: Integration/Dialect/SparseTensor/CPU/sparse_vector_ops.mlir
  MLIR :: Integration/Dialect/SparseTensor/python/test_SDDMM.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_SpMM.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_elementwise_add_sparse_output.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_output.py
  MLIR :: Integration/Dialect/SparseTensor/python/test_stress.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_MTTKRP.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_SDDMM.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_SpMM.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_SpMV.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_Tensor.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_scalar_tensor_algebra.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_simple_tensor_algebra.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_complex.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_types.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_tensor_unary_ops.py
  MLIR :: Integration/Dialect/SparseTensor/taco/test_true_dense_tensor_algebra.py
  MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_core.py
  MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_io.py
  MLIR :: Integration/Dialect/SparseTensor/taco/unit_test_tensor_utils.py
  MLIR :: Integration/Dialect/Standard/CPU/test-ceil-floor-pos-neg.mlir
  MLIR :: Integration/Dialect/Standard/CPU/test_subview.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-mulf-full.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-mulf.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli-ext.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli-full.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-muli.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-tilezero-block.mlir
  MLIR :: Integration/Dialect/Vector/CPU/AMX/test-tilezero.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-dot.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-inline-asm-vector-avx512.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-mask-compress.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-rsqrt.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-sparse-dot-product.mlir
  MLIR :: Integration/Dialect/Vector/CPU/X86Vector/test-vp2intersect-i32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-0-d-vectors.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-broadcast.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-compress.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-constant-mask.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-contraction.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-create-mask-v4i1.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-create-mask.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-expand.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-extract-strided-slice.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-flat-transpose-col.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-flat-transpose-row.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-fma.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-gather.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-index-vectors.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-insert-strided-slice.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-maskedload.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-maskedstore.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-matrix-multiply-col.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-matrix-multiply-row.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-outerproduct-f32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-outerproduct-i64.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-print-int.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-realloc.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f32-reassoc.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f64-reassoc.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-f64.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i32.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i4.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-i64.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-si4.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-reductions-ui4.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-scan.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-scatter.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-shape-cast.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-shuffle.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-sparse-dot-matvec.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-sparse-saxpy-jagged-matvec.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read-3d.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-read.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-to-loops.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transfer-write.mlir
  MLIR :: Integration/Dialect/Vector/CPU/test-transpose.mlir

Testing Time: 0.29s
  Unsupported:  31
  Passed     :   5
  Failed     : 186

Differential Revision: https://reviews.llvm.org/D143970
2023-02-13 18:30:52 -08:00
Markus Böck
161b9d741a [mlir] Make the vast majority of integration and runner tests work on Windows
This patch contains the changes required to make the vast majority of integration and runner tests run on Windows.
Historically speaking, the JIT support for Windows has been lacking behind, but recent versions of ORC JIT have now caught up and works for basically all examples in repo.

Sadly due to these tests previously not working on Windows, basically all of them are making unix-like assumptions about things like filenames, paths, shell syntax etc.
This patch fixes all these issues in one big swoop and enables Windows support for the vast majority of integration tests.

More specifically, following changes had to be done:
* The various JIT runners used paths to the runtime libraries that assumed a Unix toolchain layout and filenames. I abstracted the specific path and filename of these runtime libraries away by making the paths to the runtime libraries be passed from cmake into lit. This now also allows a much more convenient syntax: `--shared-libs=%mlir_c_runner_utils` instead of `--shared-libs=%mlir_lib_dir/lib/libmlir_c_runner_utils%shlibext`
* Some tests using python set environment variables using the `ENV=VALUE cmd` format. This works on Unix, but on Windows it has to prefixed using `env ENV=VALUE cmd`
* Some tests used C functions that are simply not available or exported on Windows (`fabsf`, `aligned_alloc`). These tests have either been adjusted or explicitly marked as `UNSUPPORTED`

Some tests remain disabled on Windows as before:
* In SparseTensor some tests have non-trivial logic for finding the runtime libraries which seems to be required for the use of emulators. I do not have the time to port these so I simply kept them disabled
* Some tests requiring special hardware which I simply cannot test remain disabled on Windows. These include usage of AVX512 or AMX

The tests for `mlir-vulkan-runner` and `mlir-spirv-runner` all work now as well and so do the vast majority of `mlir-cpu-runner`.

Differential Revision: https://reviews.llvm.org/D143925
2023-02-13 22:24:20 +01:00
Thomas Raoux
7efdc117b1 [mlir][nvvm] Add lowering of gpu.printf to nvvm
When converting to nvvm lowering gpu.printf to vprintf allows us to
support printing when running on cuda.

Differential Revision: https://reviews.llvm.org/D141049
2023-01-06 17:29:30 +00:00
Ivan Butygin
befd167050 [mlir][gpu] Fix cuda integration tests
https://reviews.llvm.org/D138758 has added `uniform` flag to gpu reduce ops, update integration tests.

Differential Revision: https://reviews.llvm.org/D140014
2022-12-14 14:01:00 +01:00
Navdeep Katel
3d35546cd1 Support transpose mode for gpu.subgroup WMMA ops
Add support for loading, computing, and storing `gpu.subgroup` WMMA ops
in transpose mode as well. Update the GPU to NVVM lowerings to support
`transpose` mode and update integration tests as well.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D139021
2022-12-05 22:37:02 +05:30
rkayaith
13bd410962 [mlir][Pass] Include anchor op in -pass-pipeline
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.

This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D134900
2022-11-03 11:36:12 -04:00