Commit Graph

517 Commits

Author SHA1 Message Date
Jakub Kuderski
971b852546 [mlir][NFC] Simplify type checks with isa predicates (#87183)
For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.
2024-04-01 11:40:09 -04:00
Justin Fargnoli
35d55f2894 [NFC][mlir] Reorder declarePromisedInterface() operands (#86628)
Reorder the template operands of `declarePromisedInterface()` to match
`declarePromisedInterfaces()`.
2024-03-27 10:30:17 -07:00
Oleksandr "Alex" Zinenko
5a9bdd85ee [mlir] split transform interfaces into a separate library (#85221)
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.

Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.

As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
2024-03-20 22:15:17 +01:00
Justin Fargnoli
513cdb8222 [mlir] Declare promised interfaces for all dialects (#78368)
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.

Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
2024-03-15 20:23:20 -07:00
Justin Lebar
fab2bb8bfd Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678)
For some reason this was missing from STLExtras.
2024-03-10 20:00:13 -07:00
Fangrui Song
886ecb3078 [mlir] Remove setRelaxELFRelocations. NFC
The option is always true (see 2aedfdd9b8)
and the MCAsmInfo option is going away in favor of MCTargetOptions.
2024-03-06 23:12:40 -08:00
Ingo Müller
6e27dd47e1 [mlir][gpu] Replace MLIR_GPU_TO_HSACO_PASS_ENABLE by more generic one. (#84001)
This is another follow-up of #83004. The PR replaces the macro
`MLIR_GPU_TO_HSACO_PASS_ENABLE` with the more generic macro
`MLIR_ENABLE_ROCM_CONVERSIONS`. Until now, the former has been defined
if and only if the latter evaluated to true in CMake. However, the
former was not defined when the latter evaluated to false, in which case
a warning was raised if compiled with `-Wundef`. Using a single macro
relies on the `#cmakedefine01` mechanism that ensures the macro is
always set to either 0 or 1.
2024-03-06 09:53:30 +01:00
Ingo Müller
f3be842728 [mlir] Expose MLIR_ROCM_CONVERSIONS_ENABLED in mlir-config.h. (#83977)
This is a follow up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_ROCM_CONVERSIONS_ENABLED`, which the CMake files previously
defined manually.
2024-03-05 15:37:14 +01:00
Ingo Müller
5f2097dbed [mlir] Expose MLIR_CUDA_CONVERSIONS_ENABLED in mlir-config.h. (#83004)
That macro was not defined in some cases and thus yielded warnings if
compiled with `-Wundef`. In particular, they were not defined in the
BUILD files, so the GPU targets were broken when built with Bazel. This
commit exposes mentioned CMake variable through mlir-config.h and uses
the macro that is introduced with the same name. This replaces the macro
MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined
manually.
2024-02-28 14:48:40 +01:00
Matthias Springer
492e8ba038 [mlir] Fix memory leaks after #81759 (#82762)
This commit fixes memory leaks that were introduced by #81759. The way
ops and blocks are erased changed slightly.

The leaks were caused by an incorrect implementation of op builders:
blocks must be created with the supplied builder object. Otherwise, they
are not properly tracked by the dialect conversion and can leak during
rollback.
2024-02-23 14:28:57 +01:00
Fabian Mora
f204aee1b9 [mlir][GPU] Remove the SerializeToCubin pass (#82486)
The `SerializeToCubin` pass was deprecated in September 2023 in favor of
GPU compilation attributes; see the [GPU
compilation](https://mlir.llvm.org/docs/Dialects/GPU/#gpu-compilation)
section in the `gpu` dialect MLIR docs.
This patch removes `SerializeToCubin` from the repo.
2024-02-21 20:47:19 -05:00
Thomas Preud'homme
76e79b0bef Fix duplicate mapping detection in gpu::setMappingAttr() (#77499) 2024-02-20 09:54:00 +00:00
Guray Ozen
d7f59c8fb8 [mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#81489)
This PR moves lowering of math dialect later in the pipeline. Because
math dialect is lowered correctly by createConvertGpuOpsToNVVMOps for
GPU target, and it needs to run it first.

Reland #78556
2024-02-13 08:31:42 +01:00
Benjamin Kramer
98dbc688de Revert "[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556)"
This reverts commit 74bf0b1cd9. The test
always fails.

 | mlir/test/Dialect/GPU/test-nvvm-pipeline.mlir:23:16: error: CHECK-PTX: expected string not found in input
 |  // CHECK-PTX: __nv_expf

https://lab.llvm.org/buildbot/#/builders/61/builds/53789
2024-01-31 17:41:21 +01:00
Guray Ozen
74bf0b1cd9 [mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556)
This PR moves lowering of math dialect later in the pipeline. Because
math dialect is lowered correctly by `createConvertGpuOpsToNVVMOps` for
GPU target, and it needs to run it first.
2024-01-31 15:24:32 +01:00
Saiyedul Islam
082f87c9d4 [AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.
2024-01-23 17:08:18 +05:30
Mehdi Amini
e730f76005 Apply clang-tidy fixes for llvm-qualified-auto in DecomposeMemrefs.cpp (NFC) 2024-01-17 08:51:41 -08:00
Mehdi Amini
9a2a6a728a Apply clang-tidy fixes for readability-identifier-naming in Utils.cpp (NFC) 2024-01-17 08:51:41 -08:00
Mehdi Amini
74cf9bcf71 Apply clang-tidy fixes for performance-unnecessary-value-param in Utils.cpp (NFC) 2024-01-17 08:51:41 -08:00
Matthias Springer
5fcf907b34 [mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Mehdi Amini
d8ed736c0e Apply clang-tidy fixes for bugprone-macro-parentheses in Utils.cpp (NFC) 2024-01-15 20:59:13 -08:00
Fabian Mora
5b4f2b906b [mlir][gpu] Add an offloading handler attribute to gpu.module (#78047)
This patch adds an optional offloading handler attribute to
the`gpu.module` op. This attribute will be used during
`gpu-module-to-binary` pass to override the offloading handler used in
the `gpu.binary` op.
2024-01-15 16:58:10 -05:00
Fabian Mora
a1eaed7a21 [mlir][gpu] Fix GPU YieldOP format and traits (#78006)
This patch adds assembly format to `gpu::YieldOp`. It also adds the
return like trait, to make it compatible with `RegionBranchOpInterface`.
2024-01-14 21:19:20 -05:00
Guray Ozen
5b33cff397 [mlir][gpu] Add Support for Cluster of Thread Blocks in gpu.launch (#76924) 2024-01-06 11:17:01 +01:00
Guray Ozen
ace69e6b94 [mlir][gpu] Improve gpu-lower-to-nvvm-pipeline Documentation (#77062)
This PR improves the documentation for the `gpu-lower-to-nvvm-pipeline`
(as it was remaning item for #75775)

- Changes pipeline `gpu-lower-to-nvvm` -> `gpu-lower-to-nvvm-pipeline`
- Adds a section in GPU Dialect in website. It clarifies the pipeline's
functionality in lowering primary dialects to NVVM targets.
2024-01-05 12:51:25 +01:00
Jakub Kuderski
c0345b4648 [mlir][gpu] Add subgroup_reduce to shuffle lowering (#76530)
This supports both the scalar and the vector multi-reduction cases.
2024-01-02 16:14:22 -05:00
Jakub Kuderski
2af186f9bd [mlir][gpu] Add patterns to break down subgroup reduce (#76271)
The new patterns break down subgroup reduce ops with vector values into
a sequence of subgroup reductions that fit the native shuffle size. The
maximum/native shuffle size is parametrized.

The overall goal is to be able to perform multi-element reductions with
a sequence of `gpu.shuffle` ops.
2023-12-28 14:39:46 -05:00
Jakub Kuderski
72003adf6b [mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015)
Each vector element is reduced independently, which is a form of
multi-reduction.

The plan is to allow for gradual lowering of multi-reduction that
results in fewer `gpu.shuffle` ops at the end:
1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d
`gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32

For example we can perform 2 independent f16 reductions with a series of
`gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.
2023-12-21 11:55:43 -05:00
Krzysztof Parzyszek
8b231d73bd [mlir] Fix build break with shared libraries
When project components are built as separate shared libraries, a lot
of errors appear about undefined symbols, e.g.

```
/usr/bin/ld: CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o
: in function `(anonymous namespace)::buildCommonPassPipeline(mlir::OpPa
ssManager&, (anonymous namespace)::GPUToNVVMPipelineOptions const&)':
GPUToNVVMPipeline.cpp:(.text._ZN12_GLOBAL__N_123buildCommonPassPipelineE
RN4mlir13OpPassManagerERKNS_24GPUToNVVMPipelineOptionsE+0xa5): undefined
 reference to `mlir::createConvertLinalgToLoopsPass()'
```

Add the necessary dependencies to Dialect/GPU/Pipelines/CMakeLists.txt
2023-12-20 12:49:25 -06:00
Jakub Kuderski
560564f51c [mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901)
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.

Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.

Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.

Issue: https://github.com/llvm/llvm-project/issues/72354
2023-12-20 00:14:43 -05:00
Jakub Kuderski
9f74e6e615 [mlir][vector][gpu] Use makeArithReduction in lowering patterns. NFC. (#75952)
Use the `vector::makeArithReduction` helper as the source-of-truth of
reduction to arith ops lowering.
2023-12-19 19:04:27 -05:00
Guray Ozen
5caae72d1a [mlir][gpu] Productize test-lower-to-nvvm as gpu-lower-to-nvvm (#75775)
The `test-lower-to-nvvm` pipeline serves as the common and proper
pipeline for nvvm+host compilation, and it's used across our CUDA
integration tests.

This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm`
and moves it within `InitAllPasses.h`. The aim is to call it from
Python, also having a standardize compilation process for nvvm.
2023-12-19 08:40:46 +01:00
Fabian Mora
419c45a325 [mlir][gpu] Fix crash in gpu-module-to-binary (#75477)
This patch fixes the error in issue #75434. The crash was being caused
by not checking for a lack of target attributes in a GPU module. It's
now considered an error to invoke the pass with a GPU module with no
target attributes.
2023-12-14 14:03:10 -05:00
Kazu Hirata
88d319a29f [mlir] Use StringRef::{starts,ends}_with (NFC)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 22:58:30 -08:00
Adrian Kuegel
8a5b448fa0 [mlir][GPU] Apply ClangTidy fixes
Use const reference in loops if possible.
2023-12-12 07:34:03 +00:00
Mehdi Amini
6402706a00 [mlir] Fix the link of libcuda.so in MLIRGPUTransforms to not use fully qualified path (#74018)
At the moment we find libcuda.so in a path like:

  /usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so

and directly add this to `target_link_libraries`. The problem is that
our installed MLIR package will include the full path to the library,
and a user downstream when including our cmake installed package will
inherit this full path.

We're changing this to instead

 -L /usr/local/cuda/targets/x86_64-linux/lib/stubs/ -lcuda
2023-11-30 19:30:05 -08:00
Jakub Kuderski
7eccd52842 Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423)"
This reverts commit dd09221a29 and relands
https://github.com/llvm/llvm-project/pull/73423.

* Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.
2023-11-27 11:38:18 -05:00
Jakub Kuderski
dd09221a29 Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423)"
This reverts commit e0aac8c88d.

I'm seeing some nvidia integration test failures:
https://lab.llvm.org/buildbot/#/builders/61/builds/52334.
2023-11-27 11:29:23 -05:00
Jakub Kuderski
e0aac8c88d [mlir][gpu] Align reduction operations with vector combining kinds (#73423)
The motivation for this change is explained in
https://github.com/llvm/llvm-project/issues/72354.

Before this change, we could not tell between signed/unsigned
minimum/maximum and NaN treatment for floating point values.

The mapping of old reduction operations to the new ones is as follows:
*  `min` --> `minsi` for ints, `minf` for floats
*  `max` --> `maxsi` for ints, `maxf` for floats

New reduction kinds not represented in the old enum: `minui`, `maxui`,
`minimumf`, `maximumf`.

As a next step, I would like to have a common definition of combining
kinds used by the `vector` and `gpu` dialects. Separately, the GPU to
SPIR-V lowering does not yet properly handle zero and NaN values -- the
behavior of floating point min/max group reductions is not specified by
the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. 

Issue: https://github.com/llvm/llvm-project/issues/72354
2023-11-27 11:19:20 -05:00
Guray Ozen
edf5cae739 [mlir][gpu] Support Cluster of Thread Blocks in gpu.launch_func (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
                clusters in (%1, %0, %0) // <-- Optional
                blocks in (%0, %0, %0)
                threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id  x
%cidY = gpu.cluster_id  y
%cidZ = gpu.cluster_id  z

%cdimX = gpu.cluster_dim  x
%cdimY = gpu.cluster_dim  y
%cdimZ = gpu.cluster_dim  z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR. 

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
2023-11-27 11:05:07 +01:00
Guray Ozen
ea84897ba3 [mlir][gpu] Introduce gpu.dynamic_shared_memory Op (#71546)
While the `gpu.launch` Op allows setting the size via the
`dynamic_shared_memory_size` argument, accessing the dynamic shared
memory is very convoluted. This PR implements the proposed Op,
`gpu.dynamic_shared_memory` that aims to simplify the utilization of
dynamic shared memory.

RFC:
https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/

**Proposal from RFC**
This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory
feature efficiently. It is is a powerful feature that enables the
allocation of shared memory at runtime with the kernel launch on the
host. Afterwards, the memory can be accessed directly from the device. I
believe similar story exists for AMDGPU.

**Current way Using Dynamic Shared Memory with MLIR**

Let me illustrate the challenges of using dynamic shared memory in MLIR
with an example below. The process involves several steps:
- memref.global 0-sized array LLVM's NVPTX backend expects
- dynamic_shared_memory_size Set the size of dynamic shared memory
- memref.get_global Access the global symbol
- reinterpret_cast and subview Many OPs for pointer arithmetic

```
// Step 1. Create 0-sized global symbol. Manually set the alignment
memref.global "private" @dynamicShmem  : memref<0xf16, 3> { alignment = 16 }
func.func @main() {
  // Step 2. Allocate shared memory
  gpu.launch blocks(...) threads(...)
    dynamic_shared_memory_size %c10000 {
    // Step 3. Access the global object
    %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3>
    // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations.
    %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128],  strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3>
    %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3>
    %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3>
    %7 = memref.subview %6[0, 0][64, 64][1,1]  : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>
    %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>
    // Step.5 Use
    "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index)
    "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index)
    gpu.terminator
  }
```

Let’s write the program above with that:

```
func.func @main() {
    gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 {
    	%i = arith.constant 18 : index
        // Step 1: Obtain shared memory directly
        %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3>
        %c147456 = arith.constant 147456 : index
        %c155648 = arith.constant 155648 : index
        %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3>
        %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3>

        // Step 2: Utilize the shared memory
        "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index)
        "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index)
    }
}
```

This PR resolves #72513
2023-11-16 14:42:17 +01:00
long.chen
1609f1c2a5 [mlir][affine][nfc] cleanup deprecated T.cast style functions (#71269)
detail see the docment: https://mlir.llvm.org/deprecation/

Not all changes are made manually, most of them are made through a clang
tool I wrote https://github.com/lipracer/cpp-refactor.
2023-11-14 13:01:19 +08:00
drazi
9a3d3c7093 generalize pass gpu-kernel-outlining for symbol op (#72074)
This PR generalize gpu-out-lining pass to take care of ops
`SymbolOpInterface` instead of just `func::FuncOp`.

Before this change, gpu-out-lining pass will skip `llvm.func`.
```mlir
module {
  llvm.func @main() {
    %c1 = arith.constant 1 : index
    gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) {
      gpu.terminator
    }
    llvm.return
  }
}
```

After this change, gpu-out-lining pass can handle llvm.func as well.
2023-11-12 21:48:49 -08:00
spaceotter
45f669252e [mlir][gpu] Fix build error after barrier elimination code moved (#72019)
Should fix
https://lab.llvm.org/buildbot/#/builders/61/builds/51692/steps/5/logs/stdio
2023-11-11 00:57:30 -08:00
spaceotter
00c3c73189 [mlir][gpu] Separate the barrier elimination code from transform ops (#71762)
Allows the barrier elimination code to be run from C++ as well. The code
from transforms dialect is copied as-is, the pass and populate functions
have beed added at the end.

Co-authored-by: Eric Eaton <eric@nod-labs.com>
2023-11-10 17:59:09 -08:00
spaceotter
51af040b22 [mlir][gpu] Eliminate redundant gpu.barrier ops (#71575)
Adds a canonicalizer for gpu.barrier that gets rid of duplicates.

Co-authored-by: Eric Eaton <eric@nod-labs.com>
2023-11-09 18:06:20 -05:00
Krzysztof Drewniak
05fa923a9b Fix SmallVector usage in SerailzeToHsaco (#71702)
Enable merging #71439 by removing a definitely-wrong usage of
std::unique_ptr<SmallVectorImpl<char>> as a return value with passing in
a SmallVectorImpl<char>&

Also change the following function to take ArrayRef<char> instead of
const SmalVectorImpl<char>& .
2023-11-08 13:57:41 -06:00
Fabian Mora
42630689e2 [mlir][gpu] Clean GPU Passes.h from external SPIRV includes (#71331)
Removes the `SPIRVAttributes.h` header from `GPU/Transforms/Passes.h`
2023-11-05 17:06:04 -08:00
Sang Ik Lee
2dace04521 [mlir][spirv] Implement gpu::TargetAttrInterface (#69949)
This commit implements gpu::TargetAttrInterface for SPIR-V target
attribute. The plan is to use this to enable GPU compilation pipeline
for OpenCL kernels later.

The changes do not impact Vulkan shaders using milr-vulkan-runner.
New GPU Dialect transform pass spirv-attach-target is implemented for
attaching attribute from CLI.

gpu-module-to-binary pass now works with GPU module that has SPIR-V
module with OpenCL kernel functions inside.
2023-11-05 08:11:53 -08:00
Mehdi Amini
6883343843 [mlir] Guard NVPTX backend initialization on it being configured (NFC)
This is just helping with some build failure in some new configurations.
2023-11-03 22:23:01 -07:00