Commit Graph

217 Commits

Author SHA1 Message Date
Justin Fargnoli
35d55f2894 [NFC][mlir] Reorder declarePromisedInterface() operands (#86628)
Reorder the template operands of `declarePromisedInterface()` to match
`declarePromisedInterfaces()`.
2024-03-27 10:30:17 -07:00
Justin Fargnoli
513cdb8222 [mlir] Declare promised interfaces for all dialects (#78368)
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.

Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
2024-03-15 20:23:20 -07:00
Matthias Springer
492e8ba038 [mlir] Fix memory leaks after #81759 (#82762)
This commit fixes memory leaks that were introduced by #81759. The way
ops and blocks are erased changed slightly.

The leaks were caused by an incorrect implementation of op builders:
blocks must be created with the supplied builder object. Otherwise, they
are not properly tracked by the dialect conversion and can leak during
rollback.
2024-02-23 14:28:57 +01:00
Matthias Springer
5fcf907b34 [mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Fabian Mora
5b4f2b906b [mlir][gpu] Add an offloading handler attribute to gpu.module (#78047)
This patch adds an optional offloading handler attribute to
the`gpu.module` op. This attribute will be used during
`gpu-module-to-binary` pass to override the offloading handler used in
the `gpu.binary` op.
2024-01-15 16:58:10 -05:00
Guray Ozen
5b33cff397 [mlir][gpu] Add Support for Cluster of Thread Blocks in gpu.launch (#76924) 2024-01-06 11:17:01 +01:00
Jakub Kuderski
72003adf6b [mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015)
Each vector element is reduced independently, which is a form of
multi-reduction.

The plan is to allow for gradual lowering of multi-reduction that
results in fewer `gpu.shuffle` ops at the end:
1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d
`gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32

For example we can perform 2 independent f16 reductions with a series of
`gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.
2023-12-21 11:55:43 -05:00
Jakub Kuderski
560564f51c [mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901)
This is to avoid confusion when dealing with reduction/combining kinds.
For example, see a recent PR comment:
https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175.

Previously, they were picked to mostly mirror the names of the llvm
vector reduction intrinsics:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In
isolation, it was not clear if `<maxf>` has `arith.maxnumf` or
`arith.maximumf` semantics. The new reduction kind names map 1:1 to
arith ops, which makes it easier to tell/look up their semantics.

Because both the vector and the gpu dialect depend on the arith dialect,
it's more natural to align names with those in arith than with the
lowering to llvm intrinsics.

Issue: https://github.com/llvm/llvm-project/issues/72354
2023-12-20 00:14:43 -05:00
Jakub Kuderski
7eccd52842 Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423)"
This reverts commit dd09221a29 and relands
https://github.com/llvm/llvm-project/pull/73423.

* Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.
2023-11-27 11:38:18 -05:00
Jakub Kuderski
dd09221a29 Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423)"
This reverts commit e0aac8c88d.

I'm seeing some nvidia integration test failures:
https://lab.llvm.org/buildbot/#/builders/61/builds/52334.
2023-11-27 11:29:23 -05:00
Jakub Kuderski
e0aac8c88d [mlir][gpu] Align reduction operations with vector combining kinds (#73423)
The motivation for this change is explained in
https://github.com/llvm/llvm-project/issues/72354.

Before this change, we could not tell between signed/unsigned
minimum/maximum and NaN treatment for floating point values.

The mapping of old reduction operations to the new ones is as follows:
*  `min` --> `minsi` for ints, `minf` for floats
*  `max` --> `maxsi` for ints, `maxf` for floats

New reduction kinds not represented in the old enum: `minui`, `maxui`,
`minimumf`, `maximumf`.

As a next step, I would like to have a common definition of combining
kinds used by the `vector` and `gpu` dialects. Separately, the GPU to
SPIR-V lowering does not yet properly handle zero and NaN values -- the
behavior of floating point min/max group reductions is not specified by
the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. 

Issue: https://github.com/llvm/llvm-project/issues/72354
2023-11-27 11:19:20 -05:00
Guray Ozen
edf5cae739 [mlir][gpu] Support Cluster of Thread Blocks in gpu.launch_func (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
                clusters in (%1, %0, %0) // <-- Optional
                blocks in (%0, %0, %0)
                threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id  x
%cidY = gpu.cluster_id  y
%cidZ = gpu.cluster_id  z

%cdimX = gpu.cluster_dim  x
%cdimY = gpu.cluster_dim  y
%cdimZ = gpu.cluster_dim  z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR. 

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
2023-11-27 11:05:07 +01:00
Guray Ozen
ea84897ba3 [mlir][gpu] Introduce gpu.dynamic_shared_memory Op (#71546)
While the `gpu.launch` Op allows setting the size via the
`dynamic_shared_memory_size` argument, accessing the dynamic shared
memory is very convoluted. This PR implements the proposed Op,
`gpu.dynamic_shared_memory` that aims to simplify the utilization of
dynamic shared memory.

RFC:
https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/

**Proposal from RFC**
This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory
feature efficiently. It is is a powerful feature that enables the
allocation of shared memory at runtime with the kernel launch on the
host. Afterwards, the memory can be accessed directly from the device. I
believe similar story exists for AMDGPU.

**Current way Using Dynamic Shared Memory with MLIR**

Let me illustrate the challenges of using dynamic shared memory in MLIR
with an example below. The process involves several steps:
- memref.global 0-sized array LLVM's NVPTX backend expects
- dynamic_shared_memory_size Set the size of dynamic shared memory
- memref.get_global Access the global symbol
- reinterpret_cast and subview Many OPs for pointer arithmetic

```
// Step 1. Create 0-sized global symbol. Manually set the alignment
memref.global "private" @dynamicShmem  : memref<0xf16, 3> { alignment = 16 }
func.func @main() {
  // Step 2. Allocate shared memory
  gpu.launch blocks(...) threads(...)
    dynamic_shared_memory_size %c10000 {
    // Step 3. Access the global object
    %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3>
    // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations.
    %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128],  strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3>
    %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3>
    %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3>
    %7 = memref.subview %6[0, 0][64, 64][1,1]  : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>
    %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>
    // Step.5 Use
    "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index)
    "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index)
    gpu.terminator
  }
```

Let’s write the program above with that:

```
func.func @main() {
    gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 {
    	%i = arith.constant 18 : index
        // Step 1: Obtain shared memory directly
        %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3>
        %c147456 = arith.constant 147456 : index
        %c155648 = arith.constant 155648 : index
        %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3>
        %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3>

        // Step 2: Utilize the shared memory
        "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index)
        "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index)
    }
}
```

This PR resolves #72513
2023-11-16 14:42:17 +01:00
spaceotter
51af040b22 [mlir][gpu] Eliminate redundant gpu.barrier ops (#71575)
Adds a canonicalizer for gpu.barrier that gets rid of duplicates.

Co-authored-by: Eric Eaton <eric@nod-labs.com>
2023-11-09 18:06:20 -05:00
Fabian Mora
5093413a50 [mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220)
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.

2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.

3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.

4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
2023-09-14 18:00:27 -04:00
Fabian Mora
444abb396c [mlir][gpu] Add a symbol table field to TargetOptions and adjust GpuModuleToBinary (#65797)
This patch adds the option of building an optional symbol table for the
top operation in the `gpu-module-to-binary` pass. The table is not
created by default as most targets don't need it; instead, it is lazily
built. The table is passed through a callback in `TargetOptions`.

This patch is required to integrate #65539 .
2023-09-09 19:59:20 -04:00
Fabian Mora
ec9f218173 [mlir][gpu][target] Use promises to verify TargetAttrs IR correctness. (#65787)
This patch employs the updated promise mechanism to enforce Target
Attribute IR constraints. Due to this patch, TargetAttributes
implementations no longer have to be registered before executing
translation to LLVM IR in cases where they are not needed, like when
translating `gpu.binary` operations.
2023-09-08 17:21:45 -04:00
Fabian Mora
e22f04b597 [mlir][gpu] Fix option parsing in TargetOptions
`TargetOptions` includes a field for passing additional command line options to
the GPU compilation process. This field is typically used during the 'gpu-module-to-binary`
pass:
```
--gpu-module-to-binary=opts="-v -c"
```

The problem is that `tokenizeCmdOptions` receives the quoted string, which produces an
incorrect tokenization for `"-v -c"`. This patch removes quotes, fixing this issue.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D159434
2023-09-04 20:54:29 -04:00
Martin Erhart
34a35a8b24 [mlir] Move FunctionInterfaces to Interfaces directory and inherit from CallableOpInterface
Functions are always callable operations and thus every operation
implementing the `FunctionOpInterface` also implements the
`CallableOpInterface`. The only exception was the FuncOp in the toy
example. To make implementation of the `FunctionOpInterface` easier,
this commit lets `FunctionOpInterface` inherit from
`CallableOpInterface` and merges some of their methods. More precisely,
the `CallableOpInterface` has methods to get the argument and result
attributes and a method to get the result types of the callable region.
These methods are always implemented the same way as their analogues in
`FunctionOpInterface` and thus this commit moves all the argument and
result attribute handling methods to the callable interface as well as
the methods to get the argument and result types. The
`FuntionOpInterface` then does not have to declare them as well, but
just inherits them from the `CallableOpInterface`.
Adding the inheritance relation also required to move the
`FunctionOpInterface` from the IR directory to the Interfaces directory
since IR should not depend on Interfaces.

Reviewed By: jpienaar, springerm

Differential Revision: https://reviews.llvm.org/D157988
2023-08-31 11:28:23 +00:00
Adrian Kuegel
b454ecc84f [mlir] Apply ClangTidy fix (NFC)
Prefer to use .empty() instead of checking for size() > 0.
2023-08-28 10:53:15 +02:00
Fabian Mora
8ae074b195 [mlir][gpu] Add the Select Object compilation attribute.
**For an explanation of these patches see D154153.**

Commit message:
This patch adds the default offloading handler for GPU binary ops: `#gpu.select_object`,
it selects the object to embed based on an index or a target attribute, embedding
the object as a global string and launches the kernel using the scheme used in the
GPU to LLVM pass.

Depends on D154137

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D154147
2023-08-11 22:00:35 +00:00
Fabian Mora
a63db3f5f5 [mlir][gpu] Modifies gpu.launch_func to allow lowering it after gpu-to-llvm.
**For an explanation of these patches see D154153.**

Commit message:
In order to lower `gpu.launch_func` after running `gpu-to-llvm` it must be
able to handle lowered types -eg. index -> i64. This patch also allows the op
to refer to GPU binaries and not only GPU modules.

Depends on D154132.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D154137
2023-08-11 21:56:37 +00:00
Fabian Mora
bf24fb81ac [mlir][gpu] Add gpu.binary op and #gpu.object attribute.
**For an explanation of these patches see D154153.**

Commit message:
Adds the `#gpu.object` attribute for holding a binary object and the target
attribute used to create it. Also adds the `gpu.binary` operation used to
store GPU objects.

Depends on D154108

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D154132
2023-08-11 19:48:18 +00:00
Fabian Mora
9fa7b9ef21 [mlir][gpu] Add target attribute to GPU modules.
**For an explanation of these patches see D154153.**

Commit message:
Adds support for Target attributes in GPU modules. This change enables attaching
an optional non empty array of GPU target attributes to the module.

Depends on D154104

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D154113
2023-08-08 13:19:47 +00:00
Fabian Mora
86c4dfa209 [mlir][gpu] Add GPU target attribute interface.
**For an explanation of these patches see D154153.**

Commit message:
This patch adds the `GPUTargetAttrInterface` attribute interface, this interface
is meant to be used as an opaque interface for serializing GPU modules into
binary strings.

Reviewed By: mehdi_amini, krzysz00

Differential Revision: https://reviews.llvm.org/D154104
2023-08-08 13:10:54 +00:00
Kun Wu
dfe2942909 [mlir][sparse][gpu] add spgemm operator
Differential Revision: https://reviews.llvm.org/D152981
2023-08-08 00:29:23 +00:00
Alex Zinenko
e98e59955e Revert "Foo"
This reverts commit 3c9aa10c57.

No proper description of the commit.
2023-08-04 13:30:12 +00:00
Nicolas Vasilache
3c9aa10c57 Foo 2023-08-04 11:06:17 +00:00
Nicolas Vasilache
44e6318cea [mlir][transforms] Revamp the implementation of mapping loops to GPUs
This revision significantly simplifies the specification and implementation of mapping loops to GPU ids.

Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes:
  1. a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication
     may occur on a per-dimension 3-D sub-rectangle basis.
  2. a n-D linearized mode, on which predication may only occur on a linear basis.

In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages.

The `warp_dims` attribute was deemed confusing and is removed from the transform in favor of better size inference.

Differential Revision: https://reviews.llvm.org/D155941
2023-07-26 00:09:08 +02:00
Matthias Springer
b1d2687501 [mlir][IR] Remove duplicate isLastMemrefDimUnitStride functions
This function is duplicated in various dialects.

Differential Revision: https://reviews.llvm.org/D155462
2023-07-17 16:31:04 +02:00
Matthias Springer
cb7bda2ace [mlir][NFC] Use getConstantIntValue instead of casting to ConstantIndexOp
`getConstantIntValue` extracts constant values from all constant-like ops, not just `arith::ConstantIndexOp`.

Differential Revision: https://reviews.llvm.org/D154356
2023-07-04 14:08:37 +02:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Kun Wu
97f4c22b3a [mlir][sparse][gpu] unify dnmat and dnvec handle and ops
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152465
2023-06-09 17:16:48 +00:00
Kun Wu
810c7410b5 [MLIR][sparse][GPU] fixing windows build break caused by D151014
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151405
2023-05-25 17:12:18 +00:00
Kazu Hirata
6455242570 [mlir] Fix a warning
This patch fixes:

  mlir/lib/Dialect/GPU/IR/GPUDialect.cpp:175:2: error: extra ';'
  outside of a function is incompatible with C++98
  [-Werror,-Wc++98-compat-extra-semi]
2023-05-24 10:36:38 -07:00
Kun Wu
86bf710cf7 [mlir] [gpu] [sparse] refined SparseHandle type
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151014
2023-05-24 10:16:07 -07:00
Aart Bik
b700a90cc0 [mlir][gpu][sparse] add gpu ops for sparse matrix computations
This revision extends the GPU dialect with ops that can be lowered to
host-oriented sparse matrix library calls (in this case cuSparse focused
although the ops could be generalized to support more GPUs in principle).
This will allow the "sparse compiler pipeline" to accelerate sparse operations
(see follow up revisions with examples of this).

For some background;

https://discourse.llvm.org/t/sparse-compiler-and-gpu-code-generation/69786/2

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D150152
2023-05-12 10:44:36 -07:00
Tres Popp
c1fa60b4cd [mlir] Update method cast calls to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Context:

* https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
* Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This follows a previous patch that updated calls
`op.cast<T>()-> cast<T>(op)`. However some cases could not handle an
unprefixed `cast` call due to occurrences of variables named cast, or
occurring inside of class definitions which would resolve to the method.
All C++ files that did not work automatically with `cast<T>()` are
updated here to `llvm::cast` and similar with the intention that they
can be easily updated after the methods are removed through a
find-replace.

See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
for the clang-tidy check that is used and then update printed
occurrences of the function to include `llvm::` before.

One can then run the following:
```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
                 -export-fixes /tmp/cast/casts.yaml mlir/*\
                 -header-filter=mlir/ -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```

Differential Revision: https://reviews.llvm.org/D150348
2023-05-12 11:21:30 +02:00
Ivan Butygin
4142952352 [mlir][gpu] Reduction ops canonicalizatios
Make group ops uniform if `gpu.launch` is their direct parent.

Differential Revision: https://reviews.llvm.org/D149183
2023-05-10 00:33:42 +02:00
Krzysztof Drewniak
94058c41d4 [mlir][GPU] Allow specifying alignment of memory attributions
Add support for argument attributes on workgroup and private
attributions for GPU functions. These arguments are outside the range
of getNumArguments() and get printed separately, so the default
mechanism for function argument attributes can't be used on them.

Having done this, check for the `llvm.align` attribute on workgroup or
private attributions in a `gpu.func` and pass it through to the
relevant allocation op (creating a global or alloca). This allows
people creating kernels that use multiple workgroup buffers to set an
alignment.

(This could, in the future, be a GPU dialect `alignment` attribute,
but I've taken the simpler route of using the LLVM version instead for
simplicity and because I don't know how this might impact backends
like Vulkan)

Reviewed By: nirvedhmeshram

Differential Revision: https://reviews.llvm.org/D148965
2023-05-03 21:51:15 +00:00
Fabian Mora
54e96f4f97 [mlir][GPUDialect] Implement memory attributions for LaunchOp
Currently memory attributions are not supported for gpu::LaunchOp, this patch implements memory attributions for gpu::LaunchOp and modifies the KernelOutlining pass to make the attributions available in GPUFuncOp.

Reviewed By: makslevental

Differential Revision: https://reviews.llvm.org/D147809
2023-04-26 17:53:18 -05:00
Nicolas Vasilache
c59465e120 [mlir][Transform] Add support for mapping to GPU warps and to linear ids
This revisions refactors the implementation of mapping to threads to additionally allow warps and linear ids to be specified.

`warp_dims` is currently specified along with `block_dims` as a transform attribute.

Linear ids on th other hand use the flattened block_dims to predicate on the first (linearized) k threads.
An additional GPULinearIdMappingAttr is added to the GPU dialect to allow specifying loops mapped to this new scheme.

Various implementation and transform op semantics cleanups are also applied.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D146130
2023-03-20 01:05:32 -07:00
Amir Mohammad Tavakkoli
115711c19c [mlir][LinAlg][Transform][GPU] Add GPU memory hierarchy to the transform.promote op
In this patch we are adding the support of copying a a `memref.subview` to the shared or private memory in GPU. The global to shared memory copy is adopted from codes implemented in IREE (https://github.com/iree-org/iree), but the private memory copy part has not been implemented in IREE. This patch enables transferring a subview from `global->shared`, `global->private`, and `shared->private`.

Our final aim is to provide a copy layout as an affine map to the `transform.promote` op to support transpose memory copy. This map is a permutation of the original affine index map. Although this has been implemented and user can copy data to arbitrary layout , this attempt is not included in this patch since we have still problem with `linalg.generic` operations to change their index map to the transformed index map. You can find more in following links ([[ 4fd5f93355 | Initial attempt to support layout map in promote op in transform dialect ]]) ([[ 9062b5849f | Fix data transpose in shared memory ]])

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D144666
2023-02-27 16:33:58 +01:00
Matthias Springer
c080c1f482 [mlir][GPU] Fix incorrect API usage in RewritePatterns
Incorrect API usage was detected by D144552.

Differential Revision: https://reviews.llvm.org/D144637
2023-02-23 18:20:37 +01:00
Quinn Dawkins
985f7ff632 [mlir][gpu] Add support for integer types in gpu.subgroup_mma ops
The signedness is carried by `!gpu.mma_matrix` types to most closely
match the Cooperative Matrix specification which determines signedness
with the type (and sometimes the operation).

See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html

To handle the lowering from vector to gpu, ops such as arith.extsi are
pattern matched next to `vector.transfer_read` and `vector.contract` to
determine the signedness of the matrix type.

Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D143223
2023-02-07 17:58:01 -05:00
Thomas Raoux
a7686db801 [mlir][gpu] Allow distributing to different level of IDs without failing
Change map_nested_foreach_to_threads to ignore foreach_thread not
mapping to threads, this will allow us to call
mapNestedForeachToThreadsImpl with different set of ids to lower
multiple levels. Also adds warpIds attributes.

Differential Revision: https://reviews.llvm.org/D143298
2023-02-04 02:03:05 +00:00
Alex Zinenko
3f2f83ef41 [mlir] accept values with result numbers in gpu.launch_func
The parser of gpu.launch_func was incorrectly rejecting SSA values with
result numbers (`%0#0`) in the list of function arguments by using the
`parseArgument` function intended for region argument declarations, not
operands. Fix this by directly parsing comma-separated operands and
types.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D141851
2023-01-16 19:26:42 +00:00
Christopher Bate
6ca1a09f03 [mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr)
This is a purely mechanical change that introduces an enum attribute in the GPU
dialect to represent the various memref memory spaces as opposed to the
hard-coded integer attributes that are currently used.

The following steps were taken to make the transition across the codebase:

1. Introduce a pass "gpu-lower-memory-space-attributes":

The pass updates all memref types that have a memory space attribute that is a
`gpu::AddressSpaceAttr`. These attributes are changed to `IntegerAttr`'s using a
mapping that is given by the caller. This pass is based on the
"map-memref-spirv-storage-class" pass and the common functions can probably
be refactored into a set of utilities under the MemRef dialect.

2. Update the verifiers of GPU/NVGPU dialect operations.

If a verifier currently checks the address space of an operand using
e.g.`getWorkspaceAddressSpace`, then it can continue to do so. However, the
checks are changed to only fail if the memory space is either missing or a wrong
value of type `gpu::AddressSpaceAttr`. Otherwise, it just assumes the address
space is correct because it was specifically lowered to something other than a
`gpu::AddressSpaceAttr`.

3. Update existing gpu-to-llvm conversion infrastructure.

In the existing gpu-to-X passes, we add a full conversion equivalent to
`gpu-lower-memory-space-attributes` just before doing the conversion to the
LLVMDialect. This is done because currently both the gpu-to-llvm passes
(rocdl,nvvm) run gpu-to-gpu rewrites within the pass, which introduce
`AddressSpaceAttr` memory space annotations. Therefore, I inserted the
memory space conversion between the gpu-to-gpu rewrites and the LLVM
conversion.

For more context see the below discourse discussion:
https://discourse.llvm.org/t/gpu-workgroup-shared-memory-address-space-is-hard-coded/

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D140644
2023-01-13 11:00:10 -07:00
Jeff Niu
4d67b27817 [mlir] Add operations to BlockAndValueMapping and rename it to IRMapping
The patch adds operations to `BlockAndValueMapping` and renames it to `IRMapping`. When operations are cloned, old operations are mapped to the cloned operations. This allows mapping from an operation to a cloned operation. Example:

```
Operation *opWithRegion = ...
Operation *opInsideRegion = &opWithRegion->front().front();

IRMapping map
Operation *newOpWithRegion = opWithRegion->clone(map);
Operation *newOpInsideRegion = map.lookupOrNull(opInsideRegion);
```

Migration instructions:
All includes to `mlir/IR/BlockAndValueMapping.h` should be replaced with `mlir/IR/IRMapping.h`. All uses of `BlockAndValueMapping` need to be renamed to `IRMapping`.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D139665
2023-01-12 13:16:05 -08:00
Markus Böck
7df761217c [mlir][NFC] Migrate rest of the dialects to the new fold API 2023-01-11 21:47:25 +01:00