Commit Graph

83 Commits

Author SHA1 Message Date
marina kolpakova a.k.a. geexie
0080d2aa55 [mlir][gpu] folds memref.dim of gpu.alloc
implements canonicalization which folds memref.dim(gpu.alloc(%size), %idx) -> %size

Differential Revision: https://reviews.llvm.org/D108892
2021-08-31 12:33:10 +03:00
Chris Lattner
0dd4c4b5ae [Testsuite] Change these tests to only have a single verification error, NFC.
These are testing for various verification failures, but have missing returns
at the end of their function.  Add the returns to focus the tests better.
2021-06-13 21:36:31 -07:00
thomasraoux
428a62f65f [mlir][gpu] Add op to create MMA constant matrix
This allow creating a matrix with all elements set to a given value. This is
needed to be able to implement a simple dot op.

Differential Revision: https://reviews.llvm.org/D103870
2021-06-10 08:34:04 -07:00
Christian Sigg
0b21371e12 [mlir] Support pre-existing tokens in 'gpu-async-region'
Allow gpu ops implementing the async interface to already be async when running the GpuAsyncRegionPass.
That pass threads a 'current token' through a block with ops implementing the gpu async interface.

After this change, existing async ops (returning a !gpu.async.token) set the current token.
Existing synchronous `gpu.wait` ops reset the current token.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D103396
2021-06-10 08:43:45 +02:00
William S. Moses
00b6463b26 [MLIR][GPU] Simplify memcpy of cast
Introduce a simplification that allows memcpy of a cast to simply use the underlying op

Differential Revision: https://reviews.llvm.org/D103830
2021-06-07 14:00:13 -04:00
thomasraoux
b44007bec2 [mlir][gpu] Relax restriction on MMA store op to allow chain of mma ops.
In order to allow large matmul operations using the MMA ops we need to chain
operations this is not possible unless "DOp" and "COp" type have matching
layout so remove the "DOp" layout and force accumulator and result type to
match.
Added a test for the case where the MMA value is accumulated.

Differential Revision: https://reviews.llvm.org/D103023
2021-05-27 09:13:51 -07:00
Navdeep Kumar
875eb523c1 [MLIR][GPU][NVVM] Add warp synchronous matrix-multiply accumulate ops
Add warp synchronous matrix-multiply accumulate ops in GPU and NVVM
dialect. Add following three ops to GPU dialect :-
  1.) subgroup_mma_load_matrix
  2.) subgroup_mma_store_matrix
  3.) subgroup_mma_compute
Add following three ops to NVVM dialect :-
  1.) wmma.m16n16k16.load.[a,b,c].[f16,f32].row.stride
  2.) wmma.m16n16k16.store.d.[f16,f32].row.stride
  3.) wmma.m16n16k16.mma.row.row.[f16,f32].[f16,f32]

Reviewed By: bondhugula, ftynse, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D95330
2021-05-06 12:06:25 +05:30
Julian Gross
e2310704d8 [MLIR] Create memref dialect and move dialect-specific ops from std.
Create the memref dialect and move dialect-specific ops
from std dialect to this dialect.

Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
AssumeAlignmentOp -> MemRef_AssumeAlignmentOp
DeallocOp -> MemRef_DeallocOp
DimOp -> MemRef_DimOp
MemRefCastOp -> MemRef_CastOp
MemRefReinterpretCastOp -> MemRef_ReinterpretCastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
LoadOp -> MemRef_LoadOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
SubViewOp -> MemRef_SubViewOp
TransposeOp -> MemRef_TransposeOp
TensorLoadOp -> MemRef_TensorLoadOp
TensorStoreOp -> MemRef_TensorStoreOp
TensorToMemRefOp -> MemRef_BufferCastOp
ViewOp -> MemRef_ViewOp

The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667

Differential Revision: https://reviews.llvm.org/D98041
2021-03-15 11:14:09 +01:00
Christian Sigg
f03826f896 Pass GPU events instead of streams across async regions.
Lower !gpu.async.tokens returned from async.execute regions to events instead of streams.

Make !gpu.async.token returned from !async.execute single-use.
This allows creating one event per use and destroying them without leaking or ref-counting.
Technically we only need this for stream/event-based lowering. I kept the code separate
from the rest of the gpu-async-region pass so that we can make this optional or move
to a separate pass as needed.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D96965
2021-02-25 13:18:18 +01:00
Alexander Belyaev
a89035d750 Revert "[MLIR] Create memref dialect and move several dialect-specific ops from std."
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h

Working on a fix.

This reverts commit 8aa6c3765b.
2021-02-18 12:49:52 +01:00
Julian Gross
8aa6c3765b [MLIR] Create memref dialect and move several dialect-specific ops from std.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.

Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp

The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667

Differential Revision: https://reviews.llvm.org/D96425
2021-02-18 11:29:39 +01:00
River Riddle
93592b726c [mlir][OpFormatGen] Format enum attribute cases as keywords when possible
In the overwhelmingly common case, enum attribute case strings represent valid identifiers in MLIR syntax. This revision updates the format generator to format as a keyword in these cases, removing the need to wrap values in a string. The parser still retains the ability to parse the string form, but the printer will use the keyword form when applicable.

Differential Revision: https://reviews.llvm.org/D94575
2021-01-14 11:35:49 -08:00
Alex Zinenko
dd5165a920 [mlir] replace LLVM dialect float types with built-ins
Continue the convergence between LLVM dialect and built-in types by replacing
the bfloat, half, float and double LLVM dialect types with their built-in
counterparts. At the API level, this is a direct replacement. At the syntax
level, we change the keywords to `bf16`, `f16`, `f32` and `f64`, respectively,
to be compatible with the built-in type syntax. The old keywords can still be
parsed but produce a deprecation warning and will be eventually removed.

Depends On D94178

Reviewed By: mehdi_amini, silvas, antiagainst

Differential Revision: https://reviews.llvm.org/D94179
2021-01-08 17:38:12 +01:00
Alex Zinenko
2230bf99c7 [mlir] replace LLVMIntegerType with built-in integer type
The LLVM dialect type system has been closed until now, i.e. did not support
types from other dialects inside containers. While this has had obvious
benefits of deriving from a common base class, it has led to some simple types
being almost identical with the built-in types, namely integer and floating
point types. This in turn has led to a lot of larger-scale complexity: simple
types must still be converted, numerous operations that correspond to LLVM IR
intrinsics are replicated to produce versions operating on either LLVM dialect
or built-in types leading to quasi-duplicate dialects, lowering to the LLVM
dialect is essentially required to be one-shot because of type conversion, etc.
In this light, it is reasonable to trade off some local complexity in the
internal implementation of LLVM dialect types for removing larger-scale system
complexity. Previous commits to the LLVM dialect type system have adapted the
API to support types from other dialects.

Replace LLVMIntegerType with the built-in IntegerType plus additional checks
that such types are signless (these are isolated in a utility function that
replaced `isa<LLVMType>` and in the parser). Temporarily keep the possibility
to parse `!llvm.i32` as a synonym for `i32`, but add a deprecation notice.

Reviewed By: mehdi_amini, silvas, antiagainst

Differential Revision: https://reviews.llvm.org/D94178
2021-01-07 19:48:31 +01:00
Sanjoy Das
bd166c813c Nit: fix spacing
Differential Revision: https://reviews.llvm.org/D93996
2021-01-06 09:40:50 -08:00
Christian Sigg
0955d8df06 [mlir] Add gpu.memcpy op.
Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D93197
2020-12-22 17:39:55 +01:00
Christian Sigg
a79b26db0e [mlir] Fix for gpu-async-region pass.
- the !gpu.async.token is the second result of 'gpu.alloc async', not the first.
- async.execute construction takes operand types not yet wrapped in !async.value.
- fix typo

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D93156
2020-12-16 19:08:10 +01:00
Christian Sigg
d9adde5ae2 [mlir][gpu] Move gpu.wait ops from async.execute regions to its dependencies.
This can prevent unnecessary host synchronization.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D90346
2020-12-03 08:52:28 +01:00
Christian Sigg
5535696c38 [mlir] Add gpu.allocate, gpu.deallocate ops with LLVM lowering to runtime function calls.
The ops are very similar to the std variants, but support async GPU execution.

gpu.alloc does not currently support an alignment attribute, and the new ops do not have
canonicalizers/folders like their std siblings do.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D91698
2020-11-27 09:40:59 +01:00
Rahul Joshi
b7382ed3fe [MLIR] Extend Symbol verification to reject public symbol declarations.
- Extend the Symbol interface with `isDeclaration` to identify operations that declare
  a symbol as opposed to define it.
- Extend verification to disallow public declarations as per the discussion in
   https://llvm.discourse.group/t/rfc-symbol-definition-declaration-x-visibility-checks/2140
- Adopt the new interface for `FuncOp` and fix test and code to not have/create public
  function declarations.

Differential Revision: https://reviews.llvm.org/D91456
2020-11-16 16:05:32 -08:00
River Riddle
ebcc022507 [mlir][AsmPrinter] Refactor printing to only print aliases for attributes/types that will exist in the output.
This revision refactors the way that attributes/types are considered when generating aliases. Instead of considering all of the attributes/types of every operation, we perform a "fake" print step that prints the operations using a dummy printer to collect the attributes and types that would actually be printed during the real process. This removes a lot of attributes/types from consideration that generally won't end up in the final output, e.g. affine map attributes in an `affine.apply`/`affine.for`.

This resolves a long standing TODO w.r.t aliases, and helps to have a much cleaner textual output format. As a datapoint to the latter, as part of this change several tests were identified as testing for the presence of attributes aliases that weren't actually referenced by the custom form of any operation.

To ensure that this wouldn't cause a large degradation in compile time due to the second full print, I benchmarked this change on a very large module with a lot of operations(The file is ~673M/~4.7 million lines long). This file before this change take ~6.9 seconds to print in the custom form, and ~7 seconds after this change. In the custom assembly case, this added an average of a little over ~100 miliseconds to the compile time. This increase was due to the way that argument attributes on functions are structured and how they get printed; i.e. with a better representation the negative impact here can be greatly decreased. When printing in the generic form, this revision had no observable impact on the compile time. This benchmarking leads me to believe that the impact of this change on compile time w.r.t printing is closely related to `print` methods that perform a lot of additional/complex processing outside of the OpAsmPrinter.

Differential Revision: https://reviews.llvm.org/D90512
2020-11-09 21:54:47 -08:00
Christian Sigg
db7129a005 [mlir][gpu] Add pass to make GPU ops within a region execute asynchronously.
Do not use the pass yet, except in a test.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D89937
2020-10-29 22:17:50 +01:00
Christian Sigg
3556114083 [mlir][gpu] Allow gpu.launch_func to be async.
This is a roll-forward of rGec7780ebdab4, now that the remaining
gpu.launch_func have been converted to custom form in rGb22f111023ba.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D90420
2020-10-29 21:48:38 +01:00
Mehdi Amini
834618a2ff Revert "[mlir][gpu] Allow gpu.launch_func to be async."
This reverts commit ec7780ebda.

One of the bot is crashing in a test related to this change.
2020-10-29 17:30:27 +00:00
Christian Sigg
ec7780ebda [mlir][gpu] Allow gpu.launch_func to be async.
Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D89324
2020-10-29 17:54:56 +01:00
Christian Sigg
1c1803dbb0 [mlir][gpu] Add customer printer/parser for gpu.launch_func.
Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D89262
2020-10-21 18:19:00 +02:00
Christian Sigg
ad3ecc24b1 [mlir][gpu] NFC: Make room for more than one GPU rewrite pattern.
AllReduceLowering is currently the only GPU rewrite pattern, but more are coming. This is a preparation change.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D89370
2020-10-19 07:52:47 +02:00
Christian Sigg
db1cf3d9ab [mlir][gpu] Add gpu.wait op.
This combines two separate ops (D88972: `gpu.create_token`, D89043: `gpu.host_wait`) into one.

I do after all like the idea of combining the two ops, because it matches exactly the pattern we are
going to have in the other gpu ops that will implement the AsyncOpInterface (launch_func, copies, alloc):

If the op is async, we return a !gpu.async.token. Otherwise, we synchronize with the host and don't return a token.

The use cases for `gpu.wait async` and `gpu.wait` are further apart than those of e.g. `gpu.h2d async` and `gpu.h2d`,
but I like the consistent meaning of the `async` keyword in GPU ops.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D89160
2020-10-13 17:30:59 +02:00
Christian Sigg
473b364a19 Add GPU async op interface and token type.
See https://llvm.discourse.group/t/rfc-new-dialect-for-modelling-asynchronous-execution-at-a-higher-level/1345

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D88954
2020-10-09 22:37:13 +02:00
Stephan Herhut
366d8435b4 [mlir][gpu] Fix bug in kernel outlining
The updated version of kernel outlining did not handle cases correctly
where an operand of a candidate for sinking itself was defined by an operation
that is a sinking candidate. In such cases, it could happen that sunk
operations were inserted in the wrong order, breaking ssa properties.

Differential Revision: https://reviews.llvm.org/D89112
2020-10-09 15:03:14 +02:00
Stephan Herhut
edeff6e642 [mlir][GPU] Improve constant sinking in kernel outlining
The previous implementation did not support sinking simple expressions. In particular,
it is often beneficial to sink dim operations.

Differential Revision: https://reviews.llvm.org/D88439
2020-09-29 14:46:15 +02:00
Alex Zinenko
ec1f4e7c3b [mlir] switch the modeling of LLVM types to use the new mechanism
A new first-party modeling for LLVM IR types in the LLVM dialect has been
developed in parallel to the existing modeling based on wrapping LLVM `Type *`
instances. It resolves the long-standing problem of modeling identified
structure types, including recursive structures, and enables future removal of
LLVMContext and related locking mechanisms from LLVMDialect.

This commit only switches the modeling by (a) renaming LLVMTypeNew to LLVMType,
(b) removing the old implementaiton of LLVMType, and (c) updating the tests. It
is intentionally minimal. Separate commits will remove the infrastructure built
for the transition and update API uses where appropriate.

Depends On D85020

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D85021
2020-08-04 14:29:25 +02:00
Frederik Gossen
904f91db5f [MLIR][Standard] Make the dim operation index an operand.
Allow for dynamic indices in the `dim` operation.
Rather than an attribute, the index is now an operand of type `index`.
This allows to apply the operation to dynamically ranked tensors.
The correct lowering of dynamic indices remains to be implemented.

Differential Revision: https://reviews.llvm.org/D81551
2020-06-10 13:54:47 +00:00
Mehdi Amini
d31c9e5a46 Change filecheck default to dump input on failure
Having the input dumped on failure seems like a better
default: I debugged FileCheck tests for a while without knowing
about this option, which really helps to understand failures.

Remove `-dump-input-on-failure` and the environment variable
FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.

Differential Revision: https://reviews.llvm.org/D81422
2020-06-09 18:57:46 +00:00
Wen-Heng (Jack) Chung
603b974cf7 [mlir][gpu] Fix logic error in D79508 computing number of private attributions.
Fix logic error in D79508. The old logic would make the first check in
`GPUFuncOp::verifyBody` always pass.
2020-06-08 07:40:34 -05:00
Jacques Pienaar
b0921f68e1 [mlir] Add verify method to adaptor
This allows verifying op-indepent attributes (e.g., attributes that do not require the op to have been created) before constructing an operation. These include checking whether required attributes are defined or constraints on attributes (such as I32 attribute). This is not perfect (e.g., if one had a disjunctive constraint where one part relied on the op and the other doesn't, then this would not try and extract the op independent from the op dependent).

The next step is to move these out to a trait that could be verified earlier than in the generated method. The first use case is for inferring the return type while constructing the op. At that point you don't have an Operation yet and that ends up in one having to duplicate the same checks, e.g., verify that attribute A is defined before querying A in shape function which requires that duplication. Instead this allows one to invoke a method to verify all the traits and, if this is checked first during verification, then all other traits could use attributes knowing they have been verified.

It is a little bit funny to have these on the adaptor, but I see the adaptor as a place to collect information about the op before the op is constructed (e.g., avoiding stringly typed accessors, verifying what is possible to verify before the op is constructed) while being cheap to use even with constructed op (so layer of indirection between the op constructed/being constructed). And from that point of view it made sense to me.

Differential Revision: https://reviews.llvm.org/D80842
2020-06-05 09:47:37 -07:00
Thomas Raoux
661235e126 [mlir][gpu] Add subgroup Id/Size/Num to GPU dialect
Add SubgroupId, SubgroupSize and NumSubgroups to GPU dialect ops and add the
lowering of those ops to SPIRV.

Differential Revision: https://reviews.llvm.org/D81042
2020-06-04 10:52:40 -07:00
Alex Zinenko
60f443bb3b [mlir] Change dialect namespace loop->scf
All ops of the SCF dialect now use the `scf.` prefix instead of `loop.`. This
is a part of dialect renaming.

Differential Revision: https://reviews.llvm.org/D79844
2020-05-13 19:20:21 +02:00
Tres Popp
2d2d696137 [MLIR] Propagate input side effect information
Summary:
Previously operations like std.load created methods for obtaining their
effects but did not inherit from the SideEffect interfaces when their
parameters were decorated with the information. The resulting situation
was that passes had no information on the SideEffects of std.load/store
and had to treat them more cautiously. This adds the inheritance
information when creating the methods.

As a side effect, many tests are modified, as they were using std.load
for testing and this oepration would be folded away as part of pattern
rewriting. Tests are modified to use store or to reutn the result of the
std.load.

Reviewers: mravishankar, antiagainst, nicolasvasilache, herhut, aartbik, ftynse!

Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, bader, grosul1, frgossen, Kayjukh, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78802
2020-04-27 11:35:52 +02:00
Frederik Gossen
7e4b139a04 [MLIR] Ensure gpu.func must be inside a gpu.module.
Ensure that `gpu.func` is only used within the dedicated `gpu.module`.
Implement the constraint to the GPU dialect and adopt test cases.

Differential Revision: https://reviews.llvm.org/D78541
2020-04-24 07:17:48 +00:00
Frederik Gossen
0372db05bb [MLIR] Use nested symbol to identify kernel in LaunchFuncOp.
Summary:
Use a nested symbol to identify the kernel to be invoked by a `LaunchFuncOp` in the GPU dialect.
This replaces the two attributes that were used to identify the kernel module and the kernel within seperately.

Differential Revision: https://reviews.llvm.org/D78551
2020-04-22 07:44:29 +00:00
Frederik Gossen
648fc95083 [MLIR] Use kernel as a short hand for gpu.kernel attribute.
Summary:
Use the shortcu `kernel` for the `gpu.kernel` attribute of `gpu.func`.
The parser supports this and test cases are easier to read.

Differential Revision: https://reviews.llvm.org/D78542
2020-04-22 07:38:30 +00:00
Frederik Gossen
2813802746 [MLIR] Fix test case for kernel attribute.
Summary:
Fix a broken test case in the `invalid.mlir` lit test case.
`expect` was missing its `e`.

Differential Revision: https://reviews.llvm.org/D78540
2020-04-22 07:27:39 +00:00
Mehdi Amini
bab5bcf8fd Add a flag on the context to protect against creation of operations in unregistered dialects
Differential Revision: https://reviews.llvm.org/D76903
2020-03-30 19:37:31 +00:00
Valentin Clement
d4d62fcab6 [MLIR] Add test for multiple gpu.all_reduce in the same kernel when lowering to NVVM
Summary: This patch add tests when lowering multiple `gpu.all_reduce` operations in the same kernel. This was previously failing.

Differential Revision: https://reviews.llvm.org/D75930
2020-03-19 16:36:38 +01:00
Valentin Clement
c7380995f8 [MLIR] Add and, or, xor, min, max too gpu.all_reduce and the nvvm lowering
Summary:
This patch add some builtin operation for the gpu.all_reduce ops.
- for Integer only: `and`, `or`, `xor`
- for Float and Integer: `min`, `max`

This is useful for higher level dialect like OpenACC or OpenMP that can lower to the GPU dialect.

Differential Revision: https://reviews.llvm.org/D75766
2020-03-11 14:07:04 +01:00
Stephan Herhut
f6790a1c63 Revert "[MLIR] Add and, or, xor, min, max too gpu.all_reduce and the nvvm lowering"
Attribution to original author got lost.
2020-03-11 14:07:04 +01:00
Stephan Herhut
2eff566b07 [MLIR] Add and, or, xor, min, max too gpu.all_reduce and the nvvm lowering
Summary:
This patch add some builtin operation for the gpu.all_reduce ops.
- for Integer only: `and`, `or`, `xor`
- for Float and Integer: `min`, `max`

This is useful for higher level dialect like OpenACC or OpenMP that can lower to the GPU dialect.

Differential Revision: https://reviews.llvm.org/D75766
2020-03-10 21:09:06 +01:00
MaheshRavishankar
3f44495dfd [mlir][GPU] Expose the functionality to create a GPUFuncOp from a LaunchOp
The current setup of the GPU dialect is to model both the host and
device side codegen. For cases (like IREE) the host side modeling
might not directly fit its use case, but device-side codegen is still
valuable. First step in accessing just the device-side functionality
of the GPU dialect is to allow just creating a gpu.func operation from
a gpu.launch operation. In addition this change also "inlines"
operations into the gpu.func op at time of creation instead of this
being a later step.

Differential Revision: https://reviews.llvm.org/D75287
2020-03-05 11:03:51 -08:00
Stephan Herhut
7a7eacc797 [MLIR][GPU] Implement a simple greedy loop mapper.
Summary:
The mapper assigns annotations to loop.parallel operations that
are compatible with the loop to gpu mapping pass. The outermost
loop uses the grid dimensions, followed by block dimensions. All
remaining loops are mapped to sequential loops.

Differential Revision: https://reviews.llvm.org/D74963
2020-02-25 11:42:42 +01:00