Commit Graph

93 Commits

Author SHA1 Message Date
Justin Fargnoli
35d55f2894 [NFC][mlir] Reorder declarePromisedInterface() operands (#86628)
Reorder the template operands of `declarePromisedInterface()` to match
`declarePromisedInterfaces()`.
2024-03-27 10:30:17 -07:00
Guray Ozen
8819f87998 [MLIR][NVVM] Add barrier.arrive (#85412)
PR adds `nvvm.barrier.arrive` Op. It is useful op for producer consumer
modeling.
2024-03-19 16:51:32 +01:00
Guray Ozen
b5d694ba14 [mlir][nvvm] Introduce nvvm.barrier OP (#81487)
This PR that introduces the `nvvm.barrier` OP to the NVVM dialect.
Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes
all threads using barrier resource 0.

The new `nvvm.barrier` has two essential arguments: the barrier resource
and the number of threads. This added flexibility allows for selective
synchronization of threads within a CTA, aligning with the capabilities
provided by LLVM intrinsics or the PTX model.

I think we can deprecate `nvvm.barrier0` in favor of the more generic
`nvvm.barrier`.

```
// Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA)
nvvm.barrier

// Synchronize all threads using the 3rd barrier resource.
nvvm.barrier id = 3

// Synchronize %numberOfThreads threads using the 3rd barrier resource.
nvvm.barrier id = 3 number_of_threads = %numberOfThreads
```
2024-02-14 08:28:45 +01:00
Rishi Surendran
fa6850a998 [mlir][nvvm]Add support for grid_constant attribute on LLVM function arguments (#78228)
Add support for attribute nvvm.grid_constant on LLVM function arguments.
The attribute can be attached only to arguments of type llvm.ptr that
have llvm.byval attribute.
Generate LLVM metadata for functions with nvvm.grid_constant arguments.
The metadata node is a list of integers, where each integer n denotes
that the nth parameter has the
grid_constant annotation (numbering from 1). The generated metadata node
will be handled by NVVM compiler. See
https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties
for documentation on grid_constant property.

This patch also adds convertParameterAttr to
LLVMTranslationDialectInterface for supporting the translation of
derived dialect attributes on function parameters 
2024-02-12 13:16:59 -08:00
Guray Ozen
12c241b365 [MLIR][NVVM] Explicit Data Type for Output in wgmma.mma_async (#78713)
The current implementation of `nvvm.wgmma.mma_async` Op deduces the data
type of the output matrix from the data type of struct member, which can be
non-intuitive, especially in cases where types like `2xf16` are packed
into `i32`.

This PR addresses this issue by improving the Op to include an explicit
data type for the output matrix.

The modified Op now includes an explicit data type for Matrix-D (<f16>),
and looks as follows:

```
%result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ...
nvvm.wgmma.mma_async
    %descA, %descB, %result,
    #nvvm.shape<m = 64, n = 32, k = 16>,
    D [<f16>, #nvvm.wgmma_scale_out<zero>],
    A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>],
    B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>]
```
2024-01-22 08:37:20 +01:00
Guray Ozen
2aec7083ad [mlir][gpu] Use DenseI32Array for NVVM's maxntid and reqntid (NFC) (#77466) 2024-01-09 16:44:25 +01:00
Adam Paszke
85b2327192 [mlir][nvvm] Fix the PTX lowering of wgmma.mma_async (#76150) 2023-12-22 14:46:34 +01:00
Guray Ozen
80ff67be81 [mlir][nvvm] Introduce nvvm.fence.proxy (#74057)
This PR introduce `nvvm.fence.proxy` OP for the following cases:

```
nvvm.fence.proxy { kind = #nvvm.proxy_kind<alias>}
nvvm.fence.proxy { kind = #nvvm.proxy_kind<async>}
nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.global>}
nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cta>}
nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cluster>}
```
2023-12-04 16:49:07 +01:00
Guray Ozen
68433f6b27 [mlir][nvvm] Introduce setmaxregister.sync.aligned Op (#73780)
This PR introduce `setmaxregister.sync.aligned` Op to increase or
decrease the register size.


https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg
2023-11-29 15:26:30 +01:00
Guray Ozen
9ceea08859 [mlir] im2col & l2cache on cp.async.bulk.tensor.shared.cluster.global` (#72967)
PR adds support of `im2col` and `l2cache` to
`cp.async.bulk.tensor.shared.cluster.global`. The Op is now supports all
the traits of the corresponding PTX instruction.

The current structure of this operation looks somewhat like below. The
PR also simplifies types so we don't need to write obvious types after
`:` anymore.
```
nvvm.cp.async.bulk.tensor.shared.cluster.global
		%dest, %tmaDescriptor, %barrier,
		box[%crd0,%crd1,%crd2,%crd3,%crd4]
		im2col[%off0,%off1,%off2] 			<-- PR introduces
		multicast_mask = %ctamask
		l2_cache_hint = %cacheHint			<-- PR introduces
		: !llvm.ptr<3>, !llvm.ptr
```
2023-11-22 16:08:09 +01:00
Guray Ozen
5316d19ed5 [mlir][nvvm] Introduce nvvm.stmatrix Op (#69467)
This PR adds `nvvm.stmatrix` Op to NVVM dialect. The Op collectively
store one or more matrices across all threads in a warp to the given
address location in shared memory.
2023-10-19 10:26:28 +02:00
Guray Ozen
dc27d21890 [mlir][nvvm] Use NVVMMemorySpace instead of hardcoded values (nfc) 2023-10-18 15:17:51 +02:00
Markus Böck
9779a731a6 [mlir] Fix some cmake dependencies in LLVMIR Dialect (#66956)
While looking into reducing needless interdependencies between upstream
MLIR dialects and passes, I discovered that the ROCDL Dialect
redundantely uses links in `VectorToLLVM` conversion pass when it
actually requires just the LLVM Dialect. Furthermore, after a build
failure, I ran `ninja -t missingdeps` which revealed that the NVVM
Dialect depends on headers of the GPU dialect
(211c9752c8/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h (L18))
without stating so in CMake.
This causes flaky builds as it is not guaranteed that the header exists
prior to the dialect being compiled.
2023-09-22 23:32:20 +02:00
JingZe Cui
10477be8a3 Add TMA Store operation to the NVVM dialect
Reviewed By: guraypp

Differential Revision: https://reviews.llvm.org/D159535
2023-09-21 08:23:33 -07:00
Guray Ozen
6360d09531 [MLIR][NVVM] Fix the register number of predicate (#65970)
The register number of predicate is calculated incorrectly. This PR
fixes that.
2023-09-13 15:04:31 +02:00
Fabian Mora
d0e6fd99aa [mlir] Extend the promise interface mechanism
This patch pairs a promised interface with the object (Op/Attr/Type/Dialect) requesting the promise, ie:
```
declarePromisedInterface<MyAttr, MyInterface>();
```
Allowing to make fine grained promises. It also adds a mechanism to query if `Op/Attr/Type` has an specific promise returning true if the promise is there or if an implementation has been added. Finally it adds a couple of `Attr|TypeConstraints` that can be used in ODS to query if the promise or an implementation is there.

This patch tries to solve 2 issues:
1. Different entities cannot use the same promise.
```
declarePromisedInterface<MyInterface>();
// Resolves a promise.
MyAttr1::attachInterface<MyInterface>(ctx);
// Doesn't resolves a promise, as the previous attachment removed the promise.
MyAttr2::attachInterface<MyInterface>(ctx);
```
2. Is not possible to query if a promise has been declared.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D158464
2023-09-05 09:55:27 -04:00
Kazu Hirata
b95028ee2d [mlir] Use llvm::is_contained (NFC) 2023-09-02 12:12:12 -07:00
Guray Ozen
a5925eee5a [MLIR][NVVM] Fix register mapping in wgmma.mma_async
WgmmaMmaAsync Op generates `wgmma.mma_async` PTX instruction that uses the same registers as read and write with mapping. Therefore, the registers count needs to be increased 2 times for the following registers.

This works changes this:
```
llvm.inline_asm has_side_effects asm_dialect = att "{wgmma.mma_async... {$0, $1, $2, $3, $4}, $5, $6, p", "=f,=f,=f,=f,0,1,2,3,l,l"
```

Into this one below. The only different is the number of registers ($8 and $9) that comes after read/write.
```
llvm.inline_asm has_side_effects asm_dialect = att "{wgmma.mma_async... {$0, $1, $2, $3, $4}, $8, $9, p", "=f,=f,=f,=f,0,1,2,3,l,l"
```

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D157843
2023-08-14 14:08:49 +02:00
Guray Ozen
5c3150e584 [MLIR][NVVM] Introduce WGMMA Types
This work introduces `WGMMATypes` attributes for the `WgmmaMmaSyncOp`. This op, having been recently added to MLIR, previously used `MMATypes`. However, there arises a disparity in supported types between `MmaOp` and `WgmmaMmaSyncOp`. To address this discrepancy more effectively, a new set of attributes is introduced.

Furthermore, this patch refines and optimizing the verification mechanisms of `WgmmaMmaSyncOp` Op.

It also adds support for f8 types, including `e4m3` and `e5m2`, within the `WgmmaMmaSyncOp`.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D157695
2023-08-12 12:47:45 +02:00
Guray Ozen
db11fd8bf4 [MLIR][NVVM] Change WgmmaMmaSyncOp to WgmmaMmaAsyncOp (NFC)
WgmmaMmaSyncOp is asynchronous operation. There was a typo named op. This work fixes that.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D157697
2023-08-12 09:55:11 +02:00
Guray Ozen
18e161f9e1 [MLIR][NVVM] Introduction of the wgmma.mma_async Op
This work introduces the `wgmma.mma_async` Op along PTX generation using `BasicPtxBuilderOpInterface`. The Op is designed to execute the matrix multiply-and-accumulate operation across a warpgroup (128 threads). It's important to note that this operation works for devices with the sm_90a capability.

The matrix multiply-and-accumulate operation can take one of the following forms. In both cases, matrix D is referred to as the accumulator:
	D = A * B + D 	: Result is added to the accumulator matrix D.
	D = A * B 		: The input from the accumulator matrix D is not utilized.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D157370
2023-08-09 23:08:00 +02:00
Fabian Mora
211c9752c8 [mlir][NVVM] Adds the NVVM target attribute.
**For an explanation of these patches see D154153.**

Commit message:
This patch adds the NVVM target attribute for serializing GPU modules into
strings containing cubin.

Depends on D154113 and D154100 and D154097

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D154117
2023-08-08 19:21:36 +00:00
Mehdi Amini
4529797a9d Add a generic "convert-to-llvm" pass delegating to an interface
The multiple -convert-XXX-to-llvm passes are really nice testing tools for
individual dialects, but the expectation is that a proper conversion should
assemble the conversion patterns using `populateXXXToLLVMConversionPatterns()
APIs. However most customers just chain the conversion passes by convenience.

This pass makes it composable more transparently to assemble the required
patterns for conversion to LLVM dialect by using an interface.
The Pass will scan the input and collect all the dialect present, and for
those who implement the `ConvertToLLVMPatternInterface` it will use it to
populate the conversion pattern, and possible the conversion target.

Since these conversions can involve intermediate dialects, or target other
dialects than LLVM (for example AVX or NVVM), this pass can't statically
declare the required `getDependentDialects()` before the pass pipeline
begins. This is worked around by using an extension in the dialectRegistry
that will be invoked for every new loaded dialects in the context. This
allows to lookup the interface ahead of time and use it to query the
dependent dialects.

Differential Revision: https://reviews.llvm.org/D157183
2023-08-07 18:46:08 -07:00
Guray Ozen
28555793b1 [mlir][nvvm] Add cp.async.bulk.tensor.shared.cluster.global
This work introduce `cp.async.bulk.tensor.shared.cluster.global` in NVVM dialect that executes load using TMA.

Depends on D155056

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155060
2023-07-17 17:10:39 +02:00
Guray Ozen
2c5739675c [mlir][nvgpu] Implement nvgpu.device_async_copy by NVVMToLLVM Pass
`nvgpu.device_async_copy` is lowered into `cp.async` PTX instruction. However, NVPTX backend does not support its all mode especially when zero padding is needed. Therefore, current MLIR implementation genereates inline assembly for that.

This work simplifies PTX generation for `nvgpu.device_async_copy`, and implements it by `NVVMToLLVM` Pass.

Depends on D154060

Reviewed By: nicolasvasilache, manishucsd

Differential Revision: https://reviews.llvm.org/D154345
2023-07-11 12:18:28 +02:00
Tres Popp
c1fa60b4cd [mlir] Update method cast calls to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Context:

* https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…"
* Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This follows a previous patch that updated calls
`op.cast<T>()-> cast<T>(op)`. However some cases could not handle an
unprefixed `cast` call due to occurrences of variables named cast, or
occurring inside of class definitions which would resolve to the method.
All C++ files that did not work automatically with `cast<T>()` are
updated here to `llvm::cast` and similar with the intention that they
can be easily updated after the methods are removed through a
find-replace.

See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
for the clang-tidy check that is used and then update printed
occurrences of the function to include `llvm::` before.

One can then run the following:
```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
                 -export-fixes /tmp/cast/casts.yaml mlir/*\
                 -header-filter=mlir/ -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```

Differential Revision: https://reviews.llvm.org/D150348
2023-05-12 11:21:30 +02:00
Quinn Dawkins
985f7ff632 [mlir][gpu] Add support for integer types in gpu.subgroup_mma ops
The signedness is carried by `!gpu.mma_matrix` types to most closely
match the Cooperative Matrix specification which determines signedness
with the type (and sometimes the operation).

See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html

To handle the lowering from vector to gpu, ops such as arith.extsi are
pattern matched next to `vector.transfer_read` and `vector.contract` to
determine the signedness of the matrix type.

Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D143223
2023-02-07 17:58:01 -05:00
Kazu Hirata
0a81ace004 [mlir] Use std::optional instead of llvm::Optional (NFC)
This patch replaces (llvm::|)Optional< with std::optional<.  I'll post
a separate patch to remove #include "llvm/ADT/Optional.h".

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14 01:25:58 -08:00
Kazu Hirata
a1fe1f5f77 [mlir] Add #include <optional> (NFC)
This patch adds #include <optional> to those files containing
llvm::Optional<...> or Optional<...>.

I'll post a separate patch to actually replace llvm::Optional with
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-13 21:05:06 -08:00
Ramkumar Ramachandra
22426110c5 mlir/tblgen: use std::optional in generation
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.

A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.

See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>

Differential Revision: https://reviews.llvm.org/D138934
2022-12-17 11:13:26 +01:00
Kazu Hirata
1a36588ec6 [mlir] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 18:50:27 -08:00
Guray Ozen
3ac17449cf [mlir][nvvm] Introduce performance tuning directives
PTX programming models provides some performance tuning directives; see https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives

The downstream compiler namely `ptxas` leverages these information for better register allocation or to handle other resource management that improves the performance.

This revision introduce all the kernel based directives to MLIR's NVVM dialect. The list is below
```
maxnreg			-> 	max register per thread in CTA
maxntid			-> 	max threads per CTA
reqntid			-> 	exact number of threads per CTA
minnctapersm		-> 	min CTA per SM
```

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D136931
2022-10-28 14:02:40 +02:00
Kazu Hirata
33b9304435 Use llvm::is_contained (NFC) 2022-08-27 21:21:00 -07:00
Jeff Niu
58a47508f0 (Reland) [mlir] Switch segment size attributes to DenseI32ArrayAttr
This reland includes changes to the Python bindings.

Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.

Depends on D131801

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D131803
2022-08-12 19:44:52 -04:00
Alex Zinenko
e8e718fa4b Revert "[mlir] Switch segment size attributes to DenseI32ArrayAttr"
This reverts commit 30171e76f0.

Breaks Python tests in MLIR, missing C API and Python changes.
2022-08-12 10:22:47 +02:00
Jeff Niu
30171e76f0 [mlir] Switch segment size attributes to DenseI32ArrayAttr
Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.

Depends on D131738

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D131702
2022-08-11 20:56:45 -04:00
Kazu Hirata
c27d815249 [mlir] Use value instead of getValue (NFC) 2022-07-14 00:19:59 -07:00
Kazu Hirata
491d27013d [mlir] Use has_value instead of hasValue (NFC) 2022-07-13 00:57:02 -07:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Kazu Hirata
6d5fc1e3d5 [mlir] Don't use Optional::getValue (NFC) 2022-06-20 23:20:25 -07:00
Mogball
d883a02a7c [mlir][ods] Remove StructAttr
Depends on D127373

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D127375
2022-06-21 01:10:05 +00:00
Kazu Hirata
037f09959a [mlir] Don't use Optional::hasValue (NFC) 2022-06-20 11:22:37 -07:00
Jacques Pienaar
8df54a6a03 [mlir] Update accessors to prefixed form (NFC)
Follow up from flipping dialects to both, flip accessor used to prefixed
variant ahead to flipping from _Both to _Prefixed. This just flips to
the accessors introduced in the preceding change which are just prefixed
forms of the existing accessor changed from.

Mechanical change using helper script
https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.
2022-06-18 17:53:22 -07:00
Mogball
ba79bb4973 [mlir][nvvm] Change MMAShapeAttr to AttrDef
MMAShapeAttr was a StructAttr

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D127348
2022-06-09 22:14:45 +00:00
Chris Lattner
1d7b5cd5bf [ParseResult] Mark this as LLVM_NODISCARD (like LogicalResult) and fix issues.
There are a lot of cases where we accidentally ignored the result of some
parsing hook.  Mark ParseResult as LLVM_NODISCARD just like ParseResult is.
This exposed some stuff to clean up, so do.

Differential Revision: https://reviews.llvm.org/D125549
2022-05-13 16:28:53 +01:00
Thomas Raoux
09fc685ce6 [mlir][nvvm] Add attribute to nvvm.cpAsyncOp to control l1 bypass
Add attribute to be able to generate the intrinsic version of async copy
generating a copy with l1 bypass. This correspond to
cp.async.cg.shared.global in ptx.

Differential Revision: https://reviews.llvm.org/D125241
2022-05-09 19:34:48 +00:00
Christopher Bate
22c6e7b277 [mlir][nvvm] Fix support for tf32 data type in mma.sync
The NVVM dialect test coverage for all possible type/shape combinations
in the `nvvm.mma.sync` op is mostly complete. However, there were tests
missing for TF32 datatype support. This change adds tests for the one
relevant shape/type combination. This uncovered a small bug in the op
verifier, which this change also fixes.

Differential Revision: https://reviews.llvm.org/D124975
2022-05-05 11:02:03 -06:00
Adrian Kuegel
72fe439a4e [mlir] Fix 1 ClangTidyPerformance finding (NFC) 2022-04-05 09:29:35 +02:00
Christopher Bate
3be7c28917 [mlir][NVVM] Add support for nvvm mma.sync ops
This patch adds MLIR NVVM support for the various NVPTX `mma.sync`
operations. There are a number of possible data type, shape,
and other attribute combinations supported by the operation, so a
custom assebmly format is added and attributes are inferred where
possible.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D122410
2022-03-25 17:28:05 +00:00