clang-p2996

Author	SHA1	Message	Date
Justin Fargnoli	35d55f2894	[NFC][mlir] Reorder `declarePromisedInterface()` operands (#86628 ) Reorder the template operands of `declarePromisedInterface()` to match `declarePromisedInterfaces()`.	2024-03-27 10:30:17 -07:00
Guray Ozen	8819f87998	[MLIR][NVVM] Add barrier.arrive (#85412 ) PR adds `nvvm.barrier.arrive` Op. It is useful op for producer consumer modeling.	2024-03-19 16:51:32 +01:00
Guray Ozen	b5d694ba14	[mlir][nvvm] Introduce `nvvm.barrier` OP (#81487 ) This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0. The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model. I think we can deprecate `nvvm.barrier0` in favor of the more generic `nvvm.barrier`. ``` // Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA) nvvm.barrier // Synchronize all threads using the 3rd barrier resource. nvvm.barrier id = 3 // Synchronize %numberOfThreads threads using the 3rd barrier resource. nvvm.barrier id = 3 number_of_threads = %numberOfThreads ```	2024-02-14 08:28:45 +01:00
Rishi Surendran	fa6850a998	[mlir][nvvm]Add support for grid_constant attribute on LLVM function arguments (#78228 ) Add support for attribute nvvm.grid_constant on LLVM function arguments. The attribute can be attached only to arguments of type llvm.ptr that have llvm.byval attribute. Generate LLVM metadata for functions with nvvm.grid_constant arguments. The metadata node is a list of integers, where each integer n denotes that the nth parameter has the grid_constant annotation (numbering from 1). The generated metadata node will be handled by NVVM compiler. See https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties for documentation on grid_constant property. This patch also adds convertParameterAttr to LLVMTranslationDialectInterface for supporting the translation of derived dialect attributes on function parameters	2024-02-12 13:16:59 -08:00
Guray Ozen	12c241b365	[MLIR][NVVM] Explicit Data Type for Output in `wgmma.mma_async` (#78713 ) The current implementation of `nvvm.wgmma.mma_async` Op deduces the data type of the output matrix from the data type of struct member, which can be non-intuitive, especially in cases where types like `2xf16` are packed into `i32`. This PR addresses this issue by improving the Op to include an explicit data type for the output matrix. The modified Op now includes an explicit data type for Matrix-D (<f16>), and looks as follows: ``` %result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ... nvvm.wgmma.mma_async %descA, %descB, %result, #nvvm.shape<m = 64, n = 32, k = 16>, D [<f16>, #nvvm.wgmma_scale_out<zero>], A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>], B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>] ```	2024-01-22 08:37:20 +01:00
Guray Ozen	2aec7083ad	[mlir][gpu] Use DenseI32Array for NVVM's maxntid and reqntid (NFC) (#77466 )	2024-01-09 16:44:25 +01:00
Adam Paszke	85b2327192	[mlir][nvvm] Fix the PTX lowering of wgmma.mma_async (#76150 )	2023-12-22 14:46:34 +01:00
Guray Ozen	80ff67be81	[mlir][nvvm] Introduce `nvvm.fence.proxy` (#74057 ) This PR introduce `nvvm.fence.proxy` OP for the following cases: ``` nvvm.fence.proxy { kind = #nvvm.proxy_kind<alias>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.global>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cta>} nvvm.fence.proxy { kind = #nvvm.proxy_kind<async.shared>, space = #nvvm.shared_space<cluster>} ```	2023-12-04 16:49:07 +01:00
Guray Ozen	68433f6b27	[mlir][nvvm] Introduce `setmaxregister.sync.aligned` Op (#73780 ) This PR introduce `setmaxregister.sync.aligned` Op to increase or decrease the register size. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg	2023-11-29 15:26:30 +01:00
Guray Ozen	9ceea08859	[mlir] `im2col` & `l2cache` on cp.async.bulk.tensor.shared.cluster.global` (#72967 ) PR adds support of `im2col` and `l2cache` to `cp.async.bulk.tensor.shared.cluster.global`. The Op is now supports all the traits of the corresponding PTX instruction. The current structure of this operation looks somewhat like below. The PR also simplifies types so we don't need to write obvious types after `:` anymore. ``` nvvm.cp.async.bulk.tensor.shared.cluster.global %dest, %tmaDescriptor, %barrier, box[%crd0,%crd1,%crd2,%crd3,%crd4] im2col[%off0,%off1,%off2] <-- PR introduces multicast_mask = %ctamask l2_cache_hint = %cacheHint <-- PR introduces : !llvm.ptr<3>, !llvm.ptr ```	2023-11-22 16:08:09 +01:00
Guray Ozen	5316d19ed5	[mlir][nvvm] Introduce `nvvm.stmatrix` Op (#69467 ) This PR adds `nvvm.stmatrix` Op to NVVM dialect. The Op collectively store one or more matrices across all threads in a warp to the given address location in shared memory.	2023-10-19 10:26:28 +02:00
Guray Ozen	dc27d21890	[mlir][nvvm] Use NVVMMemorySpace instead of hardcoded values (nfc)	2023-10-18 15:17:51 +02:00
Markus Böck	9779a731a6	[mlir] Fix some cmake dependencies in LLVMIR Dialect (#66956 ) While looking into reducing needless interdependencies between upstream MLIR dialects and passes, I discovered that the ROCDL Dialect redundantely uses links in `VectorToLLVM` conversion pass when it actually requires just the LLVM Dialect. Furthermore, after a build failure, I ran `ninja -t missingdeps` which revealed that the NVVM Dialect depends on headers of the GPU dialect (`211c9752c8/mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h (L18)`) without stating so in CMake. This causes flaky builds as it is not guaranteed that the header exists prior to the dialect being compiled.	2023-09-22 23:32:20 +02:00
JingZe Cui	10477be8a3	Add TMA Store operation to the NVVM dialect Reviewed By: guraypp Differential Revision: https://reviews.llvm.org/D159535	2023-09-21 08:23:33 -07:00
Guray Ozen	6360d09531	[MLIR][NVVM] Fix the register number of predicate (#65970 ) The register number of predicate is calculated incorrectly. This PR fixes that.	2023-09-13 15:04:31 +02:00
Fabian Mora	d0e6fd99aa	[mlir] Extend the promise interface mechanism This patch pairs a promised interface with the object (Op/Attr/Type/Dialect) requesting the promise, ie: ``` declarePromisedInterface<MyAttr, MyInterface>(); ``` Allowing to make fine grained promises. It also adds a mechanism to query if `Op/Attr/Type` has an specific promise returning true if the promise is there or if an implementation has been added. Finally it adds a couple of `Attr\|TypeConstraints` that can be used in ODS to query if the promise or an implementation is there. This patch tries to solve 2 issues: 1. Different entities cannot use the same promise. ``` declarePromisedInterface<MyInterface>(); // Resolves a promise. MyAttr1::attachInterface<MyInterface>(ctx); // Doesn't resolves a promise, as the previous attachment removed the promise. MyAttr2::attachInterface<MyInterface>(ctx); ``` 2. Is not possible to query if a promise has been declared. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D158464	2023-09-05 09:55:27 -04:00
Kazu Hirata	b95028ee2d	[mlir] Use llvm::is_contained (NFC)	2023-09-02 12:12:12 -07:00
Guray Ozen	a5925eee5a	[MLIR][NVVM] Fix register mapping in `wgmma.mma_async` WgmmaMmaAsync Op generates `wgmma.mma_async` PTX instruction that uses the same registers as read and write with mapping. Therefore, the registers count needs to be increased 2 times for the following registers. This works changes this: ``` llvm.inline_asm has_side_effects asm_dialect = att "{wgmma.mma_async... {$0, $1, $2, $3, $4}, $5, $6, p", "=f,=f,=f,=f,0,1,2,3,l,l" ``` Into this one below. The only different is the number of registers ($8 and $9) that comes after read/write. ``` llvm.inline_asm has_side_effects asm_dialect = att "{wgmma.mma_async... {$0, $1, $2, $3, $4}, $8, $9, p", "=f,=f,=f,=f,0,1,2,3,l,l" ``` Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D157843	2023-08-14 14:08:49 +02:00
Guray Ozen	5c3150e584	[MLIR][NVVM] Introduce WGMMA Types This work introduces `WGMMATypes` attributes for the `WgmmaMmaSyncOp`. This op, having been recently added to MLIR, previously used `MMATypes`. However, there arises a disparity in supported types between `MmaOp` and `WgmmaMmaSyncOp`. To address this discrepancy more effectively, a new set of attributes is introduced. Furthermore, this patch refines and optimizing the verification mechanisms of `WgmmaMmaSyncOp` Op. It also adds support for f8 types, including `e4m3` and `e5m2`, within the `WgmmaMmaSyncOp`. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D157695	2023-08-12 12:47:45 +02:00
Guray Ozen	db11fd8bf4	[MLIR][NVVM] Change WgmmaMmaSyncOp to WgmmaMmaAsyncOp (NFC) WgmmaMmaSyncOp is asynchronous operation. There was a typo named op. This work fixes that. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D157697	2023-08-12 09:55:11 +02:00
Guray Ozen	18e161f9e1	[MLIR][NVVM] Introduction of the `wgmma.mma_async` Op This work introduces the `wgmma.mma_async` Op along PTX generation using `BasicPtxBuilderOpInterface`. The Op is designed to execute the matrix multiply-and-accumulate operation across a warpgroup (128 threads). It's important to note that this operation works for devices with the sm_90a capability. The matrix multiply-and-accumulate operation can take one of the following forms. In both cases, matrix D is referred to as the accumulator: D = A * B + D : Result is added to the accumulator matrix D. D = A * B : The input from the accumulator matrix D is not utilized. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D157370	2023-08-09 23:08:00 +02:00
Fabian Mora	211c9752c8	[mlir][NVVM] Adds the NVVM target attribute. For an explanation of these patches see D154153. Commit message: This patch adds the NVVM target attribute for serializing GPU modules into strings containing cubin. Depends on D154113 and D154100 and D154097 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154117	2023-08-08 19:21:36 +00:00
Mehdi Amini	4529797a9d	Add a generic "convert-to-llvm" pass delegating to an interface The multiple -convert-XXX-to-llvm passes are really nice testing tools for individual dialects, but the expectation is that a proper conversion should assemble the conversion patterns using `populateXXXToLLVMConversionPatterns() APIs. However most customers just chain the conversion passes by convenience. This pass makes it composable more transparently to assemble the required patterns for conversion to LLVM dialect by using an interface. The Pass will scan the input and collect all the dialect present, and for those who implement the `ConvertToLLVMPatternInterface` it will use it to populate the conversion pattern, and possible the conversion target. Since these conversions can involve intermediate dialects, or target other dialects than LLVM (for example AVX or NVVM), this pass can't statically declare the required `getDependentDialects()` before the pass pipeline begins. This is worked around by using an extension in the dialectRegistry that will be invoked for every new loaded dialects in the context. This allows to lookup the interface ahead of time and use it to query the dependent dialects. Differential Revision: https://reviews.llvm.org/D157183	2023-08-07 18:46:08 -07:00
Guray Ozen	28555793b1	[mlir][nvvm] Add `cp.async.bulk.tensor.shared.cluster.global` This work introduce `cp.async.bulk.tensor.shared.cluster.global` in NVVM dialect that executes load using TMA. Depends on D155056 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155060	2023-07-17 17:10:39 +02:00
Guray Ozen	2c5739675c	[mlir][nvgpu] Implement `nvgpu.device_async_copy` by NVVMToLLVM Pass `nvgpu.device_async_copy` is lowered into `cp.async` PTX instruction. However, NVPTX backend does not support its all mode especially when zero padding is needed. Therefore, current MLIR implementation genereates inline assembly for that. This work simplifies PTX generation for `nvgpu.device_async_copy`, and implements it by `NVVMToLLVM` Pass. Depends on D154060 Reviewed By: nicolasvasilache, manishucsd Differential Revision: https://reviews.llvm.org/D154345	2023-07-11 12:18:28 +02:00
Tres Popp	c1fa60b4cd	[mlir] Update method cast calls to function calls The MLIR classes Type/Attribute/Operation/Op/Value support cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast functionality in addition to defining methods with the same name. This change begins the migration of uses of the method to the corresponding function call as has been decided as more consistent. Note that there still exist classes that only define methods directly, such as AffineExpr, and this does not include work currently to support a functional cast/isa call. Context: * https://mlir.llvm.org/deprecation/ at "Use the free function variants for dyn_cast/cast/isa/…" * Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443 Implementation: This follows a previous patch that updated calls `op.cast<T>()-> cast<T>(op)`. However some cases could not handle an unprefixed `cast` call due to occurrences of variables named cast, or occurring inside of class definitions which would resolve to the method. All C++ files that did not work automatically with `cast<T>()` are updated here to `llvm::cast` and similar with the intention that they can be easily updated after the methods are removed through a find-replace. See https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check for the clang-tidy check that is used and then update printed occurrences of the function to include `llvm::` before. One can then run the following: ``` ninja -C $BUILD_DIR clang-tidy run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-,misc-cast-functions'\ -export-fixes /tmp/cast/casts.yaml mlir/\ -header-filter=mlir/ -fix rm -rf $BUILD_DIR/tools/mlir/*/.inc ``` Differential Revision: https://reviews.llvm.org/D150348	2023-05-12 11:21:30 +02:00
Quinn Dawkins	985f7ff632	[mlir][gpu] Add support for integer types in gpu.subgroup_mma ops The signedness is carried by `!gpu.mma_matrix` types to most closely match the Cooperative Matrix specification which determines signedness with the type (and sometimes the operation). See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html To handle the lowering from vector to gpu, ops such as arith.extsi are pattern matched next to `vector.transfer_read` and `vector.contract` to determine the signedness of the matrix type. Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D143223	2023-02-07 17:58:01 -05:00
Kazu Hirata	0a81ace004	[mlir] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 01:25:58 -08:00
Kazu Hirata	a1fe1f5f77	[mlir] Add #include <optional> (NFC) This patch adds #include <optional> to those files containing llvm::Optional<...> or Optional<...>. I'll post a separate patch to actually replace llvm::Optional with std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-13 21:05:06 -08:00
Ramkumar Ramachandra	22426110c5	mlir/tblgen: use std::optional in generation This is part of an effort to migrate from llvm::Optional to std::optional. This patch changes the way mlir-tblgen generates .inc files, and modifies tests and documentation appropriately. It is a "no compromises" patch, and doesn't leave the user with an unpleasant mix of llvm::Optional and std::optional. A non-trivial change has been made to ControlFlowInterfaces to split one constructor into two, relating to a build failure on Windows. See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716 Signed-off-by: Ramkumar Ramachandra <r@artagnon.com> Differential Revision: https://reviews.llvm.org/D138934	2022-12-17 11:13:26 +01:00
Kazu Hirata	1a36588ec6	[mlir] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 18:50:27 -08:00
Guray Ozen	3ac17449cf	[mlir][nvvm] Introduce performance tuning directives PTX programming models provides some performance tuning directives; see https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#performance-tuning-directives The downstream compiler namely `ptxas` leverages these information for better register allocation or to handle other resource management that improves the performance. This revision introduce all the kernel based directives to MLIR's NVVM dialect. The list is below ``` maxnreg -> max register per thread in CTA maxntid -> max threads per CTA reqntid -> exact number of threads per CTA minnctapersm -> min CTA per SM ``` Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D136931	2022-10-28 14:02:40 +02:00
Kazu Hirata	33b9304435	Use llvm::is_contained (NFC)	2022-08-27 21:21:00 -07:00
Jeff Niu	58a47508f0	(Reland) [mlir] Switch segment size attributes to DenseI32ArrayAttr This reland includes changes to the Python bindings. Switch variadic operand and result segment size attributes to use the dense i32 array. Dense integer arrays were introduced primarily to represent index lists. They are a better fit for segment sizes than dense elements attrs. Depends on D131801 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D131803	2022-08-12 19:44:52 -04:00
Alex Zinenko	e8e718fa4b	Revert "[mlir] Switch segment size attributes to DenseI32ArrayAttr" This reverts commit `30171e76f0`. Breaks Python tests in MLIR, missing C API and Python changes.	2022-08-12 10:22:47 +02:00
Jeff Niu	30171e76f0	[mlir] Switch segment size attributes to DenseI32ArrayAttr Switch variadic operand and result segment size attributes to use the dense i32 array. Dense integer arrays were introduced primarily to represent index lists. They are a better fit for segment sizes than dense elements attrs. Depends on D131738 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D131702	2022-08-11 20:56:45 -04:00
Kazu Hirata	c27d815249	[mlir] Use value instead of getValue (NFC)	2022-07-14 00:19:59 -07:00
Kazu Hirata	491d27013d	[mlir] Use has_value instead of hasValue (NFC)	2022-07-13 00:57:02 -07:00
Kazu Hirata	3b7c3a654c	Revert "Don't use Optional::hasValue (NFC)" This reverts commit `aa8feeefd3`.	2022-06-25 11:56:50 -07:00
Kazu Hirata	aa8feeefd3	Don't use Optional::hasValue (NFC)	2022-06-25 11:55:57 -07:00
Kazu Hirata	6d5fc1e3d5	[mlir] Don't use Optional::getValue (NFC)	2022-06-20 23:20:25 -07:00
Mogball	d883a02a7c	[mlir][ods] Remove StructAttr Depends on D127373 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127375	2022-06-21 01:10:05 +00:00
Kazu Hirata	037f09959a	[mlir] Don't use Optional::hasValue (NFC)	2022-06-20 11:22:37 -07:00
Jacques Pienaar	8df54a6a03	[mlir] Update accessors to prefixed form (NFC) Follow up from flipping dialects to both, flip accessor used to prefixed variant ahead to flipping from _Both to _Prefixed. This just flips to the accessors introduced in the preceding change which are just prefixed forms of the existing accessor changed from. Mechanical change using helper script https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp and clang-format.	2022-06-18 17:53:22 -07:00
Mogball	ba79bb4973	[mlir][nvvm] Change MMAShapeAttr to AttrDef MMAShapeAttr was a StructAttr Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127348	2022-06-09 22:14:45 +00:00
Chris Lattner	1d7b5cd5bf	[ParseResult] Mark this as LLVM_NODISCARD (like LogicalResult) and fix issues. There are a lot of cases where we accidentally ignored the result of some parsing hook. Mark ParseResult as LLVM_NODISCARD just like ParseResult is. This exposed some stuff to clean up, so do. Differential Revision: https://reviews.llvm.org/D125549	2022-05-13 16:28:53 +01:00
Thomas Raoux	09fc685ce6	[mlir][nvvm] Add attribute to nvvm.cpAsyncOp to control l1 bypass Add attribute to be able to generate the intrinsic version of async copy generating a copy with l1 bypass. This correspond to cp.async.cg.shared.global in ptx. Differential Revision: https://reviews.llvm.org/D125241	2022-05-09 19:34:48 +00:00
Christopher Bate	22c6e7b277	[mlir][nvvm] Fix support for tf32 data type in mma.sync The NVVM dialect test coverage for all possible type/shape combinations in the `nvvm.mma.sync` op is mostly complete. However, there were tests missing for TF32 datatype support. This change adds tests for the one relevant shape/type combination. This uncovered a small bug in the op verifier, which this change also fixes. Differential Revision: https://reviews.llvm.org/D124975	2022-05-05 11:02:03 -06:00
Adrian Kuegel	72fe439a4e	[mlir] Fix 1 ClangTidyPerformance finding (NFC)	2022-04-05 09:29:35 +02:00
Christopher Bate	3be7c28917	[mlir][NVVM] Add support for nvvm mma.sync ops This patch adds MLIR NVVM support for the various NVPTX `mma.sync` operations. There are a number of possible data type, shape, and other attribute combinations supported by the operation, so a custom assebmly format is added and attributes are inferred where possible. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D122410	2022-03-25 17:28:05 +00:00

1 2

93 Commits