clang-p2996

Author	SHA1	Message	Date
Matthias Springer	2da417e7f6	[mlir][GPU] gpu.printf: Do not emit duplicate format strings (#110504 ) Even if the same format string is used multiple times, emit just one `LLVM:GlobalOp`.	2024-10-01 09:12:08 +02:00
Benjamin Kramer	ac11945386	[mlir][GPU] block_id has the grid size as its range	2024-09-17 18:38:04 +02:00
Krzysztof Drewniak	a953982cb7	[mlir][GPU] Plumb range information through the NVVM lowerings (#107659 ) Update the GPU to NVVM lowerings to correctly propagate range information on IDs and dimension queries, etiher from known_{block,grid}_size attributes or from `upperBound` annotations on the operations themselves.	2024-09-13 12:07:51 -05:00
Matthias Springer	7030280329	[mlir][GPU] Improve `gpu.module` op implementation (#102866 ) - Replace hand-written parser/printer with auto-generated assembly format. - Remove implicit `gpu.module_end` terminator and use the `NoTerminator` trait instead. (Same as `builtin.module`.) - Turn the region into a graph region. (Same as `builtin.module`.)	2024-08-13 09:37:36 +02:00
runseny	f6431f0c52	[MLIR][GPUToNVVM] support fastMath and other non-supported mathOp (#99890 ) Support fastMath and other non-supported mathOp which only require float operands and call libdevice function directly to nvvm. 1. lowering mathOp with fastMath attribute to correct libdevice intrinsic. 2. some mathOp in math dialect has been lowered to libdevice now, but it doesn't cover all mathOp. so this mr lowers all the remaining mathOp which only require float operands.	2024-07-25 13:54:58 +02:00
Matthias Springer	3f33d2f3ca	[mlir][GPUToNVVM] Fix memref function args/results (#96392 ) The `gpu.func` op lowering accounts for memref arguments/results (both "normal" and bare-pointer supported), but the `gpu.return` op lowering did not. The lowering produced invalid IR that did not verify. This commit uses the same lowering strategy as for `func.return` in the `gpu.return` lowering. (The C++ implementation is copied. We may want to share some code between `func` and `gpu` lowerings in the future.)	2024-06-23 09:51:12 +02:00
Krzysztof Drewniak	43fd4c49bd	[mlir][GPU] Improve handling of GPU bounds (#95166 ) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-06-17 23:47:38 -05:00
klensy	a5985ca51d	[mlir][test] Fix filecheck annotation typos [2/n] (#93476 ) Few more fixes previous: https://github.com/llvm/llvm-project/pull/92897 pr Issues from https://github.com/llvm/llvm-project/issues/93154 unfixed. --------- Co-authored-by: klensy <nightouser@gmail.com>	2024-06-14 17:16:02 +02:00
Johannes Reifferscheid	a6a9215b93	Lower shuffle to single-result form if possible. (#84321 ) We currently always lower shuffle to the struct-returning variant. I saw some cases where this survived all the way through ptx, resulting in increased register usage. The easiest fix is to simply lower to the single-result version when the predicate is unused.	2024-03-21 10:33:49 +01:00
David Majnemer	0039a2ff4f	[mlir][gpu] Add support for lowering math.erf to __nv_erf (#79848 )	2024-01-29 19:35:23 +00:00
Guray Ozen	2aec7083ad	[mlir][gpu] Use DenseI32Array for NVVM's maxntid and reqntid (NFC) (#77466 )	2024-01-09 16:44:25 +01:00
Guray Ozen	763109e346	[mlir][gpu] Use `known_block_size` to set `maxntid` for NVVM target (#77301 ) Setting thread block size with `maxntid` on the kernel has great performance benefits. In this way, downstream PTX compiler can do better register allocation. MLIR's `gpu.launch` and `gpu.launch_func` already has an attribute (`known_block_size`) that keeps the thread block size when it is known. This PR simply uses this attribute to set `maxntid`.	2024-01-08 14:49:19 +01:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `dd09221a29` and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Jakub Kuderski	dd09221a29	Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `e0aac8c88d`. I'm seeing some nvidia integration test failures: https://lab.llvm.org/buildbot/#/builders/61/builds/52334.	2023-11-27 11:29:23 -05:00
Jakub Kuderski	e0aac8c88d	[mlir][gpu] Align reduction operations with vector combining kinds (#73423 ) The motivation for this change is explained in https://github.com/llvm/llvm-project/issues/72354. Before this change, we could not tell between signed/unsigned minimum/maximum and NaN treatment for floating point values. The mapping of old reduction operations to the new ones is as follows: * `min` --> `minsi` for ints, `minf` for floats * `max` --> `maxsi` for ints, `maxf` for floats New reduction kinds not represented in the old enum: `minui`, `maxui`, `minimumf`, `maximumf`. As a next step, I would like to have a common definition of combining kinds used by the `vector` and `gpu` dialects. Separately, the GPU to SPIR-V lowering does not yet properly handle zero and NaN values -- the behavior of floating point min/max group reductions is not specified by the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-11-27 11:19:20 -05:00
Christian Ulmann	02307a1444	[MLIR][GPUToNVVM] Remove typed pointer support (#70861 ) This commit removes the support for lowering GPU to NVVM dialect with typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-01 08:40:32 +01:00
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Christian Ulmann	484668c759	Reland "[MLIR][LLVM] Change addressof builders to use opaque pointers" (#69292 ) This relands `fbde19a664`, which was broken due to incorrect GEP element type creation. This commit changes the builders of the `llvm.mlir.addressof` operations to no longer produce typed pointers. As a consequence, a GPU to NVVM pattern had to be updated, that still relied on typed pointers.	2023-10-17 11:33:45 +02:00
Christian Ulmann	9397e5f581	Revert "[MLIR][LLVM] Change addressof builders to use opaque pointers (#69215 )" This reverts commit `fbde19a664` due to breaking integration tests.	2023-10-17 06:31:48 +00:00
Christian Ulmann	fbde19a664	[MLIR][LLVM] Change addressof builders to use opaque pointers (#69215 ) This commit changes the builders of the `llvm.mlir.addressof` operations to no longer produce typed pointers. As a consequence, a GPU to NVVM pattern and the toy example LLVM lowerings had to be updated, as they still relied on typed pointers.	2023-10-17 07:55:00 +02:00
Adrian Kuegel	640b95da80	[mlir][GPU] Lower arith.remf to GPU intrinsic. Differential Revision: https://reviews.llvm.org/D159422	2023-09-04 13:38:45 +02:00
Nicolas Vasilache	888717e853	[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass in a declarative fashion that provides a much better understanding and control. Differential Revision: https://reviews.llvm.org/D157617	2023-08-10 15:30:48 +00:00
Christopher Bate	14858cf05d	[mlir][Conversion/GPUCommon] Fix bug in conversion of `math` ops The common GPU operation transformation that lowers `math` operations to function calls in the `gpu-to-nvvm` and `gpu-to-rocdl` passes handles `vector` types by applying the function to each scalar and returning a new vector. However, there was a typo that results in incorrectly accumulating the result vector, and the rewrite returns an `llvm.mlir.undef` result instead of the correct vector. A patch is added and tests are strengthened. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D154269	2023-07-03 13:26:51 -06:00
Uday Bondhugula	27ccf0f407	[MLIR] Provide bare pointer memref lowering option on gpu-to-nvvm pass Provide the bare pointer memref lowering option on gpu-to-nvvm pass. This is needed whenever we lower memrefs on the host function side and the kernel calls on the host-side (gpu-to-llvm) with the bare ptr convention. The GPU module side of the lowering should also "align" and use the bare pointer convention. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D152480	2023-06-18 22:53:50 +05:30
Fabian Mora	041f1abee1	[mlir][memref] Fix num elements in lowering of memref.alloca op to LLVM Fixes a mistake in the lowering of memref.alloca to llvm.alloca, as llvm.alloca uses the number of elements to allocate in the stack and not the size in bytes. Reference: LLVM IR: https://llvm.org/docs/LangRef.html#alloca-instruction LLVM MLIR: https://mlir.llvm.org/docs/Dialects/LLVM/#llvmalloca-mlirllvmallocaop Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D150705	2023-05-22 16:23:00 +00:00
Markus Böck	0e5aeae6f5	[mlir][GPUToLLVM] Add support for emitting opaque pointers Part of https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179 This patch adds the new pass option `use-opaque-pointers` to the GPU to LLVM lowerings (including ROCD and NVVM) and adapts the code to support using opaque pointers in addition to typed pointers. The required changes mostly boil down to avoiding `getElementType` and specifying base types in GEP and Alloca. In the future opaque pointers will be the only supported model, hence tests have been ported to using opaque pointers by default. Additional regression tests for typed-pointers have been added to avoid breaking existing clients. Note: This does not yet port the `GpuToVulkan` passes. Differential Revision: https://reviews.llvm.org/D144448	2023-02-21 20:46:33 +01:00
Quinn Dawkins	985f7ff632	[mlir][gpu] Add support for integer types in gpu.subgroup_mma ops The signedness is carried by `!gpu.mma_matrix` types to most closely match the Cooperative Matrix specification which determines signedness with the type (and sometimes the operation). See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html To handle the lowering from vector to gpu, ops such as arith.extsi are pattern matched next to `vector.transfer_read` and `vector.contract` to determine the signedness of the matrix type. Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D143223	2023-02-07 17:58:01 -05:00
Guray Ozen	a3388f3e2a	[mlir] Introduce a pattern to lower `gpu.subgroup_reduce` to `nvvm.redux_op` This revision introduces a pattern to lower `gpu.subgroup_reduce` op into to the `nvvm.redux_sync` op. The op must be run by the entire subgroup, otherwise it is undefined behaviour. It also adds a flag and populate function, because the op is not avaiable for every gpu (sm80+), so it can be used when it is desired. Depends on D142088 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D142103	2023-01-20 13:56:23 +01:00
Goran Flegar	b7fe0d346b	Lower math.tan to __nv_tan[f] / __ocml_tan_f{32\|64} At present math.tan fails to lower for NVVM and ROCDL. Differential Revision: https://reviews.llvm.org/D141505	2023-01-12 15:26:11 +01:00
Johannes Reifferscheid	059cf735a9	Lower math.cbrt to NVVM/ROCDL. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D141270	2023-01-09 13:17:35 +01:00
Thomas Raoux	7efdc117b1	[mlir][nvvm] Add lowering of gpu.printf to nvvm When converting to nvvm lowering gpu.printf to vprintf allows us to support printing when running on cuda. Differential Revision: https://reviews.llvm.org/D141049	2023-01-06 17:29:30 +00:00
Ivan Butygin	247d8d4f7a	[mlir][gpu] Add `uniform` flag to gpu reduction ops Differential Revision: https://reviews.llvm.org/D138758	2022-12-14 13:15:58 +01:00
Navdeep Katel	3d35546cd1	Support `transpose` mode for `gpu.subgroup` WMMA ops Add support for loading, computing, and storing `gpu.subgroup` WMMA ops in transpose mode as well. Update the GPU to NVVM lowerings to support `transpose` mode and update integration tests as well. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D139021	2022-12-05 22:37:02 +05:30
Christian Sigg	b251b608b5	[mlir][gpu] Unroll ops on vectors which map to intrinsic calls Unroll ops that map to intrinsics when lowering to LLVM, because intrinsics don't support vector operands/results. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D136345	2022-10-28 10:33:38 +02:00
Jeff Niu	5c5af910fe	[mlir][LLVMIR] "Modernize" Insert/ExtractValueOp This patch "modernizes" the LLVM `insertvalue` and `extractvalue` operations to use DenseI64ArrayAttr, since they only require an array of indices and previously there was confusion about whether to use i32 or i64 arrays, and to use assembly format. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D131537	2022-08-10 12:51:11 -04:00
Jeff Niu	00f7096d31	[mlir][math] Rename math.abs -> math.absf To make room for introducing `math.absi`. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D131325	2022-08-08 11:04:58 -04:00
Alex Zinenko	610139d2d9	[mlir] replace 'emit_c_wrappers' func->llvm conversion option with a pass The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface wrappers to be emitted for every builtin function in the module. While this has been useful to bootstrap the interface, it is problematic in the longer term as it may unintentionally affect the functions that should retain their existing interface, e.g., libm functions obtained by lowering math operations (see D126964 for an example). Since D77314, we have a finer-grain control over interface generation via an attribute that avoids the problem entirely. Remove the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass that can be run in any pipeline that needs blanket emission of functions to annotate all builtin functions with the attribute before performing the usual lowering that accounts for the attribute. Reviewed By: chelini Differential Revision: https://reviews.llvm.org/D127952	2022-06-17 11:10:31 +02:00
Thomas Raoux	a6f2c2291e	[mlir][GPUToNVVM] Fix bug in mma elementwise lowering The maxf implementation of wmma elementwise op was incorrect as the operands of the select to check for Nan were swapped. Differential Revision: https://reviews.llvm.org/D127879	2022-06-15 17:23:17 +00:00
Thomas Raoux	15bcc36eed	[mlir][gpu] Move async copy ops to NVGPU and add caching hints Move async copy operations to NVGPU as they only exist on NV target and are designed to match ptx semantic. This allows us to also add more fine grain caching hint attribute to the op. Add hint to bypass L1 and hook it up to NVVM op. Differential Revision: https://reviews.llvm.org/D125244	2022-05-10 22:30:24 +00:00
Thomas Raoux	894a591cf6	[mlir][nvgpu] Move mma.sync and ldmatrix in nvgpu dialect Move gpu operation mma.sync and ldmatrix in nvgpu as they are specific to nvidia target. Differential Revision: https://reviews.llvm.org/D123824	2022-04-14 23:44:52 +00:00
Christopher Bate	77d2c815f5	[MLIR][GPU] Add GPU ops nvvm.mma.sync, nvvm.mma.ldmatrix, lane_id This change adds three new operations to the GPU dialect: gpu.mma.sync, gpu.mma.ldmatrix, and gpu.lane_id. The former two are meant to target the lower level nvvm.mma.sync and nvvm.ldmatrix instructions, respectively. Lowerings are added for the new GPU operations for conversion to NVVM. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D123647	2022-04-13 22:50:07 +00:00
River Riddle	3655069234	[mlir] Move the Builtin FuncOp to the Func dialect This commit moves FuncOp out of the builtin dialect, and into the Func dialect. This move has been planned in some capacity from the moment we made FuncOp an operation (years ago). This commit handles the functional aspects of the move, but various aspects are left untouched to ease migration: func::FuncOp is re-exported into mlir to reduce the actual API churn, the assembly format still accepts the unqualified `func`. These temporary measures will remain for a little while to simplify migration before being removed. Differential Revision: https://reviews.llvm.org/D121266	2022-03-16 17:07:03 -07:00
River Riddle	23aa5a7446	[mlir] Rename the Standard dialect to the Func dialect The last remaining operations in the standard dialect all revolve around FuncOp/function related constructs. This patch simply handles the initial renaming (which by itself is already huge), but there are a large number of cleanups unlocked/necessary afterwards: * Removing a bunch of unnecessary dependencies on Func * Cleaning up the From/ToStandard conversion passes * Preparing for the move of FuncOp to the Func dialect See the discussion at https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061 Differential Revision: https://reviews.llvm.org/D120624	2022-03-01 12:10:04 -08:00
Thomas Raoux	5ab04bc068	[mlir][gpu] Add device side async copy operations Add new operations to the gpu dialect to represent device side asynchronous copies. This also add the lowering of those operations to nvvm dialect. Those ops are meant to be low level and map directly to llvm dialects like nvvm or rocdl. We can further add higher level of abstraction by building on top of those operations. This has been discuss here: https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924 Differential Revision: https://reviews.llvm.org/D119191	2022-02-10 17:25:59 -08:00
River Riddle	ace01605e0	[mlir] Split out a new ControlFlow dialect from Standard This dialect is intended to model lower level/branch based control-flow constructs. The initial set of operations are: AssertOp, BranchOp, CondBranchOp, SwitchOp; all split out from the current standard dialect. See https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061 Differential Revision: https://reviews.llvm.org/D118966	2022-02-06 14:51:16 -08:00
harsh	e01e4c9115	Fix bugs in GPUToNVVM lowering The current lowering from GPU to NVVM does not correctly handle the following cases when lowering the gpu shuffle op. 1. When the active width is set to 32 (all lanes), then the current approach computes (1 << 32) -1 which results in poison values in the LLVM IR. We fix this by defining the active mask as (-1) >> (32 - width). 2. In the case of shuffle up, the computation of the third operand c has to be different from the other 3 modes due to the op definition in the ISA reference. (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html) Specifically, the predicate value is computed as j >= maxLane for up and j <= maxLane for all other modes. We fix this by computing maskAndClamp as 32 - width for this mode. TEST: We modify the existing test and add more checks for the up mode. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D118086	2022-01-25 03:24:14 +00:00
Mogball	aae5125550	[mlir] Replace StrEnumAttr -> EnumAttr in core dialects Removes uses of `StrEnumAttr` in core dialects Reviewed By: mehdi_amini, rriddle Differential Revision: https://reviews.llvm.org/D117514	2022-01-18 17:15:00 +00:00
Thomas Raoux	47555d73f6	[mlir][gpu] Extend shuffle op modes and add nvvm lowering Add up, down and idx modes to gpu shuffle ops, also change the mode from string to enum Differential Revision: https://reviews.llvm.org/D114188	2021-11-19 11:14:31 -08:00
thomasraoux	f309939d06	[mlir][nvvm] Remove special case ptr arithmetic lowering in gpu to nvvm Use existing helper instead of handling only a subset of indices lowering arithmetic. Also relax the restriction on the memref rank for the GPU mma ops as we can now support any rank. Differential Revision: https://reviews.llvm.org/D113383	2021-11-10 10:00:12 -08:00
thomasraoux	8a992b20db	[mlir][gpu] Add basic support to do elementwise ops on mma matrix type In order to support fusion with mma matrix type we need to be able to execute elementwise operations on them. This add an op to be able to support some basic elementwise operations. This is a is not a full solution as it only supports a limited scope or operations. Ideally we would want to be able to fuse with more kind of operations. Differential Revision: https://reviews.llvm.org/D112857	2021-11-01 11:51:19 -07:00

1 2 3

107 Commits