Commit Graph

107 Commits

Author SHA1 Message Date
Matthias Springer
2da417e7f6 [mlir][GPU] gpu.printf: Do not emit duplicate format strings (#110504)
Even if the same format string is used multiple times, emit just one
`LLVM:GlobalOp`.
2024-10-01 09:12:08 +02:00
Benjamin Kramer
ac11945386 [mlir][GPU] block_id has the grid size as its range 2024-09-17 18:38:04 +02:00
Krzysztof Drewniak
a953982cb7 [mlir][GPU] Plumb range information through the NVVM lowerings (#107659)
Update the GPU to NVVM lowerings to correctly propagate range
information on IDs and dimension queries, etiher from
known_{block,grid}_size attributes or from `upperBound` annotations on
the operations themselves.
2024-09-13 12:07:51 -05:00
Matthias Springer
7030280329 [mlir][GPU] Improve gpu.module op implementation (#102866)
- Replace hand-written parser/printer with auto-generated assembly
format.
- Remove implicit `gpu.module_end` terminator and use the `NoTerminator`
trait instead. (Same as `builtin.module`.)
- Turn the region into a graph region. (Same as `builtin.module`.)
2024-08-13 09:37:36 +02:00
runseny
f6431f0c52 [MLIR][GPUToNVVM] support fastMath and other non-supported mathOp (#99890)
Support fastMath and other non-supported mathOp which only require float
operands and call libdevice function directly to nvvm.

1. lowering mathOp with fastMath attribute to correct libdevice
intrinsic.
2. some mathOp in math dialect has been lowered to libdevice now, but it
doesn't cover all mathOp. so this mr lowers all the remaining mathOp
which only require float operands.
2024-07-25 13:54:58 +02:00
Matthias Springer
3f33d2f3ca [mlir][GPUToNVVM] Fix memref function args/results (#96392)
The `gpu.func` op lowering accounts for memref arguments/results (both
"normal" and bare-pointer supported), but the `gpu.return` op lowering
did not. The lowering produced invalid IR that did not verify.

This commit uses the same lowering strategy as for `func.return` in the
`gpu.return` lowering. (The C++ implementation is copied. We may want to
share some code between `func` and `gpu` lowerings in the future.)
2024-06-23 09:51:12 +02:00
Krzysztof Drewniak
43fd4c49bd [mlir][GPU] Improve handling of GPU bounds (#95166)
This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)

When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).

This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-06-17 23:47:38 -05:00
klensy
a5985ca51d [mlir][test] Fix filecheck annotation typos [2/n] (#93476)
Few more fixes
previous: https://github.com/llvm/llvm-project/pull/92897 pr
Issues from https://github.com/llvm/llvm-project/issues/93154 unfixed.

---------

Co-authored-by: klensy <nightouser@gmail.com>
2024-06-14 17:16:02 +02:00
Johannes Reifferscheid
a6a9215b93 Lower shuffle to single-result form if possible. (#84321)
We currently always lower shuffle to the struct-returning variant. I saw
some cases where this survived all the way through ptx, resulting in
increased register usage. The easiest fix is to simply lower to the
single-result version when the predicate is unused.
2024-03-21 10:33:49 +01:00
David Majnemer
0039a2ff4f [mlir][gpu] Add support for lowering math.erf to __nv_erf (#79848) 2024-01-29 19:35:23 +00:00
Guray Ozen
2aec7083ad [mlir][gpu] Use DenseI32Array for NVVM's maxntid and reqntid (NFC) (#77466) 2024-01-09 16:44:25 +01:00
Guray Ozen
763109e346 [mlir][gpu] Use known_block_size to set maxntid for NVVM target (#77301)
Setting thread block size with `maxntid` on the kernel has great
performance benefits. In this way, downstream PTX compiler can do better
register allocation.

MLIR's `gpu.launch` and `gpu.launch_func` already has an attribute
(`known_block_size`) that keeps the thread block size when it is known.
This PR simply uses this attribute to set `maxntid`.
2024-01-08 14:49:19 +01:00
Jakub Kuderski
7eccd52842 Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423)"
This reverts commit dd09221a29 and relands
https://github.com/llvm/llvm-project/pull/73423.

* Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.
2023-11-27 11:38:18 -05:00
Jakub Kuderski
dd09221a29 Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423)"
This reverts commit e0aac8c88d.

I'm seeing some nvidia integration test failures:
https://lab.llvm.org/buildbot/#/builders/61/builds/52334.
2023-11-27 11:29:23 -05:00
Jakub Kuderski
e0aac8c88d [mlir][gpu] Align reduction operations with vector combining kinds (#73423)
The motivation for this change is explained in
https://github.com/llvm/llvm-project/issues/72354.

Before this change, we could not tell between signed/unsigned
minimum/maximum and NaN treatment for floating point values.

The mapping of old reduction operations to the new ones is as follows:
*  `min` --> `minsi` for ints, `minf` for floats
*  `max` --> `maxsi` for ints, `maxf` for floats

New reduction kinds not represented in the old enum: `minui`, `maxui`,
`minimumf`, `maximumf`.

As a next step, I would like to have a common definition of combining
kinds used by the `vector` and `gpu` dialects. Separately, the GPU to
SPIR-V lowering does not yet properly handle zero and NaN values -- the
behavior of floating point min/max group reductions is not specified by
the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. 

Issue: https://github.com/llvm/llvm-project/issues/72354
2023-11-27 11:19:20 -05:00
Christian Ulmann
02307a1444 [MLIR][GPUToNVVM] Remove typed pointer support (#70861)
This commit removes the support for lowering GPU to NVVM dialect with
typed pointers. Typed pointers have been deprecated for a while now and
it's planned to soon remove them from the LLVM dialect.

Related PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
2023-11-01 08:40:32 +01:00
Oleksandr "Alex" Zinenko
e4384149b5 [mlir] use transform-interpreter in test passes (#70040)
Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.

Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.

Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
2023-10-24 16:12:34 +02:00
Christian Ulmann
484668c759 Reland "[MLIR][LLVM] Change addressof builders to use opaque pointers" (#69292)
This relands fbde19a664, which was broken due to incorrect GEP element type creation.

This commit changes the builders of the `llvm.mlir.addressof` operations
to no longer produce typed pointers.

As a consequence, a GPU to NVVM pattern had to be updated, that still
relied on typed pointers.
2023-10-17 11:33:45 +02:00
Christian Ulmann
9397e5f581 Revert "[MLIR][LLVM] Change addressof builders to use opaque pointers (#69215)"
This reverts commit fbde19a664 due to
breaking integration tests.
2023-10-17 06:31:48 +00:00
Christian Ulmann
fbde19a664 [MLIR][LLVM] Change addressof builders to use opaque pointers (#69215)
This commit changes the builders of the `llvm.mlir.addressof` operations
to no longer produce typed pointers.

As a consequence, a GPU to NVVM pattern and the toy example LLVM lowerings had to be updated, as they still relied on typed pointers.
2023-10-17 07:55:00 +02:00
Adrian Kuegel
640b95da80 [mlir][GPU] Lower arith.remf to GPU intrinsic.
Differential Revision: https://reviews.llvm.org/D159422
2023-09-04 13:38:45 +02:00
Nicolas Vasilache
888717e853 [mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD
This revision untangles a few more conversion pieces and allows rewriting
the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass
in a declarative fashion that provides a much better understanding and control.

Differential Revision: https://reviews.llvm.org/D157617
2023-08-10 15:30:48 +00:00
Christopher Bate
14858cf05d [mlir][Conversion/GPUCommon] Fix bug in conversion of math ops
The common GPU operation transformation that lowers `math` operations
to function calls in the `gpu-to-nvvm` and `gpu-to-rocdl` passes handles
`vector` types by applying the function to each scalar and returning a
new vector. However, there was a typo that results in incorrectly
accumulating the result vector, and the rewrite returns an `llvm.mlir.undef`
result instead of the correct vector. A patch is added and tests are
strengthened.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D154269
2023-07-03 13:26:51 -06:00
Uday Bondhugula
27ccf0f407 [MLIR] Provide bare pointer memref lowering option on gpu-to-nvvm pass
Provide the bare pointer memref lowering option on gpu-to-nvvm pass.
This is needed whenever we lower memrefs on the host function side and
the kernel calls on the host-side (gpu-to-llvm) with the bare ptr
convention. The GPU module side of the lowering should also "align" and
use the bare pointer convention.

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D152480
2023-06-18 22:53:50 +05:30
Fabian Mora
041f1abee1 [mlir][memref] Fix num elements in lowering of memref.alloca op to LLVM
Fixes a mistake in the lowering of memref.alloca to llvm.alloca, as llvm.alloca uses the number of elements to allocate in the stack and not the size in bytes.

Reference:
LLVM IR: https://llvm.org/docs/LangRef.html#alloca-instruction
LLVM MLIR: https://mlir.llvm.org/docs/Dialects/LLVM/#llvmalloca-mlirllvmallocaop

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D150705
2023-05-22 16:23:00 +00:00
Markus Böck
0e5aeae6f5 [mlir][GPUToLLVM] Add support for emitting opaque pointers
Part of https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179

This patch adds the new pass option `use-opaque-pointers` to the GPU to LLVM lowerings (including ROCD and NVVM) and adapts the code to support using opaque pointers in addition to typed pointers.
The required changes mostly boil down to avoiding `getElementType` and specifying base types in GEP and Alloca.

In the future opaque pointers will be the only supported model, hence tests have been ported to using opaque pointers by default. Additional regression tests for typed-pointers have been added to avoid breaking existing clients.

Note: This does not yet port the `GpuToVulkan` passes.

Differential Revision: https://reviews.llvm.org/D144448
2023-02-21 20:46:33 +01:00
Quinn Dawkins
985f7ff632 [mlir][gpu] Add support for integer types in gpu.subgroup_mma ops
The signedness is carried by `!gpu.mma_matrix` types to most closely
match the Cooperative Matrix specification which determines signedness
with the type (and sometimes the operation).

See: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_cooperative_matrix.html

To handle the lowering from vector to gpu, ops such as arith.extsi are
pattern matched next to `vector.transfer_read` and `vector.contract` to
determine the signedness of the matrix type.

Enables s8 and u8 WMMA types in NVVM for the GPUToNVVM conversion.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D143223
2023-02-07 17:58:01 -05:00
Guray Ozen
a3388f3e2a [mlir] Introduce a pattern to lower gpu.subgroup_reduce to nvvm.redux_op
This revision introduces a pattern to lower `gpu.subgroup_reduce` op into to the `nvvm.redux_sync` op. The op must be run by the entire subgroup, otherwise it is undefined behaviour.

It also adds a flag and populate function, because the op is not avaiable for every gpu (sm80+), so it can be used when it is desired.

Depends on D142088

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D142103
2023-01-20 13:56:23 +01:00
Goran Flegar
b7fe0d346b Lower math.tan to __nv_tan[f] / __ocml_tan_f{32|64}
At present math.tan fails to lower for NVVM and ROCDL.

Differential Revision: https://reviews.llvm.org/D141505
2023-01-12 15:26:11 +01:00
Johannes Reifferscheid
059cf735a9 Lower math.cbrt to NVVM/ROCDL.
Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D141270
2023-01-09 13:17:35 +01:00
Thomas Raoux
7efdc117b1 [mlir][nvvm] Add lowering of gpu.printf to nvvm
When converting to nvvm lowering gpu.printf to vprintf allows us to
support printing when running on cuda.

Differential Revision: https://reviews.llvm.org/D141049
2023-01-06 17:29:30 +00:00
Ivan Butygin
247d8d4f7a [mlir][gpu] Add uniform flag to gpu reduction ops
Differential Revision: https://reviews.llvm.org/D138758
2022-12-14 13:15:58 +01:00
Navdeep Katel
3d35546cd1 Support transpose mode for gpu.subgroup WMMA ops
Add support for loading, computing, and storing `gpu.subgroup` WMMA ops
in transpose mode as well. Update the GPU to NVVM lowerings to support
`transpose` mode and update integration tests as well.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D139021
2022-12-05 22:37:02 +05:30
Christian Sigg
b251b608b5 [mlir][gpu] Unroll ops on vectors which map to intrinsic calls
Unroll ops that map to intrinsics when lowering to LLVM, because intrinsics don't support vector operands/results.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D136345
2022-10-28 10:33:38 +02:00
Jeff Niu
5c5af910fe [mlir][LLVMIR] "Modernize" Insert/ExtractValueOp
This patch "modernizes" the LLVM `insertvalue` and `extractvalue`
operations to use DenseI64ArrayAttr, since they only require an array of
indices and previously there was confusion about whether to use i32 or
i64 arrays, and to use assembly format.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D131537
2022-08-10 12:51:11 -04:00
Jeff Niu
00f7096d31 [mlir][math] Rename math.abs -> math.absf
To make room for introducing `math.absi`.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D131325
2022-08-08 11:04:58 -04:00
Alex Zinenko
610139d2d9 [mlir] replace 'emit_c_wrappers' func->llvm conversion option with a pass
The 'emit_c_wrappers' option in the FuncToLLVM conversion requests C interface
wrappers to be emitted for every builtin function in the module. While this has
been useful to bootstrap the interface, it is problematic in the longer term as
it may unintentionally affect the functions that should retain their existing
interface, e.g., libm functions obtained by lowering math operations (see
D126964 for an example). Since D77314, we have a finer-grain control over
interface generation via an attribute that avoids the problem entirely. Remove
the 'emit_c_wrappers' option. Introduce the '-llvm-request-c-wrappers' pass
that can be run in any pipeline that needs blanket emission of functions to
annotate all builtin functions with the attribute before performing the usual
lowering that accounts for the attribute.

Reviewed By: chelini

Differential Revision: https://reviews.llvm.org/D127952
2022-06-17 11:10:31 +02:00
Thomas Raoux
a6f2c2291e [mlir][GPUToNVVM] Fix bug in mma elementwise lowering
The maxf implementation of wmma elementwise op was incorrect as the
operands of the select to check for Nan were swapped.

Differential Revision: https://reviews.llvm.org/D127879
2022-06-15 17:23:17 +00:00
Thomas Raoux
15bcc36eed [mlir][gpu] Move async copy ops to NVGPU and add caching hints
Move async copy operations to NVGPU as they only exist on NV target and are
designed to match ptx semantic. This allows us to also add more fine grain
caching hint attribute to the op.
Add hint to bypass L1 and hook it up to NVVM op.

Differential Revision: https://reviews.llvm.org/D125244
2022-05-10 22:30:24 +00:00
Thomas Raoux
894a591cf6 [mlir][nvgpu] Move mma.sync and ldmatrix in nvgpu dialect
Move gpu operation mma.sync and ldmatrix in nvgpu as they are specific
to nvidia target.

Differential Revision: https://reviews.llvm.org/D123824
2022-04-14 23:44:52 +00:00
Christopher Bate
77d2c815f5 [MLIR][GPU] Add GPU ops nvvm.mma.sync, nvvm.mma.ldmatrix, lane_id
This change adds three new operations to the GPU dialect: gpu.mma.sync,
gpu.mma.ldmatrix, and gpu.lane_id. The former two are meant to target
the lower level nvvm.mma.sync and nvvm.ldmatrix instructions, respectively.
Lowerings are added for the new GPU operations for conversion to
NVVM.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D123647
2022-04-13 22:50:07 +00:00
River Riddle
3655069234 [mlir] Move the Builtin FuncOp to the Func dialect
This commit moves FuncOp out of the builtin dialect, and into the Func
dialect. This move has been planned in some capacity from the moment
we made FuncOp an operation (years ago). This commit handles the
functional aspects of the move, but various aspects are left untouched
to ease migration: func::FuncOp is re-exported into mlir to reduce
the actual API churn, the assembly format still accepts the unqualified
`func`. These temporary measures will remain for a little while to
simplify migration before being removed.

Differential Revision: https://reviews.llvm.org/D121266
2022-03-16 17:07:03 -07:00
River Riddle
23aa5a7446 [mlir] Rename the Standard dialect to the Func dialect
The last remaining operations in the standard dialect all revolve around
FuncOp/function related constructs. This patch simply handles the initial
renaming (which by itself is already huge), but there are a large number
of cleanups unlocked/necessary afterwards:

* Removing a bunch of unnecessary dependencies on Func
* Cleaning up the From/ToStandard conversion passes
* Preparing for the move of FuncOp to the Func dialect

See the discussion at https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061

Differential Revision: https://reviews.llvm.org/D120624
2022-03-01 12:10:04 -08:00
Thomas Raoux
5ab04bc068 [mlir][gpu] Add device side async copy operations
Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.

We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924

Differential Revision: https://reviews.llvm.org/D119191
2022-02-10 17:25:59 -08:00
River Riddle
ace01605e0 [mlir] Split out a new ControlFlow dialect from Standard
This dialect is intended to model lower level/branch based control-flow constructs. The initial set
of operations are: AssertOp, BranchOp, CondBranchOp, SwitchOp; all split out from the current
standard dialect.

See https://discourse.llvm.org/t/standard-dialect-the-final-chapter/6061

Differential Revision: https://reviews.llvm.org/D118966
2022-02-06 14:51:16 -08:00
harsh
e01e4c9115 Fix bugs in GPUToNVVM lowering
The current lowering from GPU to NVVM does
not correctly handle the following cases when
lowering the gpu shuffle op.

1. When the active width is set to 32 (all lanes),
then the current approach computes (1 << 32) -1 which
results in poison values in the LLVM IR. We fix this by
defining the active mask as (-1) >> (32 - width).

2. In the case of shuffle up, the computation of the third
operand c has to be different from the other 3 modes due to
the op definition in the ISA reference.
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html)
Specifically, the predicate value is computed as j >= maxLane
for up and j <= maxLane for all other modes. We fix this by
computing maskAndClamp as 32 - width for this mode.

TEST: We modify the existing test and add more checks for the up mode.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D118086
2022-01-25 03:24:14 +00:00
Mogball
aae5125550 [mlir] Replace StrEnumAttr -> EnumAttr in core dialects
Removes uses of `StrEnumAttr` in core dialects

Reviewed By: mehdi_amini, rriddle

Differential Revision: https://reviews.llvm.org/D117514
2022-01-18 17:15:00 +00:00
Thomas Raoux
47555d73f6 [mlir][gpu] Extend shuffle op modes and add nvvm lowering
Add up, down and idx modes to gpu shuffle ops, also change the mode from
string to enum

Differential Revision: https://reviews.llvm.org/D114188
2021-11-19 11:14:31 -08:00
thomasraoux
f309939d06 [mlir][nvvm] Remove special case ptr arithmetic lowering in gpu to nvvm
Use existing helper instead of handling only a subset of indices lowering
arithmetic. Also relax the restriction on the memref rank for the GPU mma ops
as we can now support any rank.

Differential Revision: https://reviews.llvm.org/D113383
2021-11-10 10:00:12 -08:00
thomasraoux
8a992b20db [mlir][gpu] Add basic support to do elementwise ops on mma matrix type
In order to support fusion with mma matrix type we need to be able to
execute elementwise operations on them. This add an op to be able to
support some basic elementwise operations. This is a is not a full
solution as it only supports a limited scope or operations. Ideally we would
want to be able to fuse with more kind of operations.

Differential Revision: https://reviews.llvm.org/D112857
2021-11-01 11:51:19 -07:00