Commit Graph

1163 Commits

Author SHA1 Message Date
Jakub Kuderski
80d5400d92 [mlir][spirv] Account for type conversion failures in scf-to-spirv
Fixes: https://github.com/llvm/llvm-project/issues/59136

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D141292
2023-01-09 11:35:47 -05:00
Johannes Reifferscheid
059cf735a9 Lower math.cbrt to NVVM/ROCDL.
Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D141270
2023-01-09 13:17:35 +01:00
Alexander Shaposhnikov
9e1a344155 [MLIR][TOSA] Switch Tosa to DenseArrayAttr
This diff completes switching Tosa to DenseArrayAttr.

Test plan: ninja check-mlir check-all

Differential revision: https://reviews.llvm.org/D141111
2023-01-06 22:57:14 +00:00
Rob Suderman
7ce53e3102 [mlir][tosa] Add tosa.conv3d lowering to Linalg
Conv3D has an existing linalg operation for floating point. Adding a quantized
variant and corresponding lowering from TOSA. Numerical correctness was validated
using the TOSA conformance tests.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D140919
2023-01-06 10:47:45 -08:00
Thomas Raoux
7efdc117b1 [mlir][nvvm] Add lowering of gpu.printf to nvvm
When converting to nvvm lowering gpu.printf to vprintf allows us to
support printing when running on cuda.

Differential Revision: https://reviews.llvm.org/D141049
2023-01-06 17:29:30 +00:00
mariecwhite
76dc9a853a [mlir][tosa] Remove clamping behavior in tosa.cast for integer truncation
Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D141015
2023-01-04 15:10:06 -08:00
Alexander Shaposhnikov
11030c7d67 [MLIR][TOSA] Switch Tosa_IntArrayAttr[N], Tosa_IntArrayAttrUpto[N] to DenseI64ArrayAttr
Switch Tosa_IntArrayAttr[N], Tosa_IntArrayAttrUpto[N] to DenseI64ArrayAttr.

Test plan: ninja check-mlir check-all

Differential revision: https://reviews.llvm.org/D140748 https://reviews.llvm.org/D140829, https://reviews.llvm.org/D140832, https://reviews.llvm.org/D140833, https://reviews.llvm.org/D140834
2023-01-04 21:58:20 +00:00
Robert Walker
ca21499526 [mlir][tosa] Fix floating point offset for tosa.resize
Offset is a signed value, so use `arith.sitofp`

See also https://github.com/llvm/llvm-project/issues/59585

Reviewed By: NatashaKnk, jpienaar

Differential Revision: https://reviews.llvm.org/D140958
2023-01-04 12:53:54 -08:00
Rob Suderman
b5a1de9c98 [mlir][tosa] Add broadcasting case for tosa.resize to linalg implementation
When lowering tosa.resize it is possible there is an unary input dimension.
Lowering to a new tosa.resize and explicit broadcast simplifies the
tosa.resize operation to avoid recomputing the identical broadcasted values.

This change reworks the broadcast optimization reuse the tosa.resize generic
implementation.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D139963
2023-01-03 14:29:06 -08:00
Johannes Reifferscheid
998a3a3894 Add a math.cbrt instruction and lowering to libm.
There's currently no way to get accurate cube roots in the math dialect.
powf(x, 1/3.0) is too inaccurate in some cases.

Reviewed By: akuegel

Differential Revision: https://reviews.llvm.org/D140842
2023-01-03 08:44:12 +01:00
Krzysztof Drewniak
f6076bd81f [mlir][ROCDL] Translate known block size attributes to ROCDL
1. When converting from the GPU dialect to the ROCDL dialect, if the
function that contains a gpu.thread_id or gpu.block_id op is annotated
with gpu.known_{block,grid}_size, use that size to set a "range"
attribute on the corresponding rocdl intrinsic so that the LLVM
frontend can optimize based on that range information.
1b. When translating from the rocdl dialect to LLVM IR, use the
"range" attribute, if present, to set !range metadata on the relevant
function call.
2. Deprecate the old rocdl.max_flat_work_group_size attribute, which
was used in a tensorflow backend. Instead, use
rocdl.flat_work_group_size going forward to allow kernel generators to
specify the minimum and maximum work group sizes a kernel may be
launched with in one attribute, thus more closely matching the backend.
3. When translating from gpu.func to llvm.func within gpu-to-rocdl,
copy the known_block_size attribute as rocdl.reqd_work_group_size to
enable further translations to set the corresponding metadata on the
LLVM IR function. Also, set the rocdl.flat_work_group_size attribute
to ensure that the reqd_work_group_size metadata and the
amdgpu-flat-work-group-size metadata are consistent.
3b. Extend the ROCDL to LLVM IR translation to set the
!reqd_work_group_size metadata on LLVM functions

Also update tests and add functions to the ROCDL dialect to ensure
attribute names are used consistently.

Depends on D139865

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D139866
2023-01-02 21:04:13 +00:00
Ivan Butygin
2e4aa3bd83 [mlir][gpu][spirv] Lower gpu reduction ops to spirv
Supports only "add" and "mul" ops for now. More ops will be added later.

Differential Revision: https://reviews.llvm.org/D140576
2022-12-30 17:44:08 +01:00
Lei Zhang
56c069887b [mlir][spirv] Fail vector.bitcast conversion with different bitwidth
Depending on the target environment, we may need to emulate certain
types, which can cause issue with bitcast.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D140437
2022-12-29 15:43:55 -08:00
Benoit Jacob
eec575e548 Allow non-constant divisors in affine mod, floordiv, ceildiv.
The requirement that divisor>0 is not enforced here outside of the
constant case, but how to enforce it? If I understand correctly, it is
UB and while it is nice to be able to deterministically intercept UB,
that isn't always feasible. Hopefully, keeping the existing
enforcement in the constant case is enough.

Differential Revision: https://reviews.llvm.org/D140079
2022-12-17 02:24:02 +00:00
Rob Suderman
b37a0318cb [mlir][tosa] Make tosa.resize to linalg avoid redundant loads for unit width
When using a tosa resize for ?x1x1x? to ?x1x?x? we should avoid doing a 2D
interpolation as only two unique values are loaded. As the extract operation
performance numerical computation on its values the superfluous extracts may
fail to be coalesced. Instead we only interpolate between the values if there
are multiple values to interpolate between.

For the integer case we also perform scaling by the scaling-factor to apply
the same integer scaling behavior as interpolation.

Reviewed By: jpienaar, NatashaKnk

Differential Revision: https://reviews.llvm.org/D139979
2022-12-15 16:22:46 -08:00
Lei Zhang
f1db4aec30 [mlir][VectorToGPU] Support transposed+broadcasted 2D MMA load
This is loading from 2-D memref, in addition to D139655 where we
load from 1-D memref cases.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D140136
2022-12-15 19:34:32 +00:00
Lei Zhang
dbddd4f6a4 [mlir][VectorToGPU] Support transposed+broadcasted 1D MMA load
This is now possible with transpose semantics on subgroup MMA
load ops.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D139655
2022-12-15 19:22:35 +00:00
Quinn Dawkins
b05b8970d8 [mlir][gpu][spirv] Verify elementwise op type as mulf when converting to spirv.MatrixTimesScalar
Conversion from gpu.subgroup_mma_constant_matrix to spirv.MatrixTimesScalar didn't check that the op type was a multiplication and thus would incorrectly convert other elementwise scalar operations.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D140081
2022-12-15 03:15:04 +00:00
Jakub Kuderski
4f47677dee [mlir][arith][spirv] Account for possible type conversion failures
Check results of all type conversions in `--convert-arith-to-spirv`.

Fixes: https://github.com/llvm/llvm-project/issues/59496

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D140033
2022-12-14 19:32:40 -05:00
Slava Zakharin
70174b8035 [mlir][math] Added math::FPowI conversion to LLVM dialect.
The operations are converted into LLVM::PowIOp.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D129812
2022-12-14 10:15:05 -08:00
Ivan Butygin
247d8d4f7a [mlir][gpu] Add uniform flag to gpu reduction ops
Differential Revision: https://reviews.llvm.org/D138758
2022-12-14 13:15:58 +01:00
Slava Zakharin
22702cc76c [mlir][math] Added math::FPowI conversion to calls of outlined implementations.
Power functions are implemented as linkonce_odr scalar functions
for FPowI operations met in a module.
Vector form of FPowI is linearized into a sequence of calls
of the scalar functions.

Option {min-width-of-fpowi-exponent} controls which FPowI operations
are converted by MathToFuncs: if the width of the exponent's integer
type is less than the specified value, then the operation is not converted.

Flang will specify {min-width-of-fpowi-exponent=33} to make sure that
math::FPowI operations with exponent wider than 32 bits will be converted
by MathToFuncs, and operations with more narrow exponent will be left
for MathToLLVM to convert them to LLVM::PowIOp.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D139804
2022-12-13 12:15:35 -08:00
Rob Suderman
78503e1a2f [mlir][tosa] Refactor tosa.resize
Moved to using helper lambdas to avoid code repetition. IR needed to be reordered to
accommodate which should be the only changes to the existing tests.

This changes the quantized test to target `i48` types to guarantee types are extended
correctly when necessary.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D136500
2022-12-12 14:38:38 -08:00
Jakub Kuderski
f39b47264e [mlir][arith][tosa] Use extended mul in 32-bit tosa.apply_scale
To not introduce 64-bit types that may be difficult to handle for some
targets.

Reviewed By: rsuderman, antiagainst

Differential Revision: https://reviews.llvm.org/D139777
2022-12-12 14:39:58 -05:00
Benjamin Chetioui
a6c8f06f55 [mlir] Clean up typos in FileCheck directives in various tests.
Reviewed By: tpopp

Differential Revision: https://reviews.llvm.org/D139698
2022-12-12 09:29:14 +01:00
Jakub Kuderski
285d321a85 [mlir][arith] Define mulsi_extended op
Extend D139688 with the signed version of the extended multiplication
op. Add conversion to the SPIR-V and LLVM dialects.

This was originally proposed in:
https://discourse.llvm.org/t/rfc-arith-add-extended-multiplication-ops/66869.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D139743
2022-12-09 20:25:31 -05:00
Jakub Kuderski
b4bdcea214 [mlir][arith] Define mului_extended op
Add conversion to the SPIR-V and LLVM dialects.

This was originally proposed in:
https://discourse.llvm.org/t/rfc-arith-add-extended-multiplication-ops/66869.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D139688
2022-12-09 17:37:06 -05:00
Guray Ozen
b2bba5b65c [mlir][spirv] Support conversion of CopySignOp to spirv for 1D vector with 1 element
Conversion of CopySignOp to SPIRV is supported for scalar and vectors but not 1D vectors with 1 element (aka vector<1xf32>). This revisions adds supports this by treating them as scalars.

An alternative solution would be to allow 0D vectors for SPIRV, but the spec [0] strictly defines the vector type as non-0D.
"Vector: An ordered homogeneous collection of two or more scalars. Vector sizes are quite restrictive and dependent on the execution model."

[0] https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_types

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D139518
2022-12-08 09:11:27 +01:00
Jakub Kuderski
28246b7e75 [mlir][arith] Rename addui_carry to addui_extended
The goal is to make the naming of the future `_extended` ops more
consistent. With unsigned addition, the carry value/flag and overflow
bit are the same, but this is not true when it comes to signed addition.

Also rename the second result from `carry` to `overflow`.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D139569
2022-12-07 17:15:56 -05:00
Rob Suderman
8e7630ece1 [mlir][tosa] Fix tosa.resize for i48 accumulator
Implementation assumed a i32 accumulator. Fixed the implementation to
work with an i32 accumulator.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D139365
2022-12-07 11:27:33 -08:00
Ramkumar Ramachandra
2a19625424 mlir/tosa: move tosa.pad from Linalg to Tensor conversion
Since tosa.pad is lowered strictly to artih and tensor ops, move
ConvertPad from TosaToLinalg to TosaToTensor, benefitting non-Linalg
Tosa targets. TensorToLinalg exists, and is trivial, so nothing is lost.

Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>

Differential Revision: https://reviews.llvm.org/D139091
2022-12-06 07:39:29 +01:00
Lei Zhang
2c7827da4f [mlir][spirv] Add GPU subgroup MMA to spirv.MMAMatrixTimesScalar
Along the way, make the default pattern fail instead of crashing
when an elementwise op is not supported yet.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D139280
2022-12-05 22:30:50 +00:00
Rob Suderman
58fa8426ff [mlir][tosa] Handle tosa.resize nearest rounding correctly
Rounding of tosa.resize did not handle rounding to the nearest pixel correctly.
Rather than dividing the scale by 2 we should double the partial pixel to
guarantee we include a check on the lowest bit.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D139162
2022-12-05 13:10:08 -08:00
Navdeep Katel
3d35546cd1 Support transpose mode for gpu.subgroup WMMA ops
Add support for loading, computing, and storing `gpu.subgroup` WMMA ops
in transpose mode as well. Update the GPU to NVVM lowerings to support
`transpose` mode and update integration tests as well.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D139021
2022-12-05 22:37:02 +05:30
Ramkumar Ramachandra
1e33330e29 mlir/TosaToTensor: fix typos in test
This patch fixes a misspelt CHECK-LABEL in tosa-to-tensor.mlir.

Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>

Differential Revision: https://reviews.llvm.org/D139085
2022-12-03 09:57:10 +01:00
Quentin Colombet
786cbb09ed Re-apply "[mlir][MemRefToLLVM] Remove the code for lowering subview"
This reverts commit d0650d1089.

Original commit message:
Subviews are supposed to be expanded before we hit the lowering
code.
The expansion is done with the pass called
expand-strided-metadata.

Add a test that demonstrate how these passes can be linked up to achieve
the desired lowering.

This patch is NFC in spirit but not in practice because `subview` gets
lowered into `reinterpret_cast(extract_strided_metadata, <some math>)`
which lowers in two memref descriptors (one for `reinterpert_cast` and
one for `extract_strided_metadata`), which creates some noise of the
form: `extractvalue(unrealized_cast(extractvalue[0]))[0]` that is
currently not simplified within MLIR but that is really just noop in
that case.

Differential Revision: https://reviews.llvm.org/D136377
2022-12-02 15:26:58 +00:00
Quentin Colombet
d0650d1089 Revert "[mlir][MemRefToLLVM] Remove the code for lowering subview"
This reverts commit c8e15afa4c.

This breaks some integration tests, see
https://lab.llvm.org/buildbot/#/builders/220/builds/10446

I have to update a bunch of RUN lines in the tests to use the new
lowering scheme. Nothing complicated but let's keep the build clean
while I'm fixing that.
2022-12-02 14:19:37 +00:00
Quentin Colombet
c8e15afa4c [mlir][MemRefToLLVM] Remove the code for lowering subview
Subviews are supposed to be expanded before we hit the lowering
code.
The expansion is done with the pass called
expand-strided-metadata.

Add a test that demonstrate how these passes can be linked up to achieve
the desired lowering.

This patch is NFC in spirit but not in practice because `subview` gets
lowered into `reinterpret_cast(extract_strided_metadata, <some math>)`
which lowers in two memref descriptors (one for `reinterpert_cast` and
one for `extract_strided_metadata`), which creates some noise of the
form: `extractvalue(unrealized_cast(extractvalue[0]))[0]` that is
currently not simplified within MLIR but that is really just noop in
that case.

Differential Revision: https://reviews.llvm.org/D136377
2022-12-02 10:17:06 +00:00
Manish Gupta
9774cd17e8 [mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref
This patch fixes and simplifies the ldmatrix affine map arithmetic by
abstracting the affine expressions in terms of pitch-linear layout
(strided and contiguous dimensions). Then it applies the maps for
strided and contiguous dimensions in row-major and col-major.

LdMatrixOp collaboratively (32 threads in a warp) load tiles
(8 row x 128b col) of data. It can load either x1, x2, x4 tiles.
Additionally, it can transpose at 16-bit granularity when moving
data from the Shared Memory to registers.

This patch fixes affine map:
(laneid -> coordinate index a thread points in a tile).

- Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous
  chunk of 128b. The issue was exposed when running this case.
- Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points
  to contiguous chunk of 128b. The patch is NFC for these cases.

Differential Revision: https://reviews.llvm.org/D138978
2022-12-01 18:26:33 -08:00
Nicolas Vasilache
3af6438372 Revert "[WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D"
This reverts commit 7db25f78db.

This was mistakently stacked below (and committed) along with an NFC change.
2022-12-01 02:57:03 -08:00
Nicolas Vasilache
7db25f78db [WIP] Add support for MMA conversion for 1-D vector.transfer followed by a broadcast to 2-D
Differential Revision: https://reviews.llvm.org/D139040
2022-12-01 02:49:47 -08:00
Lei Zhang
ff81cc824f [mlir][spirv] Improve vector extract/insert element conversion
* Fix type conversions around positions--we need to use the
  converted value from the adaptor.
* Convert constant position cases to composite extract/insert.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D139057
2022-12-01 00:35:41 +00:00
Jakub Kuderski
9ad215bb3d [mlir][spirv] Drop experimental LinalgToSPIRV pass
This experimental pass is unused and obsolete.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D139056
2022-11-30 19:25:40 -05:00
Lei Zhang
52ca149931 [mlir][spirv] Allow controlling subgroup size
This commit extends the `ResourceLimitsAttr` to support specifying
a minimal and maximal subgroup size, and extends `EntryPointABIAttr`
to support specifying the requested subgroup size. This is possible
now in Vulkan with the VK_EXT_subgroup_size_control extension.
For OpenCL it's possible to use the `SubgroupSize` execution mode
directly.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D138962
2022-11-30 12:34:09 -05:00
Diego Caballero
eb7e2998d1 Reland "[mlir][Vector] Re-define masking semantics in vector.transfer ops""
This relands commit 847b5f82a4.

Differential Revision: https://reviews.llvm.org/D138079
2022-11-29 03:36:54 +00:00
Quinn Dawkins
c0321edc26 [mlir][gpu] Adding support for transposed mma_load_matrix
Enables transposed gpu.subgroup_mma_load_matrix and updates the lowerings in Vector to GPU and GPU to SPIRV. Needed to enable B transpose matmuls lowering to wmma ops.

Taken over from author: stanley-nod <stanley@nod-labs.com>

Reviewed By: ThomasRaoux, antiagainst

Differential Revision: https://reviews.llvm.org/D138770
2022-11-29 03:35:49 +00:00
Diego Caballero
f6d90055fd [mlir][Vector] Remove 'lower-permutation-maps' option from VectorToSCF
This patch is part of a larger simplification effort of vector transfer
operations. It removes the flag `lower-permutation-maps` from
VectorToSCF conversion and enables the lowering of permutation maps
by default. This means that VectorToSCF will always lower permutation
maps to independent broadcast/transpose operations before lowering
vector operations to SCF.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D138742
2022-11-28 23:56:43 +00:00
Hanhan Wang
0a1569a400 [mlir][NFC] Remove trailing whitespaces from *.td and *.mlir files.
This is generated by running

```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```

Reviewed By: rriddle, dcaballe

Differential Revision: https://reviews.llvm.org/D138866
2022-11-28 15:26:30 -08:00
Thomas Raoux
df47f3ea0d [mlir][spirv] Add lowering for gpu shuffle idx
Differential Revision: https://reviews.llvm.org/D138863
2022-11-28 22:17:19 +00:00
Luca Boasso
4f9c9295a6 [mlir][index] Add and, or, and xor ops
This patch adds the and, or, and xor bitwise operations to
the index dialects with folders and LLVM lowerings.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D138590
2022-11-23 13:26:02 -06:00