Commit Graph

1884 Commits

Author SHA1 Message Date
Manupa Karunaratne
a6e72f9392 [MLIR][Vector] Add Lowering for vector.step (#113655)
Currently, the lowering for vector.step lives
under a folder. This is not ideal if we want
to do transformation on it and defer the
 materizaliztion of the constants much later.

This commits adds a rewrite pattern that
could be used by using
`transform.structured.vectorize_children_and_apply_patterns`
transform dialect operation.

Moreover, the rewriter of vector.step is also
now used in -convert-vector-to-llvm pass where
it handles scalable and non-scalable types as
LLVM expects it.

As a consequence of removing the vector.step
lowering as its folder, linalg vectorization
will keep vector.step intact.
2024-11-01 16:38:36 +00:00
Krzysztof Drewniak
3452149c05 [mlir][AMDGPU] Support vector<2xbf16> packed atomic fadd (#113929)
Now that we use LLVM's native bfloat types in the AMDGPU lowering,
enable vector<2xbf16> for AMDGPU.
2024-10-31 10:52:53 -05:00
Longsheng Mou
262afc8aec [mlir][TosaToLinalg] RescaleConverter only support integer type (#114239)
This PR fixes a bug in the `RescaleConverter` that allows non-integer
types, which leads to a crash.
Fixes #61383.
2024-10-31 11:32:19 +00:00
Simon Camphausen
95c2d79814 [mlir][EmitC] memref-to-emitc: insert conversion_casts (#114204)
Add materializations to the conversion pass, such that types of
non-converted operands are legalized.
2024-10-30 15:27:23 +01:00
Longsheng Mou
7ad63c0e44 [mlir][MathToFuncs] MathToFuncs only support integer type (#113693)
This PR fixes a bug in `MathToFuncs` where it incorrectly converts index
type for `math.ctlz` and `math.ipowi`, leading to a crash. Fixes
#108150.
2024-10-28 09:54:51 +08:00
Durgadoss R
e33aec89ef [MLIR][NVVM] Update the elect.sync Op to use intrinsics (#113757)
Recently, we added an intrinsic for the elect.sync PTX instruction (PR
104780). This patch updates the corresponding Op in NVVM Dialect
to lower to the intrinsic instead of inline-ptx.

The existing test under Conversion/ is migrated to check for the new
pattern. A separate test is added to verify the lowered intrinsic under
the Target/ directory.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-10-27 22:24:31 +05:30
Longsheng Mou
5aa741d7ca [mlir][SPIRVToLLVM] Erase empty spirv.mlir.loop in LoopPattern (#113527)
This PR erases `spirv.mlir.loop` with an empty region in `LoopPattern`,
resolving a crash. Fixes #113404.
2024-10-26 11:22:57 +08:00
Longsheng Mou
8f9fc6ce47 [mlir][GPU] Add FunctionOpInterface check for OpToFuncCallLowering (#113449)
This PR adds a `FunctionOpInterface` check in `OpToFuncCallLowering` to
resolve a crash when ops not in function. Fixes #113334.
2024-10-26 11:22:08 +08:00
donald chen
889b67c9d3 [mlir] [memref] add more checks to the memref.reinterpret_cast (#112669)
Operation memref.reinterpret_cast was accept input like:

%out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10],
strides: [1]
         : memref<?xf32> to memref<10xf32>

A problem arises: while lowering, the true offset of %out is %offset,
but its data type indicates an offset of 0. Permitting this
inconsistency can result in incorrect outcomes, as certain pass might
erroneously extract the offset from the data type of %out.

This patch fixes this by enforcing that the return value's data type
aligns
with the input parameter.
2024-10-26 08:07:51 +08:00
Sayan Saha
b91ce7bc0e [Tosa] : Fix integer overflow for computing intmax+1 in tosa.cast to linalg. (#112455)
This PR fixes an issue related to integer overflow when computing
`(intmax+1)` for `i64` during `tosa-to-linalg` pass for `tosa.cast`.

Found this issue while debugging a numerical mismatch for `deeplabv3`
model from `torchvision` represented in `tosa` dialect using the
`TorchToTosa` pipeline in `torch-mlir` repository. `torch.aten.to.dtype`
is converted to `tosa.cast` that casts `f32` to `i64` type. Technically
by the specification, `tosa.cast` doesn't handle casting `f32` to `i64`.
So it's possible to add a verifier to error out for such tosa ops
instead of producing incorrect code. However, I chose to fix the
overflow issue to still be able to represent the `deeplabv3` model with
`tosa` ops in the above-mentioned pipeline. Open to suggestions if
adding the verifier is more appropriate instead.
2024-10-25 10:51:09 +01:00
Longsheng Mou
927559d27d [mlir][vector] Fix a crash in VectorToGPU (#113454)
This PR fixes a crash in `VectorToGPU` when the operand of `extOp` is a
function argument, which cannot be retrieved using `getDefiningOp`.
Fixes #107967.
2024-10-24 20:28:42 +08:00
Dmitriy Smirnov
27158edaa4 [MLIR][SPIRV] Update cast from IntN to Bool (#113329)
This PR updates the cast to bool from IntN to treat any non-zero value
as TRUE. This makes the cast more resilient to non-generic (i.e. "non
1") TRUE values.

Signed-off-by: Dmitriy Smirnov <dmitriy.smirnov@arm.com>
2024-10-23 09:47:33 +01:00
Kunwar Grover
1004865f1c [mlir][Vector] Support 0-d vectors natively in TransferOpReduceRank (#112907)
Since
ddf2d62c7d
, 0-d vectors are supported in VectorType. This patch removes 0-d vector
handling with scalars for the TransferOpReduceRank pattern. This pattern
specifically introduces tensor.extract_slice during vectorization,
causing vectorization to not fold transfer_read/transfer_write slices
properly. The changes in vectorization test files reflect this.

There are other places where lowering patterns are still side-stepping
from handling 0-d vectors properly, by turning them into scalars, but
this patch only focuses on the vector.transfer_x patterns.
2024-10-22 15:50:16 +01:00
Finlay
1775b98de7 [mlir][spirv] Add spirv-to-llvm conversion for OpControlBarrier (#111864)
The conversion is based on the expected llvm function from the
LLVM/SPIRV translation tool.
2024-10-19 11:55:04 +01:00
Benjamin Kramer
4d228e1ebd [mlir][vector] Escape variable usage in test
Otherwise the shell might expand this in the command line.
2024-10-17 12:43:32 +02:00
Sirui Mu
1dfb104eac [mlir][LLVMIR] Add operand bundle support for llvm.intr.assume (#112143)
This patch adds operand bundle support for `llvm.intr.assume`.

This patch actually contains two parts:

- `llvm.intr.assume` now accepts operand bundle related attributes and
operands. `llvm.intr.assume` does not take constraint on the operand
bundles, but obviously only a few set of operand bundles are meaningful.
I plan to add some of those (e.g. `aligned` and `separate_storage` are
what interest me but other people may be interested in other operand
bundles as well) in future patches.

- The definitions of `llvm.call`, `llvm.invoke`, and
`llvm.call_intrinsic` actually define `op_bundle_tags` as an operation
property. It turns out this approach would introduce some unnecessary
burden if applied equally to the intrinsic operations because properties
are not available through `Operation *` but we have to operate on
`Operation *` during the import/export of intrinsics, so this PR changes
it from a property to an array attribute.

This patch relands commit d8fadad07c.
2024-10-16 20:49:02 +08:00
Andrzej Warzyński
37ad65ffb6 [mlir][arith] Remove some e2e tests (#112012)
I am removing the recently added integration test for various Arith Ops.
These operations and their lowerings are effectively already verified by
the Arith-to-LLVM conversion tests in:
  * "mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir"

I've noticed that a few variants of `arith.cmpi` were missing in that
file - those are added here as well.

This is a follow-up for this discussion:
  * https://github.com/llvm/llvm-project/pull/92272

See also the recent update to our guidelines on e2e tests in MLIR:
  * https://github.com/llvm/mlir-www/pull/203
2024-10-16 07:43:49 +01:00
Sirui Mu
484c02780b Revert "[mlir][LLVMIR] Add operand bundle support for llvm.intr.assume (#112143)"
This reverts commit d8fadad07c.

The commit breaks the following CI builds:
- ppc64le-mlir-rhel-clang: https://lab.llvm.org/buildbot/#/builders/129/builds/7685
- ppc64le-flang-rhel-clang: https://lab.llvm.org/buildbot/#/builders/157/builds/10338
2024-10-16 14:15:31 +08:00
Sirui Mu
d8fadad07c [mlir][LLVMIR] Add operand bundle support for llvm.intr.assume (#112143)
This patch adds operand bundle support for `llvm.intr.assume`.

This patch actually contains two parts:

- `llvm.intr.assume` now accepts operand bundle related attributes and
operands. `llvm.intr.assume` does not take constraint on the operand
bundles, but obviously only a few set of operand bundles are meaningful.
I plan to add some of those (e.g. `aligned` and `separate_storage` are
what interest me but other people may be interested in other operand
bundles as well) in future patches.

- The definitions of `llvm.call`, `llvm.invoke`, and
`llvm.call_intrinsic` actually define `op_bundle_tags` as an operation
property. It turns out this approach would introduce some unnecessary
burden if applied equally to the intrinsic operations because properties
are not available through `Operation *` but we have to operate on
`Operation *` during the import/export of intrinsics, so this PR changes
it from a property to an array attribute.
2024-10-16 12:51:50 +08:00
Andrzej Warzyński
3187a4917d [mlir][vector] Add more tests for ConvertVectorToLLVM (8/n) (#111997)
Adds tests with scalable vectors for the Vector-To-LLVM conversion pass.
Covers the following Ops:

* `vector.transfer_read`,
* `vector.transfer_write`.

In addition:

* Duplicate tests from "vector-mask-to-llvm.mlir" are removed.
* Tests for xfer_read/xfer_write are moved to a newly created test file,
  "vector-xfer-to-llvm.mlir". This follows an existing pattern among
  VectorToLLVM conversion tests.
* Tests that test both xfer_read and xfer_write have their names updated
  to capture that (e.g. @transfer_read_1d_mask ->
  @transfer_read_write_1d_mask)
* @transfer_write_1d_scalable_mask and @transfer_read_1d_scalable_mask
  are re-written as @transfer_read_write_1d_mask_scalable. This is to
  make it clear that this case is meant to complement
  @transfer_read_write_1d_mask.
* @transfer_write_tensor is updated to also test xfer_read.
2024-10-15 13:15:36 +01:00
Sergio Afonso
0a17bdfc36 [MLIR][OpenMP] Remove terminators from loop wrappers (#112229)
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.

Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.

There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
2024-10-15 11:28:39 +01:00
Fabian Mora
58d97034c9 [mlir][OpenMP] Implement the ConvertToLLVMPatternInterface (#101997)
This patch implements the `ConvertToLLVMPatternInterface` for the OpenMP
dialect, allowing `convert-to-llvm` to act on the OpenMP dialect.
2024-10-11 15:07:08 -04:00
Andrzej Warzyński
f7eb271542 [mlir][vector] Add more tests for ConvertVectorToLLVM (7/n) (#111895)
Adds tests with scalable vectors for the Vector-To-LLVM conversion pass.
Covers the following Ops:
  * vector.fma
  * vector.reduce
2024-10-11 14:36:26 +01:00
Simon Camphausen
777142937a [mlir][EmitC] Fail on memrefs with 0 dims in type conversion (#111965)
This let's the type conversion fail instead of generating invalid array
types.
2024-10-11 11:45:25 +02:00
Petr Kurapov
f8b7a65395 [MLIR][GPU-LLVM] Add in-pass signature update for opencl kernels (#105664)
Default to Global address space for memrefs that do not have an explicit address space set in the IR.

---------

Co-authored-by: Victor Perez <victor.perez@intel.com>
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Co-authored-by: Victor Perez <victor.perez@codeplay.com>
2024-10-10 14:04:52 +02:00
Adam Siemieniuk
ec450b1900 [mlir][xegpu] Allow out-of-bounds writes (#110811)
Relaxes vector.transfer_write lowering to allow out-of-bound writes.

This aligns lowering with the current hardware specification which does
not update bytes in out-of-bound locations during block stores.
2024-10-09 18:59:14 +02:00
Dmitriy Smirnov
b9314a8219 [mlir][spirv] Update math.powf lowering (#111388)
The PR updates math.powf lowering to produce NaN result for a negative
base with a fractional exponent which matches the actual behaviour of
the C/C++ implementation.
2024-10-09 09:04:31 +01:00
Benoit Jacob
d8a656ffaf [MLIR] AMDGPUToROCDL: Use a bitcast op to reintepret a vector of i8 as single integer. (#111400)
Found by inspecting AMDGPU assembly - so the arithmetic ops created
there were definitely making their way into the target ISA. A
`LLVM::BitcastOp` seems equivalent, and evaporates as expected in the
target asm.

Along the way, I thought that this helper function `mfmaConcatIfNeeded`
could be renamed to `convertMFMAVectorOperand` to better convey its
contract; so I don't need to think about whether a bitcast is a
legitimate "concat" :-)

---------

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
2024-10-07 14:14:18 -04:00
Andrzej Warzyński
f58e85a972 [mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (#111121)
Adds tests with scalable vectors for the Vector-To-LLVM conversion pass.
Covers the following Ops:
  * `vector.insert_strided_slice`

With this change, for every test with fixed-width vectors, there should
be a corresponding example with scalable vectors (for
`vector.insert_strided_slice`). In addition:
  * Test function names are updated to more accurately reflect the case
    being exercised (e.g. `@insert_strided_index_slice1` ->
    `@insert_strided_index_slice_index_2d_into_3d`)
  * For consistency, took the liberty of updating some of the function
    names for `vector.extract_strided_slice`
  * `@insert_strided_slice_scalable` is effectively replaced with
    `@insert_strided_slice_f32_2d_into_3d_scalable`
2024-10-07 10:05:16 +01:00
Matthias Springer
6937dbbe51 [mlir][memref] Fix alloca lowering with 0 dimensions (#111119)
The `memref.alloca` lowering computed the allocation size incorrectly
when there were 0 dimensions.

Previously:
```
memref.alloca() : memref<10x0x2xf32>
--> llvm.alloca 20xf32
```

Now:
```
memref.alloca() : memref<10x0x2xf32>
--> llvm.alloca 0xf32
```

From the `llvm.alloca` documentation:
```
Allocating zero bytes is legal, but the returned pointer may not be unique.
```
2024-10-04 17:32:31 +02:00
Andrzej Warzyński
56d6b56739 [mlir][vector] Relax the requirements on broadcast dims (#99341)
NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute
was made mandatory.

This PR updates the semantics of the `in_bounds` attribute so that
broadcast dimensions are no longer required to be "in bounds".
Specifically, these xfer_read/xfer_write Ops become valid after this
change:

```mlir
  %read = vector.transfer_read %A[%base1, %base2], %pad
      {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
      {permutation_map = affine_map<(d0, d1) -> (0)>}
      : memref<?x?xf32>, vector<9xf32>

  vector.transfer_write %vec, %A[%base1, %base2],
      {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>}
      {permutation_map = affine_map<(d0, d1) -> (0)>}
      : vector<9xf32>, memref<?x?xf32>
```

Note that the value `false` merely means "may run out-of-bounds", i.e.,
the corresponding access can still be "in bounds". In fact, the folder
for xfer Ops is also updated (*) and will update the attribute value
corresponding to broadcast dims to `true` if all non-broadcast dims
are marked as "in bounds". 

Note that this PR doesn't change any of the lowerings. The changes in
"SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple
reverts of recent changes in #97049. Those were only meant to facilitate
making `in_bounds` mandatory and to work around the extra requirements
for broadcast dims (those requirements ere removed in this PR). All
changes in tests are also reverts of changes from #97049.

For context, here's a PR in which "broadcast" dims where forced to
always be "in-bounds":
  * https://reviews.llvm.org/D102566

(*) See `foldTransferInBoundsAttribute`.
2024-10-04 07:41:20 +01:00
Adam Siemieniuk
6c25604df2 [mlir][xegpu] Convert Vector load and store to XeGPU (#110826)
Adds patterns to lower vector.load|store to XeGPU operations.
2024-10-03 08:59:39 +02:00
Jack Frankland
8a57d82120 [mlir] Add Scalar Broadcast TOSA Depthwise Conv (#110806)
Support broadcasting of depthwise conv2d bias in tosa->linalg named
lowering in the case that bias is a rank-1 tensor with exactly 1
element. In this case TOSA specifies the value should first be broadcast
across the bias dimension and then across the result tensor.

Add `lit` tests for depthwise conv2d with scalar bias and for conv3d
which was already supported but missing coverage.

Signed-off-by: Jack Frankland <jack.frankland@arm.com>
2024-10-03 06:40:15 +01:00
Sergio Afonso
cdb3ebf1e6 [MLIR][OpenMP] Normalize representation of entry block arg-defining clauses (#109809)
This patch updates printing and parsing of operations including clauses
that define entry block arguments to the operation's region. This
impacts `in_reduction`, `map`, `private`, `reduction` and
`task_reduction`.

The proposed representation to be used by all such clauses is the
following:
```
<clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) {
  ...
}
```

The `byref` tag is only allowed for reduction-like clauses and the
`@<sym>` is required and only allowed for the `private` and
reduction-like clauses. The `map` clause does not accept any of these
two.

This change fixes some currently broken op representations, like
`omp.teams` or `omp.sections` reduction:
```
omp.teams reduction([byref] @<sym> -> %value : <type>) {
^bb0(%block_arg : <type>):
  ...
}
```

Additionally, it addresses some redundancy in the representation of the
previously mentioned cases, as well as e.g. `map` in `omp.target`. The
problem is that the block argument name after the arrow is not checked
in any way, which makes some misleading representations legal:
```mlir
omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) {
^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr):
  ...
}
```

In that case, `%x` maps to `%arg0`, contrary to what the representation
states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it
would likely cause issues if used anywhere inside of the operation's
region.

The solution implemented in this patch makes it so that values
introduced after the arrow on the representation of these clauses
implicitly define the corresponding entry block arguments, removing the
potential for these problematic representations. This is what is already
implemented for the `private` and `reduction` clauses of `omp.parallel`.

There are a couple of consequences of this change:
- Entry block argument-defining clauses must come at the end of the
operation's representation and in alphabetical order. This is because
they are printed/parsed as part of the region and a standardized
ordering is needed to reliably match op arguments with their
corresponding entry block arguments via the `BlockArgOpenMPOpInterface`.
- We can no longer define per-clause assembly formats to be reused by
all operations that take these clauses, since they must be passed to a
custom printer including the region and arguments of all other entry
block argument-defining clauses. Code duplication and potential for
introducing issues is minimized by providing the generic
`{print,parse}BlockArgRegion` helpers and associated structures.

MLIR and Flang lowering unit tests are updated due to changes in the
order and formatting of impacted operations.
2024-10-01 16:18:36 +01:00
Matthias Springer
2da417e7f6 [mlir][GPU] gpu.printf: Do not emit duplicate format strings (#110504)
Even if the same format string is used multiple times, emit just one
`LLVM:GlobalOp`.
2024-10-01 09:12:08 +02:00
Dimple Prajapati
f8ba021e64 [mlir][spirv] Add gpu printf op lowering to spirv.CL.printf op (#78510)
This change contains following:
	- adds lowering of printf op to spirv.CL.printf op in GPUToSPIRV pass.
	- Fixes Constant decoration parsing for spirv GlobalVariable.
	- minor modification to spirv.CL.printf op assembly format.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-09-30 15:39:13 -04:00
Finlay
af7aa223d2 [MLIR][GPU] Lower subgroup query ops in gpu-to-llvm-spv (#108839)
These ops are:
* gpu.subgroup_id
* gpu.lane_id
* gpu.num_subgroups
* gpu.subgroup_size

---------

Signed-off-by: Finlay Marno <finlay.marno@codeplay.com>
2024-09-26 14:52:12 +01:00
Daniel Hernandez-Juarez
1c47fa9b62 [mlir][AMDGPU] Add support for AMD f16 math library calls (#108809)
In this PR we add support for AMD f16 math library calls
(`__ocml_*_f16`)

CC: @krzysz00 @manupak
2024-09-23 12:52:00 -05:00
Daniel Hernandez-Juarez
b014265d99 [mlir][AMDGPU] New gfx12 barrier instructions and update lowering LDSBarrierOp (#109273)
New gfx12 barrier instructions: s.barrier.signal, s.barrier.wait and
s.wait.dscnt. And update lowering LDSBarrierOp accordingly.

CC: @krzysz00 @manupak @giuseros
2024-09-20 17:41:36 -05:00
Umang Yadav
9f8f1d9890 [MLIR][AMDGPU] Add ability to do 16-bit Memset with HIP APIs (#108587)
CC: @krzysz00  @manupak
2024-09-20 09:53:41 -05:00
Adam Siemieniuk
02d34d800b [mlir][vector][xegpu] Vector to XeGPU conversion pass (#107419)
Add pass for Vector to XeGPU dialect conversion and initial conversion
patterns for vector.transfer_read|write operations.
2024-09-19 15:16:23 -05:00
Benjamin Kramer
ac11945386 [mlir][GPU] block_id has the grid size as its range 2024-09-17 18:38:04 +02:00
Krzysztof Drewniak
a953982cb7 [mlir][GPU] Plumb range information through the NVVM lowerings (#107659)
Update the GPU to NVVM lowerings to correctly propagate range
information on IDs and dimension queries, etiher from
known_{block,grid}_size attributes or from `upperBound` annotations on
the operations themselves.
2024-09-13 12:07:51 -05:00
Sergio Afonso
6568062ff1 [MLIR][OpenMP] Improve assemblyFormat handling for clause-based ops (#108023)
This patch modifies the representation of `OpenMP_Clause` to allow
definitions to incorporate both required and optional arguments while
still allowing operations including them and overriding the
`assemblyFormat` to take advantage of automatically-populated format
strings.

The proposed approach is to split the `assemblyFormat` clause property
into `reqAssemblyFormat` and `optAssemblyFormat`, and remove the
`isRequired` template and associated `required` property. The
`OpenMP_Op` class, in turn, populates the new `clausesReqAssemblyFormat`
and `clausesOptAssemblyFormat` properties in addition to
`clausesAssemblyFormat`. These properties can be used by clause-based
OpenMP operation definitions to reconstruct parts of the
clause-inherited format string in a more flexible way when overriding
it.

Clause definitions are updated to follow this new approach and some
operation definitions overriding the `assemblyFormat` are simplified by
taking advantage of the improved flexibility, reducing code duplication.
The `verify-openmp-ops` tablegen pass is updated for the new
`OpenMP_Clause` representation.

Some MLIR and Flang unit tests had to be updated due to changes to the
default printing order of clauses on updated operations.
2024-09-13 12:57:41 +01:00
Krzysztof Drewniak
6292ea6879 [mlir][AMDGPU] Remove an old bf16 workaround (#108409)
The AMDGPU backend now implements LLVM's `bfloat` type. Therefore, we no
longer need to type convert MLIR's `bf16` to `i16` during lowerings to
ROCDL.

As a result of this change, we discovered that, whel the code for MFMA
and WMMA intrinsics was mainly prepared for this change, we were failing
to bitcast the bf16 results of WMMA operations out from the i16 they're
natively represented as. This commit also fixes that issue.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-09-12 17:45:39 -05:00
Nirvedh Meshram
a16164d0c2 [MLIR][ROCDL] Add dynamically legal ops to LowerGpuOpsToROCDLOpsPass (#108302)
Similar to https://github.com/llvm/llvm-project/pull/108266
After https://github.com/llvm/llvm-project/pull/102971
It is legal to generate `LLVM::ExpOp` and `LLVM::LogOp` if the type is
is a float16 or float32
2024-09-12 11:20:27 -05:00
Krzysztof Drewniak
9596e83b2a [mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312)
* Fix a bug introduced by the Chipset refactoring in #107720 where
atomics emulation for adds was mistakenly applied to gfx11+
* Add the case needed for gfx11+ atomic emulation, namely that gfx11
doesn't support atomically adding a v2f16 or v2bf16, thus requiring
MLIR-level legalization for buffer intrinsics that attempt to do such an
addition
* Add tests, including tests for gfx11 atomic emulation

Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>
2024-09-12 09:47:52 -05:00
Krzysztof Drewniak
90a0be9482 [mlir][LLVM] Refactor how range() annotations are handled for ROCDL intrinsics (#107658)
This commit introduces a ConstantRange attribute to match the
ConstantRange attribute type present in LLVM IR.

It then refactors the LLVM_IntrOpBase so that the basic part of the
intrinsic builder code can be re-used without needing to copy it or
get rid of important context. This, along with adding code for
handling an optional `range` attribute to that same base, allows us to
make the support for range() annotations generic without adding
another bit to IntrOpBase.

This commit then updates the lowering of index intrinsic operations to
use the new ConstantRange attribute and fixes a bug (where we'd be
subtracting 1 from upper bounds instead of adding it on operations
like gpu.block_dim) along the way.

The point of these changes is to enable these range annotations to be
used for the corresponding NVVM operations in a future commit.
2024-09-12 09:46:42 -05:00
Matthias Springer
b9674cb10f [mlir][SPIRV] Make test case more robust (#108388)
This commit is in preparation of #108381, which changes the insertion
point source materializations during a block type conversion slightly.
2024-09-12 15:34:59 +02:00
Krzysztof Drewniak
aa60a3e4d0 [mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108286)
Extend the lowering of atomic.fadd to support the v2f16 variant
avaliable on some AMDGPU chips.

Re-lands #108238 (and addresses review comments from there)

Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>
2024-09-11 17:51:07 -05:00