clang-p2996

Author	SHA1	Message	Date
Manupa Karunaratne	a6e72f9392	[MLIR][Vector] Add Lowering for vector.step (#113655 ) Currently, the lowering for vector.step lives under a folder. This is not ideal if we want to do transformation on it and defer the materizaliztion of the constants much later. This commits adds a rewrite pattern that could be used by using `transform.structured.vectorize_children_and_apply_patterns` transform dialect operation. Moreover, the rewriter of vector.step is also now used in -convert-vector-to-llvm pass where it handles scalable and non-scalable types as LLVM expects it. As a consequence of removing the vector.step lowering as its folder, linalg vectorization will keep vector.step intact.	2024-11-01 16:38:36 +00:00
Krzysztof Drewniak	3452149c05	[mlir][AMDGPU] Support vector<2xbf16> packed atomic fadd (#113929 ) Now that we use LLVM's native bfloat types in the AMDGPU lowering, enable vector<2xbf16> for AMDGPU.	2024-10-31 10:52:53 -05:00
Longsheng Mou	262afc8aec	[mlir][TosaToLinalg] `RescaleConverter` only support integer type (#114239 ) This PR fixes a bug in the `RescaleConverter` that allows non-integer types, which leads to a crash. Fixes #61383.	2024-10-31 11:32:19 +00:00
Simon Camphausen	95c2d79814	[mlir][EmitC] memref-to-emitc: insert conversion_casts (#114204 ) Add materializations to the conversion pass, such that types of non-converted operands are legalized.	2024-10-30 15:27:23 +01:00
Longsheng Mou	7ad63c0e44	[mlir][MathToFuncs] `MathToFuncs` only support integer type (#113693 ) This PR fixes a bug in `MathToFuncs` where it incorrectly converts index type for `math.ctlz` and `math.ipowi`, leading to a crash. Fixes #108150.	2024-10-28 09:54:51 +08:00
Durgadoss R	e33aec89ef	[MLIR][NVVM] Update the elect.sync Op to use intrinsics (#113757 ) Recently, we added an intrinsic for the elect.sync PTX instruction (PR 104780). This patch updates the corresponding Op in NVVM Dialect to lower to the intrinsic instead of inline-ptx. The existing test under Conversion/ is migrated to check for the new pattern. A separate test is added to verify the lowered intrinsic under the Target/ directory. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-27 22:24:31 +05:30
Longsheng Mou	5aa741d7ca	[mlir][SPIRVToLLVM] Erase empty `spirv.mlir.loop` in `LoopPattern` (#113527 ) This PR erases `spirv.mlir.loop` with an empty region in `LoopPattern`, resolving a crash. Fixes #113404.	2024-10-26 11:22:57 +08:00
Longsheng Mou	8f9fc6ce47	[mlir][GPU] Add FunctionOpInterface check for `OpToFuncCallLowering` (#113449 ) This PR adds a `FunctionOpInterface` check in `OpToFuncCallLowering` to resolve a crash when ops not in function. Fixes #113334.	2024-10-26 11:22:08 +08:00
donald chen	889b67c9d3	[mlir] [memref] add more checks to the memref.reinterpret_cast (#112669 ) Operation memref.reinterpret_cast was accept input like: %out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10], strides: [1] : memref<?xf32> to memref<10xf32> A problem arises: while lowering, the true offset of %out is %offset, but its data type indicates an offset of 0. Permitting this inconsistency can result in incorrect outcomes, as certain pass might erroneously extract the offset from the data type of %out. This patch fixes this by enforcing that the return value's data type aligns with the input parameter.	2024-10-26 08:07:51 +08:00
Sayan Saha	b91ce7bc0e	[Tosa] : Fix integer overflow for computing intmax+1 in tosa.cast to linalg. (#112455 ) This PR fixes an issue related to integer overflow when computing `(intmax+1)` for `i64` during `tosa-to-linalg` pass for `tosa.cast`. Found this issue while debugging a numerical mismatch for `deeplabv3` model from `torchvision` represented in `tosa` dialect using the `TorchToTosa` pipeline in `torch-mlir` repository. `torch.aten.to.dtype` is converted to `tosa.cast` that casts `f32` to `i64` type. Technically by the specification, `tosa.cast` doesn't handle casting `f32` to `i64`. So it's possible to add a verifier to error out for such tosa ops instead of producing incorrect code. However, I chose to fix the overflow issue to still be able to represent the `deeplabv3` model with `tosa` ops in the above-mentioned pipeline. Open to suggestions if adding the verifier is more appropriate instead.	2024-10-25 10:51:09 +01:00
Longsheng Mou	927559d27d	[mlir][vector] Fix a crash in `VectorToGPU` (#113454 ) This PR fixes a crash in `VectorToGPU` when the operand of `extOp` is a function argument, which cannot be retrieved using `getDefiningOp`. Fixes #107967.	2024-10-24 20:28:42 +08:00
Dmitriy Smirnov	27158edaa4	[MLIR][SPIRV] Update cast from IntN to Bool (#113329 ) This PR updates the cast to bool from IntN to treat any non-zero value as TRUE. This makes the cast more resilient to non-generic (i.e. "non 1") TRUE values. Signed-off-by: Dmitriy Smirnov <dmitriy.smirnov@arm.com>	2024-10-23 09:47:33 +01:00
Kunwar Grover	1004865f1c	[mlir][Vector] Support 0-d vectors natively in TransferOpReduceRank (#112907 ) Since `ddf2d62c7d` , 0-d vectors are supported in VectorType. This patch removes 0-d vector handling with scalars for the TransferOpReduceRank pattern. This pattern specifically introduces tensor.extract_slice during vectorization, causing vectorization to not fold transfer_read/transfer_write slices properly. The changes in vectorization test files reflect this. There are other places where lowering patterns are still side-stepping from handling 0-d vectors properly, by turning them into scalars, but this patch only focuses on the vector.transfer_x patterns.	2024-10-22 15:50:16 +01:00
Finlay	1775b98de7	[mlir][spirv] Add spirv-to-llvm conversion for OpControlBarrier (#111864 ) The conversion is based on the expected llvm function from the LLVM/SPIRV translation tool.	2024-10-19 11:55:04 +01:00
Benjamin Kramer	4d228e1ebd	[mlir][vector] Escape variable usage in test Otherwise the shell might expand this in the command line.	2024-10-17 12:43:32 +02:00
Sirui Mu	1dfb104eac	[mlir][LLVMIR] Add operand bundle support for llvm.intr.assume (#112143 ) This patch adds operand bundle support for `llvm.intr.assume`. This patch actually contains two parts: - `llvm.intr.assume` now accepts operand bundle related attributes and operands. `llvm.intr.assume` does not take constraint on the operand bundles, but obviously only a few set of operand bundles are meaningful. I plan to add some of those (e.g. `aligned` and `separate_storage` are what interest me but other people may be interested in other operand bundles as well) in future patches. - The definitions of `llvm.call`, `llvm.invoke`, and `llvm.call_intrinsic` actually define `op_bundle_tags` as an operation property. It turns out this approach would introduce some unnecessary burden if applied equally to the intrinsic operations because properties are not available through `Operation ` but we have to operate on `Operation ` during the import/export of intrinsics, so this PR changes it from a property to an array attribute. This patch relands commit `d8fadad07c`.	2024-10-16 20:49:02 +08:00
Andrzej Warzyński	37ad65ffb6	[mlir][arith] Remove some e2e tests (#112012 ) I am removing the recently added integration test for various Arith Ops. These operations and their lowerings are effectively already verified by the Arith-to-LLVM conversion tests in: * "mlir/test/Conversion/ArithToLLVM/arith-to-llvm.mlir" I've noticed that a few variants of `arith.cmpi` were missing in that file - those are added here as well. This is a follow-up for this discussion: * https://github.com/llvm/llvm-project/pull/92272 See also the recent update to our guidelines on e2e tests in MLIR: * https://github.com/llvm/mlir-www/pull/203	2024-10-16 07:43:49 +01:00
Sirui Mu	484c02780b	Revert "[mlir][LLVMIR] Add operand bundle support for llvm.intr.assume (#112143 )" This reverts commit `d8fadad07c`. The commit breaks the following CI builds: - ppc64le-mlir-rhel-clang: https://lab.llvm.org/buildbot/#/builders/129/builds/7685 - ppc64le-flang-rhel-clang: https://lab.llvm.org/buildbot/#/builders/157/builds/10338	2024-10-16 14:15:31 +08:00
Sirui Mu	d8fadad07c	[mlir][LLVMIR] Add operand bundle support for llvm.intr.assume (#112143 ) This patch adds operand bundle support for `llvm.intr.assume`. This patch actually contains two parts: - `llvm.intr.assume` now accepts operand bundle related attributes and operands. `llvm.intr.assume` does not take constraint on the operand bundles, but obviously only a few set of operand bundles are meaningful. I plan to add some of those (e.g. `aligned` and `separate_storage` are what interest me but other people may be interested in other operand bundles as well) in future patches. - The definitions of `llvm.call`, `llvm.invoke`, and `llvm.call_intrinsic` actually define `op_bundle_tags` as an operation property. It turns out this approach would introduce some unnecessary burden if applied equally to the intrinsic operations because properties are not available through `Operation ` but we have to operate on `Operation ` during the import/export of intrinsics, so this PR changes it from a property to an array attribute.	2024-10-16 12:51:50 +08:00
Andrzej Warzyński	3187a4917d	[mlir][vector] Add more tests for ConvertVectorToLLVM (8/n) (#111997 ) Adds tests with scalable vectors for the Vector-To-LLVM conversion pass. Covers the following Ops: * `vector.transfer_read`, * `vector.transfer_write`. In addition: * Duplicate tests from "vector-mask-to-llvm.mlir" are removed. * Tests for xfer_read/xfer_write are moved to a newly created test file, "vector-xfer-to-llvm.mlir". This follows an existing pattern among VectorToLLVM conversion tests. * Tests that test both xfer_read and xfer_write have their names updated to capture that (e.g. @transfer_read_1d_mask -> @transfer_read_write_1d_mask) * @transfer_write_1d_scalable_mask and @transfer_read_1d_scalable_mask are re-written as @transfer_read_write_1d_mask_scalable. This is to make it clear that this case is meant to complement @transfer_read_write_1d_mask. * @transfer_write_tensor is updated to also test xfer_read.	2024-10-15 13:15:36 +01:00
Sergio Afonso	0a17bdfc36	[MLIR][OpenMP] Remove terminators from loop wrappers (#112229 ) This patch simplifies the representation of OpenMP loop wrapper operations by introducing the `NoTerminator` trait and updating accordingly the verifier for the `LoopWrapperInterface`. Since loop wrappers are already limited to having exactly one region containing exactly one block, and this block can only hold a single `omp.loop_nest` or loop wrapper and an `omp.terminator` that does not return any values, it makes sense to simplify the representation of loop wrappers by removing the terminator. There is an extensive list of Lit tests that needed updating to remove the `omp.terminator`s adding some noise to this patch, but actual changes are limited to the definition of the `omp.wsloop`, `omp.simd`, `omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion and OpenMP dialect documentation.	2024-10-15 11:28:39 +01:00
Fabian Mora	58d97034c9	[mlir][OpenMP] Implement the ConvertToLLVMPatternInterface (#101997 ) This patch implements the `ConvertToLLVMPatternInterface` for the OpenMP dialect, allowing `convert-to-llvm` to act on the OpenMP dialect.	2024-10-11 15:07:08 -04:00
Andrzej Warzyński	f7eb271542	[mlir][vector] Add more tests for ConvertVectorToLLVM (7/n) (#111895 ) Adds tests with scalable vectors for the Vector-To-LLVM conversion pass. Covers the following Ops: * vector.fma * vector.reduce	2024-10-11 14:36:26 +01:00
Simon Camphausen	777142937a	[mlir][EmitC] Fail on memrefs with 0 dims in type conversion (#111965 ) This let's the type conversion fail instead of generating invalid array types.	2024-10-11 11:45:25 +02:00
Petr Kurapov	f8b7a65395	[MLIR][GPU-LLVM] Add in-pass signature update for opencl kernels (#105664 ) Default to Global address space for memrefs that do not have an explicit address space set in the IR. --------- Co-authored-by: Victor Perez <victor.perez@intel.com> Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Victor Perez <victor.perez@codeplay.com>	2024-10-10 14:04:52 +02:00
Adam Siemieniuk	ec450b1900	[mlir][xegpu] Allow out-of-bounds writes (#110811 ) Relaxes vector.transfer_write lowering to allow out-of-bound writes. This aligns lowering with the current hardware specification which does not update bytes in out-of-bound locations during block stores.	2024-10-09 18:59:14 +02:00
Dmitriy Smirnov	b9314a8219	[mlir][spirv] Update math.powf lowering (#111388 ) The PR updates math.powf lowering to produce NaN result for a negative base with a fractional exponent which matches the actual behaviour of the C/C++ implementation.	2024-10-09 09:04:31 +01:00
Benoit Jacob	d8a656ffaf	[MLIR] AMDGPUToROCDL: Use a bitcast op to reintepret a vector of i8 as single integer. (#111400 ) Found by inspecting AMDGPU assembly - so the arithmetic ops created there were definitely making their way into the target ISA. A `LLVM::BitcastOp` seems equivalent, and evaporates as expected in the target asm. Along the way, I thought that this helper function `mfmaConcatIfNeeded` could be renamed to `convertMFMAVectorOperand` to better convey its contract; so I don't need to think about whether a bitcast is a legitimate "concat" :-) --------- Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>	2024-10-07 14:14:18 -04:00
Andrzej Warzyński	f58e85a972	[mlir][vector] Add more tests for ConvertVectorToLLVM (6/n) (#111121 ) Adds tests with scalable vectors for the Vector-To-LLVM conversion pass. Covers the following Ops: * `vector.insert_strided_slice` With this change, for every test with fixed-width vectors, there should be a corresponding example with scalable vectors (for `vector.insert_strided_slice`). In addition: * Test function names are updated to more accurately reflect the case being exercised (e.g. `@insert_strided_index_slice1` -> `@insert_strided_index_slice_index_2d_into_3d`) * For consistency, took the liberty of updating some of the function names for `vector.extract_strided_slice` * `@insert_strided_slice_scalable` is effectively replaced with `@insert_strided_slice_f32_2d_into_3d_scalable`	2024-10-07 10:05:16 +01:00
Matthias Springer	6937dbbe51	[mlir][memref] Fix `alloca` lowering with 0 dimensions (#111119 ) The `memref.alloca` lowering computed the allocation size incorrectly when there were 0 dimensions. Previously: ``` memref.alloca() : memref<10x0x2xf32> --> llvm.alloca 20xf32 ``` Now: ``` memref.alloca() : memref<10x0x2xf32> --> llvm.alloca 0xf32 ``` From the `llvm.alloca` documentation: ``` Allocating zero bytes is legal, but the returned pointer may not be unique. ```	2024-10-04 17:32:31 +02:00
Andrzej Warzyński	56d6b56739	[mlir][vector] Relax the requirements on broadcast dims (#99341 ) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated () and will update the attribute value corresponding to broadcast dims to `true` if all non-broadcast dims are marked as "in bounds". Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in #97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from #97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.	2024-10-04 07:41:20 +01:00
Adam Siemieniuk	6c25604df2	[mlir][xegpu] Convert Vector load and store to XeGPU (#110826 ) Adds patterns to lower vector.load\|store to XeGPU operations.	2024-10-03 08:59:39 +02:00
Jack Frankland	8a57d82120	[mlir] Add Scalar Broadcast TOSA Depthwise Conv (#110806 ) Support broadcasting of depthwise conv2d bias in tosa->linalg named lowering in the case that bias is a rank-1 tensor with exactly 1 element. In this case TOSA specifies the value should first be broadcast across the bias dimension and then across the result tensor. Add `lit` tests for depthwise conv2d with scalar bias and for conv3d which was already supported but missing coverage. Signed-off-by: Jack Frankland <jack.frankland@arm.com>	2024-10-03 06:40:15 +01:00
Sergio Afonso	cdb3ebf1e6	[MLIR][OpenMP] Normalize representation of entry block arg-defining clauses (#109809 ) This patch updates printing and parsing of operations including clauses that define entry block arguments to the operation's region. This impacts `in_reduction`, `map`, `private`, `reduction` and `task_reduction`. The proposed representation to be used by all such clauses is the following: ``` <clause_name>([byref] [@<sym>] %value -> %block_arg [, ...] : <type>[, ...]) { ... } ``` The `byref` tag is only allowed for reduction-like clauses and the `@<sym>` is required and only allowed for the `private` and reduction-like clauses. The `map` clause does not accept any of these two. This change fixes some currently broken op representations, like `omp.teams` or `omp.sections` reduction: ``` omp.teams reduction([byref] @<sym> -> %value : <type>) { ^bb0(%block_arg : <type>): ... } ``` Additionally, it addresses some redundancy in the representation of the previously mentioned cases, as well as e.g. `map` in `omp.target`. The problem is that the block argument name after the arrow is not checked in any way, which makes some misleading representations legal: ```mlir omp.target map_entries(%x -> %arg1, %y -> %arg0, %z -> %doesnt_exist : !llvm.ptr, !llvm.ptr, !llvm.ptr) { ^bb0(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr, %arg2 : !llvm.ptr): ... } ``` In that case, `%x` maps to `%arg0`, contrary to what the representation states, and `%z` maps to `%arg2`. `%doesnt_exist` is not resolved, so it would likely cause issues if used anywhere inside of the operation's region. The solution implemented in this patch makes it so that values introduced after the arrow on the representation of these clauses implicitly define the corresponding entry block arguments, removing the potential for these problematic representations. This is what is already implemented for the `private` and `reduction` clauses of `omp.parallel`. There are a couple of consequences of this change: - Entry block argument-defining clauses must come at the end of the operation's representation and in alphabetical order. This is because they are printed/parsed as part of the region and a standardized ordering is needed to reliably match op arguments with their corresponding entry block arguments via the `BlockArgOpenMPOpInterface`. - We can no longer define per-clause assembly formats to be reused by all operations that take these clauses, since they must be passed to a custom printer including the region and arguments of all other entry block argument-defining clauses. Code duplication and potential for introducing issues is minimized by providing the generic `{print,parse}BlockArgRegion` helpers and associated structures. MLIR and Flang lowering unit tests are updated due to changes in the order and formatting of impacted operations.	2024-10-01 16:18:36 +01:00
Matthias Springer	2da417e7f6	[mlir][GPU] gpu.printf: Do not emit duplicate format strings (#110504 ) Even if the same format string is used multiple times, emit just one `LLVM:GlobalOp`.	2024-10-01 09:12:08 +02:00
Dimple Prajapati	f8ba021e64	[mlir][spirv] Add gpu printf op lowering to spirv.CL.printf op (#78510 ) This change contains following: - adds lowering of printf op to spirv.CL.printf op in GPUToSPIRV pass. - Fixes Constant decoration parsing for spirv GlobalVariable. - minor modification to spirv.CL.printf op assembly format. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2024-09-30 15:39:13 -04:00
Finlay	af7aa223d2	[MLIR][GPU] Lower subgroup query ops in gpu-to-llvm-spv (#108839 ) These ops are: * gpu.subgroup_id * gpu.lane_id * gpu.num_subgroups * gpu.subgroup_size --------- Signed-off-by: Finlay Marno <finlay.marno@codeplay.com>	2024-09-26 14:52:12 +01:00
Daniel Hernandez-Juarez	1c47fa9b62	[mlir][AMDGPU] Add support for AMD f16 math library calls (#108809 ) In this PR we add support for AMD f16 math library calls (`__ocml_*_f16`) CC: @krzysz00 @manupak	2024-09-23 12:52:00 -05:00
Daniel Hernandez-Juarez	b014265d99	[mlir][AMDGPU] New gfx12 barrier instructions and update lowering LDSBarrierOp (#109273 ) New gfx12 barrier instructions: s.barrier.signal, s.barrier.wait and s.wait.dscnt. And update lowering LDSBarrierOp accordingly. CC: @krzysz00 @manupak @giuseros	2024-09-20 17:41:36 -05:00
Umang Yadav	9f8f1d9890	[MLIR][AMDGPU] Add ability to do 16-bit Memset with HIP APIs (#108587 ) CC: @krzysz00 @manupak	2024-09-20 09:53:41 -05:00
Adam Siemieniuk	02d34d800b	[mlir][vector][xegpu] Vector to XeGPU conversion pass (#107419 ) Add pass for Vector to XeGPU dialect conversion and initial conversion patterns for vector.transfer_read\|write operations.	2024-09-19 15:16:23 -05:00
Benjamin Kramer	ac11945386	[mlir][GPU] block_id has the grid size as its range	2024-09-17 18:38:04 +02:00
Krzysztof Drewniak	a953982cb7	[mlir][GPU] Plumb range information through the NVVM lowerings (#107659 ) Update the GPU to NVVM lowerings to correctly propagate range information on IDs and dimension queries, etiher from known_{block,grid}_size attributes or from `upperBound` annotations on the operations themselves.	2024-09-13 12:07:51 -05:00
Sergio Afonso	6568062ff1	[MLIR][OpenMP] Improve assemblyFormat handling for clause-based ops (#108023 ) This patch modifies the representation of `OpenMP_Clause` to allow definitions to incorporate both required and optional arguments while still allowing operations including them and overriding the `assemblyFormat` to take advantage of automatically-populated format strings. The proposed approach is to split the `assemblyFormat` clause property into `reqAssemblyFormat` and `optAssemblyFormat`, and remove the `isRequired` template and associated `required` property. The `OpenMP_Op` class, in turn, populates the new `clausesReqAssemblyFormat` and `clausesOptAssemblyFormat` properties in addition to `clausesAssemblyFormat`. These properties can be used by clause-based OpenMP operation definitions to reconstruct parts of the clause-inherited format string in a more flexible way when overriding it. Clause definitions are updated to follow this new approach and some operation definitions overriding the `assemblyFormat` are simplified by taking advantage of the improved flexibility, reducing code duplication. The `verify-openmp-ops` tablegen pass is updated for the new `OpenMP_Clause` representation. Some MLIR and Flang unit tests had to be updated due to changes to the default printing order of clauses on updated operations.	2024-09-13 12:57:41 +01:00
Krzysztof Drewniak	6292ea6879	[mlir][AMDGPU] Remove an old bf16 workaround (#108409 ) The AMDGPU backend now implements LLVM's `bfloat` type. Therefore, we no longer need to type convert MLIR's `bf16` to `i16` during lowerings to ROCDL. As a result of this change, we discovered that, whel the code for MFMA and WMMA intrinsics was mainly prepared for this change, we were failing to bitcast the bf16 results of WMMA operations out from the i16 they're natively represented as. This commit also fixes that issue. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2024-09-12 17:45:39 -05:00
Nirvedh Meshram	a16164d0c2	[MLIR][ROCDL] Add dynamically legal ops to LowerGpuOpsToROCDLOpsPass (#108302 ) Similar to https://github.com/llvm/llvm-project/pull/108266 After https://github.com/llvm/llvm-project/pull/102971 It is legal to generate `LLVM::ExpOp` and `LLVM::LogOp` if the type is is a float16 or float32	2024-09-12 11:20:27 -05:00
Krzysztof Drewniak	9596e83b2a	[mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312 ) * Fix a bug introduced by the Chipset refactoring in #107720 where atomics emulation for adds was mistakenly applied to gfx11+ * Add the case needed for gfx11+ atomic emulation, namely that gfx11 doesn't support atomically adding a v2f16 or v2bf16, thus requiring MLIR-level legalization for buffer intrinsics that attempt to do such an addition * Add tests, including tests for gfx11 atomic emulation Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>	2024-09-12 09:47:52 -05:00
Krzysztof Drewniak	90a0be9482	[mlir][LLVM] Refactor how range() annotations are handled for ROCDL intrinsics (#107658 ) This commit introduces a ConstantRange attribute to match the ConstantRange attribute type present in LLVM IR. It then refactors the LLVM_IntrOpBase so that the basic part of the intrinsic builder code can be re-used without needing to copy it or get rid of important context. This, along with adding code for handling an optional `range` attribute to that same base, allows us to make the support for range() annotations generic without adding another bit to IntrOpBase. This commit then updates the lowering of index intrinsic operations to use the new ConstantRange attribute and fixes a bug (where we'd be subtracting 1 from upper bounds instead of adding it on operations like gpu.block_dim) along the way. The point of these changes is to enable these range annotations to be used for the corresponding NVVM operations in a future commit.	2024-09-12 09:46:42 -05:00
Matthias Springer	b9674cb10f	[mlir][SPIRV] Make test case more robust (#108388 ) This commit is in preparation of #108381, which changes the insertion point source materializations during a block type conversion slightly.	2024-09-12 15:34:59 +02:00
Krzysztof Drewniak	aa60a3e4d0	[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108286 ) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Re-lands #108238 (and addresses review comments from there) Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>	2024-09-11 17:51:07 -05:00

1 2 3 4 5 ...

1884 Commits