clang-p2996

Author	SHA1	Message	Date
Alexander Belyaev	bfcd3fa825	[mlir] Add result name for gpu.block_id and gpu.thread_id ops. (#83393 ) expand-arith-ops.mlir fails on windows, but this is unrelated to this PR	2024-02-29 10:57:09 +01:00
Guray Ozen	d7f59c8fb8	[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#81489 ) This PR moves lowering of math dialect later in the pipeline. Because math dialect is lowered correctly by createConvertGpuOpsToNVVMOps for GPU target, and it needs to run it first. Reland #78556	2024-02-13 08:31:42 +01:00
Benjamin Kramer	98dbc688de	Revert "[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556 )" This reverts commit `74bf0b1cd9`. The test always fails. \| mlir/test/Dialect/GPU/test-nvvm-pipeline.mlir:23:16: error: CHECK-PTX: expected string not found in input \| // CHECK-PTX: __nv_expf https://lab.llvm.org/buildbot/#/builders/61/builds/53789	2024-01-31 17:41:21 +01:00
Guray Ozen	74bf0b1cd9	[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556 ) This PR moves lowering of math dialect later in the pipeline. Because math dialect is lowered correctly by `createConvertGpuOpsToNVVMOps` for GPU target, and it needs to run it first.	2024-01-31 15:24:32 +01:00
Matthias Springer	ce7cc723b9	[mlir][memref] `memref.subview`: Verify result strides The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases. Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit. Also refactor the implementation a bit: * If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`. * `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments. * `produceSubViewErrorMsg` does not have to be templatized. * Fix comment and add additional assert to `ExpandStridedMetadata.cpp`, to make sure that the memref.subview verifier is in sync with the memref.subview -> memref.reinterpret_cast lowering. Note: This change is identical to #79865, but with a fixed comment and an additional assert in `ExpandStridedMetadata.cpp`. (I reverted #79865 in #80116, but the implementation was actually correct, just the comment in `ExpandStridedMetadata.cpp` was confusing.)	2024-01-31 09:28:53 +00:00
Matthias Springer	96c907dbce	Revert "[mlir][memref] `memref.subview`: Verify result strides" (#80116 ) Reverts llvm/llvm-project#79865 I think there is a bug in the stride computation in `SubViewOp::inferResultType`. (Was already there before this change.) Reverting this commit for now and updating the original pull request with a fix and more test cases.	2024-01-31 09:35:13 +01:00
Matthias Springer	db49319264	[mlir][memref] `memref.subview`: Verify result strides (#79865 ) The `memref.subview` verifier currently checks result shape, element type, memory space and offset of the result type. However, the strides of the result type are currently not verified. This commit adds verification of result strides for non-rank reducing ops and fixes invalid IR in test cases. Verification of result strides for ops with rank reductions is more complex (and there could be multiple possible result types). That is left for a separate commit. Also refactor the implementation a bit: * If `computeMemRefRankReductionMask` could not compute the dropped dimensions, there must be something wrong with the op. Return `FailureOr` instead of `std::optional`. * `isRankReducedMemRefType` did much more than just checking whether the op has rank reductions or not. Inline the implementation into the verifier and add better comments. * `produceSubViewErrorMsg` does not have to be templatized.	2024-01-31 09:14:48 +01:00
Matthias Springer	fbb62d449c	[mlir][bufferization] Buffer deallocation: Make op preconditions stricter (#75127 ) The buffer deallocation pass checks the IR ("operation preconditions") to make sure that there is no IR that is unsupported. In such a case, the pass signals a failure. The pass now rejects all ops with unknown memory effects. We do not know whether such an op allocates memory or not. Therefore, the buffer deallocation pass does not know whether a deallocation op should be inserted or not. Memory effects are queried from the `MemoryEffectOpInterface` interface. Ops that do not implement this interface but have the `RecursiveMemoryEffects` trait do not have any side effects (apart from the ones that their nested ops may have). Unregistered ops are now rejected by the pass because they do not implement the `MemoryEffectOpInterface` and neither do we know if they have `RecursiveMemoryEffects` or not. All test cases that currently have unregistered ops are updated to use registered ops.	2024-01-21 11:10:09 +01:00
Fabian Mora	5b4f2b906b	[mlir][gpu] Add an offloading handler attribute to `gpu.module` (#78047 ) This patch adds an optional offloading handler attribute to the`gpu.module` op. This attribute will be used during `gpu-module-to-binary` pass to override the offloading handler used in the `gpu.binary` op.	2024-01-15 16:58:10 -05:00
Fabian Mora	a1eaed7a21	[mlir][gpu] Fix GPU YieldOP format and traits (#78006 ) This patch adds assembly format to `gpu::YieldOp`. It also adds the return like trait, to make it compatible with `RegionBranchOpInterface`.	2024-01-14 21:19:20 -05:00
Guray Ozen	5b33cff397	[mlir][gpu] Add Support for Cluster of Thread Blocks in `gpu.launch` (#76924 )	2024-01-06 11:17:01 +01:00
Andrzej Warzyński	ca5d34ec71	[mlir][TD] Fix the order of return handles (#76929 ) Replace (in tests and docs): %forall, %tiled = transform.structured.tile_using_forall with (updated order of return handles): %tiled, %forall = transform.structured.tile_using_forall Similar change is applied to (in the TD tutorial): transform.structured.fuse_into_containing_op This update makes sure that the tests/documentation are consistent with the Op specifications. Follow-up for #67320 which updated the order of the return handles for `tile_using_forall`.	2024-01-04 12:54:16 +00:00
Jakub Kuderski	c0345b4648	[mlir][gpu] Add subgroup_reduce to shuffle lowering (#76530 ) This supports both the scalar and the vector multi-reduction cases.	2024-01-02 16:14:22 -05:00
Jakub Kuderski	2af186f9bd	[mlir][gpu] Add patterns to break down subgroup reduce (#76271 ) The new patterns break down subgroup reduce ops with vector values into a sequence of subgroup reductions that fit the native shuffle size. The maximum/native shuffle size is parametrized. The overall goal is to be able to perform multi-element reductions with a sequence of `gpu.shuffle` ops.	2023-12-28 14:39:46 -05:00
Jakub Kuderski	72003adf6b	[mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015 ) Each vector element is reduced independently, which is a form of multi-reduction. The plan is to allow for gradual lowering of multi-reduction that results in fewer `gpu.shuffle` ops at the end: 1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d `gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32 For example we can perform 2 independent f16 reductions with a series of `gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.	2023-12-21 11:55:43 -05:00
Matthias Springer	f7096428b4	[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` (#75315 ) Infer the side effects of `gpu.launch` from its body.	2023-12-20 15:25:25 +09:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Fabian Mora	419c45a325	[mlir][gpu] Fix crash in `gpu-module-to-binary` (#75477 ) This patch fixes the error in issue #75434. The crash was being caused by not checking for a lack of target attributes in a GPU module. It's now considered an error to invoke the pass with a GPU module with no target attributes.	2023-12-14 14:03:10 -05:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `dd09221a29` and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Jakub Kuderski	dd09221a29	Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `e0aac8c88d`. I'm seeing some nvidia integration test failures: https://lab.llvm.org/buildbot/#/builders/61/builds/52334.	2023-11-27 11:29:23 -05:00
Jakub Kuderski	e0aac8c88d	[mlir][gpu] Align reduction operations with vector combining kinds (#73423 ) The motivation for this change is explained in https://github.com/llvm/llvm-project/issues/72354. Before this change, we could not tell between signed/unsigned minimum/maximum and NaN treatment for floating point values. The mapping of old reduction operations to the new ones is as follows: * `min` --> `minsi` for ints, `minf` for floats * `max` --> `maxsi` for ints, `maxf` for floats New reduction kinds not represented in the old enum: `minui`, `maxui`, `minimumf`, `maximumf`. As a next step, I would like to have a common definition of combining kinds used by the `vector` and `gpu` dialects. Separately, the GPU to SPIR-V lowering does not yet properly handle zero and NaN values -- the behavior of floating point min/max group reductions is not specified by the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-11-27 11:19:20 -05:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Guray Ozen	ea84897ba3	[mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (#71546 ) While the `gpu.launch` Op allows setting the size via the `dynamic_shared_memory_size` argument, accessing the dynamic shared memory is very convoluted. This PR implements the proposed Op, `gpu.dynamic_shared_memory` that aims to simplify the utilization of dynamic shared memory. RFC: https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/ Proposal from RFC This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory feature efficiently. It is is a powerful feature that enables the allocation of shared memory at runtime with the kernel launch on the host. Afterwards, the memory can be accessed directly from the device. I believe similar story exists for AMDGPU. Current way Using Dynamic Shared Memory with MLIR Let me illustrate the challenges of using dynamic shared memory in MLIR with an example below. The process involves several steps: - memref.global 0-sized array LLVM's NVPTX backend expects - dynamic_shared_memory_size Set the size of dynamic shared memory - memref.get_global Access the global symbol - reinterpret_cast and subview Many OPs for pointer arithmetic ``` // Step 1. Create 0-sized global symbol. Manually set the alignment memref.global "private" @dynamicShmem : memref<0xf16, 3> { alignment = 16 } func.func @main() { // Step 2. Allocate shared memory gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { // Step 3. Access the global object %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3> // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations. %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128], strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3> %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> %7 = memref.subview %6[0, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3> %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3> // Step.5 Use "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index) gpu.terminator } ``` Let’s write the program above with that: ``` func.func @main() { gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { %i = arith.constant 18 : index // Step 1: Obtain shared memory directly %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3> %c147456 = arith.constant 147456 : index %c155648 = arith.constant 155648 : index %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3> %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3> // Step 2: Utilize the shared memory "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index) } } ``` This PR resolves #72513	2023-11-16 14:42:17 +01:00
drazi	9a3d3c7093	generalize pass gpu-kernel-outlining for symbol op (#72074 ) This PR generalize gpu-out-lining pass to take care of ops `SymbolOpInterface` instead of just `func::FuncOp`. Before this change, gpu-out-lining pass will skip `llvm.func`. ```mlir module { llvm.func @main() { %c1 = arith.constant 1 : index gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) { gpu.terminator } llvm.return } } ``` After this change, gpu-out-lining pass can handle llvm.func as well.	2023-11-12 21:48:49 -08:00
spaceotter	00c3c73189	[mlir][gpu] Separate the barrier elimination code from transform ops (#71762 ) Allows the barrier elimination code to be run from C++ as well. The code from transforms dialect is copied as-is, the pass and populate functions have beed added at the end. Co-authored-by: Eric Eaton <eric@nod-labs.com>	2023-11-10 17:59:09 -08:00
spaceotter	51af040b22	[mlir][gpu] Eliminate redundant gpu.barrier ops (#71575 ) Adds a canonicalizer for gpu.barrier that gets rid of duplicates. Co-authored-by: Eric Eaton <eric@nod-labs.com>	2023-11-09 18:06:20 -05:00
Fabian Mora	42630689e2	[mlir][gpu] Clean GPU `Passes.h` from external SPIRV includes (#71331 ) Removes the `SPIRVAttributes.h` header from `GPU/Transforms/Passes.h`	2023-11-05 17:06:04 -08:00
Sang Ik Lee	2dace04521	[mlir][spirv] Implement gpu::TargetAttrInterface (#69949 ) This commit implements gpu::TargetAttrInterface for SPIR-V target attribute. The plan is to use this to enable GPU compilation pipeline for OpenCL kernels later. The changes do not impact Vulkan shaders using milr-vulkan-runner. New GPU Dialect transform pass spirv-attach-target is implemented for attaching attribute from CLI. gpu-module-to-binary pass now works with GPU module that has SPIR-V module with OpenCL kernel functions inside.	2023-11-05 08:11:53 -08:00
Christian Ulmann	7ed96b1c0d	[MLIR][LLVM] Remove last typed pointer remnants from tests (#71232 ) This commit removes all LLVM dialect typed pointers from the lit tests. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-04 14:13:31 +01:00
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Aviad Cohen	7060422265	[mlir][Linalg]: Optimize linalg generic in transform::PromoteOp to avoid unnecessary copies (#68555 ) If the operands are not used in the payload of linalg generic operations, there is no need to copy them before the operation.	2023-10-14 10:40:45 +03:00
Aart Bik	39038177ee	[mlir][sparse][gpu] add CSC and BSR format to cuSparse GPU ops (#67509 ) This adds two cuSparse formats to the GPU dialect support. Together with proper lowering and runtime cuda support. Also fixes a few minor omissions.	2023-09-27 09:32:25 -07:00
Oleksandr "Alex" Zinenko	96ff0255f2	[mlir] cleanup of structured.tile* transform ops (#67320 ) Rename and restructure tiling-related transform ops from the structured extension to be more homogeneous. In particular, all ops now follow a consistent naming scheme: - `transform.structured.tile_using_for`; - `transform.structured.tile_using_forall`; - `transform.structured.tile_reduction_using_for`; - `transform.structured.tile_reduction_using_forall`. This drops the "_op" naming artifact from `tile_to_forall_op` that shouldn't have been included in the first place, consistently specifies the name of the control flow op to be produced for loops (instead of `tile_reduction_using_scf` since `scf.forall` also belongs to `scf`), and opts for the `using` connector to avoid ambiguity. The loops produced by tiling are now systematically placed as trailing results of the transform op. While this required changing 3 out of 4 ops (except for `tile_using_for`), this is the only choice that makes sense when producing multiple `scf.for` ops that can be associated with a variadic number of handles. This choice is also most consistent with other transform ops from the structured extension, in particular with fusion ops, that produce the structured op as the leading result and the loop as the trailing result.	2023-09-26 09:14:29 +02:00
Tobias Gysi	85175edd4e	[mlir][llvm] Replace NullOp by ZeroOp (#67183 ) This revision replaces the LLVM dialect NullOp by the recently introduced ZeroOp. The ZeroOp is more generic in the sense that it represents zero values of any LLVM type rather than null pointers only. This is a follow to https://github.com/llvm/llvm-project/pull/65508	2023-09-25 11:11:52 +02:00
Martin Erhart	522c1d0eea	[mlir][gpu][bufferization] Implement BufferDeallocationOpInterface for gpu.terminator (#66880 ) This is necessary to support deallocation of IR with gpu.launch operations because it does not implement the RegionBranchOpInterface. Implementing the interface would require it to support regions with unstructured control flow and produced arguments/results.	2023-09-20 12:28:28 +02:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
Nicolas Vasilache	92f088d335	[mlir][gpu][transform] Provide better error messages and avoid crashing in MapForallToBlocks. This revision addresses issues surfaced in https://reviews.llvm.org/D159093	2023-09-04 14:11:38 +00:00
Aart Bik	289f7231f9	[mlir][sparse][gpu] minor code cleanup for sparse gpu ops Consistent order of ops and related methods. Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp since this is a general utility for sparse matrices, not specific to GEMM ops only. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D157922	2023-08-14 15:08:57 -07:00
Fabian Mora	43752a2aa3	[mlir][gpu] Add the `gpu-module-to-binary` pass. For an explanation of these patches see D154153. Commit message: This pass converts GPU modules into GPU binaries, serializing all targets present in a GPU module by invoking the `serializeToObject` target attribute method. Depends on D154147 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154149	2023-08-12 00:24:53 +00:00
Fabian Mora	8ae074b195	[mlir][gpu] Add the Select Object compilation attribute. For an explanation of these patches see D154153. Commit message: This patch adds the default offloading handler for GPU binary ops: `#gpu.select_object`, it selects the object to embed based on an index or a target attribute, embedding the object as a global string and launches the kernel using the scheme used in the GPU to LLVM pass. Depends on D154137 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154147	2023-08-11 22:00:35 +00:00
Fabian Mora	a63db3f5f5	[mlir][gpu] Modifies `gpu.launch_func` to allow lowering it after gpu-to-llvm. For an explanation of these patches see D154153. Commit message: In order to lower `gpu.launch_func` after running `gpu-to-llvm` it must be able to handle lowered types -eg. index -> i64. This patch also allows the op to refer to GPU binaries and not only GPU modules. Depends on D154132. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154137	2023-08-11 21:56:37 +00:00
Fabian Mora	068213130d	[mlir][ROCDL] Adds the ROCDL target attribute. For an explanation of these patches see D154153. Commit message: This patch adds the ROCDL target attribute for serializing GPU modules into strings containing HSAco. Depends on D154117 Differential Revision: https://reviews.llvm.org/D154129	2023-08-11 21:44:05 +00:00
Fabian Mora	1e77536e1d	Revert "[mlir][ROCDL] Adds the ROCDL target attribute." This reverts commit `6a0feb1503`.	2023-08-11 19:50:05 +00:00
Fabian Mora	bf24fb81ac	[mlir][gpu] Add `gpu.binary` op and `#gpu.object` attribute. For an explanation of these patches see D154153. Commit message: Adds the `#gpu.object` attribute for holding a binary object and the target attribute used to create it. Also adds the `gpu.binary` operation used to store GPU objects. Depends on D154108 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154132	2023-08-11 19:48:18 +00:00
Fabian Mora	6a0feb1503	[mlir][ROCDL] Adds the ROCDL target attribute. For an explanation of these patches see D154153. Commit message: This patch adds the ROCDL target attribute for serializing GPU modules into strings containing HSAco. Depends on D154117 Reviewed By: mehdi_amini, krzysz00 Differential Revision: https://reviews.llvm.org/D154129	2023-08-11 19:43:59 +00:00
Aart Bik	6c4cd7a13e	[mlir][sparse][gpu] refine sparse gpu round-trip and lowering test Tests had become inconsistent, and contained a few slip ups (e.g. non-async versions did not lower) Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157666	2023-08-10 17:18:59 -07:00
Aart Bik	95a6c509c9	[mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs Rationale: Since we only support default algorithm for SpGEMM, we can remove the estimate op (for now at least). This also introduces the set csr pointers op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown. This revision paves the way for actual recognition of SpGEMM in the sparsifier. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157645	2023-08-10 13:52:47 -07:00
Ivan Butygin	793ee2bf08	[mlir][gpu] Add DecomposeMemrefsPass Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail. This pass extracts sizes and strides via `memref.extract_strrided_metadata` outside `gpu.launch` body and do index/offset calculation explicitly and then reconstructs memrefs via `memref.reinterpret_cast`. `memref.reinterpret_cast` then lowered via https://reviews.llvm.org/D155011 Differential Revision: https://reviews.llvm.org/D155247	2023-08-10 22:28:05 +02:00
Mehdi Amini	363b655920	Finish renaming getOperandSegmentSizeAttr() from `operand_segment_sizes` to `operandSegmentSizes` This renaming started with the native ODS support for properties, this is completing it. A mass automated textual rename seems safe for most codebases. Drop also the ods prefix to keep the accessors the same as they were before this change: properties.odsOperandSegmentSizes reverts back to: properties.operandSegementSizes The ODS prefix was creating divergence between all the places and make it harder to be consistent. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D157173	2023-08-09 19:37:01 -07:00
Ivan Butygin	b13248f997	Revert "[mlir][gpu] Add DecomposeMemrefsPass" Broke some bots This reverts commit `2b5b2bfef1`.	2023-08-10 03:07:28 +02:00

1 2 3 4 5

238 Commits