clang-p2996

Author	SHA1	Message	Date
Jakub Kuderski	72003adf6b	[mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015 ) Each vector element is reduced independently, which is a form of multi-reduction. The plan is to allow for gradual lowering of multi-reduction that results in fewer `gpu.shuffle` ops at the end: 1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d `gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32 For example we can perform 2 independent f16 reductions with a series of `gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.	2023-12-21 11:55:43 -05:00
Matthias Springer	f7096428b4	[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` (#75315 ) Infer the side effects of `gpu.launch` from its body.	2023-12-20 15:25:25 +09:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Fabian Mora	419c45a325	[mlir][gpu] Fix crash in `gpu-module-to-binary` (#75477 ) This patch fixes the error in issue #75434. The crash was being caused by not checking for a lack of target attributes in a GPU module. It's now considered an error to invoke the pass with a GPU module with no target attributes.	2023-12-14 14:03:10 -05:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `dd09221a29` and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Jakub Kuderski	dd09221a29	Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `e0aac8c88d`. I'm seeing some nvidia integration test failures: https://lab.llvm.org/buildbot/#/builders/61/builds/52334.	2023-11-27 11:29:23 -05:00
Jakub Kuderski	e0aac8c88d	[mlir][gpu] Align reduction operations with vector combining kinds (#73423 ) The motivation for this change is explained in https://github.com/llvm/llvm-project/issues/72354. Before this change, we could not tell between signed/unsigned minimum/maximum and NaN treatment for floating point values. The mapping of old reduction operations to the new ones is as follows: * `min` --> `minsi` for ints, `minf` for floats * `max` --> `maxsi` for ints, `maxf` for floats New reduction kinds not represented in the old enum: `minui`, `maxui`, `minimumf`, `maximumf`. As a next step, I would like to have a common definition of combining kinds used by the `vector` and `gpu` dialects. Separately, the GPU to SPIR-V lowering does not yet properly handle zero and NaN values -- the behavior of floating point min/max group reductions is not specified by the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-11-27 11:19:20 -05:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Guray Ozen	ea84897ba3	[mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (#71546 ) While the `gpu.launch` Op allows setting the size via the `dynamic_shared_memory_size` argument, accessing the dynamic shared memory is very convoluted. This PR implements the proposed Op, `gpu.dynamic_shared_memory` that aims to simplify the utilization of dynamic shared memory. RFC: https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/ Proposal from RFC This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory feature efficiently. It is is a powerful feature that enables the allocation of shared memory at runtime with the kernel launch on the host. Afterwards, the memory can be accessed directly from the device. I believe similar story exists for AMDGPU. Current way Using Dynamic Shared Memory with MLIR Let me illustrate the challenges of using dynamic shared memory in MLIR with an example below. The process involves several steps: - memref.global 0-sized array LLVM's NVPTX backend expects - dynamic_shared_memory_size Set the size of dynamic shared memory - memref.get_global Access the global symbol - reinterpret_cast and subview Many OPs for pointer arithmetic ``` // Step 1. Create 0-sized global symbol. Manually set the alignment memref.global "private" @dynamicShmem : memref<0xf16, 3> { alignment = 16 } func.func @main() { // Step 2. Allocate shared memory gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { // Step 3. Access the global object %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3> // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations. %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128], strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3> %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> %7 = memref.subview %6[0, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3> %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3> // Step.5 Use "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index) gpu.terminator } ``` Let’s write the program above with that: ``` func.func @main() { gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { %i = arith.constant 18 : index // Step 1: Obtain shared memory directly %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3> %c147456 = arith.constant 147456 : index %c155648 = arith.constant 155648 : index %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3> %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3> // Step 2: Utilize the shared memory "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index) } } ``` This PR resolves #72513	2023-11-16 14:42:17 +01:00
drazi	9a3d3c7093	generalize pass gpu-kernel-outlining for symbol op (#72074 ) This PR generalize gpu-out-lining pass to take care of ops `SymbolOpInterface` instead of just `func::FuncOp`. Before this change, gpu-out-lining pass will skip `llvm.func`. ```mlir module { llvm.func @main() { %c1 = arith.constant 1 : index gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) { gpu.terminator } llvm.return } } ``` After this change, gpu-out-lining pass can handle llvm.func as well.	2023-11-12 21:48:49 -08:00
spaceotter	00c3c73189	[mlir][gpu] Separate the barrier elimination code from transform ops (#71762 ) Allows the barrier elimination code to be run from C++ as well. The code from transforms dialect is copied as-is, the pass and populate functions have beed added at the end. Co-authored-by: Eric Eaton <eric@nod-labs.com>	2023-11-10 17:59:09 -08:00
spaceotter	51af040b22	[mlir][gpu] Eliminate redundant gpu.barrier ops (#71575 ) Adds a canonicalizer for gpu.barrier that gets rid of duplicates. Co-authored-by: Eric Eaton <eric@nod-labs.com>	2023-11-09 18:06:20 -05:00
Fabian Mora	42630689e2	[mlir][gpu] Clean GPU `Passes.h` from external SPIRV includes (#71331 ) Removes the `SPIRVAttributes.h` header from `GPU/Transforms/Passes.h`	2023-11-05 17:06:04 -08:00
Sang Ik Lee	2dace04521	[mlir][spirv] Implement gpu::TargetAttrInterface (#69949 ) This commit implements gpu::TargetAttrInterface for SPIR-V target attribute. The plan is to use this to enable GPU compilation pipeline for OpenCL kernels later. The changes do not impact Vulkan shaders using milr-vulkan-runner. New GPU Dialect transform pass spirv-attach-target is implemented for attaching attribute from CLI. gpu-module-to-binary pass now works with GPU module that has SPIR-V module with OpenCL kernel functions inside.	2023-11-05 08:11:53 -08:00
Christian Ulmann	7ed96b1c0d	[MLIR][LLVM] Remove last typed pointer remnants from tests (#71232 ) This commit removes all LLVM dialect typed pointers from the lit tests. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-04 14:13:31 +01:00
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Aviad Cohen	7060422265	[mlir][Linalg]: Optimize linalg generic in transform::PromoteOp to avoid unnecessary copies (#68555 ) If the operands are not used in the payload of linalg generic operations, there is no need to copy them before the operation.	2023-10-14 10:40:45 +03:00
Aart Bik	39038177ee	[mlir][sparse][gpu] add CSC and BSR format to cuSparse GPU ops (#67509 ) This adds two cuSparse formats to the GPU dialect support. Together with proper lowering and runtime cuda support. Also fixes a few minor omissions.	2023-09-27 09:32:25 -07:00
Oleksandr "Alex" Zinenko	96ff0255f2	[mlir] cleanup of structured.tile* transform ops (#67320 ) Rename and restructure tiling-related transform ops from the structured extension to be more homogeneous. In particular, all ops now follow a consistent naming scheme: - `transform.structured.tile_using_for`; - `transform.structured.tile_using_forall`; - `transform.structured.tile_reduction_using_for`; - `transform.structured.tile_reduction_using_forall`. This drops the "_op" naming artifact from `tile_to_forall_op` that shouldn't have been included in the first place, consistently specifies the name of the control flow op to be produced for loops (instead of `tile_reduction_using_scf` since `scf.forall` also belongs to `scf`), and opts for the `using` connector to avoid ambiguity. The loops produced by tiling are now systematically placed as trailing results of the transform op. While this required changing 3 out of 4 ops (except for `tile_using_for`), this is the only choice that makes sense when producing multiple `scf.for` ops that can be associated with a variadic number of handles. This choice is also most consistent with other transform ops from the structured extension, in particular with fusion ops, that produce the structured op as the leading result and the loop as the trailing result.	2023-09-26 09:14:29 +02:00
Tobias Gysi	85175edd4e	[mlir][llvm] Replace NullOp by ZeroOp (#67183 ) This revision replaces the LLVM dialect NullOp by the recently introduced ZeroOp. The ZeroOp is more generic in the sense that it represents zero values of any LLVM type rather than null pointers only. This is a follow to https://github.com/llvm/llvm-project/pull/65508	2023-09-25 11:11:52 +02:00
Martin Erhart	522c1d0eea	[mlir][gpu][bufferization] Implement BufferDeallocationOpInterface for gpu.terminator (#66880 ) This is necessary to support deallocation of IR with gpu.launch operations because it does not implement the RegionBranchOpInterface. Implementing the interface would require it to support regions with unstructured control flow and produced arguments/results.	2023-09-20 12:28:28 +02:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
Nicolas Vasilache	92f088d335	[mlir][gpu][transform] Provide better error messages and avoid crashing in MapForallToBlocks. This revision addresses issues surfaced in https://reviews.llvm.org/D159093	2023-09-04 14:11:38 +00:00
Aart Bik	289f7231f9	[mlir][sparse][gpu] minor code cleanup for sparse gpu ops Consistent order of ops and related methods. Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp since this is a general utility for sparse matrices, not specific to GEMM ops only. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D157922	2023-08-14 15:08:57 -07:00
Fabian Mora	43752a2aa3	[mlir][gpu] Add the `gpu-module-to-binary` pass. For an explanation of these patches see D154153. Commit message: This pass converts GPU modules into GPU binaries, serializing all targets present in a GPU module by invoking the `serializeToObject` target attribute method. Depends on D154147 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154149	2023-08-12 00:24:53 +00:00
Fabian Mora	8ae074b195	[mlir][gpu] Add the Select Object compilation attribute. For an explanation of these patches see D154153. Commit message: This patch adds the default offloading handler for GPU binary ops: `#gpu.select_object`, it selects the object to embed based on an index or a target attribute, embedding the object as a global string and launches the kernel using the scheme used in the GPU to LLVM pass. Depends on D154137 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154147	2023-08-11 22:00:35 +00:00
Fabian Mora	a63db3f5f5	[mlir][gpu] Modifies `gpu.launch_func` to allow lowering it after gpu-to-llvm. For an explanation of these patches see D154153. Commit message: In order to lower `gpu.launch_func` after running `gpu-to-llvm` it must be able to handle lowered types -eg. index -> i64. This patch also allows the op to refer to GPU binaries and not only GPU modules. Depends on D154132. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154137	2023-08-11 21:56:37 +00:00
Fabian Mora	068213130d	[mlir][ROCDL] Adds the ROCDL target attribute. For an explanation of these patches see D154153. Commit message: This patch adds the ROCDL target attribute for serializing GPU modules into strings containing HSAco. Depends on D154117 Differential Revision: https://reviews.llvm.org/D154129	2023-08-11 21:44:05 +00:00
Fabian Mora	1e77536e1d	Revert "[mlir][ROCDL] Adds the ROCDL target attribute." This reverts commit `6a0feb1503`.	2023-08-11 19:50:05 +00:00
Fabian Mora	bf24fb81ac	[mlir][gpu] Add `gpu.binary` op and `#gpu.object` attribute. For an explanation of these patches see D154153. Commit message: Adds the `#gpu.object` attribute for holding a binary object and the target attribute used to create it. Also adds the `gpu.binary` operation used to store GPU objects. Depends on D154108 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154132	2023-08-11 19:48:18 +00:00
Fabian Mora	6a0feb1503	[mlir][ROCDL] Adds the ROCDL target attribute. For an explanation of these patches see D154153. Commit message: This patch adds the ROCDL target attribute for serializing GPU modules into strings containing HSAco. Depends on D154117 Reviewed By: mehdi_amini, krzysz00 Differential Revision: https://reviews.llvm.org/D154129	2023-08-11 19:43:59 +00:00
Aart Bik	6c4cd7a13e	[mlir][sparse][gpu] refine sparse gpu round-trip and lowering test Tests had become inconsistent, and contained a few slip ups (e.g. non-async versions did not lower) Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157666	2023-08-10 17:18:59 -07:00
Aart Bik	95a6c509c9	[mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs Rationale: Since we only support default algorithm for SpGEMM, we can remove the estimate op (for now at least). This also introduces the set csr pointers op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown. This revision paves the way for actual recognition of SpGEMM in the sparsifier. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157645	2023-08-10 13:52:47 -07:00
Ivan Butygin	793ee2bf08	[mlir][gpu] Add DecomposeMemrefsPass Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail. This pass extracts sizes and strides via `memref.extract_strrided_metadata` outside `gpu.launch` body and do index/offset calculation explicitly and then reconstructs memrefs via `memref.reinterpret_cast`. `memref.reinterpret_cast` then lowered via https://reviews.llvm.org/D155011 Differential Revision: https://reviews.llvm.org/D155247	2023-08-10 22:28:05 +02:00
Mehdi Amini	363b655920	Finish renaming getOperandSegmentSizeAttr() from `operand_segment_sizes` to `operandSegmentSizes` This renaming started with the native ODS support for properties, this is completing it. A mass automated textual rename seems safe for most codebases. Drop also the ods prefix to keep the accessors the same as they were before this change: properties.odsOperandSegmentSizes reverts back to: properties.operandSegementSizes The ODS prefix was creating divergence between all the places and make it harder to be consistent. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D157173	2023-08-09 19:37:01 -07:00
Ivan Butygin	b13248f997	Revert "[mlir][gpu] Add DecomposeMemrefsPass" Broke some bots This reverts commit `2b5b2bfef1`.	2023-08-10 03:07:28 +02:00
Ivan Butygin	2b5b2bfef1	[mlir][gpu] Add DecomposeMemrefsPass Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail. This pass extracts sizes and strides via `memref.extract_strrided_metadata` outside `gpu.launch` body and do index/offset calculation explicitly and then reconstructs memrefs via `memref.reinterpret_cast`. `memref.reinterpret_cast` then lowered via https://reviews.llvm.org/D155011 Differential Revision: https://reviews.llvm.org/D155247	2023-08-10 02:28:03 +02:00
Aart Bik	e7e4ed0d7a	[mlir][sparse][gpu] only support default algorithm for SpGEMM Rationale: This is the approach taken for all the others too (SpMV, SpMM, SDDMM), so it is more consistent to follow the same path (until we have a need for more algorithms). Also, in a follow up revision, this will allow us to remove some unused GEMM ops. Reviewed By: K-Wu Differential Revision: https://reviews.llvm.org/D157542	2023-08-09 12:49:47 -07:00
Fabian Mora	211c9752c8	[mlir][NVVM] Adds the NVVM target attribute. For an explanation of these patches see D154153. Commit message: This patch adds the NVVM target attribute for serializing GPU modules into strings containing cubin. Depends on D154113 and D154100 and D154097 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154117	2023-08-08 19:21:36 +00:00
Fabian Mora	9fa7b9ef21	[mlir][gpu] Add target attribute to GPU modules. For an explanation of these patches see D154153. Commit message: Adds support for Target attributes in GPU modules. This change enables attaching an optional non empty array of GPU target attributes to the module. Depends on D154104 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154113	2023-08-08 13:19:47 +00:00
Kun Wu	dfe2942909	[mlir][sparse][gpu] add spgemm operator Differential Revision: https://reviews.llvm.org/D152981	2023-08-08 00:29:23 +00:00
Nicolas Vasilache	44e6318cea	[mlir][transforms] Revamp the implementation of mapping loops to GPUs This revision significantly simplifies the specification and implementation of mapping loops to GPU ids. Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes: 1. a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication may occur on a per-dimension 3-D sub-rectangle basis. 2. a n-D linearized mode, on which predication may only occur on a linear basis. In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages. The `warp_dims` attribute was deemed confusing and is removed from the transform in favor of better size inference. Differential Revision: https://reviews.llvm.org/D155941	2023-07-26 00:09:08 +02:00
Quinn Dawkins	ff8775f3ff	[mlir][GPU] Add op for unrolling contractions to a native size Adds `apply_patterns.gpu.unroll_vectors_subgroup_mma` which allows specifying a native MMA shape of `m`, `n`, and `k` to unroll to, greedily unrolling the inner most dimension of contractions and other vector operations based on expected usage. Differential Revision: https://reviews.llvm.org/D156079	2023-07-25 13:11:32 -04:00
Alex Zinenko	9ab34689b0	[mlir] add a simple gpu barrier elimination mechanism GPU code generation, and specifically the shared memory copy insertion may introduce spurious barriers guarding read-after-read dependencies or read-after-write on non-aliasing data, which degrades performance due to unnecessary synchronization. Add a pattern and transform op that removes such barriers by analyzing memory effects that the barrier actually guards that are not also guarded by other barriers. The code is adapted from the Polygeist incubator project. Co-authored-by: William Moses <gh@wsmoses.com> Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp> Reviewed By: nicolasvasilache, wsmoses Differential Revision: https://reviews.llvm.org/D154720	2023-07-07 18:51:49 +00:00
Kun Wu	be2dd22b8f	[mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime Differential Revision: https://reviews.llvm.org/D153173	2023-06-30 21:52:34 +00:00
Matthias Springer	dae8c72495	[mlir][linalg] TileToForallOp: Support memref ops Support tiling of ops with memref semantics. Differential Revision: https://reviews.llvm.org/D153353	2023-06-21 09:12:34 +02:00
Kun Wu	97f4c22b3a	[mlir][sparse][gpu] unify dnmat and dnvec handle and ops Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D152465	2023-06-09 17:16:48 +00:00
Kun Wu	8ed59c53de	[mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D151775	2023-06-06 23:13:21 +00:00
Kun Wu	fa98bdbd95	[mlir][sparse][gpu] make computeType mandatory Differential Revision: https://reviews.llvm.org/D152018	2023-06-02 21:47:44 +00:00
Kun Wu	cc402de0b1	[mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path Differential Revision: https://reviews.llvm.org/D151592	2023-06-01 17:17:40 +00:00

1 2 3 4 5

224 Commits