clang-p2996

Author	SHA1	Message	Date
Jakub Kuderski	971b852546	[mlir][NFC] Simplify type checks with isa predicates (#87183 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-04-01 11:40:09 -04:00
Fangrui Song	886ecb3078	[mlir] Remove setRelaxELFRelocations. NFC The option is always true (see `2aedfdd9b8`) and the MCAsmInfo option is going away in favor of MCTargetOptions.	2024-03-06 23:12:40 -08:00
Ingo Müller	6e27dd47e1	[mlir][gpu] Replace MLIR_GPU_TO_HSACO_PASS_ENABLE by more generic one. (#84001 ) This is another follow-up of #83004. The PR replaces the macro `MLIR_GPU_TO_HSACO_PASS_ENABLE` with the more generic macro `MLIR_ENABLE_ROCM_CONVERSIONS`. Until now, the former has been defined if and only if the latter evaluated to true in CMake. However, the former was not defined when the latter evaluated to false, in which case a warning was raised if compiled with `-Wundef`. Using a single macro relies on the `#cmakedefine01` mechanism that ensures the macro is always set to either 0 or 1.	2024-03-06 09:53:30 +01:00
Ingo Müller	f3be842728	[mlir] Expose MLIR_ROCM_CONVERSIONS_ENABLED in mlir-config.h. (#83977 ) This is a follow up of #83004, which made the same change for `MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit exposes mentioned CMake variable through `mlir-config.h` and uses the macro that is introduced with the same name. This replaces the macro `MLIR_ROCM_CONVERSIONS_ENABLED`, which the CMake files previously defined manually.	2024-03-05 15:37:14 +01:00
Ingo Müller	5f2097dbed	[mlir] Expose MLIR_CUDA_CONVERSIONS_ENABLED in mlir-config.h. (#83004 ) That macro was not defined in some cases and thus yielded warnings if compiled with `-Wundef`. In particular, they were not defined in the BUILD files, so the GPU targets were broken when built with Bazel. This commit exposes mentioned CMake variable through mlir-config.h and uses the macro that is introduced with the same name. This replaces the macro MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined manually.	2024-02-28 14:48:40 +01:00
Fabian Mora	f204aee1b9	[mlir][GPU] Remove the SerializeToCubin pass (#82486 ) The `SerializeToCubin` pass was deprecated in September 2023 in favor of GPU compilation attributes; see the [GPU compilation](https://mlir.llvm.org/docs/Dialects/GPU/#gpu-compilation) section in the `gpu` dialect MLIR docs. This patch removes `SerializeToCubin` from the repo.	2024-02-21 20:47:19 -05:00
Thomas Preud'homme	76e79b0bef	Fix duplicate mapping detection in gpu::setMappingAttr() (#77499 )	2024-02-20 09:54:00 +00:00
Saiyedul Islam	082f87c9d4	[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Corresponding llvm-objdump AMDGPU lit tests are updated in a follow-up PR.	2024-01-23 17:08:18 +05:30
Mehdi Amini	e730f76005	Apply clang-tidy fixes for llvm-qualified-auto in DecomposeMemrefs.cpp (NFC)	2024-01-17 08:51:41 -08:00
Fabian Mora	5b4f2b906b	[mlir][gpu] Add an offloading handler attribute to `gpu.module` (#78047 ) This patch adds an optional offloading handler attribute to the`gpu.module` op. This attribute will be used during `gpu-module-to-binary` pass to override the offloading handler used in the `gpu.binary` op.	2024-01-15 16:58:10 -05:00
Guray Ozen	5b33cff397	[mlir][gpu] Add Support for Cluster of Thread Blocks in `gpu.launch` (#76924 )	2024-01-06 11:17:01 +01:00
Jakub Kuderski	c0345b4648	[mlir][gpu] Add subgroup_reduce to shuffle lowering (#76530 ) This supports both the scalar and the vector multi-reduction cases.	2024-01-02 16:14:22 -05:00
Jakub Kuderski	2af186f9bd	[mlir][gpu] Add patterns to break down subgroup reduce (#76271 ) The new patterns break down subgroup reduce ops with vector values into a sequence of subgroup reductions that fit the native shuffle size. The maximum/native shuffle size is parametrized. The overall goal is to be able to perform multi-element reductions with a sequence of `gpu.shuffle` ops.	2023-12-28 14:39:46 -05:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Jakub Kuderski	9f74e6e615	[mlir][vector][gpu] Use `makeArithReduction` in lowering patterns. NFC. (#75952 ) Use the `vector::makeArithReduction` helper as the source-of-truth of reduction to arith ops lowering.	2023-12-19 19:04:27 -05:00
Fabian Mora	419c45a325	[mlir][gpu] Fix crash in `gpu-module-to-binary` (#75477 ) This patch fixes the error in issue #75434. The crash was being caused by not checking for a lack of target attributes in a GPU module. It's now considered an error to invoke the pass with a GPU module with no target attributes.	2023-12-14 14:03:10 -05:00
Kazu Hirata	88d319a29f	[mlir] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 22:58:30 -08:00
Adrian Kuegel	8a5b448fa0	[mlir][GPU] Apply ClangTidy fixes Use const reference in loops if possible.	2023-12-12 07:34:03 +00:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `dd09221a29` and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Jakub Kuderski	dd09221a29	Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `e0aac8c88d`. I'm seeing some nvidia integration test failures: https://lab.llvm.org/buildbot/#/builders/61/builds/52334.	2023-11-27 11:29:23 -05:00
Jakub Kuderski	e0aac8c88d	[mlir][gpu] Align reduction operations with vector combining kinds (#73423 ) The motivation for this change is explained in https://github.com/llvm/llvm-project/issues/72354. Before this change, we could not tell between signed/unsigned minimum/maximum and NaN treatment for floating point values. The mapping of old reduction operations to the new ones is as follows: * `min` --> `minsi` for ints, `minf` for floats * `max` --> `maxsi` for ints, `maxf` for floats New reduction kinds not represented in the old enum: `minui`, `maxui`, `minimumf`, `maximumf`. As a next step, I would like to have a common definition of combining kinds used by the `vector` and `gpu` dialects. Separately, the GPU to SPIR-V lowering does not yet properly handle zero and NaN values -- the behavior of floating point min/max group reductions is not specified by the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-11-27 11:19:20 -05:00
drazi	9a3d3c7093	generalize pass gpu-kernel-outlining for symbol op (#72074 ) This PR generalize gpu-out-lining pass to take care of ops `SymbolOpInterface` instead of just `func::FuncOp`. Before this change, gpu-out-lining pass will skip `llvm.func`. ```mlir module { llvm.func @main() { %c1 = arith.constant 1 : index gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) { gpu.terminator } llvm.return } } ``` After this change, gpu-out-lining pass can handle llvm.func as well.	2023-11-12 21:48:49 -08:00
spaceotter	00c3c73189	[mlir][gpu] Separate the barrier elimination code from transform ops (#71762 ) Allows the barrier elimination code to be run from C++ as well. The code from transforms dialect is copied as-is, the pass and populate functions have beed added at the end. Co-authored-by: Eric Eaton <eric@nod-labs.com>	2023-11-10 17:59:09 -08:00
Krzysztof Drewniak	05fa923a9b	Fix SmallVector usage in SerailzeToHsaco (#71702 ) Enable merging #71439 by removing a definitely-wrong usage of std::unique_ptr<SmallVectorImpl<char>> as a return value with passing in a SmallVectorImpl<char>& Also change the following function to take ArrayRef<char> instead of const SmalVectorImpl<char>& .	2023-11-08 13:57:41 -06:00
Fabian Mora	42630689e2	[mlir][gpu] Clean GPU `Passes.h` from external SPIRV includes (#71331 ) Removes the `SPIRVAttributes.h` header from `GPU/Transforms/Passes.h`	2023-11-05 17:06:04 -08:00
Sang Ik Lee	2dace04521	[mlir][spirv] Implement gpu::TargetAttrInterface (#69949 ) This commit implements gpu::TargetAttrInterface for SPIR-V target attribute. The plan is to use this to enable GPU compilation pipeline for OpenCL kernels later. The changes do not impact Vulkan shaders using milr-vulkan-runner. New GPU Dialect transform pass spirv-attach-target is implemented for attaching attribute from CLI. gpu-module-to-binary pass now works with GPU module that has SPIR-V module with OpenCL kernel functions inside.	2023-11-05 08:11:53 -08:00
Mehdi Amini	6883343843	[mlir] Guard NVPTX backend initialization on it being configured (NFC) This is just helping with some build failure in some new configurations.	2023-11-03 22:23:01 -07:00
Rohan Yadav	71bdd2c238	mlir/lib/Dialect/GPU/Transforms: improve context management in SerializeToCubin (#65779 ) This commit adjusts the CUDA context management in the SerializeToCubin pass. In particular, it uses the device 0 primary context instead of creating a new CUDA context on each invocation of SerializeToCubin. This yields very large improvements in compile time, especially if an application (like a JIT compiler) is calling SerializeToCubin repeatedly. Differential Revision: https://reviews.llvm.org/D159487 Co-authored-by: Rohan Yadav <rohany@cs.stanford.edu>	2023-10-20 23:05:10 +05:30
Krzysztof Drewniak	0463e00ac6	[mlir][ROCDL] Fix file leak in SeralizeToHsaco and its newer form (#67711 ) SerializetToHsaco, as currently implemented, leaks the file descriptor of the .hsaco temporary file, which causes issues in long-running parallel compilation setups. See also https://github.com/ROCmSoftwarePlatform/rocMLIR/pull/1257	2023-09-29 17:24:40 -05:00
Martin Erhart	522c1d0eea	[mlir][gpu][bufferization] Implement BufferDeallocationOpInterface for gpu.terminator (#66880 ) This is necessary to support deallocation of IR with gpu.launch operations because it does not implement the RegionBranchOpInterface. Implementing the interface would require it to support regions with unstructured control flow and produced arguments/results.	2023-09-20 12:28:28 +02:00
Fabian Mora	5093413a50	[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220 ) This patch adds an NVPTX compilation path that enables JIT compilation on NVIDIA targets. The following modifications were performed: 1. Adding a format field to the GPU object attribute, allowing the translation attribute to use the correct runtime function to load the module. Likewise, a dictionary attribute was added to add any possible extra options. 2. Adding the `createObject` method to `GPUTargetAttrInterface`; this method returns a GPU object from a binary string. 3. Adding the function `mgpuModuleLoadJIT`, which is only available for NVIDIA GPUs, as there is no equivalent for AMD. 4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify the format to use during testing.	2023-09-14 18:00:27 -04:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Fabian Mora	444abb396c	[mlir][gpu] Add a symbol table field to TargetOptions and adjust GpuModuleToBinary (#65797 ) This patch adds the option of building an optional symbol table for the top operation in the `gpu-module-to-binary` pass. The table is not created by default as most targets don't need it; instead, it is lazily built. The table is passed through a callback in `TargetOptions`. This patch is required to integrate #65539 .	2023-09-09 19:59:20 -04:00
Fabian Mora	c16adb0dcb	[mlir][Target][NVPTX] Add fatbin support to NVPTX compilation. (#65398 ) Currently, the NVPTX tool compilation path only calls `ptxas`; thus, the GPU running the binary must be an exact match of the arch of the target, or else the runtime throws an error due to the arch mismatch. This patch adds a call to `fatbinary`, creating a fat binary with the cubin object and the PTX code, allowing the driver to JIT the PTX at runtime if there's an arch mismatch.	2023-09-07 07:44:41 -04:00
Adrian Kuegel	bf92a7655c	[mlir] Apply ClangTidy fixes (NFC) Prefer to use .empty() instead of checking size().	2023-08-23 17:18:59 +02:00
Adrian Kuegel	93228cff8f	[mlir] Apply ClangTidy fix (NFC) Use .empty() instead of checking for size().	2023-08-22 13:55:09 +02:00
Nicolas Vasilache	7c4e8c6a27	[mlir] Disentangle dialect and extension registrations. This revision avoids the registration of dialect extensions in Pass::getDependentDialects. Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is always guaranteed to return false for extensions (i.e. there is no mechanism to track whether a lambda is already in the list of already registered extensions). When the context is already in a multi-threaded mode, this is guaranteed to assert. Arguably a more structured registration mechanism for extensions with a unique ExtensionID could be envisioned in the future. In the process of cleaning this up, multiple usage inconsistencies surfaced around the registration of translation extensions that this revision also cleans up. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D157703	2023-08-22 00:40:09 +00:00
Fabian Mora	fbbb8adef1	[mlir][gpu] Add passes to attach (NVVM\|ROCDL) target attributes to GPU Modules Adds the passes `nvvm-attach-target` & `rocdl-attach-target for attaching `nvvm.target` & `rocdl.target` attributes to GPU Modules. These passes search GPU Modules in the immediate region of the Op being acted on, attaching the target attribute to the module. Modules can be selected using a regex string, allowing fine grain attachment of targets, see the test `attach-target.mlir` for an example. Depends on D154153 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D157351	2023-08-12 00:45:26 +00:00
Fabian Mora	43752a2aa3	[mlir][gpu] Add the `gpu-module-to-binary` pass. For an explanation of these patches see D154153. Commit message: This pass converts GPU modules into GPU binaries, serializing all targets present in a GPU module by invoking the `serializeToObject` target attribute method. Depends on D154147 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D154149	2023-08-12 00:24:53 +00:00
Ingo Müller	616eb0b2c4	[mlir][gpu] Fix error message on unknown CUDA error code. This patch fixes the output of the error message that is printed when the CUDA library cannot identity the error code. In that case, no error message is provided by the library, and the previous implementation just printed the content of a randomly initialized pointer. This patch initializes the pointer to nullptr and only prints the content if that has changed. Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D156791	2023-08-11 08:04:58 +00:00
Ivan Butygin	793ee2bf08	[mlir][gpu] Add DecomposeMemrefsPass Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail. This pass extracts sizes and strides via `memref.extract_strrided_metadata` outside `gpu.launch` body and do index/offset calculation explicitly and then reconstructs memrefs via `memref.reinterpret_cast`. `memref.reinterpret_cast` then lowered via https://reviews.llvm.org/D155011 Differential Revision: https://reviews.llvm.org/D155247	2023-08-10 22:28:05 +02:00
Nicolas Vasilache	888717e853	[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass in a declarative fashion that provides a much better understanding and control. Differential Revision: https://reviews.llvm.org/D157617	2023-08-10 15:30:48 +00:00
Ivan Butygin	b13248f997	Revert "[mlir][gpu] Add DecomposeMemrefsPass" Broke some bots This reverts commit `2b5b2bfef1`.	2023-08-10 03:07:28 +02:00
Ivan Butygin	2b5b2bfef1	[mlir][gpu] Add DecomposeMemrefsPass Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail. This pass extracts sizes and strides via `memref.extract_strrided_metadata` outside `gpu.launch` body and do index/offset calculation explicitly and then reconstructs memrefs via `memref.reinterpret_cast`. `memref.reinterpret_cast` then lowered via https://reviews.llvm.org/D155011 Differential Revision: https://reviews.llvm.org/D155247	2023-08-10 02:28:03 +02:00
Mehdi Amini	5e8a1164f2	Revert "[mlir][gpu] Fallback to JIT compilation" "[mlir][gpu] Increase default SM version from 35 to 50" and "[mlir][gpu] Improving Cubin Serialization with ptxas Compiler" This reverts commit `2e0e00ed84` and reverts commit `a6eb40692c` and reverts commit `585cbe3f63`. 15 tests are broken on the mlir-nvidia buildbot: 'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_SOURCE' 'cuModuleGetFunction(&function, module, name)' failed with 'CUDA_ERROR_INVALID_HANDLE' 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)' failed with 'CUDA_ERROR_INVALID_HANDLE' 'cuModuleUnload(module)' failed with 'CUDA_ERROR_INVALID_HANDLE'	2023-07-24 10:23:15 -07:00
Guray Ozen	a6eb40692c	[mlir][gpu] Increase default SM version from 35 to 50 Current SM version is 35 but it is deprecated long time ago. D155563 introduced ptxas compilations, using sm_35 causes failures in builtbot. This change increase default SM version to 50. Differential Revision: https://reviews.llvm.org/D156098	2023-07-24 15:11:30 +02:00
Guray Ozen	2e0e00ed84	[mlir][gpu] Fallback to JIT compilation Recent change introduces compilation with ptxas compiler. The change is important to be able to different versions of ptxas compiler without changing the compiler. It causes some failures in builtbot. This change adds fallback mechanism to JIt compilation that is original path. Differential Revision: https://reviews.llvm.org/D156096	2023-07-24 15:11:05 +02:00
Guray Ozen	585cbe3f63	[mlir][gpu] Improving Cubin Serialization with ptxas Compiler This work improves how we compile the generated PTX code using the `ptxas` compiler. Currently, we rely on the driver's jit API to compile the PTX code. However, this approach has some limitations. It doesn't always produce the same binary output as the ptxas compiler, leading to potential inconsistencies in the generated Cubin files. This work introduces a significant improvement by directly utilizing the ptxas compiler for PTX compilation. By doing so, we can achieve more consistent and reliable results in generating cubin files. Key Benefits: - Using the Ptxas compiler directly ensures that the cubin files generated during the build process remain consistent with CUDA compilation using `nvcc` or `clang`. - Another advantage of this work is that it allows developers to experiment with different ptxas compilers without the need to change the compiler. Performance among ptxas compiler versions are vary, therefore, one can easily try different ptxas compilers. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155563	2023-07-24 12:29:53 +02:00
Krzysztof Drewniak	db647f5bd8	[mlir][GPU] Initialize LLVM exactly once during GPU compiles No matter how one constructs their SerializeTo* pass, we want to ensure that the LLVM initialization code runs once and only once. This commit adds a static once_flag to ensure that. I've run into mysterious segfaults when calling MLIR GPU compiles from multiple threads, and this commit is a potential fix for the issue. Reviewed By: fmorac Differential Revision: https://reviews.llvm.org/D155226	2023-07-14 19:10:52 +00:00
Guray Ozen	22a32f7d9c	[mlir][gpu] Add dump-ptx option When targeting NVIDIA GPUs, seeing the generated PTX is important. Currently, we don't have simple way to do it. This work adds dump-ptx to gpu-to-cubin pass. One can use it like `gpu-to-cubin{chip=sm_90 features=+ptx80 dump-ptx}`. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D155166	2023-07-13 21:14:57 +02:00

1 2 3 4 5 ...

264 Commits