clang-p2996

Author	SHA1	Message	Date
Matthias Springer	6aaa8f25b6	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465 ) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.	2025-01-21 08:48:09 +01:00
Jacques Pienaar	09dfc5713d	[mlir] Enable decoupling two kinds of greedy behavior. (#104649 ) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.	2024-12-20 08:15:48 -08:00
Mehdi Amini	72e8b9aeaa	[MLIR] Add a BlobAttr interface for attribute to wrap arbitrary content and use it as linkLibs for ModuleToObject (#120116 ) This change allows to expose through an interface attributes wrapping content as external resources, and the usage inside the ModuleToObject show how we will be able to provide runtime libraries without relying on the filesystem.	2024-12-17 01:30:56 +01:00
Renaud Kauffmann	9919295cfd	[mlir][gpu] Adding ELF section option to the gpu-module-to-binary pass (#119440 ) This is a follow-up of #117246. I thought then it would be easy to edit a DictionaryAttr but it turns out that these attributes are immutable and need to be passed during the construction of the gpu.binary Op. The first commit was using the NVVMTargetAttr to pass the information. After feedback from @fabianmcg, this PR now passes the information through a new option of the gpu-module-to-binary pass. Please add reviewers, as you see fit.	2024-12-16 09:09:41 -08:00
Petr Kurapov	bc29fc937c	[MLIR] Create GPU utils library & move distribution utils (#119264 ) Continue the move of `warp_execute_on_lane_0` op to the gpu dialect (#116994). This patch creates a utils library in GPU and moves generic helper functions there.	2024-12-13 10:26:57 +01:00
Zhen Wang	516d6ede12	[mlir][gpu] Add optional attributes of kernelModule and kernelFunc for outlining kernels. (#118861 ) Adding optional attributes so we can specify the kernel function names and the kernel module names generated.	2024-12-06 12:33:34 -08:00
Oleksandr "Alex" Zinenko	5ce4d4c775	[mlir] fix memory effects in GPU barrier elimination (#117432 ) Existing implementation may trigger infinite cycles when collecting effects above or below the current block after wrapping around a loop-like construct. Limit this case to only looking at the immediate block (loop body). This is correct because wrap around is intended to consider effects of different iterations of the same loop and shouldn't be existing the loop block. Reported-by: Fabian Mora <fmora.dev@gmail.com> Co-authored-by: Fabian Mora <fmora.dev@gmail.com>	2024-11-24 23:25:11 +01:00
donald chen	889b67c9d3	[mlir] [memref] add more checks to the memref.reinterpret_cast (#112669 ) Operation memref.reinterpret_cast was accept input like: %out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10], strides: [1] : memref<?xf32> to memref<10xf32> A problem arises: while lowering, the true offset of %out is %offset, but its data type indicates an offset of 0. Permitting this inconsistency can result in incorrect outcomes, as certain pass might erroneously extract the offset from the data type of %out. This patch fixes this by enforcing that the return value's data type aligns with the input parameter.	2024-10-26 08:07:51 +08:00
Andrea Faulds	a800ffac41	[mlir][gpu] Disjoint patterns for lowering clustered subgroup reduce (#109158 ) Making the existing populateGpuLowerSubgroupReduceToShufflePatterns() function also cover the new "clustered" subgroup reductions is proving to be inconvenient, because certain backends may have more specific lowerings that only cover the non-clustered type, and this creates pass ordering constraints. This commit removes coverage of clustered reductions from this function in favour of a new separate function, which makes controlling the lowering much more straightforward.	2024-09-18 15:55:53 -04:00
Andrea Faulds	fd26f8444a	[mlir][gpu] Rename two misspelled pattern population functions (#109015 )	2024-09-17 15:26:14 -04:00
Andrea Faulds	3d01f0a33b	[mlir][gpu] Add 'cluster_stride' attribute to gpu.subgroup_reduce (#107142 ) Follow-up to `7aa22f013e`, adding an additional attribute needed in some applications.	2024-09-05 09:03:22 -04:00
Fabian Mora	fd36a7b944	[mlir][gpu] Pass GPU module to `TargetAttrInterface::createObject`. (#94910 ) This patch adds an argument to `gpu::TargetAttrInterface::createObject` to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a property dict for metadata, hence the module can be used for extracting things like the symbol table and adding it to the property dict. --------- Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>	2024-08-27 11:05:04 -04:00
Andrea Faulds	7aa22f013e	[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851 ) This enables performing several reductions in parallel, each smaller than the size of the subgroup. One potential application is flash attention with subgroup-wide matrix multiplication and reduction combined in one kernel. The multiplication operation requires a 2D matrix to be distributed over the lanes of the subgroup, which then constrains the shape the following reduction can have if we want to keep data in registers.	2024-08-20 13:37:03 -04:00
Angel Zhang	863a2ed440	[mlir][memref] Rename `MemRef` directories and files. NFC. (#102337 ) This PR renames the `MemRef` integration test directory for and the `DecomposeMemref.s.cpp` so that they can be found when doing a case-sensitive search on file paths.	2024-08-07 15:41:40 -04:00
Kazu Hirata	5262865aac	[mlir] Construct SmallVector with ArrayRef (NFC) (#101896 )	2024-08-04 11:43:05 -07:00
Ramkumar Ramachandra	db791b278a	mlir/LogicalResult: move into llvm (#97309 ) This patch is part of a project to move the Presburger library into LLVM.	2024-07-02 10:42:33 +01:00
Mehdi Amini	d7da0ae4f4	[MLIR][NVVM] Reduce the scope of the LLVM_HAS_NVPTX_TARGET guard (#97349 ) Most of the code here does not depend on the NVPTX target. In particular the simple offload can just emit LLVM IR and we can use this without the NVVM backend being built, which can be useful for a frontend that just need to serialize the IR and leave it up to the runtime to JIT further.	2024-07-02 11:31:12 +02:00
Kazu Hirata	b7b337fb91	[mlir] Use llvm::unique (NFC) (#96415 )	2024-06-24 11:54:02 -07:00
Fabian Mora	9ddf3b835c	[mlir][gpu] Remove old GPU serialization passes (#94998 ) This patch removes the last vestiges of the old gpu serialization pipeline. To compile GPU code use target attributes instead. See [Compilation overview \| 'gpu' Dialect - MLIR docs](https://mlir.llvm.org/docs/Dialects/GPU/#compilation-overview) for additional information on the target attributes compilation pipeline that replaced the old serialization pipeline.	2024-06-20 08:01:40 -05:00
Krzysztof Drewniak	43fd4c49bd	[mlir][GPU] Improve handling of GPU bounds (#95166 ) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-06-17 23:47:38 -05:00
tyb0807	8178a3ad1b	[mlir] Replace MLIR_ENABLE_CUDA_CONVERSIONS with LLVM_HAS_NVPTX_TARGET (#93008 ) LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX was enabled when building LLVM. Use this instead of manually defining MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).	2024-05-24 17:31:28 +02:00
Benjamin Kramer	29c2475f21	[mlir] Fix the build after `03c53c69a3`	2024-05-15 18:34:59 +02:00
Mehdi Amini	d566a5cd22	[MLIR] Improve KernelOutlining to avoid introducing an extra block (#90359 ) This fixes a TODO in the code.	2024-04-29 19:30:38 +02:00
Fabian Mora	dc6ce60801	[mlir][gpu] Remove `offloadingHandler` from `ModuleToBinary` (#90368 ) This patch removes the `offloadingHandler` option from the `ModuleToBinary` pass. The option is removed as it cannot be parsed from textual form. This fixes issue #90344.	2024-04-28 10:03:12 -04:00
Jakub Kuderski	971b852546	[mlir][NFC] Simplify type checks with isa predicates (#87183 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-04-01 11:40:09 -04:00
Fangrui Song	886ecb3078	[mlir] Remove setRelaxELFRelocations. NFC The option is always true (see `2aedfdd9b8`) and the MCAsmInfo option is going away in favor of MCTargetOptions.	2024-03-06 23:12:40 -08:00
Ingo Müller	6e27dd47e1	[mlir][gpu] Replace MLIR_GPU_TO_HSACO_PASS_ENABLE by more generic one. (#84001 ) This is another follow-up of #83004. The PR replaces the macro `MLIR_GPU_TO_HSACO_PASS_ENABLE` with the more generic macro `MLIR_ENABLE_ROCM_CONVERSIONS`. Until now, the former has been defined if and only if the latter evaluated to true in CMake. However, the former was not defined when the latter evaluated to false, in which case a warning was raised if compiled with `-Wundef`. Using a single macro relies on the `#cmakedefine01` mechanism that ensures the macro is always set to either 0 or 1.	2024-03-06 09:53:30 +01:00
Ingo Müller	f3be842728	[mlir] Expose MLIR_ROCM_CONVERSIONS_ENABLED in mlir-config.h. (#83977 ) This is a follow up of #83004, which made the same change for `MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit exposes mentioned CMake variable through `mlir-config.h` and uses the macro that is introduced with the same name. This replaces the macro `MLIR_ROCM_CONVERSIONS_ENABLED`, which the CMake files previously defined manually.	2024-03-05 15:37:14 +01:00
Ingo Müller	5f2097dbed	[mlir] Expose MLIR_CUDA_CONVERSIONS_ENABLED in mlir-config.h. (#83004 ) That macro was not defined in some cases and thus yielded warnings if compiled with `-Wundef`. In particular, they were not defined in the BUILD files, so the GPU targets were broken when built with Bazel. This commit exposes mentioned CMake variable through mlir-config.h and uses the macro that is introduced with the same name. This replaces the macro MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined manually.	2024-02-28 14:48:40 +01:00
Fabian Mora	f204aee1b9	[mlir][GPU] Remove the SerializeToCubin pass (#82486 ) The `SerializeToCubin` pass was deprecated in September 2023 in favor of GPU compilation attributes; see the [GPU compilation](https://mlir.llvm.org/docs/Dialects/GPU/#gpu-compilation) section in the `gpu` dialect MLIR docs. This patch removes `SerializeToCubin` from the repo.	2024-02-21 20:47:19 -05:00
Thomas Preud'homme	76e79b0bef	Fix duplicate mapping detection in gpu::setMappingAttr() (#77499 )	2024-02-20 09:54:00 +00:00
Saiyedul Islam	082f87c9d4	[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Corresponding llvm-objdump AMDGPU lit tests are updated in a follow-up PR.	2024-01-23 17:08:18 +05:30
Mehdi Amini	e730f76005	Apply clang-tidy fixes for llvm-qualified-auto in DecomposeMemrefs.cpp (NFC)	2024-01-17 08:51:41 -08:00
Fabian Mora	5b4f2b906b	[mlir][gpu] Add an offloading handler attribute to `gpu.module` (#78047 ) This patch adds an optional offloading handler attribute to the`gpu.module` op. This attribute will be used during `gpu-module-to-binary` pass to override the offloading handler used in the `gpu.binary` op.	2024-01-15 16:58:10 -05:00
Guray Ozen	5b33cff397	[mlir][gpu] Add Support for Cluster of Thread Blocks in `gpu.launch` (#76924 )	2024-01-06 11:17:01 +01:00
Jakub Kuderski	c0345b4648	[mlir][gpu] Add subgroup_reduce to shuffle lowering (#76530 ) This supports both the scalar and the vector multi-reduction cases.	2024-01-02 16:14:22 -05:00
Jakub Kuderski	2af186f9bd	[mlir][gpu] Add patterns to break down subgroup reduce (#76271 ) The new patterns break down subgroup reduce ops with vector values into a sequence of subgroup reductions that fit the native shuffle size. The maximum/native shuffle size is parametrized. The overall goal is to be able to perform multi-element reductions with a sequence of `gpu.shuffle` ops.	2023-12-28 14:39:46 -05:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Jakub Kuderski	9f74e6e615	[mlir][vector][gpu] Use `makeArithReduction` in lowering patterns. NFC. (#75952 ) Use the `vector::makeArithReduction` helper as the source-of-truth of reduction to arith ops lowering.	2023-12-19 19:04:27 -05:00
Fabian Mora	419c45a325	[mlir][gpu] Fix crash in `gpu-module-to-binary` (#75477 ) This patch fixes the error in issue #75434. The crash was being caused by not checking for a lack of target attributes in a GPU module. It's now considered an error to invoke the pass with a GPU module with no target attributes.	2023-12-14 14:03:10 -05:00
Kazu Hirata	88d319a29f	[mlir] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 22:58:30 -08:00
Adrian Kuegel	8a5b448fa0	[mlir][GPU] Apply ClangTidy fixes Use const reference in loops if possible.	2023-12-12 07:34:03 +00:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `dd09221a29` and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Jakub Kuderski	dd09221a29	Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit `e0aac8c88d`. I'm seeing some nvidia integration test failures: https://lab.llvm.org/buildbot/#/builders/61/builds/52334.	2023-11-27 11:29:23 -05:00
Jakub Kuderski	e0aac8c88d	[mlir][gpu] Align reduction operations with vector combining kinds (#73423 ) The motivation for this change is explained in https://github.com/llvm/llvm-project/issues/72354. Before this change, we could not tell between signed/unsigned minimum/maximum and NaN treatment for floating point values. The mapping of old reduction operations to the new ones is as follows: * `min` --> `minsi` for ints, `minf` for floats * `max` --> `maxsi` for ints, `maxf` for floats New reduction kinds not represented in the old enum: `minui`, `maxui`, `minimumf`, `maximumf`. As a next step, I would like to have a common definition of combining kinds used by the `vector` and `gpu` dialects. Separately, the GPU to SPIR-V lowering does not yet properly handle zero and NaN values -- the behavior of floating point min/max group reductions is not specified by the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-11-27 11:19:20 -05:00
drazi	9a3d3c7093	generalize pass gpu-kernel-outlining for symbol op (#72074 ) This PR generalize gpu-out-lining pass to take care of ops `SymbolOpInterface` instead of just `func::FuncOp`. Before this change, gpu-out-lining pass will skip `llvm.func`. ```mlir module { llvm.func @main() { %c1 = arith.constant 1 : index gpu.launch blocks(%arg0, %arg1, %arg2) in (%arg6 = %c1, %arg7 = %c1, %arg8 = %c1) threads(%arg3, %arg4, %arg5) in (%arg9 = %c1, %arg10 = %c1, %arg11 = %c1) { gpu.terminator } llvm.return } } ``` After this change, gpu-out-lining pass can handle llvm.func as well.	2023-11-12 21:48:49 -08:00
spaceotter	00c3c73189	[mlir][gpu] Separate the barrier elimination code from transform ops (#71762 ) Allows the barrier elimination code to be run from C++ as well. The code from transforms dialect is copied as-is, the pass and populate functions have beed added at the end. Co-authored-by: Eric Eaton <eric@nod-labs.com>	2023-11-10 17:59:09 -08:00
Krzysztof Drewniak	05fa923a9b	Fix SmallVector usage in SerailzeToHsaco (#71702 ) Enable merging #71439 by removing a definitely-wrong usage of std::unique_ptr<SmallVectorImpl<char>> as a return value with passing in a SmallVectorImpl<char>& Also change the following function to take ArrayRef<char> instead of const SmalVectorImpl<char>& .	2023-11-08 13:57:41 -06:00
Fabian Mora	42630689e2	[mlir][gpu] Clean GPU `Passes.h` from external SPIRV includes (#71331 ) Removes the `SPIRVAttributes.h` header from `GPU/Transforms/Passes.h`	2023-11-05 17:06:04 -08:00
Sang Ik Lee	2dace04521	[mlir][spirv] Implement gpu::TargetAttrInterface (#69949 ) This commit implements gpu::TargetAttrInterface for SPIR-V target attribute. The plan is to use this to enable GPU compilation pipeline for OpenCL kernels later. The changes do not impact Vulkan shaders using milr-vulkan-runner. New GPU Dialect transform pass spirv-attach-target is implemented for attaching attribute from CLI. gpu-module-to-binary pass now works with GPU module that has SPIR-V module with OpenCL kernel functions inside.	2023-11-05 08:11:53 -08:00

1 2 3 4 5 ...

288 Commits