clang-p2996

Author	SHA1	Message	Date
donald chen	889b67c9d3	[mlir] [memref] add more checks to the memref.reinterpret_cast (#112669 ) Operation memref.reinterpret_cast was accept input like: %out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10], strides: [1] : memref<?xf32> to memref<10xf32> A problem arises: while lowering, the true offset of %out is %offset, but its data type indicates an offset of 0. Permitting this inconsistency can result in incorrect outcomes, as certain pass might erroneously extract the offset from the data type of %out. This patch fixes this by enforcing that the return value's data type aligns with the input parameter.	2024-10-26 08:07:51 +08:00
Mehdi Amini	8b47711e84	Revert "CMake: Remove unnecessary dependencies on LLVM/MLIR" (#110594 ) Reverts llvm/llvm-project#110362 Multiple bots are broken.	2024-10-01 00:44:21 +02:00
BARRET	4980f2177e	CMake: Remove unnecessary dependencies on LLVM/MLIR (#110362 ) There are some spurious libraries which can be removed. I'm trying to bundle MLIR/LLVM library dependencies for our own libraries. We're utilizing cmake function to recursively collect MLIR/LLVM related dependencies. However, we identified certain library dependencies as redundant and safe for removal.	2024-09-30 23:57:13 +02:00
Andrzej Warzyński	bfde17834d	[mlir] Update the return type of `getNum{Dynamic\|Scalable}Dims` (#110472 ) Updates the return type of `getNumDynamicDims` and `getNumScalableDims` from `int64_t` to `size_t`. This is for consistency with other helpers/methods that return "size" and to reduce the number of `static_cast`s in various places.	2024-09-30 14:53:50 +01:00
Andrea Faulds	a800ffac41	[mlir][gpu] Disjoint patterns for lowering clustered subgroup reduce (#109158 ) Making the existing populateGpuLowerSubgroupReduceToShufflePatterns() function also cover the new "clustered" subgroup reductions is proving to be inconvenient, because certain backends may have more specific lowerings that only cover the non-clustered type, and this creates pass ordering constraints. This commit removes coverage of clustered reductions from this function in favour of a new separate function, which makes controlling the lowering much more straightforward.	2024-09-18 15:55:53 -04:00
Andrea Faulds	fd26f8444a	[mlir][gpu] Rename two misspelled pattern population functions (#109015 )	2024-09-17 15:26:14 -04:00
Andrea Faulds	3d01f0a33b	[mlir][gpu] Add 'cluster_stride' attribute to gpu.subgroup_reduce (#107142 ) Follow-up to `7aa22f013e`, adding an additional attribute needed in some applications.	2024-09-05 09:03:22 -04:00
Fabian Mora	016e1eb9c8	[mlir][gpu] Add metadata attributes for storing kernel metadata in GPU objects (#95292 ) This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table` attributes. The `#gpu.kernel_metadata` attribute allows storing metadata related to a compiled kernel, for example, the number of scalar registers used by the kernel. The attribute only has 2 required parameters, the name and function type. It also has 2 optional parameters, the arguments attributes and generic dictionary for storing all other metadata. The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`, mapping the name of the kernel to the metadata. Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to collect ELF metadata from a binary, and to test the class methods in both attributes. Example: ```mlir gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[ #gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>, #gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]> ]> , bin = "BLOB">] ``` The motivation behind these attributes is to provide useful information for things like tunning. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-08-27 18:44:50 -04:00
Fabian Mora	fd36a7b944	[mlir][gpu] Pass GPU module to `TargetAttrInterface::createObject`. (#94910 ) This patch adds an argument to `gpu::TargetAttrInterface::createObject` to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a property dict for metadata, hence the module can be used for extracting things like the symbol table and adding it to the property dict. --------- Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>	2024-08-27 11:05:04 -04:00
Andrea Faulds	7aa22f013e	[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851 ) This enables performing several reductions in parallel, each smaller than the size of the subgroup. One potential application is flash attention with subgroup-wide matrix multiplication and reduction combined in one kernel. The multiplication operation requires a 2D matrix to be distributed over the lanes of the subgroup, which then constrains the shape the following reduction can have if we want to keep data in registers.	2024-08-20 13:37:03 -04:00
Matthias Springer	7030280329	[mlir][GPU] Improve `gpu.module` op implementation (#102866 ) - Replace hand-written parser/printer with auto-generated assembly format. - Remove implicit `gpu.module_end` terminator and use the `NoTerminator` trait instead. (Same as `builtin.module`.) - Turn the region into a graph region. (Same as `builtin.module`.)	2024-08-13 09:37:36 +02:00
Matthias Springer	7359a6b799	[mlir][ODS] Verify type constraints in Types and Attributes (#102326 ) When a type/attribute is defined in TableGen, a type constraint can be used for parameters, but the type constraint verification was missing. Example: ``` def TestTypeVerification : Test_Type<"TestTypeVerification"> { let parameters = (ins AnyTypeOf<[I16, I32]>:$param); // ... } ``` No verification code was generated to ensure that `$param` is I16 or I32. When type constraints a present, a new method will generated for types and attributes: `verifyInvariantsImpl`. (The naming is similar to op verifiers.) The user-provided verifier is called `verify` (no change). There is now a new entry point to type/attribute verification: `verifyInvariants`. This function calls both `verifyInvariantsImpl` and `verify`. If neither of those two verifications are present, the `verifyInvariants` function is not generated. When a type/attribute is not defined in TableGen, but a verifier is needed, users can implement the `verifyInvariants` function. (This function was previously called `verify`.) Note for LLVM integration: If you have an attribute/type that is not defined in TableGen (i.e., just C++), you have to rename the verification function from `verify` to `verifyInvariants`. (Most attributes/types have no verification, in which case there is nothing to do.) Depends on #102657.	2024-08-09 22:04:40 +02:00
Angel Zhang	863a2ed440	[mlir][memref] Rename `MemRef` directories and files. NFC. (#102337 ) This PR renames the `MemRef` integration test directory for and the `DecomposeMemref.s.cpp` so that they can be found when doing a case-sensitive search on file paths.	2024-08-07 15:41:40 -04:00
Nikhil Kalra	84cc1865ef	[mlir] Support DialectRegistry extension comparison (#101119 ) `PassManager::run` loads the dependent dialects for each pass into the current context prior to invoking the individual passes. If the dependent dialect is already loaded into the context, this should be a no-op. However, if there are extensions registered in the `DialectRegistry`, the dependent dialects are unconditionally registered into the context. This poses a problem for dynamic pass pipelines, however, because they will likely be executing while the context is in an immutable state (because of the parent pass pipeline being run). To solve this, we'll update the extension registration API on `DialectRegistry` to require a type ID for each extension that is registered. Then, instead of unconditionally registered dialects into a context if extensions are present, we'll check against the extension type IDs already present in the context's internal `DialectRegistry`. The context will only be marked as dirty if there are net-new extension types present in the `DialectRegistry` populated by `PassManager::getDependentDialects`. Note: this PR removes the `addExtension` overload that utilizes `std::function` as the parameter. This is because `std::function` is copyable and potentially allocates memory for the contained function so we can't use the function pointer as the unique type ID for the extension. Downstream changes required: - Existing `DialectExtension` subclasses will need a type ID to be registered for each subclass. More details on how to register a type ID can be found here: `8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)` - Existing uses of the `std::function` overload of `addExtension` will need to be refactored into dedicated `DialectExtension` classes with associated type IDs. The attached `std::function` can either be inlined into or called directly from `DialectExtension::apply`. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-08-06 01:32:36 +02:00
Kazu Hirata	5262865aac	[mlir] Construct SmallVector with ArrayRef (NFC) (#101896 )	2024-08-04 11:43:05 -07:00
Ramkumar Ramachandra	db791b278a	mlir/LogicalResult: move into llvm (#97309 ) This patch is part of a project to move the Presburger library into LLVM.	2024-07-02 10:42:33 +01:00
Mehdi Amini	d7da0ae4f4	[MLIR][NVVM] Reduce the scope of the LLVM_HAS_NVPTX_TARGET guard (#97349 ) Most of the code here does not depend on the NVPTX target. In particular the simple offload can just emit LLVM IR and we can use this without the NVVM backend being built, which can be useful for a frontend that just need to serialize the IR and leave it up to the runtime to JIT further.	2024-07-02 11:31:12 +02:00
Mehdi Amini	59d7d4bc32	[MLIR][NVVM] Remove irrelevant guards (#97345 ) This code does not seem to involve the NVPTX backend anywhere.	2024-07-02 00:35:59 +02:00
Kazu Hirata	b7b337fb91	[mlir] Use llvm::unique (NFC) (#96415 )	2024-06-24 11:54:02 -07:00
Fabian Mora	9ddf3b835c	[mlir][gpu] Remove old GPU serialization passes (#94998 ) This patch removes the last vestiges of the old gpu serialization pipeline. To compile GPU code use target attributes instead. See [Compilation overview \| 'gpu' Dialect - MLIR docs](https://mlir.llvm.org/docs/Dialects/GPU/#compilation-overview) for additional information on the target attributes compilation pipeline that replaced the old serialization pipeline.	2024-06-20 08:01:40 -05:00
Michael Kruse	6244d87f42	Avoid object libraries in the VS IDE (#93519 ) As discussed in #89743, when using the Visual Studio solution generators, object library projects are displayed as a collection of non-editable *.obj files. To look for the corresponding source files, one has to browse (or search) to the library's obj.libname project. This patch tries to avoid this as much as possible. For Clang, there is already an exception for XCode. We handle MSVC_IDE the same way. For MLIR, this is more complicated. There are explicit references to the obj.libname target that only work when there is an object library. This patch cleans up the reasons for why an object library is needed: 1. The obj.libname is modified in the calling CMakeLists.txt. Note that with use-only references, `add_library(<name> ALIAS <target>)` could have been used. 2. An `libMLIR.so` (mlir-shlib) is also created. This works by adding linking the object libraries' object files into the libMLIR.so (in addition to the library's own .so/.a). XCode is handled using the `-force_load` linker option instead. Windows is not supported. This mechanism is different from LLVM's llvm-shlib that is created by linking static libraries with `-Wl,--whole-archive` (and `-Wl,-all_load` on MacOS). 3. The library might be added to an aggregate library. In-tree, the seems to be only `libMLIR-C.so` and the standalone example. In XCode, it uses the object library and `-force_load` mechanism as above. Again, this is different from `libLLVM-C.so`. 4. Build an object library whenever it was before this patch, except when generating a Visual Studio solution. This condition could be removed, but I am trying to avoid build breakages of whatever configurations others use. This seems to never have worked with XCode because of the explicit references to obj.libname (reason 1.). I don't have access to XCode, but I tried to preserve the current working. IMHO there should be a common mechanism to build aggregate libraries for all LLVM projects instead of the 4 that we have now. As far as I can see, this means for LLVM there are the following changes on whether object libraries are created: 1. An object library is created even in XCode if FORCE_OBJECT_LIBRARY is set. I do not know how XCode handles it, but I also know CMake will abort otherwise. 2. An object library is created even for explicitly SHARED libraries for building `libMLIR.so`. Again, mlir-shlib does not work otherwise. `libMLIR.so` itself is created using SHARED so this patch is marking it as EXCLUDE_FROM_LIBMLIR. 3. For the second condition, it is now sensitive to whether the mlir-shlib is built at all (LLVM_BUILD_LLVM_DYLIB). However, an object library is still built using the fourth condition unless using the MSVC solution generator. That is, except with MSVC_IDE, when an object library was built before, it will also be an object library now.	2024-06-19 14:30:01 +02:00
Krzysztof Drewniak	43fd4c49bd	[mlir][GPU] Improve handling of GPU bounds (#95166 ) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-06-17 23:47:38 -05:00
Fabian Mora	3a2f7d8a9f	Revert "Reland [mlir][Target] Improve ROCDL gpu serialization API" (#95847 ) Reverts llvm/llvm-project#95813	2024-06-17 16:19:21 -05:00
Fabian Mora	dcb6c0d71c	Reland [mlir][Target] Improve ROCDL gpu serialization API (#95813 ) Reland: https://github.com/llvm/llvm-project/pull/95456 This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. - Changes the style of the error messages to be composable, ie no full stops. - Adds an error message for when the ROCm toolkit can't be found but it was required.	2024-06-17 15:44:35 -05:00
Fabian Mora	57b8be463a	Revert [mlir][Target] Improve ROCDL gpu serialization API (#95790 ) Reverts llvm/llvm-project#95456	2024-06-17 09:09:34 -05:00
Fabian Mora	954cb5f9a2	[mlir][Target] Improve ROCDL gpu serialization API (#95456 ) This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. This patch also changes the behavior of the CMake flag `DEFAULT_ROCM_PATH`. Before it would fall back to a default value of `/opt/rocm` if not specified. However, that default value causes fragile builds in environments with ROCm. Now, the flag falls back to the empty string, making it clear that the user must provide a value at LLVM build time.	2024-06-17 09:02:55 -05:00
Fabian Mora	f3b4c00304	[mlir][gpu] Add builder to `gpu.launch_func` (#95541 ) This patch adds a builder to `gpu.launch_func` allowing it to be created using `SymbolRefAttr` instead of `GPUFuncOp`. This allows creating `launch_func` when only a `gpu.binary` is present, instead of the full `gpu.module {...}`.	2024-06-15 12:40:14 -05:00
Pradeep Kumar	bd6568c98a	[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245 ) This commit adds support for `gpu.cluster_dim_blocks` and `gpu.cluster_block_id` Ops to represent number of blocks per cluster and block id inside a cluster respectively. Also, fixed the description of `gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use `gpu.cluster_dim_blocks` Co-authored-by: pradeepku <pradeepku@nvidia.com> Co-authored-by: Guray Ozen <guray.ozen@gmail.com>	2024-06-14 10:35:35 +05:30
Fabian Mora	54373e0f40	[NFC][mlir][gpu] Make sym_name an inherent attr in GPUModuleOp (#94918 ) Make `sym_name` an inherent attr in GPUModuleOp so that it doesn't show in the discardable attributes. The change is safe as the attribute is always expected to be present.	2024-06-09 17:36:19 -05:00
tyb0807	8178a3ad1b	[mlir] Replace MLIR_ENABLE_CUDA_CONVERSIONS with LLVM_HAS_NVPTX_TARGET (#93008 ) LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX was enabled when building LLVM. Use this instead of manually defining MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).	2024-05-24 17:31:28 +02:00
Benjamin Kramer	29c2475f21	[mlir] Fix the build after `03c53c69a3`	2024-05-15 18:34:59 +02:00
Kazu Hirata	dec8055a1e	[mlir] Use StringRef::operator== instead of StringRef::equals (NFC) (#91560 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 10 under mlir/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-08 23:52:22 -07:00
Mehdi Amini	d566a5cd22	[MLIR] Improve KernelOutlining to avoid introducing an extra block (#90359 ) This fixes a TODO in the code.	2024-04-29 19:30:38 +02:00
Fabian Mora	dc6ce60801	[mlir][gpu] Remove `offloadingHandler` from `ModuleToBinary` (#90368 ) This patch removes the `offloadingHandler` option from the `ModuleToBinary` pass. The option is removed as it cannot be parsed from textual form. This fixes issue #90344.	2024-04-28 10:03:12 -04:00
Christian Sigg	a5757c5b65	Switch member calls to `isa/dyn_cast/cast/...` to free function calls. (#89356 ) This change cleans up call sites. Next step is to mark the member functions deprecated. See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.	2024-04-19 15:58:27 +02:00
Jakub Kuderski	971b852546	[mlir][NFC] Simplify type checks with isa predicates (#87183 ) For more context on isa predicates, see: https://github.com/llvm/llvm-project/pull/83753.	2024-04-01 11:40:09 -04:00
Justin Fargnoli	35d55f2894	[NFC][mlir] Reorder `declarePromisedInterface()` operands (#86628 ) Reorder the template operands of `declarePromisedInterface()` to match `declarePromisedInterfaces()`.	2024-03-27 10:30:17 -07:00
Oleksandr "Alex" Zinenko	5a9bdd85ee	[mlir] split transform interfaces into a separate library (#85221 ) Transform interfaces are implemented, direction or via extensions, in libraries belonging to multiple other dialects. Those dialects don't need to depend on the non-interface part of the transform dialect, which includes the growing number of ops and transitive dependency footprint. Split out the interfaces into a separate library. This in turn requires flipping the dependency from the interface on the dialect that has crept in because both co-existed in one library. The interface shouldn't depend on the transform dialect either. As a consequence of splitting, the capability of the interpreter to automatically walk the payload IR to identify payload ops of a certain kind based on the type used for the entry point symbol argument is disabled. This is a good move by itself as it simplifies the interpreter logic. This functionality can be trivially replaced by a `transform.structured.match` operation.	2024-03-20 22:15:17 +01:00
Justin Fargnoli	513cdb8222	[mlir] Declare promised interfaces for all dialects (#78368 ) This PR adds promised interface declarations for all interfaces declared in `InitAllDialects.h`. Promised interfaces allow a dialect to declare that it will have an implementation of a particular interface, crashing the program if one isn't provided when the interface is used.	2024-03-15 20:23:20 -07:00
Justin Lebar	fab2bb8bfd	Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678 ) For some reason this was missing from STLExtras.	2024-03-10 20:00:13 -07:00
Fangrui Song	886ecb3078	[mlir] Remove setRelaxELFRelocations. NFC The option is always true (see `2aedfdd9b8`) and the MCAsmInfo option is going away in favor of MCTargetOptions.	2024-03-06 23:12:40 -08:00
Ingo Müller	6e27dd47e1	[mlir][gpu] Replace MLIR_GPU_TO_HSACO_PASS_ENABLE by more generic one. (#84001 ) This is another follow-up of #83004. The PR replaces the macro `MLIR_GPU_TO_HSACO_PASS_ENABLE` with the more generic macro `MLIR_ENABLE_ROCM_CONVERSIONS`. Until now, the former has been defined if and only if the latter evaluated to true in CMake. However, the former was not defined when the latter evaluated to false, in which case a warning was raised if compiled with `-Wundef`. Using a single macro relies on the `#cmakedefine01` mechanism that ensures the macro is always set to either 0 or 1.	2024-03-06 09:53:30 +01:00
Ingo Müller	f3be842728	[mlir] Expose MLIR_ROCM_CONVERSIONS_ENABLED in mlir-config.h. (#83977 ) This is a follow up of #83004, which made the same change for `MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit exposes mentioned CMake variable through `mlir-config.h` and uses the macro that is introduced with the same name. This replaces the macro `MLIR_ROCM_CONVERSIONS_ENABLED`, which the CMake files previously defined manually.	2024-03-05 15:37:14 +01:00
Ingo Müller	5f2097dbed	[mlir] Expose MLIR_CUDA_CONVERSIONS_ENABLED in mlir-config.h. (#83004 ) That macro was not defined in some cases and thus yielded warnings if compiled with `-Wundef`. In particular, they were not defined in the BUILD files, so the GPU targets were broken when built with Bazel. This commit exposes mentioned CMake variable through mlir-config.h and uses the macro that is introduced with the same name. This replaces the macro MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined manually.	2024-02-28 14:48:40 +01:00
Matthias Springer	492e8ba038	[mlir] Fix memory leaks after #81759 (#82762 ) This commit fixes memory leaks that were introduced by #81759. The way ops and blocks are erased changed slightly. The leaks were caused by an incorrect implementation of op builders: blocks must be created with the supplied builder object. Otherwise, they are not properly tracked by the dialect conversion and can leak during rollback.	2024-02-23 14:28:57 +01:00
Fabian Mora	f204aee1b9	[mlir][GPU] Remove the SerializeToCubin pass (#82486 ) The `SerializeToCubin` pass was deprecated in September 2023 in favor of GPU compilation attributes; see the [GPU compilation](https://mlir.llvm.org/docs/Dialects/GPU/#gpu-compilation) section in the `gpu` dialect MLIR docs. This patch removes `SerializeToCubin` from the repo.	2024-02-21 20:47:19 -05:00
Thomas Preud'homme	76e79b0bef	Fix duplicate mapping detection in gpu::setMappingAttr() (#77499 )	2024-02-20 09:54:00 +00:00
Guray Ozen	d7f59c8fb8	[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#81489 ) This PR moves lowering of math dialect later in the pipeline. Because math dialect is lowered correctly by createConvertGpuOpsToNVVMOps for GPU target, and it needs to run it first. Reland #78556	2024-02-13 08:31:42 +01:00
Benjamin Kramer	98dbc688de	Revert "[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556 )" This reverts commit `74bf0b1cd9`. The test always fails. \| mlir/test/Dialect/GPU/test-nvvm-pipeline.mlir:23:16: error: CHECK-PTX: expected string not found in input \| // CHECK-PTX: __nv_expf https://lab.llvm.org/buildbot/#/builders/61/builds/53789	2024-01-31 17:41:21 +01:00
Guray Ozen	74bf0b1cd9	[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556 ) This PR moves lowering of math dialect later in the pipeline. Because math dialect is lowered correctly by `createConvertGpuOpsToNVVMOps` for GPU target, and it needs to run it first.	2024-01-31 15:24:32 +01:00

1 2 3 4 5 ...

552 Commits