Commit Graph

552 Commits

Author SHA1 Message Date
donald chen
889b67c9d3 [mlir] [memref] add more checks to the memref.reinterpret_cast (#112669)
Operation memref.reinterpret_cast was accept input like:

%out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10],
strides: [1]
         : memref<?xf32> to memref<10xf32>

A problem arises: while lowering, the true offset of %out is %offset,
but its data type indicates an offset of 0. Permitting this
inconsistency can result in incorrect outcomes, as certain pass might
erroneously extract the offset from the data type of %out.

This patch fixes this by enforcing that the return value's data type
aligns
with the input parameter.
2024-10-26 08:07:51 +08:00
Mehdi Amini
8b47711e84 Revert "CMake: Remove unnecessary dependencies on LLVM/MLIR" (#110594)
Reverts llvm/llvm-project#110362

Multiple bots are broken.
2024-10-01 00:44:21 +02:00
BARRET
4980f2177e CMake: Remove unnecessary dependencies on LLVM/MLIR (#110362)
There are some spurious libraries which can be removed.

I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
2024-09-30 23:57:13 +02:00
Andrzej Warzyński
bfde17834d [mlir] Update the return type of getNum{Dynamic|Scalable}Dims (#110472)
Updates the return type of `getNumDynamicDims` and `getNumScalableDims`
from `int64_t` to `size_t`. This is for consistency with other
helpers/methods that return "size" and to reduce the number of
`static_cast`s in various places.
2024-09-30 14:53:50 +01:00
Andrea Faulds
a800ffac41 [mlir][gpu] Disjoint patterns for lowering clustered subgroup reduce (#109158)
Making the existing populateGpuLowerSubgroupReduceToShufflePatterns()
function also cover the new "clustered" subgroup reductions is proving
to be inconvenient, because certain backends may have more specific
lowerings that only cover the non-clustered type, and this creates pass
ordering constraints. This commit removes coverage of clustered
reductions from this function in favour of a new separate function,
which makes controlling the lowering much more straightforward.
2024-09-18 15:55:53 -04:00
Andrea Faulds
fd26f8444a [mlir][gpu] Rename two misspelled pattern population functions (#109015) 2024-09-17 15:26:14 -04:00
Andrea Faulds
3d01f0a33b [mlir][gpu] Add 'cluster_stride' attribute to gpu.subgroup_reduce (#107142)
Follow-up to 7aa22f013e, adding an
additional attribute needed in some applications.
2024-09-05 09:03:22 -04:00
Fabian Mora
016e1eb9c8 [mlir][gpu] Add metadata attributes for storing kernel metadata in GPU objects (#95292)
This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table`
attributes. The `#gpu.kernel_metadata` attribute allows storing metadata
related to a compiled kernel, for example, the number of scalar
registers used by the kernel. The attribute only has 2 required
parameters, the name and function type. It also has 2 optional
parameters, the arguments attributes and generic dictionary for storing
all other metadata.

The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`,
mapping the name of the kernel to the metadata.

Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to
collect ELF metadata from a binary, and to test the class methods in
both attributes.

Example:
```mlir
gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[
    #gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>,
    #gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]>
  ]> , bin = "BLOB">]

```
The motivation behind these attributes is to provide useful information
for things like tunning.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-27 18:44:50 -04:00
Fabian Mora
fd36a7b944 [mlir][gpu] Pass GPU module to TargetAttrInterface::createObject. (#94910)
This patch adds an argument to `gpu::TargetAttrInterface::createObject`
to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a
property dict for metadata, hence the module can be used for extracting
things like the symbol table and adding it to the property dict.

---------

Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2024-08-27 11:05:04 -04:00
Andrea Faulds
7aa22f013e [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)
This enables performing several reductions in parallel, each smaller
than the size of the subgroup.

One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.
2024-08-20 13:37:03 -04:00
Matthias Springer
7030280329 [mlir][GPU] Improve gpu.module op implementation (#102866)
- Replace hand-written parser/printer with auto-generated assembly
format.
- Remove implicit `gpu.module_end` terminator and use the `NoTerminator`
trait instead. (Same as `builtin.module`.)
- Turn the region into a graph region. (Same as `builtin.module`.)
2024-08-13 09:37:36 +02:00
Matthias Springer
7359a6b799 [mlir][ODS] Verify type constraints in Types and Attributes (#102326)
When a type/attribute is defined in TableGen, a type constraint can be
used for parameters, but the type constraint verification was missing.

Example:
```
def TestTypeVerification : Test_Type<"TestTypeVerification"> {
  let parameters = (ins AnyTypeOf<[I16, I32]>:$param);
  // ...
}
```

No verification code was generated to ensure that `$param` is I16 or
I32.

When type constraints a present, a new method will generated for types
and attributes: `verifyInvariantsImpl`. (The naming is similar to op
verifiers.) The user-provided verifier is called `verify` (no change).
There is now a new entry point to type/attribute verification:
`verifyInvariants`. This function calls both `verifyInvariantsImpl` and
`verify`. If neither of those two verifications are present, the
`verifyInvariants` function is not generated.

When a type/attribute is not defined in TableGen, but a verifier is
needed, users can implement the `verifyInvariants` function. (This
function was previously called `verify`.)

Note for LLVM integration: If you have an attribute/type that is not
defined in TableGen (i.e., just C++), you have to rename the
verification function from `verify` to `verifyInvariants`. (Most
attributes/types have no verification, in which case there is nothing to
do.)

Depends on #102657.
2024-08-09 22:04:40 +02:00
Angel Zhang
863a2ed440 [mlir][memref] Rename MemRef directories and files. NFC. (#102337)
This PR renames the `MemRef` integration test directory for and the
`DecomposeMemref.s.cpp` so that they can be found when doing a
case-sensitive search on file paths.
2024-08-07 15:41:40 -04:00
Nikhil Kalra
84cc1865ef [mlir] Support DialectRegistry extension comparison (#101119)
`PassManager::run` loads the dependent dialects for each pass into the
current context prior to invoking the individual passes. If the
dependent dialect is already loaded into the context, this should be a
no-op. However, if there are extensions registered in the
`DialectRegistry`, the dependent dialects are unconditionally registered
into the context.

This poses a problem for dynamic pass pipelines, however, because they
will likely be executing while the context is in an immutable state
(because of the parent pass pipeline being run).

To solve this, we'll update the extension registration API on
`DialectRegistry` to require a type ID for each extension that is
registered. Then, instead of unconditionally registered dialects into a
context if extensions are present, we'll check against the extension
type IDs already present in the context's internal `DialectRegistry`.
The context will only be marked as dirty if there are net-new extension
types present in the `DialectRegistry` populated by
`PassManager::getDependentDialects`.

Note: this PR removes the `addExtension` overload that utilizes
`std::function` as the parameter. This is because `std::function` is
copyable and potentially allocates memory for the contained function so
we can't use the function pointer as the unique type ID for the
extension.

Downstream changes required:
- Existing `DialectExtension` subclasses will need a type ID to be
registered for each subclass. More details on how to register a type ID
can be found here:
8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)
- Existing uses of the `std::function` overload of `addExtension` will
need to be refactored into dedicated `DialectExtension` classes with
associated type IDs. The attached `std::function` can either be inlined
into or called directly from `DialectExtension::apply`.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-06 01:32:36 +02:00
Kazu Hirata
5262865aac [mlir] Construct SmallVector with ArrayRef (NFC) (#101896) 2024-08-04 11:43:05 -07:00
Ramkumar Ramachandra
db791b278a mlir/LogicalResult: move into llvm (#97309)
This patch is part of a project to move the Presburger library into
LLVM.
2024-07-02 10:42:33 +01:00
Mehdi Amini
d7da0ae4f4 [MLIR][NVVM] Reduce the scope of the LLVM_HAS_NVPTX_TARGET guard (#97349)
Most of the code here does not depend on the NVPTX target. In particular
the simple offload can just emit LLVM IR and we can use this without the
NVVM backend being built, which can be useful for a frontend that just
need to serialize the IR and leave it up to the runtime to JIT further.
2024-07-02 11:31:12 +02:00
Mehdi Amini
59d7d4bc32 [MLIR][NVVM] Remove irrelevant guards (#97345)
This code does not seem to involve the NVPTX backend anywhere.
2024-07-02 00:35:59 +02:00
Kazu Hirata
b7b337fb91 [mlir] Use llvm::unique (NFC) (#96415) 2024-06-24 11:54:02 -07:00
Fabian Mora
9ddf3b835c [mlir][gpu] Remove old GPU serialization passes (#94998)
This patch removes the last vestiges of the old gpu serialization
pipeline. To compile GPU code use target attributes instead.

See [Compilation overview | 'gpu' Dialect - MLIR
docs](https://mlir.llvm.org/docs/Dialects/GPU/#compilation-overview) for
additional information on the target attributes compilation pipeline
that replaced the old serialization pipeline.
2024-06-20 08:01:40 -05:00
Michael Kruse
6244d87f42 Avoid object libraries in the VS IDE (#93519)
As discussed in #89743, when using the Visual Studio solution
generators, object library projects are displayed as a collection of
non-editable *.obj files. To look for the corresponding source files,
one has to browse (or search) to the library's obj.libname project. This
patch tries to avoid this as much as possible.

For Clang, there is already an exception for XCode. We handle MSVC_IDE
the same way.

For MLIR, this is more complicated. There are explicit references to the
obj.libname target that only work when there is an object library. This
patch cleans up the reasons for why an object library is needed:

1. The obj.libname is modified in the calling CMakeLists.txt. Note that
with use-only references, `add_library(<name> ALIAS <target>)` could
have been used.

2. An `libMLIR.so` (mlir-shlib) is also created. This works by adding
linking the object libraries' object files into the libMLIR.so (in
addition to the library's own .so/.a). XCode is handled using the
`-force_load` linker option instead. Windows is not supported. This
mechanism is different from LLVM's llvm-shlib that is created by linking
static libraries with `-Wl,--whole-archive` (and `-Wl,-all_load` on
MacOS).

3. The library might be added to an aggregate library. In-tree, the
seems to be only `libMLIR-C.so` and the standalone example. In XCode, it
uses the object library and `-force_load` mechanism as above. Again,
this is different from `libLLVM-C.so`.

4. Build an object library whenever it was before this patch, except
when generating a Visual Studio solution. This condition could be
removed, but I am trying to avoid build breakages of whatever
configurations others use.

This seems to never have worked with XCode because of the explicit
references to obj.libname (reason 1.). I don't have access to XCode, but
I tried to preserve the current working. IMHO there should be a common
mechanism to build aggregate libraries for all LLVM projects instead of
the 4 that we have now.

As far as I can see, this means for LLVM there are the following changes
on whether object libraries are created:

1. An object library is created even in XCode if FORCE_OBJECT_LIBRARY is
set. I do not know how XCode handles it, but I also know CMake will
abort otherwise.

2. An object library is created even for explicitly SHARED libraries for
building `libMLIR.so`. Again, mlir-shlib does not work otherwise.
`libMLIR.so` itself is created using SHARED so this patch is marking it
as EXCLUDE_FROM_LIBMLIR.

3. For the second condition, it is now sensitive to whether the
mlir-shlib is built at all (LLVM_BUILD_LLVM_DYLIB). However, an object
library is still built using the fourth condition unless using the MSVC
solution generator. That is, except with MSVC_IDE, when an object
library was built before, it will also be an object library now.
2024-06-19 14:30:01 +02:00
Krzysztof Drewniak
43fd4c49bd [mlir][GPU] Improve handling of GPU bounds (#95166)
This change reworks how range information for GPU dispatch IDs (block
IDs, thread IDs, and so on) is handled.

1. `known_block_size` and `known_grid_size` become inherent attributes
of GPU functions. This makes them less clunky to work with. As a
consequence, the `gpu.func` lowering patterns now only look at the
inherent attributes when setting target-specific attributes on the
`llvm.func` that they lower to.
2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size`
are made official dialect-level discardable attributes which can be
placed on arbitrary functions. This allows for progressive lowerings
(without this, a lowering for `gpu.thread_id` couldn't know about the
bounds if it had already been moved from a `gpu.func` to an `llvm.func`)
and allows for range information to be provided even when
`gpu.*_{id,dim}` are being used outside of a `gpu.func` context.
3. All of these index operations have gained an optional `upper_bound`
attribute, allowing for an alternate mode of operation where the bounds
are specified locally and not inherited from the operation's context.
These also allow handling of cases where the precise launch sizes aren't
known, but can be bounded more precisely than the maximum of what any
platform's API allows. (I'd like to thank @benvanik for pointing out
that this could be useful.)

When inferring bounds (either for range inference or for setting `range`
during lowering) these sources of information are consulted in order of
specificity (`upper_bound` > inherent attribute > discardable attribute,
except that dimension sizes check for `known_*_bounds` to see if they
can be constant-folded before checking their `upper_bound`).

This patch also updates the documentation about the bounds and inference
behavior to clarify what these attributes do when set and the
consequences of setting them up incorrectly.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-06-17 23:47:38 -05:00
Fabian Mora
3a2f7d8a9f Revert "Reland [mlir][Target] Improve ROCDL gpu serialization API" (#95847)
Reverts llvm/llvm-project#95813
2024-06-17 16:19:21 -05:00
Fabian Mora
dcb6c0d71c Reland [mlir][Target] Improve ROCDL gpu serialization API (#95813)
Reland: https://github.com/llvm/llvm-project/pull/95456

This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.
- Changes the style of the error messages to be composable, ie no full
stops.
- Adds an error message for when the ROCm toolkit can't be found but it
was required.
2024-06-17 15:44:35 -05:00
Fabian Mora
57b8be463a Revert [mlir][Target] Improve ROCDL gpu serialization API (#95790)
Reverts llvm/llvm-project#95456
2024-06-17 09:09:34 -05:00
Fabian Mora
954cb5f9a2 [mlir][Target] Improve ROCDL gpu serialization API (#95456)
This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.

This patch also changes the behavior of the CMake flag
`DEFAULT_ROCM_PATH`. Before it would fall back to a default value of
`/opt/rocm` if not specified. However, that default value causes fragile
builds in environments with ROCm. Now, the flag falls back to the empty
string, making it clear that **the user must provide a value at LLVM
build time**.
2024-06-17 09:02:55 -05:00
Fabian Mora
f3b4c00304 [mlir][gpu] Add builder to gpu.launch_func (#95541)
This patch adds a builder to `gpu.launch_func` allowing it to be created
using `SymbolRefAttr` instead of `GPUFuncOp`. This allows creating
`launch_func` when only a `gpu.binary` is present, instead of the full
`gpu.module {...}`.
2024-06-15 12:40:14 -05:00
Pradeep Kumar
bd6568c98a [MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245)
This commit adds support for `gpu.cluster_dim_blocks` and
`gpu.cluster_block_id` Ops to represent number of blocks per cluster and
block id inside a cluster respectively. Also, fixed the description of
`gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use
`gpu.cluster_dim_blocks`

Co-authored-by: pradeepku <pradeepku@nvidia.com>
Co-authored-by: Guray Ozen <guray.ozen@gmail.com>
2024-06-14 10:35:35 +05:30
Fabian Mora
54373e0f40 [NFC][mlir][gpu] Make sym_name an inherent attr in GPUModuleOp (#94918)
Make `sym_name` an inherent attr in GPUModuleOp so that it doesn't show
in the discardable attributes.
The change is safe as the attribute is always expected to be present.
2024-06-09 17:36:19 -05:00
tyb0807
8178a3ad1b [mlir] Replace MLIR_ENABLE_CUDA_CONVERSIONS with LLVM_HAS_NVPTX_TARGET (#93008)
LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX
was enabled when building LLVM. Use this instead of manually defining
MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).
2024-05-24 17:31:28 +02:00
Benjamin Kramer
29c2475f21 [mlir] Fix the build after 03c53c69a3 2024-05-15 18:34:59 +02:00
Kazu Hirata
dec8055a1e [mlir] Use StringRef::operator== instead of StringRef::equals (NFC) (#91560)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  10 under mlir/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-08 23:52:22 -07:00
Mehdi Amini
d566a5cd22 [MLIR] Improve KernelOutlining to avoid introducing an extra block (#90359)
This fixes a TODO in the code.
2024-04-29 19:30:38 +02:00
Fabian Mora
dc6ce60801 [mlir][gpu] Remove offloadingHandler from ModuleToBinary (#90368)
This patch removes the `offloadingHandler` option from the
`ModuleToBinary` pass. The option is removed as it cannot be parsed from
textual form.

This fixes issue #90344.
2024-04-28 10:03:12 -04:00
Christian Sigg
a5757c5b65 Switch member calls to isa/dyn_cast/cast/... to free function calls. (#89356)
This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-19 15:58:27 +02:00
Jakub Kuderski
971b852546 [mlir][NFC] Simplify type checks with isa predicates (#87183)
For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.
2024-04-01 11:40:09 -04:00
Justin Fargnoli
35d55f2894 [NFC][mlir] Reorder declarePromisedInterface() operands (#86628)
Reorder the template operands of `declarePromisedInterface()` to match
`declarePromisedInterfaces()`.
2024-03-27 10:30:17 -07:00
Oleksandr "Alex" Zinenko
5a9bdd85ee [mlir] split transform interfaces into a separate library (#85221)
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.

Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.

As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
2024-03-20 22:15:17 +01:00
Justin Fargnoli
513cdb8222 [mlir] Declare promised interfaces for all dialects (#78368)
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.

Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
2024-03-15 20:23:20 -07:00
Justin Lebar
fab2bb8bfd Add llvm::min/max_element and use it in llvm/ and mlir/ directories. (#84678)
For some reason this was missing from STLExtras.
2024-03-10 20:00:13 -07:00
Fangrui Song
886ecb3078 [mlir] Remove setRelaxELFRelocations. NFC
The option is always true (see 2aedfdd9b8)
and the MCAsmInfo option is going away in favor of MCTargetOptions.
2024-03-06 23:12:40 -08:00
Ingo Müller
6e27dd47e1 [mlir][gpu] Replace MLIR_GPU_TO_HSACO_PASS_ENABLE by more generic one. (#84001)
This is another follow-up of #83004. The PR replaces the macro
`MLIR_GPU_TO_HSACO_PASS_ENABLE` with the more generic macro
`MLIR_ENABLE_ROCM_CONVERSIONS`. Until now, the former has been defined
if and only if the latter evaluated to true in CMake. However, the
former was not defined when the latter evaluated to false, in which case
a warning was raised if compiled with `-Wundef`. Using a single macro
relies on the `#cmakedefine01` mechanism that ensures the macro is
always set to either 0 or 1.
2024-03-06 09:53:30 +01:00
Ingo Müller
f3be842728 [mlir] Expose MLIR_ROCM_CONVERSIONS_ENABLED in mlir-config.h. (#83977)
This is a follow up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_ROCM_CONVERSIONS_ENABLED`, which the CMake files previously
defined manually.
2024-03-05 15:37:14 +01:00
Ingo Müller
5f2097dbed [mlir] Expose MLIR_CUDA_CONVERSIONS_ENABLED in mlir-config.h. (#83004)
That macro was not defined in some cases and thus yielded warnings if
compiled with `-Wundef`. In particular, they were not defined in the
BUILD files, so the GPU targets were broken when built with Bazel. This
commit exposes mentioned CMake variable through mlir-config.h and uses
the macro that is introduced with the same name. This replaces the macro
MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined
manually.
2024-02-28 14:48:40 +01:00
Matthias Springer
492e8ba038 [mlir] Fix memory leaks after #81759 (#82762)
This commit fixes memory leaks that were introduced by #81759. The way
ops and blocks are erased changed slightly.

The leaks were caused by an incorrect implementation of op builders:
blocks must be created with the supplied builder object. Otherwise, they
are not properly tracked by the dialect conversion and can leak during
rollback.
2024-02-23 14:28:57 +01:00
Fabian Mora
f204aee1b9 [mlir][GPU] Remove the SerializeToCubin pass (#82486)
The `SerializeToCubin` pass was deprecated in September 2023 in favor of
GPU compilation attributes; see the [GPU
compilation](https://mlir.llvm.org/docs/Dialects/GPU/#gpu-compilation)
section in the `gpu` dialect MLIR docs.
This patch removes `SerializeToCubin` from the repo.
2024-02-21 20:47:19 -05:00
Thomas Preud'homme
76e79b0bef Fix duplicate mapping detection in gpu::setMappingAttr() (#77499) 2024-02-20 09:54:00 +00:00
Guray Ozen
d7f59c8fb8 [mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#81489)
This PR moves lowering of math dialect later in the pipeline. Because
math dialect is lowered correctly by createConvertGpuOpsToNVVMOps for
GPU target, and it needs to run it first.

Reland #78556
2024-02-13 08:31:42 +01:00
Benjamin Kramer
98dbc688de Revert "[mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556)"
This reverts commit 74bf0b1cd9. The test
always fails.

 | mlir/test/Dialect/GPU/test-nvvm-pipeline.mlir:23:16: error: CHECK-PTX: expected string not found in input
 |  // CHECK-PTX: __nv_expf

https://lab.llvm.org/buildbot/#/builders/61/builds/53789
2024-01-31 17:41:21 +01:00
Guray Ozen
74bf0b1cd9 [mlir] Lower math dialect later in gpu-lower-to-nvvm-pipeline (#78556)
This PR moves lowering of math dialect later in the pipeline. Because
math dialect is lowered correctly by `createConvertGpuOpsToNVVMOps` for
GPU target, and it needs to run it first.
2024-01-31 15:24:32 +01:00