Commit Graph

11030 Commits

Author SHA1 Message Date
xiaoleis-nv
d03f35f9b6 [MLIR][NVVM] Fix the datatype error for nvvm.mma.sync when the operand is bf16 (#122664)
The PR fixes the datatype error for `nvvm.mma.sync` when the operand is
`bf16`. This operation originally requires the A/B type to be `f16x2`
for the `bf16` MMA. However, it violates the NVVM intrinsic
[[here](372044ee09/llvm/include/llvm/IR/IntrinsicsNVVM.td (L119))],
where the A/B operand type should be `i32`. This is a bug, and there are
no tests in MLIR that cover this datatype.

```
    // mma bf16 -> s32 @ m16n8k16/m16n8k8
    !eq(gft,"m16n8k16:a:bf16") : !listsplat(llvm_i32_ty, 4),
    !eq(gft,"m16n8k16:b:bf16") : !listsplat(llvm_i32_ty, 2),
    !eq(gft,"m16n8k8:a:bf16") : !listsplat(llvm_i32_ty, 2),
    !eq(gft,"m16n8k8:b:bf16") : [llvm_i32_ty],
```

This PR addresses this bug and adds tests to guarantee correctness.

Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
2025-01-13 15:03:05 +05:30
Kazu Hirata
4f4e2abb1a [mlir] Migrate away from PointerUnion::{is,get} (NFC) (#122591)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2025-01-11 13:16:43 -08:00
William Moses
38fcf62483 [MLIR] Import LLVM add flag to disable loadalldialects (#122574)
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2025-01-11 09:11:22 -05:00
Kazu Hirata
35e89897a4 [Dialect] Migrate away from PointerUnion::{is,get} (NFC) (#122568)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>
2025-01-11 02:06:33 -08:00
Matthias Springer
5d26a6d759 [mlir][Interfaces] ViewLikeOpInterface: Remove parser/printer overloads (#122436)
#115808 adds additional `custom<>` parser/printer variants. The overall
list of overloads/variants is getting larger.

This commit removes overloads that are not needed, to keep the
parser/printer simple.
2025-01-10 17:18:53 +01:00
Guray Ozen
66e41a1a20 [MLIR][NVVM] Declare InferIntRangeInterface for RangeableRegisterOp (#122263) 2025-01-10 10:32:25 +01:00
Krzysztof Drewniak
0aa831e0ed [mlir][GPU] Implement ValueBoundsOpInterface for GPU ID operations (#122190)
The GPU ID operations already implement InferIntRangeInterface, which
gives constant lower and upper bounds on those IDs when appropriate
metadata is prentent on the operations or in the surrounding context.

This commit uses that existing code to implement the
ValueBoundsOpInterface, which is used when analyzing affine operations
(unlike the integer range interface, which is used for arithmetic
optimization).

It also implements the interface for gpu.launch, where we can use it to
express the constraint that block/grid sizes are equal to their value
from outside the launch op and that the corresponding IDs are bounded
above by that size.

As a consequence, the test pass for this inference is updated to work on
a FunctionOpInterface and not a func.func, creating minor churn in other
tests.
2025-01-09 11:42:22 -08:00
Razvan Lupusoru
cbcb7ad32e [mlir][acc] Introduce MappableType interface (#122146)
OpenACC data clause operations previously required that the variable
operand implemented PointerLikeType interface. This was a reasonable
constraint because the dialects currently mixed with `acc` do use
pointers to represent variables. However, this forces the "pointer"
abstraction to be exposed too early and some cases are not cleanly
representable through this approach (more specifically FIR's `fix.box`
abstraction).

Thus, relax this by allowing a variable to be a type which implements
either `PointerLikeType` interface or `MappableType` interface.
2025-01-09 10:27:37 -08:00
Andrea Faulds
7724be9728 [mlir][spirv] Do SPIR-V serialization in -test-vulkan-runner-pipeline (#121494)
This commit is a further incremental step toward moving the whole
mlir-vulkan-runner MLIR pass pipeline into mlir-opt (see #73457). The
previous step was b225b3adf7b78387c9fcb97a3ff0e0a1e26eafe2, which moved
all device passes prior to SPIR-V serialization into a new mlir-opt test
pass, `-test-vulkan-runner-pipeline`.

This commit changes how SPIR-V serialization is accomplished for Vulkan
runner tests. Until now, this was done by the Vulkan-specific
ConvertGpuLaunchFuncToVulkanLaunchFunc pass. With this commit, this
responsibility is removed from that pass, and is instead done with the
existing generic GpuModuleToBinaryPass. In addition, the SPIR-V
serialization step is no longer done inside mlir-vulkan-runner, but
rather inside mlir-opt (in the `-test-vulkan-runner-pipeline` pass).
Both of these changes represent a greater alignment between
mlir-vulkan-runner and the other GPU integration tests. Notably, the IR
shapes produced by the mlir-opt pipelines for the Vulkan and SYCL
runners are now much more similar, with both using a gpu.binary op for
the serialized SPIR-V kernel.

In order to enable this, this commit includes these supporting changes:

- ConvertToSPIRVPass is enhanced to support producing the IR shape where
a spirv.module is nested inside a gpu.module, since this is what
GpuModuleToBinaryPass expects.
- ConvertGPULaunchFuncToVulkanLaunchFunc is changed to remove its SPIR-V
serialization functionality, and instead now extracts the SPIR-V from a
gpu.binary operation (as produced by ConvertToSPIRVPass).
- `-test-vulkan-runner-pipeline` now attaches SPIR-V target information
required by GpuModuleToBinaryPass.
- The WebGPU pass option, which had been removed from mlir-vulkan-runner
in the previous commit in this series, is restored as an option to
`-test-vulkan-runner-pipeline` instead, so that the WebGPU pass
continues being inserted into the pipeline just before SPIR-V
serialization.
2025-01-09 17:58:51 +01:00
Arda Unal
b3ce6dc723 [mlir][licm] Make scf.if recursively speculatable (#122031)
This change:

-  makes **scf.if** recursively speculatable like **affine.if** is. 

- also introduces related LICM tests for both **scf.if** and
**affine.if**
2025-01-08 09:54:18 -08:00
Matthias Springer
4751f47c7a [mlir][Transforms] Dialect conversion: Turn LLVM_DEPRECATED into comments (#122073)
Some functions of the deprecated 1:N dialect conversion were marked as
`LLVM_DEPRECATED`. This caused compilation warnings because there are
still test cases of the 1:N dialect conversion framework. (These test
cases will be deleted at the same time when the 1:N driver is deleted.)
2025-01-08 17:10:06 +01:00
William Moses
1c067a513c [MLIR] Enable import of non self referential alias scopes (#121987)
Fixes #121965.

---------

Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
Co-authored-by: Alex Zinenko <git@ozinenko.com>
2025-01-08 13:40:05 +01:00
Jack Frankland
360a03c980 [mlir][tosa] Add acc_type to Tosa-v1.0 Conv Ops (#121466)
Tosa v1.0 adds accumulator type attributes to the various convolution
operations defined in the spec. Update the dialect and any lit tests to
include these attributes.

Signed-off-by: Tai Ly <tai.ly@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>
2025-01-08 12:12:26 +02:00
Longsheng Mou
c1d01b2fc2 [mlir][tosa] Add missing verifier for tosa.pad (#120934)
This PR adds a missing verifier for `tosa.pad`, ensuring that the
padding shape matches [2*rank(shape1)] according to V1.0.0
Specification. Fixes #119840.
2025-01-08 10:45:59 +02:00
Guray Ozen
f50f9698ad [MLIR][GPU] Fix gpu.printf (#121940) 2025-01-08 08:25:57 +01:00
Michael Jungmair
1fb98b5a7e [mlir][Transforms] Make LocationSnapshotPass respect OpPrintingFlags (#119373)
The current implementation of LocationSnapshotPass takes an
OpPrintingFlags argument and stores it as member, but does not use it
for printing.

Properly implement the printing flags, also supporting command line args.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2025-01-07 12:14:35 +01:00
William Moses
5656cbca52 [MLIR][CAPI] export LLVMFunctionType param getter and setters (#121888) 2025-01-07 02:39:44 -05:00
Ian Wood
fe42e63d7b [mlir][NFC] Refactor eraseState to take constant time (#121670)
Refactors `analysisStates` to use two nested maps . This prevents
`eraseState` from having to scan through every analysis state which can
be costly when there are many analysis states and/or `eraseState` is
called frequently.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-01-06 10:05:14 -08:00
Maksim Levental
0c1cf75300 [mlir] DCE RegisteredOperationName::parseAssembly decl (#121730) 2025-01-06 07:12:59 -05:00
Maksim Levental
9ce8f4b70b [mlir] DCE friend Dialect::registerDialect (#121728) 2025-01-06 07:12:07 -05:00
Matthias Springer
599c739905 [mlir][GPU] Add NVVM-specific cf.assert lowering (#120431)
This commit add an NVIDIA-specific lowering of `cf.assert` to to
`__assertfail`.

Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and
`getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can
be reused.
2025-01-06 12:00:11 +01:00
William Moses
b5f21671ef MLIR: Enable importing inlineasm calls (#121624) 2025-01-05 11:02:49 -05:00
Matthias Springer
95c5c5d4ba [mlir][Transforms][NFC] Use DominanceInfo to compute materialization insertion point (#120746)
In the dialect conversion driver, use `DominanceInfo` to compute a
suitable insertion point for N:1 source materializations.
2025-01-04 09:23:15 +01:00
Matthias Springer
2d424765f4 [mlir][IR][NFC] DominanceInfo: Share same impl for block/op dominance (#115587)
The `properlyDominates` implementations for blocks and ops are very
similar. This commit replaces them with a single implementation that
operates on block iterators. That implementation can be used to
implement both `properlyDominates` variants.

Before:
```c++
template <bool IsPostDom>
bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl(Block *a,
                                                         Block *b) const;
template <bool IsPostDom>
bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl(
    Operation *a, Operation *b, bool enclosingOpOk) const;
```

After:
```c++
template <bool IsPostDom>
bool DominanceInfoBase<IsPostDom>::properlyDominatesImpl(
    Block *aBlock, Block::iterator aIt, Block *bBlock, Block::iterator bIt,
    bool enclosingOk) const;
```

Note: A subsequent commit will add a new public `properlyDominates`
overload that accepts block iterators. That functionality can then be
used to find a valid insertion point at which a range of values is
defined (by utilizing post dominance).
2025-01-04 09:12:03 +01:00
Krzysztof Drewniak
9f5cefebb4 [mlir][Affine] Generalize the linearize(delinearize()) simplifications (#117637)
The existing canonicalization patterns would only cancel out cases where
the entire result list of an affine.delineraize_index was passed to an
affine.lineraize_index and the basis elements matched exactly (except
possibly for the outer bounds).

This was correct, but limited, and left open many cases where a
delinearize_index would take a series of divisions and modulos only for
a subsequent linearize_index to use additions and multiplications to
undo all that work.

This sort of simplification is reasably easy to observe at the level of
splititng and merging indexes, but difficult to perform once the
underlying arithmetic operations have been created.

Therefore, this commit generalizes the existing simplification logic.

Now, any run of two or more delinearize_index results that appears
within the argument list of a linearize_index operation with the same
basis (or where they're both at the outermost position and so can be
unbonded, or when `linearize_index disjoint` implies a bound not present
on the `delinearize_index`) will be reduced to one signle
delinearize_index output, whose basis element (that is, size or length)
is equal to the product of the sizes that were simplified away.

That is, we can now simplify

    %0:2 = affine.delinearize_index %n into (8, 8) : inde, index
%1 = affine.linearize_index [%x, %0#0, %0#1, %y] by (3, 8, 8, 5) : index

to the simpler

    %1 = affine.linearize_index [%x, %n, %y] by (3, 64, 5) : index

This new pattern also works with dynamically-sized basis values.

While I'm here, I fixed a bunch of typos in existing tests, and added a
new getPaddedBasis() method to make processing
potentially-underspecified basis elements simpler in some cases.
2025-01-03 15:12:39 -06:00
Matthias Springer
3ace685105 [mlir][Transforms] Support 1:N mappings in ConversionValueMapping (#116524)
This commit updates the internal `ConversionValueMapping` data structure
in the dialect conversion driver to support 1:N replacements. This is
the last major commit for adding 1:N support to the dialect conversion
driver.

Since #116470, the infrastructure already supports 1:N replacements. But
the `ConversionValueMapping` still stored 1:1 value mappings. To that
end, the driver inserted temporary argument materializations (converting
N SSA values into 1 value). This is no longer the case. Argument
materializations are now entirely gone. (They will be deleted from the
type converter after some time, when we delete the old 1:N dialect
conversion driver.)

Note for LLVM integration: Replace all occurrences of
`addArgumentMaterialization` (except for 1:N dialect conversion passes)
with `addSourceMaterialization`.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
2025-01-03 16:11:56 +01:00
hatoo
cbff02b101 [mlir][emitc] Fix invalid syntax in example of emitc.return (#121112)
A return type of `emitc.func` must be specified with `->` instead of
`:`. I've verified the syntax using `mlir-translate --mlir-to-cpp`.
2025-01-02 18:13:27 +01:00
josel-amd
d622b66a82 Re-introduce Type Conversion on EmitC (#121476)
This PR reintroduces https://github.com/llvm/llvm-project/pull/118940
with a fix for the build issues on
cd9caf3aeed55280537052227f08bb1b41154efd
2025-01-02 14:58:15 +01:00
Marius Brehler
8178e72188 [mlir][func] Fix return op example (#121470)
Similiar to #121112.
2025-01-02 14:02:07 +01:00
Matthias Gehre
df728cf1d7 Revert "[MLIR][SCFToEmitC] Convert types while converting from SCF to EmitC (#118940)"
This reverts commit 450c6b02d2.
2025-01-02 11:55:35 +01:00
josel-amd
450c6b02d2 [MLIR][SCFToEmitC] Convert types while converting from SCF to EmitC (#118940)
Switch from rewrite patterns to conversion patterns. This allows to
perform type conversions together with other parts of the IR. For
example, this allows to convert from index to emit.size_t types.
2025-01-02 11:36:23 +01:00
Matthias Springer
80ecbaa3c0 [mlir][Transforms] Mark 1:N conversion driver as deprecated (#121102)
The 1:N conversion driver will be removed soon.

Note for LLVM integration: Please migrate your code base to the regular dialect conversion driver.
2024-12-31 13:11:19 +01:00
Maksim Levental
fb365ac86c [mlir][linalg] DCE unimplemented extra decl (#121272) 2024-12-30 13:51:11 -06:00
Amir Bishara
d9111f19d2 [mlir][bufferization]-Refactor findValueInReverseUseDefChain to accept opOperand (#121304)
Edit the `findValueInReverseUseDefChain` method to accept `OpOperand`
instead of the `Value` type, This change will make sure that the
populated `visitedOpOperands` argument is fully accurate and contains
the opOperand we have started the reverse chain from.
2024-12-30 21:18:38 +02:00
Longsheng Mou
79af7bdd4e [mlir][tosa] Add AllElementTypesMatch trait for tosa.transpose (#120964)
This PR adds `AllElementTypesMatch` trait for `tosa.transpose` to ensure
output tensor of same type as the input tensor. Fixes #119364.
2024-12-30 23:12:55 +08:00
Maksim Levental
60d20603e4 [mlir][xegpu] DCE decl in TD (#121249) 2024-12-30 06:21:07 -05:00
Maksim Levental
8487d2460e [mlir][shape] DCE unimplemented extra decl (#121275) 2024-12-29 12:13:46 -05:00
Maksim Levental
f1bc3afb6c [mlir][scf] DCE unimplemented decls in TDs (#121237)
More dead code in headers...
2024-12-28 14:53:05 -06:00
Maksim Levental
4a6fcc17c6 [mlir][emitc] DCE unimplemented decls (#121253) 2024-12-28 13:42:16 -05:00
Amir Bishara
7e749d4fb7 [mlir][bufferization]-Add ControlBuildSubsetExtractionFn to TensorEmptyElimination (#120851)
This PR Adds a `ControlBuildSubsetExtractionFn` to the tensor empty
elimination util, This will control the building of the subsets
extraction of the
`SubsetInsertionOpInterface`.

This control function returns the subsets extraction value that will
replace the `emptyTensorOp` use
which is being consumed by a specefic user (which the
 util expects to eliminate it).

The default control function will stay like today's behavior without any
additional changes.
2024-12-28 13:28:09 +02:00
Kunwar Grover
91bbebc7e1 [mlir][scf] Add getPartialResultTilePosition to PartialReductionOpInterface (#120465)
This PR adds a new interface method to PartialReductionOpInterface which
allows it to query the result tile position for the partial result.
Previously, tiling the reduction dimension with
SplitReductionOuterReduction when the result has transposed parallel
dimensions would produce wrong results.

Other fixes that were needed to make this PR work:

- Instead of ad-hoc logic to decide where to place the new reduction
dimensions in the partial result based on the iteration space, the
reduction dimensions are always appended to the partial result tensor.
- Remove usage of PartialReductionOpInterface in Mesh dialect. The
implementation was trying to just get a neutral element, but ended up
trying to use PartialReductionOpInterface for it, which is not right. It
was also passing the wrong sizes to it.
2024-12-27 16:52:34 +00:00
Kunwar Grover
5ad4213ef4 [mlir][Linalg] Allow PartialReductionOpInterface ops in tile_reduction_using_for (#120118)
The API used internally expects PartialReductionOpInterface. This patch
allows any operation implementing this interface to use this transform
op (instead of just LinalgOp).
2024-12-27 13:19:58 +00:00
Maksim Levental
6b53a9546c [mlir][arith] DCE getPredicateByName (#121165) 2024-12-26 17:38:18 -08:00
Oleksandr "Alex" Zinenko
776ac21c7f [mlir] minor documentation fix in GPUTransformOps.td (#121157)
- do not refer to handles as `PDLOperation`, this is an outdated and
incorrect vision of what they are based on the type used in the early
days;
 - use backticks around inline code.
2024-12-26 20:18:35 +01:00
srcarroll
8906b7be91 Enable custom alloc-like ops in promoteBufferResultsToOutParams (#120288)
In `buffer-results-to-out-params`, when `hoist-static-allocs` option is
enabled the pass was looking for `memref.alloc`s in order to attempt to
avoid copies when it can. Which makes it not extensible to external ops
that have allocation like properties. This patch simply changes
`memref::AllocOp` to `AllocationOpInterface` in the check to enable for
any allocation op.
Moreover, for function call updates, we enable setting an allocation
function callback in `BufferResultsToOutParamsOpts` to allow users to
emit their own alloc-like op.
2024-12-26 11:32:51 -06:00
Thirumalai Shaktivel
cbe583b0bd [Flang] Add translation support for MutexInOutSet and InOutSet (#120715)
Implementatoin details:
Both Mutexinoutset and Inoutset is recognized as flag=0x4 
and 0x8 respectively, the flags is set to `kmp_depend_info` and 
passed as argument to `__kmpc_omp_task_with_deps` runtime call
2024-12-26 15:02:09 +05:30
Krzysztof Drewniak
378e179337 [mlir][Properties] Shorten "Property" to "Prop" in most places (#120368)
Since the property system isn't currently in heavy use, it's probably
the right time to fix a choice I made when expanding ODS property
support.

Specifically, most of the property subclasses, like OptionalProperty or
IntProperty, wrote out the word "Property" in full. The corresponding
classes in the Attribute hierarchy uses the short-form "Attr" in those
cases, as in OptionalAttr or DefaultValuedAttr.

This commit changes all those uses of "Property" to "Prop" in order to
prevent excessively verbose tablegen files that needlessly repeat the
full name of a core concept that can be abbreviated.

So, this commit renames all the FooProperty classes to FooProp, and
keeps the existing names as alias with a Deprecated<> on them to warn
people.

In addition, this commit updates the documentation around properties to
mention the constraint support.
2024-12-23 15:57:34 -06:00
Hongren Zheng
a60050cf19 [mlir][dataflow] Allow re-run all analyses in DataFlowSolver (#120881)
In downstream (check https://github.com/google/heir/pull/1228,
especially [this
commit](fbf0b2733f);
also check https://github.com/google/heir/pull/1154) we often need to
re-run the analysis during the transformation pass as IR get changed
based on the analysis result and analysis continuously get invalidated.

There are solutions to it like `getOrCreateState` for newly created
`Value` (`AnchorT`), but warning is that the new state does not
propagate! This is quite unexpected as user of analysis would expect it
to propagate. We downstream used to use `solver->propagateIfChanged` but
that turned out to be not working, see detailed writeup in
https://github.com/google/heir/issues/1153.

Just call `initializeAndRun` repeatedly also does not solve the problem
as `analysisStates` is not cleared and the monotonicity of
`AnalysisState` will make the analysis invalid as `join` will not work
as expected (the first join is no longer `join(uninitialized, init
value)`, instead it becomes `join(higher value, init value)`.

To correctly re-run the analysis, either a new `DataFlowSolver` is
created, or we can just clear the `analysisState`.
2024-12-23 12:33:23 -08:00
Matthias Springer
df31fd8a36 [mlir] Fix use-after-return in #117513 (#120968)
Fix a use-after-return in #117513. Free-standing lambdas should not be
defined inside of the `LLVMTypeConverter` constructor because they go
out of scope.
2024-12-23 15:13:42 +01:00
Srinivasa Ravi
5f98dd5dd5 [MLIR][NVVM] Update Wgmma.fence Ops to use intrinsics (#120956)
This PR updates the WgmmaFenceAlignedOp, WgmmaGroupSyncAlignedOp, and
WgmmaWaitGroupSyncOp Ops in the NVVM Dialect to lower to the
corresponding intrinsics instead of inline-ptx.

The existing test under Conversion/NVVMToLLVM is updated to check for
the new patterns and separate tests are added under Target/LLVMIR to
verify the lowered intrinsics.
2024-12-23 18:56:48 +05:30