This tutorial gives an introduction to the `mlir-opt` tool, focusing on how to run basic passes with and without options, run pass pipelines from the CLI, and point out particularly useful flags. --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com> Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
295 lines
9.4 KiB
Markdown
295 lines
9.4 KiB
Markdown
# Using `mlir-opt`
|
|
|
|
`mlir-opt` is a command-line entry point for running passes and lowerings on MLIR code.
|
|
This tutorial will explain how to use `mlir-opt`, show some examples of its usage,
|
|
and mention some useful tips for working with it.
|
|
|
|
Prerequisites:
|
|
|
|
- [Building MLIR from source](/getting_started/)
|
|
- [MLIR Language Reference](/docs/LangRef/)
|
|
|
|
[TOC]
|
|
|
|
## `mlir-opt` basics
|
|
|
|
The `mlir-opt` tool loads a textual IR or bytecode into an in-memory structure,
|
|
and optionally executes a sequence of passes
|
|
before serializing back the IR (textual form by default).
|
|
It is intended as a testing and debugging utility.
|
|
|
|
After building the MLIR project,
|
|
the `mlir-opt` binary (located in `build/bin`)
|
|
is the entry point for running passes and lowerings,
|
|
as well as emitting debug and diagnostic data.
|
|
|
|
Running `mlir-opt` with no flags will consume textual or bytecode IR
|
|
from the standard input, parse and run verifiers on it,
|
|
and write the textual format back to the standard output.
|
|
This is a good way to test if an input MLIR is well-formed.
|
|
|
|
`mlir-opt --help` shows a complete list of flags
|
|
(there are nearly 1000).
|
|
Each pass has its own flag,
|
|
though it is recommended to use `--pass-pipeline`
|
|
to run passes rather than bare flags.
|
|
|
|
## Running a pass
|
|
|
|
Next we run [`convert-to-llvm`](/docs/Passes/#-convert-to-llvm),
|
|
which converts all supported dialects to the `llvm` dialect,
|
|
on the following IR:
|
|
|
|
```mlir
|
|
// mlir/test/Examples/mlir-opt/ctlz.mlir
|
|
module {
|
|
func.func @main(%arg0: i32) -> i32 {
|
|
%0 = math.ctlz %arg0 : i32
|
|
func.return %0 : i32
|
|
}
|
|
}
|
|
```
|
|
|
|
After building MLIR, and from the `llvm-project` base directory, run
|
|
|
|
```bash
|
|
build/bin/mlir-opt --pass-pipeline="builtin.module(convert-math-to-llvm)" mlir/test/Examples/mlir-opt/ctlz.mlir
|
|
```
|
|
|
|
which produces
|
|
|
|
```mlir
|
|
module {
|
|
func.func @main(%arg0: i32) -> i32 {
|
|
%0 = "llvm.intr.ctlz"(%arg0) <{is_zero_poison = false}> : (i32) -> i32
|
|
return %0 : i32
|
|
}
|
|
}
|
|
```
|
|
|
|
Note that `llvm` here is MLIR's `llvm` dialect,
|
|
which would still need to be processed through `mlir-translate`
|
|
to generate LLVM-IR.
|
|
|
|
## Running a pass with options
|
|
|
|
Next we will show how to run a pass that takes configuration options.
|
|
Consider the following IR containing loops with poor cache locality.
|
|
|
|
```mlir
|
|
// mlir/test/Examples/mlir-opt/loop_fusion.mlir
|
|
module {
|
|
func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
|
|
%0 = memref.alloc() : memref<10xf32>
|
|
%1 = memref.alloc() : memref<10xf32>
|
|
%cst = arith.constant 0.000000e+00 : f32
|
|
affine.for %arg2 = 0 to 10 {
|
|
affine.store %cst, %0[%arg2] : memref<10xf32>
|
|
affine.store %cst, %1[%arg2] : memref<10xf32>
|
|
}
|
|
affine.for %arg2 = 0 to 10 {
|
|
%2 = affine.load %0[%arg2] : memref<10xf32>
|
|
%3 = arith.addf %2, %2 : f32
|
|
affine.store %3, %arg0[%arg2] : memref<10xf32>
|
|
}
|
|
affine.for %arg2 = 0 to 10 {
|
|
%2 = affine.load %1[%arg2] : memref<10xf32>
|
|
%3 = arith.mulf %2, %2 : f32
|
|
affine.store %3, %arg1[%arg2] : memref<10xf32>
|
|
}
|
|
return
|
|
}
|
|
}
|
|
```
|
|
|
|
Running this with the [`affine-loop-fusion`](/docs/Passes/#-affine-loop-fusion) pass
|
|
produces a fused loop.
|
|
|
|
```bash
|
|
build/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion)" mlir/test/Examples/mlir-opt/loop_fusion.mlir
|
|
```
|
|
|
|
```mlir
|
|
module {
|
|
func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
|
|
%alloc = memref.alloc() : memref<1xf32>
|
|
%alloc_0 = memref.alloc() : memref<1xf32>
|
|
%cst = arith.constant 0.000000e+00 : f32
|
|
affine.for %arg2 = 0 to 10 {
|
|
affine.store %cst, %alloc[0] : memref<1xf32>
|
|
affine.store %cst, %alloc_0[0] : memref<1xf32>
|
|
%0 = affine.load %alloc_0[0] : memref<1xf32>
|
|
%1 = arith.mulf %0, %0 : f32
|
|
affine.store %1, %arg1[%arg2] : memref<10xf32>
|
|
%2 = affine.load %alloc[0] : memref<1xf32>
|
|
%3 = arith.addf %2, %2 : f32
|
|
affine.store %3, %arg0[%arg2] : memref<10xf32>
|
|
}
|
|
return
|
|
}
|
|
}
|
|
```
|
|
|
|
This pass has options that allow the user to configure its behavior.
|
|
For example, the `fusion-compute-tolerance` option
|
|
is described as the "fractional increase in additional computation tolerated while fusing."
|
|
If this value is set to zero on the command line,
|
|
the pass will not fuse the loops.
|
|
|
|
```bash
|
|
build/bin/mlir-opt --pass-pipeline="builtin.module(affine-loop-fusion{fusion-compute-tolerance=0})" \
|
|
mlir/test/Examples/mlir-opt/loop_fusion.mlir
|
|
```
|
|
|
|
```mlir
|
|
module {
|
|
func.func @producer_consumer_fusion(%arg0: memref<10xf32>, %arg1: memref<10xf32>) {
|
|
%alloc = memref.alloc() : memref<10xf32>
|
|
%alloc_0 = memref.alloc() : memref<10xf32>
|
|
%cst = arith.constant 0.000000e+00 : f32
|
|
affine.for %arg2 = 0 to 10 {
|
|
affine.store %cst, %alloc[%arg2] : memref<10xf32>
|
|
affine.store %cst, %alloc_0[%arg2] : memref<10xf32>
|
|
}
|
|
affine.for %arg2 = 0 to 10 {
|
|
%0 = affine.load %alloc[%arg2] : memref<10xf32>
|
|
%1 = arith.addf %0, %0 : f32
|
|
affine.store %1, %arg0[%arg2] : memref<10xf32>
|
|
}
|
|
affine.for %arg2 = 0 to 10 {
|
|
%0 = affine.load %alloc_0[%arg2] : memref<10xf32>
|
|
%1 = arith.mulf %0, %0 : f32
|
|
affine.store %1, %arg1[%arg2] : memref<10xf32>
|
|
}
|
|
return
|
|
}
|
|
}
|
|
```
|
|
|
|
Options passed to a pass
|
|
are specified via the syntax `{option1=value1 option2=value2 ...}`,
|
|
i.e., use space-separated `key=value` pairs for each option.
|
|
|
|
## Building a pass pipeline on the command line
|
|
|
|
The `--pass-pipeline` flag supports combining multiple passes into a pipeline.
|
|
So far we have used the trivial pipeline with a single pass
|
|
that is "anchored" on the top-level `builtin.module` op.
|
|
[Pass anchoring](/docs/PassManagement/#oppassmanager)
|
|
is a way for passes to specify
|
|
that they only run on particular ops.
|
|
While many passes are anchored on `builtin.module`,
|
|
if you try to run a pass that is anchored on some other op
|
|
inside `--pass-pipeline="builtin.module(pass-name)"`,
|
|
it will not run.
|
|
|
|
Multiple passes can be chained together
|
|
by providing the pass names in a comma-separated list
|
|
in the `--pass-pipeline` string,
|
|
e.g.,
|
|
`--pass-pipeline="builtin.module(pass1,pass2)"`.
|
|
The passes will be run sequentially.
|
|
|
|
To use passes that have nontrivial anchoring,
|
|
the appropriate level of nesting must be specified
|
|
in the pass pipeline.
|
|
For example, consider the following IR which has the same redundant code,
|
|
but in two different levels of nesting.
|
|
|
|
```mlir
|
|
module {
|
|
module {
|
|
func.func @func1(%arg0: i32) -> i32 {
|
|
%0 = arith.addi %arg0, %arg0 : i32
|
|
%1 = arith.addi %arg0, %arg0 : i32
|
|
%2 = arith.addi %0, %1 : i32
|
|
func.return %2 : i32
|
|
}
|
|
}
|
|
|
|
gpu.module @gpu_module {
|
|
gpu.func @func2(%arg0: i32) -> i32 {
|
|
%0 = arith.addi %arg0, %arg0 : i32
|
|
%1 = arith.addi %arg0, %arg0 : i32
|
|
%2 = arith.addi %0, %1 : i32
|
|
gpu.return %2 : i32
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The following pipeline runs `cse` (common subexpression elimination)
|
|
but only on the `func.func` inside the two `builtin.module` ops.
|
|
|
|
```bash
|
|
build/bin/mlir-opt mlir/test/Examples/mlir-opt/ctlz.mlir --pass-pipeline='
|
|
builtin.module(
|
|
builtin.module(
|
|
func.func(cse,canonicalize),
|
|
convert-to-llvm
|
|
)
|
|
)'
|
|
```
|
|
|
|
The output leaves the `gpu.module` alone
|
|
|
|
```mlir
|
|
module {
|
|
module {
|
|
llvm.func @func1(%arg0: i32) -> i32 {
|
|
%0 = llvm.add %arg0, %arg0 : i32
|
|
%1 = llvm.add %0, %0 : i32
|
|
llvm.return %1 : i32
|
|
}
|
|
}
|
|
gpu.module @gpu_module {
|
|
gpu.func @func2(%arg0: i32) -> i32 {
|
|
%0 = arith.addi %arg0, %arg0 : i32
|
|
%1 = arith.addi %arg0, %arg0 : i32
|
|
%2 = arith.addi %0, %1 : i32
|
|
gpu.return %2 : i32
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Specifying a pass pipeline with nested anchoring
|
|
is also beneficial for performance reasons:
|
|
passes with anchoring can run on IR subsets in parallel,
|
|
which provides better threaded runtime and cache locality
|
|
within threads.
|
|
For example,
|
|
even if a pass is not restricted to anchor on `func.func`,
|
|
running `builtin.module(func.func(cse, canonicalize))`
|
|
is more efficient than `builtin.module(cse, canonicalize)`.
|
|
|
|
For a spec of the pass-pipeline textual description language,
|
|
see [the docs](/docs/PassManagement/#textual-pass-pipeline-specification).
|
|
For more general information on pass management, see [Pass Infrastructure](/docs/PassManagement/#).
|
|
|
|
## Useful CLI flags
|
|
|
|
- `--debug` prints all debug information produced by `LLVM_DEBUG` calls.
|
|
- `--debug-only="my-tag"` prints only the debug information produced by `LLVM_DEBUG`
|
|
in files that have the macro `#define DEBUG_TYPE "my-tag"`.
|
|
This often allows you to print only debug information associated with a specific pass.
|
|
- `"greedy-rewriter"` only prints debug information
|
|
for patterns applied with the greedy rewriter engine.
|
|
- `"dialect-conversion"` only prints debug information
|
|
for the dialect conversion framework.
|
|
- `--emit-bytecode` emits MLIR in the bytecode format.
|
|
- `--mlir-pass-statistics` print statistics about the passes run.
|
|
These are generated via [pass statistics](/docs/PassManagement/#pass-statistics).
|
|
- `--mlir-print-ir-after-all` prints the IR after each pass.
|
|
- See also `--mlir-print-ir-after-change`, `--mlir-print-ir-after-failure`,
|
|
and analogous versions of these flags with `before` instead of `after`.
|
|
- When using `print-ir` flags, adding `--mlir-print-ir-tree-dir` writes the
|
|
IRs to files in a directory tree, making them easier to inspect versus a
|
|
large dump to the terminal.
|
|
- `--mlir-timing` displays execution times of each pass.
|
|
|
|
## Further readering
|
|
|
|
- [List of passes](/docs/Passes/)
|
|
- [List of dialects](/docs/Dialects/)
|