Commit Graph

1887 Commits

Author SHA1 Message Date
Matthias Springer
8c4bc1e75d [mlir][Transforms] Merge 1:1 and 1:N type converters (#113032)
The 1:N type converter derived from the 1:1 type converter and extends
it with 1:N target materializations. This commit merges the two type
converters and stores 1:N target materializations in the 1:1 type
converter. This is in preparation of merging the 1:1 and 1:N dialect
conversion infrastructures.

1:1 target materializations (producing a single `Value`) will remain
valid. An additional API is added to the type converter to register 1:N
target materializations (producing a `SmallVector<Value>`). Internally,
all target materializations are stored as 1:N materializations.

The 1:N type converter is removed.

Note for LLVM integration: If you are using the `OneToNTypeConverter`,
simply switch all occurrences to `TypeConverter`.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
2024-10-25 11:44:20 -07:00
Andrea Faulds
9f6c632ecd [mlir][mlir-spirv-cpu-runner] Move MLIR pass pipeline to mlir-opt (#113594)
Adds a new mlir-opt test-only pass, -test-spirv-cpu-runner-pipeline,
which runs the set of MLIR passes needed for the mlir-spirv-cpu-runner,
and removes them from the runner. The tests are changed to invoke
mlir-opt with this flag before running the runner. The eventual goal is
to move all host/device code generation steps out of the runner, like
with some of the other runners.

Recommit of 17e9752267. It was reverted
due to a build failure, but the build failure had in fact already been
fixed in e7302319b5.
2024-10-25 07:21:59 -07:00
Matthias Springer
f18c3e4e73 [mlir][Transforms] Dialect Conversion: Simplify materialization fn result type (#113031)
This commit simplifies the result type of materialization functions.

Previously: `std::optional<Value>`
Now: `Value`

The previous implementation allowed 3 possible return values:
- Non-null value: The materialization function produced a valid
materialization.
- `std::nullopt`: The materialization function failed, but another
materialization can be attempted.
- `Value()`: The materialization failed and so should the dialect
conversion. (Previously: Dialect conversion can roll back.)

This commit removes the last variant. It is not particularly useful
because the dialect conversion will fail anyway if all other
materialization functions produced `std::nullopt`.

Furthermore, in contrast to type conversions, at least one
materialization callback is expected to succeed. In case of a failing
type conversion, the current dialect conversion can roll back and try a
different pattern. This also used to be the case for materializations,
but that functionality was removed with #107109: failed materializations
can no longer trigger a rollback. (They can just make the entire dialect
conversion fail without rollback.) With this in mind, it is even less
useful to have an additional error state for materialization functions.

This commit is in preparation of merging the 1:1 and 1:N type
converters. Target materializations will have to return multiple values
instead of a single one. With this commit, we can keep the API simple:
`SmallVector<Value>` instead of `std::optional<SmallVector<Value>>`.

Note for LLVM integration: All 1:1 materializations should return
`Value` instead of `std::optional<Value>`. Instead of `std::nullopt`
return `Value()`.
2024-10-23 07:29:17 -07:00
lorenzo chelini
34d4f660fe [mlir] Fix the emission of prop-dict when operations have no properties (#112851)
When an operation has no properties, no property struct is emitted. To avoid a compilation error, we should also skip emitting `setPropertiesFromParsedAttr`, `parseProperties` and `printProperties` in such cases.
    
Compilation error:
    
```
    error: ‘Properties’ has not been declared
      static ::llvm::LogicalResult setPropertiesFromParsedAttr(Properties &prop, ::mlir::Attribute attr, ::llvm::function_ref<::mlir::InFlightDiagnostic()> emitError);
    
```
2024-10-21 13:43:55 -07:00
Jakub Kuderski
17e9752267 Revert "[mlir][mlir-spirv-cpu-runner] Move MLIR pass pipeline to mlir-opt" (#113176)
Reverts llvm/llvm-project#111575

This caused build failures:
https://lab.llvm.org/buildbot/#/builders/138/builds/5244
2024-10-21 08:10:22 -07:00
Michael Liao
e7302319b5 [mlir] Fix shared build. NFC 2024-10-21 10:55:17 -04:00
Andrea Faulds
f0312d962d [mlir][mlir-spirv-cpu-runner] Move MLIR pass pipeline to mlir-opt (#111575)
Adds a new mlir-opt test-only pass, -test-spirv-cpu-runner-pipeline,
which runs the set of MLIR passes needed for the mlir-spirv-cpu-runner,
and removes them from the runner. The tests are changed to invoke
mlir-opt with this flag before running the runner. The eventual goal is
to move all host/device code generation steps out of the runner, like
with some of the other runners.
2024-10-21 06:55:40 -07:00
donald chen
4b3f251bad [mlir] [dataflow] unify semantics of program point (#110344)
The concept of a 'program point' in the original data flow framework is
ambiguous. It can refer to either an operation or a block itself. This
representation has different interpretations in forward and backward
data-flow analysis. In forward data-flow analysis, the program point of
an operation represents the state after the operation, while in backward
data flow analysis, it represents the state before the operation. When
using forward or backward data-flow analysis, it is crucial to carefully
handle this distinction to ensure correctness.

This patch refactors the definition of program point, unifying the
interpretation of program points in both forward and backward data-flow
analysis.

How to integrate this patch?

For dense forward data-flow analysis and other analysis (except dense
backward data-flow analysis), the program point corresponding to the
original operation can be obtained by `getProgramPointAfter(op)`, and
the program point corresponding to the original block can be obtained by
`getProgramPointBefore(block)`.

For dense backward data-flow analysis, the program point corresponding
to the original operation can be obtained by
`getProgramPointBefore(op)`, and the program point corresponding to the
original block can be obtained by `getProgramPointAfter(block)`.

NOTE: If you need to get the lattice of other data-flow analyses in
dense backward data-flow analysis, you should still use the dense
forward data-flow approach. For example, to get the Executable state of
a block in dense backward data-flow analysis and add the dependency of
the current operation, you should write:

``getOrCreateFor<Executable>(getProgramPointBefore(op),
getProgramPointBefore(block))``

In case above, we use getProgramPointBefore(op) because the analysis we
rely on is dense backward data-flow, and we use
getProgramPointBefore(block) because the lattice we query is the result
of a non-dense backward data flow computation.

related dsscussion:
https://discourse.llvm.org/t/rfc-unify-the-semantics-of-program-points/80671/8
corresponding PSA:
https://discourse.llvm.org/t/psa-program-point-semantics-change/81479
2024-10-11 21:59:05 +08:00
Benoit Jacob
a9ebdbb5ac [MLIR] Vector: turn the ExtractStridedSlice rewrite pattern from #111541 into a canonicalization (#111614)
This is a reasonable canonicalization because `extract` is more
constrained than `extract_strided_slices`, so there is no loss of
semantics here, just lifting an op to a special-case higher/constrained
op. And the additional `shape_cast` is merely adding leading unit dims
to match the original result type.

Context: discussion on #111541. I wasn't sure how this would turn out,
but in the process of writing this PR, I discovered at least 2 bugs in
the pattern introduced in #111541, which shows the value of shared
canonicalization patterns which are exercised on a high number of
testcases.

---------

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
2024-10-09 09:24:23 -04:00
Benoit Jacob
10054ba4ac [mlir][vector] Add pattern to rewrite contiguous ExtractStridedSlice into Extract (#111541)
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-10-08 11:51:01 -04:00
Matthias Springer
206fad0e21 [mlir][NFC] Mark type converter in populate... functions as const (#111250)
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.

Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.

Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
2024-10-05 21:32:40 +02:00
Aman LaChapelle
759a7b5933 [mlir] Add the ability to define dialect-specific location attrs. (#105584)
This patch adds the capability to define dialect-specific location
attrs. This is useful in particular for defining location structure that
doesn't necessarily fit within the core MLIR location hierarchy, but
doesn't make sense to push upstream (i.e. a custom use case).

This patch adds an AttributeTrait, `IsLocation`, which is tagged onto
all the builtin location attrs, as well as the test location attribute.
This is necessary because previously LocationAttr::classof only returned
true if the attribute was one of the builtin location attributes, and
well, the point of this patch is to allow dialects to define their own
location attributes.

There was an alternate implementation I considered wherein LocationAttr
becomes an AttrInterface, but that was discarded because there are
likely to be *many* locations in a single program, and I was concerned
that forcing every MLIR user to pay the cost of the additional
lookup/dispatch was unacceptable. It also would have been a *much* more
invasive change. It would have allowed for more flexibility in terms of
pretty printing, but it's unclear how useful/necessary that flexibility
would be given how much customizability there already is for attribute
definitions.
2024-10-03 10:25:44 -07:00
Billy Zhu
5b21fd298c [MLIR][Pass] Full & deterministic diagnostics (#110311)
Today, when the pass infra schedules a pass/nested-pipeline on a set of
ops, it exits early as soon as it fails on one of the ops. This leads to
non-exhaustive, and more importantly, non-deterministic error reporting
(under async).

This PR removes the early termination behavior so that all ops have a
chance to run through the current pass/nested-pipeline, and all errors
are reported (async diagnostics are already ordered). This guarantees
deterministic & full error reporting. As a result, it's also no longer
necessary to -split-input-file with one error per split when testing
with -verify-diagnostics.
2024-10-01 19:07:52 -07:00
Andrea Faulds
a800ffac41 [mlir][gpu] Disjoint patterns for lowering clustered subgroup reduce (#109158)
Making the existing populateGpuLowerSubgroupReduceToShufflePatterns()
function also cover the new "clustered" subgroup reductions is proving
to be inconvenient, because certain backends may have more specific
lowerings that only cover the non-clustered type, and this creates pass
ordering constraints. This commit removes coverage of clustered
reductions from this function in favour of a new separate function,
which makes controlling the lowering much more straightforward.
2024-09-18 15:55:53 -04:00
Andrea Faulds
fd26f8444a [mlir][gpu] Rename two misspelled pattern population functions (#109015) 2024-09-17 15:26:14 -04:00
MaheshRavishankar
d5f0969c96 [mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882)
Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of tiled/tiled+fused operations to see if they are
produced by `extract_slice` operations to populate the worklist used to
continue fusion. This implicit assumption does not always work. Instead
make the implementations of `getTiledImplementation` return the slices
to use to continue fusion.

This is a breaking change

- To continue to get the same behavior of
`scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree
implementation of `TilingInterface::getTiledImplementation` to return
the slices to continue fusion on. All in-tree implementations have been
adapted to this.
- This change touches parts that required a simplification to the
`ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a
`std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that
should be `std::nullopt` if fusion is not to be performed.

Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
2024-09-11 22:15:43 -07:00
Amy Wang
6634d44e5e [MLIR][Transform] Allow stateInitializer and stateExporter for applyTransforms (#101186)
This is discussed in RFC:

https://discourse.llvm.org/t/rfc-making-the-constructor-of-the-transformstate-class-protected/80377
2024-09-09 10:57:13 -04:00
Matthias Springer
3815f478bb [mlir][Transforms] Dialect conversion: Make materializations optional (#107109)
This commit makes source/target/argument materializations (via the
`TypeConverter` API) optional.

By default (`ConversionConfig::buildMaterializations = true`), the
dialect conversion infrastructure tries to legalize all unresolved
materializations right after the main transformation process has
succeeded. If at least one unresolved materialization fails to resolve,
the dialect conversion fails. (With an error message such as `failed to
legalize unresolved materialization ...`.) Automatic materializations
through the `TypeConverter` API can now be deactivated. In that case,
every unresolved materialization will show up as a
`builtin.unrealized_conversion_cast` op in the output IR.

There used to be a complex and error-prone analysis in the dialect
conversion that predicted the future uses of unresolved
materializations. Based on that logic, some casts (that were deemed to
unnecessary) were folded. This analysis was needed because folding
happened at a point of time when some IR changes (e.g., op replacements)
had not materialized yet.

This commit removes that analysis. Any folding of cast ops now happens
after all other IR changes have been materialized and the uses can
directly be queried from the IR. This simplifies the analysis
significantly. And certain helper data structures such as
`inverseMapping` are no longer needed for the analysis. The folding
itself is done by `reconcileUnrealizedCasts` (which also exists as a
standalone pass).

After casts have been folded, the remaining casts are materialized
through the `TypeConverter`, as usual. This last step can be deactivated
in the `ConversionConfig`.

`ConversionConfig::buildMaterializations = false` can be used to debug
error messages such as `failed to legalize unresolved materialization
...`. (It is also useful in case automatic materializations are not
needed.) The materializations that failed to resolve can then be seen as
`builtin.unrealized_conversion_cast` ops in the resulting IR. (This is
better than running with `-debug`, because `-debug` shows IR where some
IR changes have not been materialized yet.)

Note: This is a reupload of #104668, but with correct handling of cyclic
unrealized_conversion_casts that may be generated by the dialect
conversion.
2024-09-05 19:40:58 +02:00
SJW
ebf0599314 [MLIR][SCF] Add support for loop pipeline peeling for dynamic loops. (#106436)
Allow speculative execution and predicate results per stage.
2024-09-04 12:24:58 -07:00
donald chen
b6603e1bf1 [mlir] [dataflow] Refactoring the definition of program points in data flow analysis (#105656)
This patch distinguishes between program points and lattice anchors in
data flow analysis, where lattice anchors represent locations where a
lattice can be attached, while program points denote points in program
execution.

Related discussions:
https://discourse.llvm.org/t/rfc-unify-the-semantics-of-program-points/80671/8
2024-08-25 19:21:47 +08:00
MaheshRavishankar
4dbaef6d5e [mlir][Linalg] Avoid doing op replacement in linalg::dropUnitDims. (#105749)
It is better to do the replacement in the caller. This avoids the
footgun if the caller needs the original operation. Instead return the
produced operation and replacement values.

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2024-08-23 13:43:33 -07:00
Théo Degioanni
b084111c8e [mlir][mem2reg] Fix Mem2Reg attempting to promote in graph regions (#104910)
Mem2Reg assumes SSA dependencies but did not check for graph regions.
This fixes it.

---------

Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
2024-08-23 15:15:10 +02:00
Ivan Butygin
15e915a44f [mlir][dataflow] Propagate errors from visitOperation (#105448)
Base `DataFlowAnalysis::visit` returns `LogicalResult`, but wrappers's
Sparse/Dense/Forward/Backward `visitOperation` doesn't.

Sometimes it's needed to abort solver early if some unrecoverable
condition detected inside analysis.

Update `visitOperation` to return `LogicalResult` and propagate it to
`solver.initializeAndRun()`. Only `visitOperation` is updated for now,
it's possible to update other hooks like `visitNonControlFlowArguments`,
bit it's not needed immediately and let's keep this PR small.

Hijacked `UnderlyingValueAnalysis` test analysis to test it.
2024-08-22 12:16:03 +03:00
Andrzej Warzyński
42944da5ba [mlir][vector] Group re-order patterns together (#102856)
Group all patterns that re-order vector.transpose and vector.broadcast
Ops (*) under `populateSinkVectorOpsPatterns`. These patterns are
normally used to "sink" redundant Vector Ops, hence grouping together.
Example:

```mlir
%at = vector.transpose %a, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%bt = vector.transpose %b, [1, 0]: vector<4x2xf32> to vector<2x4xf32>
%r = arith.addf %at, %bt : vector<2x4xf32>
```
would get converted to:
```mlir
%0 = arith.addf %a, %b : vector<4x2xf32>
%r = vector.transpose %0, [1, 0] : vector<2x4xf32>
```

This patch also moves all tests for these patterns so that all of them
are:
  * run under one test-flag: `test-vector-sink-patterns`,
  * located in one file: "vector-sink.mlir".

To facilitate this change:
  * `-test-sink-vector-broadcast` is renamed as
    `test-vector-sink-patterns`,
  * "sink-vector-broadcast.mlir" is renamed as "vector-sink.mlir",
  * tests for `ReorderCastOpsOnBroadcast` and
    `ReorderElementwiseOpsOnTranspose` patterns are moved from
    "vector-reduce-to-contract.mlir" to "vector-sink.mlir",
  * `ReorderElementwiseOpsOnTranspose` patterns are removed from
    `populateVectorReductionToContractPatterns` and added to (newly
    created) `populateSinkVectorOpsPatterns`,
  * `ReorderCastOpsOnBroadcast` patterns are removed from
    `populateVectorReductionToContractPatterns` - these are already
    present in `populateSinkVectorOpsPatterns`.

This should allow us better layering and more straightforward testing.
For the latter, the goal is to be able to easily identify which pattern
a particular test is exercising (especially when it's a specific
pattern).

NOTES FOR DOWNSTREAM USERS

In order to preserve the current functionality, please make sure to add
  * `populateSinkVectorOpsPatterns`,

wherever you are using `populateVectorReductionToContractPatterns`.
Also, rename `populateSinkVectorBroadcastPatterns` as
`populateSinkVectorOpsPatterns`.

(*) I didn't notice any other re-order patterns.
2024-08-16 16:53:53 +01:00
Ian Wood
a95ad2da36 [mlir] Add bubbling patterns for non intersecting reshapes (#103401)
Refactored @Max191's PR https://github.com/llvm/llvm-project/pull/94637
to move it to `Tensor`

From the original PR
>This PR adds fusion by expansion patterns to push a tensor.expand_shape
up through a tensor.collapse_shape with non-intersecting reassociations.
Sometimes parallel collapse_shape ops like this can block propagation of
expand_shape ops, so this allows them to pass through each other.

I'm not sure if I put the code/tests in the right places, so let me know
where those go if they aren't.

cc @MaheshRavishankar @hanhanW

---------

Co-authored-by: Max Dawkins <max.dawkins@gmail.com>
2024-08-14 13:58:35 -07:00
Frank Schlimbach
baabcb2898 [mlir][mesh] Shardingcontrol (#102598)
This is a fixed copy of #98145 (necessary after it got reverted).

@sogartar @yaochengji
This PR adds the following to #98145:
- `UpdateHaloOp` accepts a `memref` (instead of a tensor) and not
returning a result to clarify its inplace-semantics
- `UpdateHaloOp` accepts `split_axis` to allow multiple mesh-axes per
tensor/memref-axis (similar to `mesh.sharding`)
- The implementation of `Shardinginterface` for tensor operation
(`tensor.empty` for now) moved from the tensor library to the mesh
interface library. `spmdize` uses features from `mesh` dialect.
@rengolin agreed that `tensor` should not depend on `mesh` so this
functionality cannot live in a `tensor`s lib. The unfulfilled dependency
caused the issues leading to reverting #98145. Such cases are generally
possible and might lead to re-considering the current structure (like
for tosa ops).
- rebased onto latest main
--------------------------
Replacing `#mesh.sharding` attribute with operation `mesh.sharding`
- extended semantics now allow providing optional `halo_sizes` and
`sharded_dims_sizes`
- internally a sharding is represented as a non-IR class
`mesh::MeshSharding`

What previously was
```mlir
%sharded0 = mesh.shard %arg0 <@mesh0, [[0]]> : tensor<4x8xf32>
%sharded1 = mesh.shard %arg1 <@mesh0, [[0]]> annotate_for_users : tensor<16x8xf32>
```
is now
```mlir
%sharding = mesh.sharding @mesh0, [[0]] : !mesh.sharding
%0 = mesh.shard %arg0 to %sharding : tensor<4x8xf32>
%1 = mesh.shard %arg1 to %sharding annotate_for_users : tensor<16x8xf32>
```
and allows additional annotations to control the shard sizes:
```mlir
mesh.mesh @mesh0 (shape = 4)
%sharding0 = mesh.sharding @mesh0, [[0]] halo_sizes = [1, 2] : !mesh.sharding
%0 = mesh.shard %arg0 to %sharding0 : tensor<4x8xf32>
%sharding1 = mesh.sharding @mesh0, [[0]] sharded_dims_sizes = [3, 5, 5, 3] : !mesh.sharding
%1 = mesh.shard %arg1 to %sharding1 annotate_for_users : tensor<16x8xf32>
```
- `mesh.shard` op accepts additional optional attribute `force`, useful
for halo updates
- Some initial spmdization support for the new semantics
- Support for `tensor.empty` reacting on `sharded_dims_sizes` and
`halo_sizes` in the sharding
- New collective operation `mesh.update_halo` as a spmdized target for
shardings with `halo_sizes`

---------

Co-authored-by: frank.schlimbach <fschlimb@smtp.igk.intel.com>
Co-authored-by: Jie Fu <jiefu@tencent.com>
2024-08-12 12:20:58 +01:00
Nikhil Kalra
165c6d1251 [mlir] Add support for parsing nested PassPipelineOptions (#101118)
- Added a default parsing implementation to `PassOptions` to allow
`Option`/`ListOption` to wrap PassOption objects. This is helpful when
creating meta-pipelines (pass pipelines composed of pass pipelines).
- Updated `ListOption` printing to enable round-tripping the output of
`dump-pass-pipeline` back into `mlir-opt` for more complex structures.
2024-08-09 13:54:00 -07:00
Matthias Springer
7359a6b799 [mlir][ODS] Verify type constraints in Types and Attributes (#102326)
When a type/attribute is defined in TableGen, a type constraint can be
used for parameters, but the type constraint verification was missing.

Example:
```
def TestTypeVerification : Test_Type<"TestTypeVerification"> {
  let parameters = (ins AnyTypeOf<[I16, I32]>:$param);
  // ...
}
```

No verification code was generated to ensure that `$param` is I16 or
I32.

When type constraints a present, a new method will generated for types
and attributes: `verifyInvariantsImpl`. (The naming is similar to op
verifiers.) The user-provided verifier is called `verify` (no change).
There is now a new entry point to type/attribute verification:
`verifyInvariants`. This function calls both `verifyInvariantsImpl` and
`verify`. If neither of those two verifications are present, the
`verifyInvariants` function is not generated.

When a type/attribute is not defined in TableGen, but a verifier is
needed, users can implement the `verifyInvariants` function. (This
function was previously called `verify`.)

Note for LLVM integration: If you have an attribute/type that is not
defined in TableGen (i.e., just C++), you have to rename the
verification function from `verify` to `verifyInvariants`. (Most
attributes/types have no verification, in which case there is nothing to
do.)

Depends on #102657.
2024-08-09 22:04:40 +02:00
Benjamin Maxwell
9b06e25e73 [mlir][vector] Add mask elimination transform (#99314)
This adds a new transform `eliminateVectorMasks()` which aims at
removing scalable `vector.create_masks` that will be all-true at
runtime. It attempts to do this by simply pattern-matching the mask
operands (similar to some canonicalizations), if that does not lead to
an answer (is all-true? yes/no), then value bounds analysis will be used
to find the lower bound of the unknown operands. If the lower bound is
>= to the corresponding mask vector type dim, then that dimension of the
mask is all true.

Note that the pattern matching prevents expensive value-bounds analysis
in cases where the mask won't be all true.

For example:
```mlir
%mask = vector.create_mask %dynamicValue, %c2 : vector<8x4xi1>
```
From looking at `%c2` we can tell this is not going to be an all-true
mask, so we don't need to run the value-bounds analysis for
`%dynamicValue` (and can exit the transform early).

Note: Eliminating create_masks here means replacing them with all-true
constants (which will then lead to the masks folding away).
2024-08-09 10:51:49 +01:00
Diego Caballero
2ac2e9a5b6 [mlir][LLVM] Improve lowering of llvm.byval function arguments (#100028)
When a function argument is annotated with the `llvm.byval` attribute,
[LLVM expects](https://llvm.org/docs/LangRef.html#parameter-attributes)
the function argument type to be an `llvm.ptr`. For example:

```
func.func (%args0 : llvm.ptr {llvm.byval = !llvm.struct<(i32)>} {
  ...
}
```

Unfortunately, this makes the type conversion context-dependent, which
is something that the type conversion infrastructure (i.e.,
`LLVMTypeConverter` in this particular case) doesn't support. For
example, we may want to convert `MyType` to `llvm.struct<(i32)>` in
general, but to an `llvm.ptr` type only when it's a function argument
passed by value.

To fix this problem, this PR changes the FuncToLLVM conversion logic to
generate an `llvm.ptr` when the function argument has a `llvm.byval`
attribute. An `llvm.load` is inserted into the function to retrieve the
value expected by the argument users.
2024-08-08 19:27:54 -07:00
Renato Golin
3968942f10 Revert "[mlir][mesh] adding shard-size control (#98145)"
This reverts commit fca69838ca.

Also reverts the fixup: "[mlir] Fix -Wunused-variable in MeshOps.cpp (NFC)"

This reverts commit fc737368fe.
2024-08-07 15:12:37 +01:00
Frank Schlimbach
fca69838ca [mlir][mesh] adding shard-size control (#98145)
- Replacing `#mesh.sharding` attribute with operation `mesh.sharding`
- extended semantics now allow providing optional `halo_sizes` and
`sharded_dims_sizes`
- internally a sharding is represented as a non-IR class
`mesh::MeshSharding`

What previously was
```mlir
%sharded0 = mesh.shard %arg0 <@mesh0, [[0]]> : tensor<4x8xf32>
%sharded1 = mesh.shard %arg1 <@mesh0, [[0]]> annotate_for_users : tensor<16x8xf32>
```
is now
```mlir
%sharding = mesh.sharding @mesh0, [[0]] : !mesh.sharding
%0 = mesh.shard %arg0 to %sharding : tensor<4x8xf32>
%1 = mesh.shard %arg1 to %sharding annotate_for_users : tensor<16x8xf32>
```
and allows additional annotations to control the shard sizes:
```mlir
mesh.mesh @mesh0 (shape = 4)
%sharding0 = mesh.sharding @mesh0, [[0]] halo_sizes = [1, 2] : !mesh.sharding
%0 = mesh.shard %arg0 to %sharding0 : tensor<4x8xf32>
%sharding1 = mesh.sharding @mesh0, [[0]] sharded_dims_sizes = [3, 5, 5, 3] : !mesh.sharding
%1 = mesh.shard %arg1 to %sharding1 annotate_for_users : tensor<16x8xf32>
```
- `mesh.shard` op accepts additional optional attribute `force`, useful
for halo updates
- Some initial spmdization support for the new semantics
- Support for `tensor.empty` reacting on `sharded_dims_sizes` and
`halo_sizes` in the sharding
- New collective operation `mesh.update_halo` as a spmdized target for
shardings with `halo_sizes`

@sogartar @yaochengji
2024-08-07 13:34:57 +01:00
Nikhil Kalra
84cc1865ef [mlir] Support DialectRegistry extension comparison (#101119)
`PassManager::run` loads the dependent dialects for each pass into the
current context prior to invoking the individual passes. If the
dependent dialect is already loaded into the context, this should be a
no-op. However, if there are extensions registered in the
`DialectRegistry`, the dependent dialects are unconditionally registered
into the context.

This poses a problem for dynamic pass pipelines, however, because they
will likely be executing while the context is in an immutable state
(because of the parent pass pipeline being run).

To solve this, we'll update the extension registration API on
`DialectRegistry` to require a type ID for each extension that is
registered. Then, instead of unconditionally registered dialects into a
context if extensions are present, we'll check against the extension
type IDs already present in the context's internal `DialectRegistry`.
The context will only be marked as dirty if there are net-new extension
types present in the `DialectRegistry` populated by
`PassManager::getDependentDialects`.

Note: this PR removes the `addExtension` overload that utilizes
`std::function` as the parameter. This is because `std::function` is
copyable and potentially allocates memory for the contained function so
we can't use the function pointer as the unique type ID for the
extension.

Downstream changes required:
- Existing `DialectExtension` subclasses will need a type ID to be
registered for each subclass. More details on how to register a type ID
can be found here:
8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)
- Existing uses of the `std::function` overload of `addExtension` will
need to be refactored into dedicated `DialectExtension` classes with
associated type IDs. The attached `std::function` can either be inlined
into or called directly from `DialectExtension::apply`.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-06 01:32:36 +02:00
Kazu Hirata
5262865aac [mlir] Construct SmallVector with ArrayRef (NFC) (#101896) 2024-08-04 11:43:05 -07:00
MaheshRavishankar
6740d701bd [mlir][Linalg] Deprecate linalg::tileToForallOp and linalg::tileToForallOpUsingTileSizes (#91878)
The implementation of these methods are legacy and they are removed in
favor of using the `scf::tileUsingSCF` methods as replacements. To get
the latter on par with requirements of the deprecated methods, the
tiling allows one to specify the maximum number of tiles to use instead
of specifying the tile sizes. When tiling to `scf.forall` this
specification is used to generate the `num_threads` version of the
operation.

A slight deviation from previous implementation is that the deprecated
method always generated the `num_threads` variant of the `scf.forall`
operation. Instead now this is driven by the tiling options specified.
This reduces the indexing math generated when the tile sizes are
specified.

**Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`**

```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> numThreads;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping);
```

can be replaced by
```
scf::SCFTilingOptions options;
options.setNumThreads(numThreads);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```

This generates the `numThreads` version of the `scf.forall` for the
inter-tile loops, i.e.

```
... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...)
```

**Moving from `linalg::tileToForallOpUsingTileSizes` to
`scf::tileUsingSCF`**

```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> tileSizes;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping);
```

can be replaced by
```
scf::SCFTilingOptions options;
options.setTileSizes(tileSizes);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```

Also note that `linalg::tileToForallOpUsingTileSizes` would effectively
call the `linalg::tileToForallOp` by computing the `numThreads` from the
`op` and `tileSizes` and generate the `numThreads` version of the
`scf.forall`. That is not the case anymore. Instead this will directly
generate the `tileSizes` version of the `scf.forall` op

```
... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...)
```

If you actually want to use the `numThreads` version, it is upto the
caller to compute the `numThreads` and set `options.setNumThreads`
instead of `options.setTileSizes`. Note that there is a slight
difference in the num threads version and tile size version. The former
requires an additional `affine.max` on the tile size to ensure
non-negative tile sizes. When lowering to `numThreads` version this
`affine.max` is not needed since by construction the tile sizes are
non-negative. In previous implementations, the `numThreads` version
generated when using the `linalg::tileToForallOpUsingTileSizes` method
would avoid generating the `affine.max` operation. To get the same
state, downstream users will have to additionally normalize the
`scf.forall` operation.

**Changes to `transform.structured.tile_using_forall`**

The transform dialect op that called into `linalg::tileToForallOp` and
`linalg::tileToForallOpUsingTileSizes` have been modified to call
`scf::tileUsingSCF`. The transform dialect op always generates the
`numThreads` version of the `scf.forall` op. So when `tile_sizes` are
specified for the transform dialect op, first the `tile_sizes` version
of the `scf.forall` is generated by the `scf::tileUsingSCF` method which
is then further normalized to get back to the same state. So there is no
functional change to `transform.structured.tile_using_forall`. It always
generates the `numThreads` version of the `scf.forall` op (as it did
before this change).

---------

Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
2024-07-31 12:32:07 -07:00
Matthias Springer
8fc329421b [mlir][Transforms] Dialect conversion: Add missing "else if" branch (#101148)
This code got lost in #97213 and there was no test for it. Add it back
with an MLIR test.

When a pattern is run without a type converter, we can assume that the
new block argument types of a signature conversion are legal. That's
because they were specified by the user. This won't work for 1->N
conversions due to limitations in the dialect conversion infrastructure,
so the original `FIXME` has to stay in place.
2024-07-30 16:36:47 +02:00
Victor Perez
e8f07cdb57 [MLIR][SCF] Define -scf-rotate-while pass (#99850)
Define SCF dialect patterns rotating `scf.while` loops leveraging
existing `mlir::scf::wrapWhileLoopInZeroTripCheck`. `forceCreateCheck`
is always `false` as the pattern would lead to an infinite recursion
otherwise.

This pattern rotates `scf.while` ops, mutating them from "while" loops to
"do-while" loops. A guard checking the condition for the first iteration
is inserted. Note this guard can be optimized away if the compiler can
prove the loop will be executed at least once.

Using this pattern, the following while loop:

```mlir
scf.while (%arg0 = %init) : (i32) -> i64 {
  %val = .., %arg0 : i64
  %cond = arith.cmpi .., %arg0 : i32
  scf.condition(%cond) %val : i64
} do {
^bb0(%arg1: i64):
  %next = .., %arg1 : i32
  scf.yield %next : i32
}
```

Can be transformed into:

``` mlir
%pre_val = .., %init : i64
%pre_cond = arith.cmpi .., %init : i32
scf.if %pre_cond -> i64 {
  %res = scf.while (%arg1 = %va0) : (i64) -> i64 {
    // Original after block
    %next = .., %arg1 : i32
    // Original before block
    %val = .., %next : i64
    %cond = arith.cmpi .., %next : i32
    scf.condition(%cond) %val : i64
  } do {
  ^bb0(%arg2: i64):
    %scf.yield %arg2 : i32
  }
  scf.yield %res : i64
} else {
  scf.yield %pre_val : i64
}
```

The test pass for `wrapWhileLoopInZeroTripCheck` has been modified to
use the new pattern when `forceCreateCheck=false`.

---------

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
2024-07-30 10:06:01 +02:00
Krzysztof Drewniak
8955e285e1 [mlir] Add property combinators, initial ODS support (#94732)
While we have had a Properties.td that allowed for defining
non-attribute-backed properties, such properties were not plumbed
through the basic autogeneration facilities available to attributes,
forcing those who want to migrate to the new system to write such code
by hand.

## Potentially breaking changes

- The `setFoo()` methods on `Properties` struct no longer take their
inputs by const reference. Those wishing to pass non-owned values of a
property by reference to constructors and setters should set the
interface type to `const [storageType]&`
- Adapters and operations now define getters and setters for properties
listed in ODS, which may conflict with custom getters.
- Builders now include properties listed in ODS specifications,
potentially conflicting with custom builders with the same type
signature.

## Extensions to the `Property` class

This commit  adds several fields to the `Property` class, including:
- `parser`, `optionalParser`, and `printer` (for parsing/printing
properties of a given type in ODS syntax)
- `storageTypeValueOverride`, an extension of `defaultValue` to allow
the storage and interface type defaults to differ
- `baseProperty` (allowing for classes like `DefaultValuedProperty`)

Existing fields have also had their documentation comments updated.

This commit does not add a `PropertyConstraint` analogous to
`AttrConstraint`, but this is a natural evolution of the work here.

This commit also adds the concrete property kinds `I32Property`,
`I64Property`, `UnitProperty` (and special handling for it like for
UnitAttr), and `BoolProperty`.

## Property combinators

`Properties.td` also now includes several ways to combine properties.

One is `ArrayProperty<Property elem>`, which now stores a
variable-length array of some property as
`SmallVector<elem.storageType>` and uses `ArrayRef<elem.storageType>` as
its interface type. It has `IntArrayProperty` subclasses that change its
conversion to attributes to use `DenseI[N]Attr`s instead of an
`ArrayAttr`.

Similarly, `OptionalProperty<Property p>` wraps a property's storage in
`std::optional<>` and adds a `std::nullopt` default value. In the case
where the underlying property can be parsed optionally but doesn't have
its own default value, `OptionalProperty` can piggyback off the optional
parser to produce a cleaner syntax, as opposed to its general form,
which is either `none` or `some<[value]>`.

(Note that `OptionalProperty` can be nested if desired).

  ## Autogeneration changes

Operations and adaptors now support getters and setters for properties
like those for attributes. Unlike for attributes, there aren't separate
value and attribute forms, since there is no `FooAttr()` available for a
`getFooAttr()` to return.

The largest change is to operation formats. Previously, properties could
only be used in custom directives. Now, they can be used anywhere an
attribute could be used, and have parsers and printers defined in their
tablegen records.

These updates include special `UnitProperty` logic like that used for
`UnitAttr`.

## Misc.

Some attempt has been made to test the new functionality.

This commit takes tentative steps towards updating the documentation to
account for properties. A full update will be in order once any followup
work has been completed and the interfaces have stabilized.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
2024-07-26 09:35:06 -05:00
Matthias Springer
684a5a30e1 [mlir][Transforms] Dialect conversion: fix crash when converting detached region (#100633)
This commit fixes a crash in the dialect conversion when applying a
signature conversion to a block inside of a detached region.

This fixes an issue reported in
4114d5be87 (r1691809730).
2024-07-25 22:14:15 +02:00
weiwei chen
12dba4d484 [mlir] Add metadata to Diagnostic. (#99398)
Add metadata to Diagnostic. 

Motivation: we have a use case where we want to do some filtering in our
customized Diagnostic Handler based on some customized info that is not
`location` or `severity` or `diagnostic arguments` that are member
variables of `Diagnostic`. Specifically, we want to add a unique ID to
the `Diagnostic` for the handler to filter in a compiler pass that emits
errors in async tasks with multithreading and the diagnostic handling is
associated to the task.

This patch adds a field of `metadata` to `mlir::Diagnostics` as a
general solution. `metadata` is of type `SmallVector<DiagnosticArgument,
0>` to save memory size and reuse existing `DiagnosticArgument` for
metadata type.
2024-07-25 10:01:46 -04:00
Angel Zhang
f83950ab8d [mlir][spirv] Implement vector unrolling for convert-to-spirv pass (#100138)
### Description
This PR builds on #99872. It implements a minimal version of function
body vector unrolling to convert vector types into 1D and with a size
supported by SPIR-V (2, 3 or 4 depending on the original dimension). The
ops that are currently supported include those with elementwise traits
(e.g. `arith.addi`), `vector.reduction` and `vector.transpose`. This PR
also includes new LIT tests that only check for vector unrolling.

### Future Plans
- Support more ops

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-07-24 10:41:19 -04:00
quartersdg
690dc4eff1 Add AsmParser::parseDecimalInteger. (#96255)
An attribute parser needs to parse lists of possibly negative integers
separated by x in a way which is foiled by parseInteger handling hex
formats and parseIntegerInDimensionList does not allow negatives.

---------

Co-authored-by: Jacques Pienaar <jpienaar@google.com>
2024-07-23 20:12:40 -07:00
Hsiangkai Wang
27ee33d136 [mlir][linalg] Decompose winograd operators (#96183)
Convert Linalg winograd_filter_transform, winograd_input_transform, and
winograd_output_transform into nested loops with matrix multiplication
with constant transform matrices.

Support several configurations of Winograd Conv2D, including F(2, 3),
F(4, 3) and F(2, 5). These configurations show that the implementation
can support different kernel size (3 and 5) and different output size
(2 and 4). Besides symetric kernel size 3x3 and 5x5, this patch also
supports 1x3, 3x1, 1x5, and 5x1 kernels.

The implementation is based on the paper, Fast Algorithm for
Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308)

Reviewers: ftynse, Max191, GeorgeARM, nicolasvasilache, MaheshRavishankar, dcaballe, rengolin

Reviewed By: ftynse, Max191

Pull Request: https://github.com/llvm/llvm-project/pull/96183
2024-07-18 06:04:53 +01:00
Angel Zhang
6867e49fc8 [mlir][spirv] Implement vector type legalization for function signatures (#98337)
### Description
This PR implements a minimal version of function signature conversion to
unroll vectors into 1D and with a size supported by SPIR-V (2, 3 or 4
depending on the original dimension). This PR also includes new unit
tests that only check for function signature conversion.

### Future Plans
- Check for capabilities that support vectors of size 8 or 16.
- Set up `OneToNTypeConversion` and `DialectConversion` to replace the
current implementation that uses `GreedyPatternRewriteDriver`.
- Introduce other vector unrolling patterns to cancel out the
`vector.insert_strided_slice` and `vector.extract_strided_slice` ops and
fully legalize the vector types in the function body.
- Handle `func::CallOp` and declarations.
- Restructure the code in `SPIRVConversion.cpp`.
- Create test passes for testing sets of patterns in isolation.
- Optimize the way original shape is splitted into target shapes, e.g.
`vector<5xi32>` can be splitted into `vector<4xi32>` and
`vector<1xi32>`.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-07-17 13:09:15 -04:00
Hsiangkai Wang
7d246e84a4 [mlir][linalg] Implement Conv2D using Winograd Conv2D algorithm (#96181)
Define high level winograd operators and convert conv_2d_nhwc_fhwc into
winograd operators. According to Winograd Conv2D algorithm, we need
three transform operators for input, filter, and output transformation.

The formula of Winograd Conv2D algorithm is

Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A

filter transform: G x g x G^T
input transform: B^T x d x B
output transform: A^T x y x A

The implementation is based on the paper, Fast Algorithm for
Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308)

Reviewers: stellaraccident, ftynse, Max191, GeorgeARM, cxy-1993, nicolasvasilache, MaheshRavishankar, dcaballe, rengolin

Reviewed By: ftynse, Max191, stellaraccident

Pull Request: https://github.com/llvm/llvm-project/pull/96181
2024-07-10 07:30:45 +01:00
Han-Chung Wang
04fc471f48 [mlir][linalg] Switch to use OpOperand* in ControlPropagationFn. (#96697)
It's not easy to determine whether we want to propagate pack/unpack ops
because we don't know the (producer, consumer) information. The
revisions switch it to `OpOperand*`, so the control function can capture
the (producer, consumer) pair. E.g.,

```
Operation *producer = opOperand->get().getDefiningOp();
Operation *consumer = opOperand->getOwner();
```
2024-07-08 09:53:09 -07:00
Jeremy Kun
07c157a435 [mlir] load dialect in parser for optional parameters (#96667)
https://github.com/llvm/llvm-project/pull/96242 fixed an issue where the
auto-generated parsers were not loading dialects whose namespaces are
not present in the textual IR. This required the attribute parameter to
be a tablegen def with its dialect information attached.

This fails when using parameter wrapper classes like
`OptionalParameter`. This came up because `RingAttr` uses
`OptionalParameter` for its second and third attributes.
`OptionalParameter` takes as input the C++ type as a string instead of
the tablegen def, and so it doesn't have a dialect member value to
trigger the fix from https://github.com/llvm/llvm-project/pull/96242.
The docs on this topic say the appropriate solution as overloading
`FieldParser` for a particular type.

This PR updates `FieldParser` for generic attributes to load the dialect
on demand. This requires `mlir-tblgen` to emit a `dialectName` static
field on the generated attribute class, and check for it with template
metaprogramming, since not all attribute types go through `mlir-tblgen`.

---------

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2024-07-07 09:44:07 -07:00
Ramkumar Ramachandra
db791b278a mlir/LogicalResult: move into llvm (#97309)
This patch is part of a project to move the Presburger library into
LLVM.
2024-07-02 10:42:33 +01:00
Théo Degioanni
69d3793ffb [mlir][sroa] Update name of subelement types in destructurable slots (#97226)
The `elementPtrs` has changed meaning over time and the name is now
outdated which may be confusing. This PR updates it to a name
representative of current usage.
2024-06-30 20:24:56 +02:00
srcarroll
431213c99d [mlir][linalg] Implement patterns for reducing rank of named linalg contraction ops (#95710)
This patch introduces pattern rewrites for reducing the rank of named
linalg contraction ops with unit spatial dim(s) to other named
contraction ops. For example `linalg.batch_matmul` with batch size 1 ->
`linalg.matmul` and `linalg.matmul` with unit LHS spatial dim ->
`linalg.vecmat`, etc. These patterns don't support reducing the rank
along reduction dimension as those don't convert to other named
contraction ops.
2024-06-24 13:06:31 -05:00