Commit Graph

274 Commits

Author SHA1 Message Date
Guray Ozen
5caae72d1a [mlir][gpu] Productize test-lower-to-nvvm as gpu-lower-to-nvvm (#75775)
The `test-lower-to-nvvm` pipeline serves as the common and proper
pipeline for nvvm+host compilation, and it's used across our CUDA
integration tests.

This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm`
and moves it within `InitAllPasses.h`. The aim is to call it from
Python, also having a standardize compilation process for nvvm.
2023-12-19 08:40:46 +01:00
Stella Laurenzo
8eff570482 Add missing dep on MLIRToLLVMIRTranslationRegistration to mlir-opt. (#75111)
I was not able to fully triage why this just started failing on one of
our bots as it seems that the use was added 4 months ago. I would assume
that it was accidentally coming in transitively in some way as the dep
was definitely missing.

For context, this started failing in [our
byo_llvm](https://github.com/openxla/iree/blob/main/build_tools/llvm/byo_llvm.sh)
build on a stock build of MLIR on top of an existing LLVM. We were
getting:

```
ld.lld: error: undefined symbol: mlir::registerSPIRVDialectTranslation(mlir::DialectRegistry&)                                                        >>> referenced by mlir-opt.cpp
>>>               tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o:(main)
```
2023-12-12 14:10:06 -08:00
Boian Petkantchin
4b3446771f [mlir][mesh] Add endomorphism simplification for all-reduce (#73150)
Does transformations like
all_reduce(x) + all_reduce(y) -> all_reduce(x + y)

max(all_reduce(x), all_reduce(y)) -> all_reduce(max(x, y))
when the all_reduce element-wise op is max.

Added general rewrite pattern HomomorphismSimplification and
EndomorphismSimplification that encapsulate the general algorithm.
Made specialization for all-reduce with respect to
addf, addi, minsi, maxsi, minimumf and maximumf
in the Arithmetic dialect.
2023-12-12 10:21:52 -08:00
Matteo Franciolini
7ad9e9dcf5 [mlir][bytecode] Implements back deployment capability for MLIR dialects (#70724)
When emitting bytecode, clients can specify a target dialect version to
emit in `BytecodeWriterConfig`. This exposes a target dialect version to
the DialectBytecodeWriter, which can be queried by name and used to
back-deploy attributes, types, and properties.
2023-10-31 15:41:29 -07:00
Fabian Mora
1828deb752 [mlir][gpu] Deprecate gpu::Serialization* passes. (#65857)
Deprecate the `gpu-to-cubin` & `gpu-to-hsaco` passes in favor of the
`TargetAttr` workflow. This patch removes remaining upstream uses of the
aforementioned passes, including the option to use them in `mlir-opt`. A
future patch will remove these passes entirely.

The passes can be re-enabled in `mlir-opt` by adding the CMake flag: `-DMLIR_ENABLE_DEPRECATED_GPU_SERIALIZATION=1`.
2023-09-11 16:32:15 -04:00
Will Dietz
08ed557714 [mlir] mlir-opt: Fix linking after 7c4e8c6a27 .
Without this, undefined refernces to the LLVMIR translations:
```
ld: mlir-opt.cpp:(.text.startup.main+0x49): undefined reference to `mlir::registerAMXDialectTranslation(mlir::DialectRegistry&)'
ld: mlir-opt.cpp:(.text.startup.main+0x51): undefined reference to `mlir::registerArmSMEDialectTranslation(mlir::DialectRegistry&)'
ld: mlir-opt.cpp:(.text.startup.main+0x59): undefined reference to `mlir::registerArmSVEDialectTranslation(mlir::DialectRegistry&)'
ld: mlir-opt.cpp:(.text.startup.main+0x81): undefined reference to `mlir::registerOpenACCDialectTranslation(mlir::DialectRegistry&)'
ld: mlir-opt.cpp:(.text.startup.main+0x89): undefined reference to `mlir::registerOpenMPDialectTranslation(mlir::DialectRegistry&)'
ld: mlir-opt.cpp:(.text.startup.main+0x99): undefined reference to `mlir::registerX86VectorDialectTranslation(mlir::DialectRegistry&)'
```

Reviewed By: stellaraccident

Differential Revision: https://reviews.llvm.org/D158606
2023-08-25 20:28:27 -05:00
Nicolas Vasilache
7c4e8c6a27 [mlir] Disentangle dialect and extension registrations.
This revision avoids the registration of dialect extensions in Pass::getDependentDialects.

Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is
always guaranteed to return false for extensions (i.e. there is no mechanism to track
whether a lambda is already in the list of already registered extensions).
When the context is already in a multi-threaded mode, this is guaranteed to assert.

Arguably a more structured registration mechanism for extensions with a unique ExtensionID
could be envisioned in the future.

In the process of cleaning this up, multiple usage inconsistencies surfaced around the
registration of translation extensions that this revision also cleans up.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D157703
2023-08-22 00:40:09 +00:00
Matteo Franciolini
bff6a4292f Expose callbacks for encoding of types/attributes
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.

Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.

Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D153383
2023-07-28 16:45:42 -07:00
Mehdi Amini
b86a13211f Revert "Expose callbacks for encoding of types/attributes"
This reverts commit b299ec1666.

The authorship informations were incorrect.
2023-07-28 16:45:42 -07:00
Mehdi Amini
b299ec1666 Expose callbacks for encoding of types/attributes
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.

Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.

Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D153383
2023-07-28 10:44:02 -07:00
Srishti Srivastava
de826ea35d [MLIR][ANALYSIS] Add liveness analysis utility
This commit adds a utility to implement liveness analysis using the
sparse backward data-flow analysis framework. Theoretically, liveness
analysis assigns liveness to each (value, program point) pair in the
program and it is thus a dense analysis. However, since values are
immutable in MLIR, a sparse analysis, which will assign liveness to
each value in the program, suffices here.

Liveness analysis has many applications. It can be used to avoid the
computation of extraneous operations that have no effect on the memory
or the final output of a program. It can also be used to optimize
register allocation. Both of these applications help achieve one very
important goal: reducing runtime.

A value is considered "live" iff it:
  (1) has memory effects OR
  (2) is returned by a public function OR
  (3) is used to compute a value of type (1) or (2).
It is also to be noted that a value could be of multiple types (1/2/3) at
the same time.

A value "has memory effects" iff it:
  (1.a) is an operand of an op with memory effects OR
  (1.b) is a non-forwarded branch operand and a block where its op could
  take the control has an op with memory effects.

A value `A` is said to be "used to compute" value `B` iff `B` cannot be
computed in the absence of `A`. Thus, in this implementation, we say that
value `A` is used to compute value `B` iff:
  (3.a) `B` is a result of an op with operand `A` OR
  (3.b) `A` is used to compute some value `C` and `C` is used to compute
  `B`.

---

It is important to note that there already exists an MLIR liveness
utility here: llvm-project/mlir/include/mlir/Analysis/Liveness.h. So,
what is the need for this new liveness analysis utility being added by
this commit? That need is explained as follows:-

The similarities between these two utilities is that both use the
fixpoint iteration method to converge to the final result of liveness.
And, both have the same theoretical understanding of liveness as well.

However, the main difference between (a) the existing utility and (b)
the added utility is the "scope of the analysis". (a) is restricted to
analysing each block independently while (b) analyses blocks together,
i.e., it looks at how the control flows from one block to the other,
how a caller calls a callee, etc. The restriction in the former implies
that some potentially non-live values could be marked live and thus the
full potential of liveness analysis will not be realised.

This can be understood using the example below:

```
1 func.func private @private_dead_return_value_removal_0() -> (i32, i32) {
2   %0 = arith.constant 0 : i32
3   %1 = arith.addi %0, %0 : i32
4   return %0, %1 : i32, i32
5 }
6 func.func @public_dead_return_value_removal_0() -> (i32) {
7   %0:2 = func.call @private_dead_return_value_removal_0() : () -> (i32, i32)
8   return %0#0 : i32
9 }
```

Here, if we just restrict our analysis to a per-block basis like (a), we
will say that the %1 on line 3 is live because it is computed and then
returned outside its block by the function. But, if we perform a
backward data-flow analysis like (b) does, we will say that %0#1 of line
7 is not live because it isn't returned by the public function and thus,
%1 of line 3 is also not live. So, while (a) will be unable to suggest
any IR optimizations, (b) can enable this IR to convert to:-

```
1 func.func private @private_dead_return_value_removal_0() -> i32 {
2   %0 = arith.constant 0 : i32
3   return %0 : i32
4 }
5 func.func @public_dead_return_value_removal_0() -> i32 {
6   %0 = call @private_dead_return_value_removal_0() : () -> i32
7   return %0 : i32
8 }
```

One operation was removed and one unnecessary return value of the
function was removed and the function signature was modified. This is an
optimization that (b) can enable but (a) cannot. Such optimizations can
help remove a lot of extraneous computations that are currently being
done.

Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com>

Reviewed By: matthiaskramm, jcai19

Differential Revision: https://reviews.llvm.org/D153779
2023-07-21 13:29:14 -07:00
Mahesh Ravishankar
67399932c7 [mlir][Linalg] Cleanup the drop unit dims pass in Linalg.
TL;DR the following API functions have been merged

```
void populateFoldUnitExtentDimsViaReshapesPatterns(RewritePatternSet &patterns);
void populateFoldUnitExtentDimsViaSlicesPatterns(RewritePatternSet &patterns);
```

into

```
void populateFoldUnitExtentDimsPatterns(RewritePatternSet &patterns,
                                        ControlDropUnitDims &options);
```

To use the previous functionality use

```
ControlDropUnitDims options;
// By default options.rankReductionStrategy is
// ControlDropUnitDims::RankReductionStrategy::ReassociativeReshape.
populateFoldUnitExtentDimsPatterns(patterns, options);
```

and

```
ControlDropUnitDims options;
options.rankReductionStrategy = ControlDropUnitDims::RankReductionStrategy::ExtractInsertSlice
populateFoldUnitExtentDimsPatterns(patterns, options);

```

This pass is quite old and needed to be updated based on the current
approach to transformations in Linalg

- Instead of two patterns, one to just remove loop dimensions that are
  unit extent (and using 0 in the indexing maps), and another to drop
  the unit-extents in the operand shapes, combine into a single
  transformation. This avoid creating an intermediate step with
  indexing maps having 0's in the domains exp ressions.

- Expose the core transformation as a utility function and add a
  pattern that calls this transformation.

This is a mostly NFC change, apart from the API change and dropping
the patterns/test that only dropped the loops that are unit extents.

Differential Revision: https://reviews.llvm.org/D155518
2023-07-19 17:47:18 +00:00
Nicolas Vasilache
7e78ecfe10 [mlir][cuda] Add a test-lower-to-nvvm catchall passpipeline.
This mirrors the test-lower-to-llvm pass pipeline that provides some sanity when running e2e examples.

One peculiarity of the GPU pipeline is that we want to allow 32b indexing in kernels.
This is currently not straightforward as there are dependencies between passes.
This new test pass orders passes in a way that connects end-to-end.

Differential Revision: https://reviews.llvm.org/D155463
2023-07-17 15:18:33 +00:00
Alex Zinenko
8a918c54bb [mlir] add backward dense dataflow analysis
This is the counterpart to the forward dense dataflow analysis and
integrates into the dataflow framework. The implementation follows the
structure of existing dataflow analyses.

Reviewed By: Mogball, phisiart

Differential Revision: https://reviews.llvm.org/D154713
2023-07-11 16:47:53 +00:00
Tobias Gysi
728a8d5a81 [mlir] Add a builtin distinct attribute
A distinct attribute associates a referenced attribute with a unique
identifier. Every call to its create function allocates a new
distinct attribute instance. The address of the attribute instance
temporarily serves as its unique identifier. Similar to the names
of SSA values, the final unique identifiers are generated during
pretty printing.

Examples:
 #distinct = distinct[0]<42.0 : f32>
 #distinct1 = distinct[1]<42.0 : f32>
 #distinct2 = distinct[2]<array<i32: 10, 42>>

This mechanism is meant to generate attributes with a unique
identifier, which can be used to mark groups of operations
that share a common properties such as if they are aliasing.

The design of the distinct attribute ensures minimal memory
footprint per distinct attribute since it only contains a reference
to another attribute. All distinct attributes are stored outside of
the storage uniquer in a thread local store that is part of the
context. It uses one bump pointer allocator per thread to ensure
distinct attributes can be created in-parallel.

Reviewed By: rriddle, Dinistro, zero9178

Differential Revision: https://reviews.llvm.org/D153360
2023-07-11 07:33:16 +00:00
yzhang93
5a1cdcbd86 [mlir] Narrow bitwidth emulation for MemRef load
This patch adds support for narrow bitwidth storage emulation. The goal is to support sub-byte type
codegen for LLVM CPU. Specifically, a type converter is added to convert memref of narrow bitwidth
(e.g., i4) into supported wider bitwidth (e.g., i8). Another focus of this patch is to populate the
pattern for int4 memref.load. memref.store pattern should be added in a seperate patch.

Reviewed By: hanchung, mravishankar

Differential Revision: https://reviews.llvm.org/D151519
2023-06-26 14:18:30 -07:00
River Riddle
a5ef51d786 [mlir] Add support for "promised" interfaces
Promised interfaces allow for a dialect to "promise" the implementation of an interface, i.e.
declare that it supports an interface, but have the interface defined in an extension in a library
separate from the dialect itself. A promised interface is powerful in that it alerts the user when
the interface is attempted to be used (e.g. via cast/dyn_cast/etc.) and the implementation has
not yet been provided. This makes the system much more robust against misconfiguration,
and ensures that we do not lose the benefit we currently have of defining the interface in
the dialect library.

Differential Revision: https://reviews.llvm.org/D120368
2023-06-09 11:30:13 -07:00
Matteo Franciolini
612781918f Preserve use-list orders in mlir bytecode
This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D149755
2023-05-21 16:48:12 -07:00
Mehdi Amini
3128b3105d Add support for Lazyloading to the MLIR bytecode
IsolatedRegions are emitted in sections in order for the reader to be
able to skip over them. A new class is exposed to manage the state and
allow the readers to load these IsolatedRegions on-demand.

Differential Revision: https://reviews.llvm.org/D149515
2023-05-20 15:24:33 -07:00
Mehdi Amini
9c8db444bc Remove deprecated preloadDialectInContext flag for MlirOptMain that has been deprecated for 2 years
See https://discourse.llvm.org/t/psa-preloaddialectincontext-has-been-deprecated-for-1y-and-will-be-removed/68992

Differential Revision: https://reviews.llvm.org/D149039
2023-04-24 14:37:31 -07:00
Mahesh Ravishankar
da784e77da [mlir] Add a utility function to make a region isolated from above.
The utility functions takes a region and makes it isolated from above
by appending to the entry block arguments that represent the captured
values and replacing all uses of the captured values within the region
with the newly added arguments. The captures values are returned.

The utility function also takes an optional callback that allows
cloning operations that define the captured values into the region
during the process of making it isolated from above. The cloned value
is no longer a captured values. The operands of the operation are then
captured values. This is done transitively allow cloning of a DAG of
operations into the region based on the callback.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D148684
2023-04-20 16:40:25 +00:00
Matthias Springer
8c885658ed [mlir][Interfaces] Add ValueBoundsOpInterface
Ops can implement this interface to specify lower/upper bounds for their result values and block arguments. Bounds can be specified for:
* Index-type values
* Dimension sizes of shapes values

The bounds are added to a constraint set. Users can query this constraint set to compute bounds wrt. to a user-specified set of values. Only EQ bounds are supported at the moment.

This revision also contains interface implementations for various tensor dialect ops, which illustrates how to implement this interface.

Differential Revision: https://reviews.llvm.org/D145681
2023-04-06 02:57:14 +02:00
Christian Ulmann
1ef51e0452 [mlir][Analysis] Introduce LoopInfo in mlir
This commit introduces an instantiation of LLVM's LoopInfo for CFGs in
MLIR. To test the LoopInfo, a test pass is added the checks the analysis
results for a set of CFGs.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D147323
2023-04-05 12:57:16 +00:00
Ingo Müller
0ceb7a12db [mlir] Implement pass utils for 1:N type conversions.
The current dialect conversion does not support 1:N type conversions.
This commit implements a (poor-man's) dialect conversion pass that does
just that. To keep the pass independent of the "real" dialect conversion
infrastructure, it provides a specialization of the TypeConverter class
that allows for N:1 target materializations, a specialization of the
RewritePattern and PatternRewriter classes that automatically add
appropriate unrealized casts supporting 1:N type conversions and provide
converted operands for implementing subclasses, and a conversion driver
that applies the provided patterns and replaces the unrealized casts
that haven't folded away with user-provided materializations.

The current pass is powerful enough to express many existing manual
solutions for 1:N type conversions or extend transforms that previously
didn't support them, out of which this patch implements call graph type
decomposition (which is currently implemented with a ValueDecomposer
that is only used there).

The goal of this pass is to illustrate the effect that 1:N type
conversions could have, gain experience in how patterns should be
written that achieve that effect, and get feedback on how the APIs of
the dialect conversion should be extended or changed to support such
patterns. The hope is that the "real" dialect conversion eventually
supports such patterns, at which point, this pass could be removed
again.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D144469
2023-03-27 16:04:26 +00:00
Ingo Müller
a8416e3c04 Revert "[mlir] Implement pass utils for 1:N type conversions."
This reverts commit 9c4611f9c7.
2023-03-27 09:23:57 +00:00
Ingo Müller
9c4611f9c7 [mlir] Implement pass utils for 1:N type conversions.
The current dialect conversion does not support 1:N type conversions.
This commit implements a (poor-man's) dialect conversion pass that does
just that. To keep the pass independent of the "real" dialect conversion
infrastructure, it provides a specialization of the TypeConverter class
that allows for N:1 target materializations, a specialization of the
RewritePattern and PatternRewriter classes that automatically add
appropriate unrealized casts supporting 1:N type conversions and provide
converted operands for implementing subclasses, and a conversion driver
that applies the provided patterns and replaces the unrealized casts
that haven't folded away with user-provided materializations.

The current pass is powerful enough to express many existing manual
solutions for 1:N type conversions or extend transforms that previously
didn't support them, out of which this patch implements call graph type
decomposition (which is currently implemented with a ValueDecomposer
that is only used there).

The goal of this pass is to illustrate the effect that 1:N type
conversions could have, gain experience in how patterns should be
written that achieve that effect, and get feedback on how the APIs of
the dialect conversion should be extended or changed to support such
patterns. The hope is that the "real" dialect conversion eventually
supports such patterns, at which point, this pass could be removed
again.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D144469
2023-03-27 09:02:28 +00:00
Nicolas Vasilache
0fa20ecafe [mlir][Affine] Add helper functions to allow reordering affine.apply operands and decompose the ops into smaller components
Care is taken to order operands from least hoistable to most hoistable and to process subexpressions in the same
order.

This allows exposing more oppportunities for licm, cse and strength reduction.

Such a step should typically be applied while we still have loops in the IR and just before lowering affine ops to arith.
This is because the affine.apply canonicalization currently tries to maximally compose chains of affine.apply operations
and could undo the effects of these decompositions.

Depends on: D145784

Differential Revision: https://reviews.llvm.org/D145685
2023-03-14 04:07:32 -07:00
Jakub Kuderski
b194ef692c [mlir][spirv][vector] Add pattern to convert reduction to SPIR-V dot prod
This converts a specific form of `vector.reduction` to SPIR-V integer
dot product ops.

Add a new test pass to excercise this outside of the main vector to
spirv conversion pass.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D145760
2023-03-10 13:54:16 -05:00
Nicolas Vasilache
c624027633 [mlir][linalg][TransformOps] Connect hoistRedundantVectorTransfers
Connect the hoistRedundantVectorTransfers functionality to the transform
dialect.

Authored-by: Quentin Colombet <quentin.colombet@gmail.com>

Differential Revision: https://reviews.llvm.org/D144260
2023-02-20 01:50:29 -08:00
Tom Eccles
81a79ee446 [mlir] Add function for checking if a block is inside a loop
This function returns whether a block is nested inside of a loop. There
can be three kinds of loop:
  1) The block is nested inside of a LoopLikeOpInterface
  2) The block is nested inside another block which is in a loop
  3) There is a cycle in the control flow graph

This will be useful for Flang's stack arrays pass, which moves array
allocations from the heap to the stack. Special handling is needed when
allocations occur inside of loops to ensure additional stack space is
not allocated on each loop iteration.

Differential Revision: https://reviews.llvm.org/D141401
2023-02-10 16:14:17 +00:00
Kiran Chandramohan
bacf1aa3c0 Revert "[mlir] Add function for checking if a block is inside a loop"
Reverting since the shared library builds are failing.

This reverts commit dcee187522.
2023-02-09 18:36:28 +00:00
Tom Eccles
dcee187522 [mlir] Add function for checking if a block is inside a loop
This function returns whether a block is nested inside of a loop. There
can be three kinds of loop:
  1) The block is nested inside of a LoopLikeOpInterface
  2) The block is nested inside another block which is in a loop
  3) There is a cycle in the control flow graph

This will be useful for Flang's stack arrays pass, which moves array
allocations from the heap to the stack. Special handling is needed when
allocations occur inside of loops to ensure additional stack space is
not allocated on each loop iteration.

Differential Revision: https://reviews.llvm.org/D141401
2023-02-09 15:18:54 +00:00
Ingo Müller
b716bf84ea [mlir][scf] Fix builder of WhileOp with region builder arguments.
The overload of WhileOp::build with arguments for builder functions for
the regions of the op was broken: It did not compute correctly the types
(and locations) of the region arguments, which lead to failed assertions
when the result types were different from the operand types.
Specifically, it used the result types (and operand locations) for *both*
regions, instead of the operand types (and locations) for the 'before'
region and the result types (and loecations) for the 'after' region.

Reviewed By: Mogball, mehdi_amini

Differential Revision: https://reviews.llvm.org/D142952
2023-02-07 13:40:54 +00:00
Matthias Springer
325b58d59f [mlir][cf] Print message in cf.assert to LLVM lowering
The assert message was previously ignored. The lowered IR now calls `puts` it in case of a failed assertion.

Differential Revision: https://reviews.llvm.org/D138647
2022-12-15 17:45:34 +01:00
Matthias Kramm
4e98d611ef [mlir] Implement backward dataflow.
This enables interprocedural lifeness analysis, very busy expression
analysis, etc.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D138935
2022-12-13 18:35:27 +01:00
Hanhan Wang
0f297cad4d [mlir][tensor][linalg] Introduce DataLayoutPropagation pass.
It introduces a pattern that swaps `linalg.generic + tensor.pack` to
`tensor.pack + linalg.generic`. It requires all the iteration types
being parallel; the indexing map of output operand is identiy. They can
all be relaxed in the future.

The user can decide whether the propagation should be applied or not by
passing a control function.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D138882
2022-12-06 15:00:07 -08:00
Matthias Springer
c1fef4e88a [mlir][bufferization] Make TensorCopyInsertionPass a test pass
TensorCopyInsertion should not have been exposed as a pass. This was a flaw in the original design. It is a preparation step for bufferization and certain transforms (that would otherwise be legal) are illegal between TensorCopyInsertion and actual rewrite to MemRef ops. Therefore, even if broken down as two separate steps internally, they should be exposed as a single pass.

This change affects the sparse compiler, which uses `TensorCopyInsertionPass`. A new `SparsificationAndBufferizationPass` is added to replace all passes in the sparse tensor pipeline from `TensorCopyInsertionPass` until the actual bufferization (rewrite to memref/non-tensor). It is generally unsafe to run arbitrary passes in-between, in particular passes that hoist tensor ops out of loops or change SSA use-def chains along tensor ops.

Differential Revision: https://reviews.llvm.org/D138915
2022-12-02 15:38:02 +01:00
Nicolas Vasilache
6e92d3fead [mlir][Test] Add a test pass to act as a sink towards LLVM conversion
This allows writing simple e2e tests where we can check for the proper materialization
of specific LLVM IR (e.g. `llvm.intr.fmuladd`).

Differential Revision: https://reviews.llvm.org/D138776
2022-11-28 00:59:55 -08:00
River Riddle
8c66344ee9 [mlir:PDL] Add support for DialectConversion with pattern configurations
Up until now PDL(L) has not supported dialect conversion because we had no
way of remapping values or integrating with type conversions. This commit
rectifies that by adding a new "pattern configuration" concept to PDL. This
essentially allows for attaching external configurations to patterns, which
can hook into pattern events (for now just the scope of a rewrite, but we
could also pass configs to native rewrites as well). This allows for injecting
the type converter into the conversion pattern rewriter.

Differential Revision: https://reviews.llvm.org/D133142
2022-11-08 01:57:57 -08:00
Nicolas Vasilache
44cfea0279 [mlir][Linalg] Retire LinalgStrategyTilePass and filter-based pattern.
Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785

Uses of `LinalgTilingPattern::returningMatchAndRewrite` are replaced by a top-level `tileWithLinalgTilingOptions` function that is marked obsolete and serves
as a temporary means to transition away from `LinalgTilingOptions`-based tiling.
LinalgTilingOptions supports too many options that have been orthogonalized with the use of the transform dialect.

Additionally, the revision introduces a `transform.structured.tile_to_scf_for` structured transform operation that is needed to properly tile `tensor.pad`
via the TilingInterface. Uses of `transform.structured.tile` will be deprecated and replaced by this new op.
This will achieve the deprecation of `linalg::tileLinalgOp`.
Context: https://discourse.llvm.org/t/psa-retire-tileandfuselinalgops-method/63850

In the process of transitioning, tests that were performing tile and distribute on tensors are retired: transformations should be orthogonalized better in the future.
In particular, tiling to specific loop types and tileAndDistribute behavior are not available via the transform ops.
The behavior is still available as part of the `tileWithLinalgTilingOptions` method to allow downstream clients to transition without breakages but is meant to be retired soon.

As more tests are ported to the transform dialect, it became necessary to introduce a test-transform-dialect-erase-schedule-pass to discard the transform specification
once applied so that e2e lowering and execution is possible.

Lastly, a number of redundant tests that were testing composition of patterns are retired as they are available with a better mechanism via the transform dialect.

Differential Revision: https://reviews.llvm.org/D135573
2022-10-11 02:42:56 -07:00
Yuanqiang Liu
9f77909a5e [mlir][shape] add outline-shape-computation pass
Add outline-shape-computation pass. This pass his pass outlines the
shape computation part in high level IR by adding shape.func and
populate corresponding mapping information into ShapeMappingAnalysis.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D131810
2022-10-02 20:24:49 -07:00
Jakub Kuderski
abc362a107 [mlir][arith] Change dialect name from Arithmetic to Arith
Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22.

Tested with:
`ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples`

and `bazel build --config=generic_clang @llvm-project//mlir:all`.

Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini

Differential Revision: https://reviews.llvm.org/D134762
2022-09-29 11:23:28 -04:00
Jakub Kuderski
242d558658 [mlir][arith] Add test pass for wide integer emulation
The new test pass allows for running wide integer emulation conversion
within specified functions only.

I intend to use it in integration tests in a way that allows me print both
original and emulated results in the same format, or even compare both results
at runtime and print on mismatch only.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D134120
2022-09-20 11:22:28 -04:00
Mathieu Fehr
ba8424a251 [mlir] Add Dynamic Dialects
Dynamic dialects are dialects that can be defined at runtime.
Dynamic dialects are extensible by new operations, types, and
attributes at runtime.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D125201
2022-09-19 09:58:18 -07:00
Matthias Springer
31fbdab376 [mlir][transforms] Add topological sort analysis
This change add a helper function for computing a topological sorting of a list of ops. E.g. this can be useful in transforms where a subset of ops should be cloned without dominance errors.

The analysis reuses the existing implementation in TopologicalSortUtils.cpp.

Differential Revision: https://reviews.llvm.org/D131669
2022-08-15 21:09:18 +02:00
Manish Gupta
14d79afeae [mlir][NVGPU] nvgpu.mmasync on F32 through TF32
Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3.

For mma.sync on f32 input using tensor cores there are two possibilites:
(a) tf32   (1 `mma.sync` per warp-level matrix-multiply-accumulate)
(b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate)

Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D130294
2022-08-01 23:23:27 +00:00
srishti-cb
b508c5649f [MLIR] Add a utility to sort the operands of commutative ops
Added a commutativity utility pattern and a function to populate it. The pattern sorts the operands of an op in ascending order of the "key" associated with each operand iff the op is commutative. This sorting is stable.

The function is intended to be used inside passes to simplify the matching of commutative operations. After the application of the above-mentioned pattern, since the commutative operands now have a deterministic order in which they occur in an op, the matching of large DAGs becomes much simpler, i.e., requires much less number of checks to be written by a user in her/his pattern matching function.

The "key" associated with an operand is the list of the "AncestorKeys" associated with the ancestors of this operand, in a breadth-first order.

The operand of any op is produced by a set of ops and block arguments. Each of these ops and block arguments is called an "ancestor" of this operand.

Now, the "AncestorKey" associated with:
1. A block argument is `{type: BLOCK_ARGUMENT, opName: ""}`.
2. A non-constant-like op, for example, `arith.addi`, is `{type: NON_CONSTANT_OP, opName: "arith.addi"}`.
3. A constant-like op, for example, `arith.constant`, is `{type: CONSTANT_OP, opName: "arith.constant"}`.

So, if an operand, say `A`, was produced as follows:

```
`<block argument>`  `<block argument>`
             \          /
              \        /
              `arith.subi`           `arith.constant`
                         \            /
                         `arith.addi`
                                |
                           returns `A`
```

Then, the block arguments and operations present in the backward slice of `A`, in the breadth-first order are:
`arith.addi`, `arith.subi`, `arith.constant`, `<block argument>`, and `<block argument>`.

Thus, the "key" associated with operand `A` is:
```
{
 {type: NON_CONSTANT_OP, opName: "arith.addi"},
 {type: NON_CONSTANT_OP, opName: "arith.subi"},
 {type: CONSTANT_OP, opName: "arith.constant"},
 {type: BLOCK_ARGUMENT, opName: ""},
 {type: BLOCK_ARGUMENT, opName: ""}
}
```

Now, if "keyA" is the key associated with operand `A` and "keyB" is the key associated with operand `B`, then:
"keyA" < "keyB" iff:
1. In the first unequal pair of corresponding AncestorKeys, the AncestorKey in operand `A` is smaller, or,
2. Both the AncestorKeys in every pair are the same and the size of operand `A`'s "key" is smaller.

AncestorKeys of type `BLOCK_ARGUMENT` are considered the smallest, those of type `CONSTANT_OP`, the largest, and `NON_CONSTANT_OP` types come in between. Within the types `NON_CONSTANT_OP` and `CONSTANT_OP`, the smaller ones are the ones with smaller op names (lexicographically).

---

Some examples of such a sorting:

Assume that the sorting is being applied to `foo.commutative`, which is a commutative op.

Example 1:

> %1 = foo.const 0
> %2 = foo.mul <block argument>, <block argument>
> %3 = foo.commutative %1, %2

Here,
1. The key associated with %1 is:
```
    {
     {CONSTANT_OP, "foo.const"}
    }
```
2. The key associated with %2 is:
```
    {
     {NON_CONSTANT_OP, "foo.mul"},
     {BLOCK_ARGUMENT, ""},
     {BLOCK_ARGUMENT, ""}
    }
```

The key of %2 < the key of %1
Thus, the sorted `foo.commutative` is:
> %3 = foo.commutative %2, %1

Example 2:

> %1 = foo.const 0
> %2 = foo.mul <block argument>, <block argument>
> %3 = foo.mul %2, %1
> %4 = foo.add %2, %1
> %5 = foo.commutative %1, %2, %3, %4

Here,
1. The key associated with %1 is:
```
    {
     {CONSTANT_OP, "foo.const"}
    }
```
2. The key associated with %2 is:
```
    {
     {NON_CONSTANT_OP, "foo.mul"},
     {BLOCK_ARGUMENT, ""}
    }
```
3. The key associated with %3 is:
```
    {
     {NON_CONSTANT_OP, "foo.mul"},
     {NON_CONSTANT_OP, "foo.mul"},
     {CONSTANT_OP, "foo.const"},
     {BLOCK_ARGUMENT, ""},
     {BLOCK_ARGUMENT, ""}
    }
```
4. The key associated with %4 is:
```
    {
     {NON_CONSTANT_OP, "foo.add"},
     {NON_CONSTANT_OP, "foo.mul"},
     {CONSTANT_OP, "foo.const"},
     {BLOCK_ARGUMENT, ""},
     {BLOCK_ARGUMENT, ""}
    }
```

Thus, the sorted `foo.commutative` is:
> %5 = foo.commutative %4, %3, %2, %1

Signed-off-by: Srishti Srivastava <srishti.srivastava@polymagelabs.com>

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D124750
2022-07-30 19:25:18 -04:00
Mahesh Ravishankar
485190df95 [mlir][Linalg] Deprecate tileAndFuseLinalgOps method and associated patterns.
The `tileAndFuseLinalgOps` is a legacy approach for tiling + fusion of
Linalg operations. Since it was also intended to work on operations
with buffer operands, this method had fairly complex logic to make
sure tile and fuse was correct even with side-effecting linalg ops.
While complex, it still wasnt robust enough. This patch deprecates
this method and thereby deprecating the tiling + fusion method for ops
with buffer semantics. Note that the core transformation to do fusion
of a producer with a tiled consumer still exists. The deprecation here
only removes methods that auto-magically tried to tile and fuse
correctly in presence of side-effects.

The `tileAndFuseLinalgOps` also works with operations with tensor
semantics. There are at least two other ways the same functionality
exists.
1) The `tileConsumerAndFuseProducers` method. This does a similar
   transformation, but using a slightly different logic to
   automatically figure out the legal tile + fuse code. Note that this
   is also to be deprecated soon.
2) The prefered way uses the `TilingInterface` for tile + fuse, and
   relies on the caller to set the tiling options correctly to ensure
   that the generated code is correct.
As proof that (2) is equivalent to the functionality provided by
`tileAndFuseLinalgOps`, relevant tests have been moved to use the
interface, where the test driver sets the tile sizes appropriately to
generate the expected code.

Differential Revision: https://reviews.llvm.org/D129901
2022-07-21 05:05:06 +00:00
Mahesh Ravishankar
3139cc766c [mlir][Linalg] Add a pattern to decompose linalg.generic ops.
This patch adds a pattern to decompose a `linalg.generic` operations
that
- has only parallel iterator types
- has more than 2 statements (including the yield)

into multiple `linalg.generic` operation such that each operation has
a single statement and a yield.
The pattern added here just splits the matching `linalg.generic` into
two `linalg.generic`s, one containing the first statement, and the
other containing the remaining. The same pattern can be applied
repeatedly on the second op to ultimately fully decompose the generic
op.

Differential Revision: https://reviews.llvm.org/D129704
2022-07-15 23:01:18 +00:00
Nicolas Vasilache
cd6e02eebc [mlir][Linalg] Retire TestLinalgCodegenStrategy pass.
This pass tests patterns that are already tested elsewhere by applying them in a semi-targeted
fashion using anchor function and op names.

From now on, targeted tests should use the transform dialect interpreter.

Differential Revision: https://reviews.llvm.org/D129627
2022-07-13 04:20:42 -07:00