Commit Graph

256 Commits

Author SHA1 Message Date
Martin Erhart
6bf043e743 [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619)
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
2023-09-18 16:44:48 +02:00
Matthias Springer
0f952cfe24 [mlir][IR] Change MutableOperandRange::operator[] to return an OpOperand & (#66515)
`operator[]` returns `OpOperand &` instead of `Value`.

* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.

Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
2023-09-18 09:43:03 +02:00
Matthias Springer
5cf714bb2f [mlir][SCF] scf.for: Consistent API around initArgs (#66512)
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
2023-09-18 09:13:43 +02:00
Christopher Bate
e2d39f799b [mlir][Transform] Add updateConversionTarget to ConversionPatternDescriptorOpInterface
This change adds a method to modify the ConversionTarget used during
`transform.apply_conversion_patterns` to the
`ConversionPatternDescriptorOpInterface`. This is needed when the TypeConverter
is used to dictate the dynamic legality of operations, as in "structural"
conversion patterns present in, for example, the SCF and func dialects.

As a first use case/test, this change also adds a
`transform.apply_patterns.scf.structural_conversions` operation to the SCF
dialect.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158672
2023-09-14 11:39:47 -06:00
Martin Erhart
66aa9a2517 [mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel (#66351)
The scf.forall.in_parallel terminator operation has a nested graph region with the NoTerminator trait. Such regions are not supported by the default implementations. Therefore, this commit adds a specialized implementation for
this operation which only covers the case where the nested region is empty.
This is because after bufferization, ops like tensor.parallel_insert_slice were already converted to memref operations residing int the scf.forall only and the nested region of scf.forall.in_parallel ends up empty.
2023-09-14 16:20:24 +02:00
Martin Erhart
c199f7dc62 Revert "[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute"
This reverts commit 6a91dfedeb.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:48 +00:00
Martin Erhart
ccb16acd46 Revert "[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel"
This reverts commit 1356e853d4.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:47 +00:00
Martin Erhart
1356e853d4 [mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel
The scf.forall.in_parallel terminator operation has a nested graph region with
the NoTerminator trait. Such regions are not supported by the default
implementations. Therefore, this commit adds a specialized implementation for
this operation which only covers the case where the nested region is empty.
This is because after bufferization, ops like tensor.parallel_insert_slice were
already converted to memref operations residing int the scf.forall only and the
nested region of scf.forall.in_parallel ends up empty.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158979
2023-09-13 09:30:24 +00:00
Martin Erhart
6a91dfedeb [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.

This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.

The documentation should w.r.t. these pass option changes should also be
updated in this commit.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D156662
2023-09-13 09:30:22 +00:00
Matthias Springer
1e1a3112f1 [mlir][bufferization] Privatize buffers for parallel regions
One-Shot Bufferize correctly handles RaW conflicts around repetitive regions (loops). Specical handling is needed for parallel regions. These are a special kind of repetitive regions that can have additional RaW conflicts that would not be present if the regions would be executed sequentially.

Example:
```
%0 = bufferization.alloc_tensor()
scf.forall ... {
  %1 = linalg.fill ins(...) outs(%0)
  ...
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %1 into ...
  }
}
```

A separate (private) buffer must be allocated for each iteration of the `scf.forall` loop.

This change adds a new interface method to `BufferizableOpInterface` to detect parallel regions. By default, regions are assumed to be sequential.

A buffer is privatized if an OpOperand bufferizes to a memory read inside a parallel region that is different from the parallel region where operand's value is defined.

Differential Revision: https://reviews.llvm.org/D159286
2023-09-06 14:28:43 +02:00
Matthias Springer
6ecebb496c [mlir][bufferization] Support unstructured control flow
This revision adds support for unstructured control flow to the bufferization infrastructure. In particular: regions with multiple blocks, `cf.br`, `cf.cond_br`.

Two helper templates are added to `BufferizableOpInterface.h`, which can be implemented by ops that supported unstructured control flow in their regions (e.g., `func.func`) and ops that branch to another block (e.g., `cf.br`).

A block signature is always bufferized together with the op that owns the block.

Differential Revision: https://reviews.llvm.org/D158094
2023-08-31 12:55:53 +02:00
Groverkss
2cc5f5d43c [mlir][Linalg] Implement tileReductionUsingScf for multiple reductions
This patch improves the reduction tiling for linalg to support multiple
reduction dimensions.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D158005
2023-08-17 02:17:03 +05:30
Matthias Springer
878950b82c [mlir][bufferization] Simplify getBufferType
`getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop.

To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument).

Also improve error messages in case of inconsistent memory spaces inside of a loop.

Differential Revision: https://reviews.llvm.org/D158060
2023-08-16 15:02:07 +02:00
Matthias Springer
a02ad6c177 [mlir][bufferization] Generalize getAliasingOpResults to getAliasingValues
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.

This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.

If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.

Differential Revision: https://reviews.llvm.org/D157957
2023-08-15 15:02:47 +02:00
Matthias Springer
7c74a2507c [mlir][SCF][NFC] Add helper functions to get body of scf.while
Add two new helper functions `getBeforeBody` and `getAfterBody` to be consistent with "scf.for" (`getBody`) and to show in the API that both regions have exactly one block. Also simplify some code that assumed that there can be more than one block in a region.

Differential Revision: https://reviews.llvm.org/D157860
2023-08-14 14:57:09 +02:00
Alex Zinenko
4a6b31b8d8 [mlir] NFC: untangle SCF Patterns.h and Transforms.h
These two headers both contained a strange mix of definitions related to
both patterns and non-pattern transforms. Put patterns and "populate"
functions into Patterns.h and standalone transforms into Transforms.h.

Depends On: D155223

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155454
2023-07-18 11:27:36 +00:00
Alex Zinenko
371366ce27 [mlir][nvgpu] add simple pipelining for shared memory copies
Add a simple transform operation to the NVGPU extension that performs
software pipelining of copies to shared memory. The functionality is
extremely minimalistic in this version and only supports copies from
global to shared memory inside an `scf.for` loop with either
`vector.transfer` or `nvgpu.device_async_copy` operations when
pipelining preconditions are already satisfied in the IR. This is the
minimally useful version that uses the more general loop pipeliner in an
NVGPU-specific way. Further extensions and orthogonalizations will be
necessary.

This required a change to the loop pipeliner itself to properly
propagate errors should the predicate generator fail.

This is loosely inspired from the vesion in IREE, but has less unsafe
assumptions and more principled way of communicating decisions.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155223
2023-07-17 14:29:12 +00:00
Spenser Bauman
272cf8f7b2 [mlir] Implement one-to-n structural conversion for ForOp
Add the missing one-to-n structural type conversion pattern for the
scf.for operation.

Reviewed By: ingomueller-net

Differential Revision: https://reviews.llvm.org/D154299
2023-07-07 15:50:54 +00:00
Ingo Müller
c000b403fc [mlir] Avoid unnecessary copies in SCF's OneToNTypeConversions. (NFC)
In two places, a ResultRange was copied into a SmallVector just to be
passed as a ValueRange argument. With this patch, the ResultRanges are
passed directly, avoiding a copy.

Reviewed By: ingomueller-net

Differential Revision: https://reviews.llvm.org/D154685
2023-07-07 09:15:30 +00:00
Matthias Springer
6596b0dde8 [mlir][tensor] Clean up tensor::DimOp usage
* Remove duplicate functions. `tensor::getMixedSize` and `tensor::getMixedSizes` should be used.
* Use `tensor::getMixedSize` instead of `createOrFold<tensor::DimOp>`. This is more efficient. `createOrFold` will create an op an immediately try to fold it. In case of a static dimension size, an attribute can be used directly.

Differential Revision: https://reviews.llvm.org/D153332
2023-06-22 10:56:17 +02:00
Nicolas Vasilache
c7592c7714 [mlir][scf] NFC - Add debug information to scf pipelining 2023-05-30 01:02:51 -07:00
Tres Popp
68f58812e3 [mlir] Move casting calls from methods to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
  for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.

Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
   additional check:
   main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
   and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
   them to a pure state.

```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
               -header-filter=mlir/ mlir/* -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```

Differential Revision: https://reviews.llvm.org/D151542
2023-05-26 10:29:55 +02:00
Matthias Springer
ae8cb64372 [mlir][scf][bufferize] Fix bug in WhileOp analysis verification
Block arguments and yielded values are not equivalent if there are not enough block arguments. This fixes #59442.

Differential Revision: https://reviews.llvm.org/D145575
2023-05-15 15:42:56 +02:00
Tres Popp
5550c82189 [mlir] Move casting calls from methods to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
  for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.

Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
   additional check:
   https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
   and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
   them to a pure state.
4. Some changes have been deleted for the following reasons:
   - Some files had a variable also named cast
   - Some files had not included a header file that defines the cast
     functions
   - Some files are definitions of the classes that have the casting
     methods, so the code still refers to the method instead of the
     function without adding a prefix or removing the method declaration
     at the same time.

```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
               -header-filter=mlir/ mlir/* -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc

git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
            mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
            mlir/lib/**/IR/\
            mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
            mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
            mlir/test/lib/Dialect/Test/TestTypes.cpp\
            mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
            mlir/test/lib/Dialect/Test/TestAttributes.cpp\
            mlir/unittests/TableGen/EnumsGenTest.cpp\
            mlir/test/python/lib/PythonTestCAPI.cpp\
            mlir/include/mlir/IR/
```

Differential Revision: https://reviews.llvm.org/D150123
2023-05-12 11:21:25 +02:00
Matthias Springer
4c48f016ef [mlir][Affine][NFC] Wrap dialect in "affine" namespace
This cleanup aligns the affine dialect with all the other dialects.

Differential Revision: https://reviews.llvm.org/D148687
2023-04-20 11:19:21 +09:00
Tres Popp
981932bc57 [MLIR] Clarify (test-scf-)parallel-loop-collapsing
1. parallel-loop-collapsing is renamed to test-scf-parallel-loop-collapsing.
2. The pass adds various checks to provide error messages instead of
   hitting assert failures.
3. Testing is added to verify these error messages

This is roughly an NFC. The name changes, but all checked behavior
previously would have resulted in an assertion failure. Almost no new
support is added, so this pass is still limited in scope to testing the
transform behaves correctly with input arguments that perfectly match
the ParallelLoop's iterator arg set. The one new piece of functionality
is that invalid operations will now be skipped with an error messages
instead of producing an assertion failure, so the pass can be used with
expected failures for pieces of the IR not cared about with a specific
RUN command.

Differential Revision: https://reviews.llvm.org/D147514
2023-04-05 13:41:15 +02:00
Oleg Shyshkov
f080f1122f [mlir][scf] Create constants for tiling in parent with isolated region.
FuncOp is IsolatedFromAbove, so this change doesn't alter current behaviour, but the current code fails if the tile op is in an op with IsolatedFromAbove trait.

An alternative would be to create constant in the same region where they're used a rely on CSE to figure out where to move them.

Differential Revision: https://reviews.llvm.org/D147273
2023-03-31 18:27:30 +02:00
Ingo Müller
4e563616a5 [mlir] Use GenericAdaptor to simplify 1:N type conversion API.
For 1:N type conversion, there is a 1:N relationship between the
original operands and the converted operands. The same is true for the
results. The previous design passed an instance of a "mapping" class
into each pattern that helped with handling this 1:N correspondance.
However, this was still rather manual and, in particular, it required
the use of magic constants for the indices of the different operands.

This commits uses the generated GenericAdaptor class that is generated
for each op class in order to simplify this relationship further. The
GenericAdaptor allows to wrap around a list of arbitrary types for each
operand (via templating); for 1:N type conversion, this allows the
operand accessors of the adaptor class to return a ValueRange that
corresponds to the N values in the converted types. Patterns can thus
use the named accessors instead of magic constants, which eliminates a
common class of errors.

This commit further simplifies the API that patterns need to implement
by making the operand and result type mappings part of the adaptor.
Since many patterns only need one of the two (or even neither), this
reduces the number of unnecessary arguments in many cases.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D147225
2023-03-31 10:43:09 +00:00
Thomas Raoux
1d448e1f24 [mlir][bufferization] Use rewriter to erase ops in scf.forall bufferization.
Without this bufferization cannot track operations removed during bufferization.
Unfortunately there is currently no way to enforce that ops need to be erased through
the rewriter and this causes sporadic errors when tracking pointers in Bufferization pass.
Therefore there is no easy way to test that the pattern is doing the right thing.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D147095
2023-03-28 23:59:45 +00:00
Ingo Müller
586cebef27 [mlir][scf] Implement structural conversion for 1:N type conversions.
This patch implements patterns for the newly introduced 1:N type
conversion utils for several ops of the SCF dialect. It also adds an
option to the existing test pass as well as test cases that applies the
patterns through the test pass.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D146959
2023-03-28 08:33:00 +00:00
Mahesh Ravishankar
3af1c48c66 Changes to SCFFuseProducerOfSliceResult to also return the operations created during fusion.
This is follow up to https://reviews.llvm.org/D145133 that allows
propogating information about ops that are fused back to the caller.

Reviewed By: hanchung

Differential Revision: https://reviews.llvm.org/D146254
2023-03-20 20:55:48 +00:00
Mahesh Ravishankar
809e3d8c98 [mlir][TilingInterface] Modify TilingInterface methods to better return the state of the transformed IR.
Currently the `getTiledImplementation` and `generateResultTileValue`
return just `SmallVector<Operation *>` and `FailureOr<Value>`.

- For `getTiledImplementation` returning empty implies tiling wasnt
  done. There is also an implicit assumption that the tiled operation
  results correspond to the tiled values of the result of the original
  operation. This cannot handle cases where the tiled implementation
  might use multiple operations to compute the tiled value for the
  results of the untiled operation. Sometimes, the tiled operation
  might not directly give the tiled values, and might require casts,
  etc to get a replacement.
- For `generateResultTileValue`, it is assumed that the op defining
  the returned `Value` is the operation that represents the tiled
  computation. Again presence of casts, etc violate this.

Instead make these methods return
```
struct TilingResult {
  SmallVector<Operation *> tiledOps;
  SmallVector<Value> tiledValues;
};
```

The `tiledOps` represent the operations generated that are relevant
for subsequent transformations. The `tiledValues` represent the tiled
values for the results of the original operation. This better
transmits the state of the transformed IR.

As a consequence the following methods also return `FailureOr<TilingResult>`
- `tensor::replaceExtractSliceWithTiledProducer`
- `tensor::bubbleUpPadSlice`

Differential Revision: https://reviews.llvm.org/D145133
2023-03-16 14:29:03 +00:00
Jakub Kuderski
a0a76804c4 [ADT] Allow llvm::enumerate to enumerate over multiple ranges
This does not work by a mere composition of `enumerate` and `zip_equal`,
because C++17 does not allow for recursive expansion of structured
bindings.

This implementation uses `zippy` to manage the iteratees and adds the
stream of indices as the first zipped range. Because we have an upfront
assertion that all input ranges are of the same length, we only need to
check if the second range has ended during iteration.

As a consequence of using `zippy`, `enumerate` will now follow the
reference and lifetime semantics of the `zip*` family of functions. The
main difference is that `enumerate` exposes each tuple of references
through a new tuple-like type `enumerate_result`, with the familiar
`.index()` and `.value()` member functions.

Because the `enumerate_result` returned on dereference is a
temporary, enumeration result can no longer be used through an
lvalue ref.

Reviewed By: dblaikie, zero9178

Differential Revision: https://reviews.llvm.org/D144503
2023-03-15 19:34:22 -04:00
Kazu Hirata
ce14f7b18f [mlir] Use Use *{Set,Map}::contains (NFC) 2023-03-14 21:48:49 -07:00
Nicolas Vasilache
1cff4cbda3 [mlir][Transform] NFC - Various API cleanups and use RewriterBase in lieu of PatternRewriter
Depends on: D145685

Differential Revision: https://reviews.llvm.org/D145977
2023-03-14 04:23:12 -07:00
Thomas Raoux
117db47d02 [mlir][scf] Fix bug in software pipeliner and simplify the logic
Fix bug when pipelining while interleaving stages. Re-do the logic to
only consider cloned operands when updating the use-def chain.

Differential Revision: https://reviews.llvm.org/D145598
2023-03-08 20:06:07 +00:00
Nicolas Vasilache
bb2ae98581 [mlir][Linalg] NFC - Apply cleanups to transforms
Depends on: D144656

Differential Revision: https://reviews.llvm.org/D144717
2023-02-28 03:25:01 -08:00
Matthias Springer
61f3775804 [mlir][SCF] Fix incorrect API usage in RewritePatterns
Incorrect API usage was detected by D144552.

Differential Revision: https://reviews.llvm.org/D144636
2023-02-27 09:36:14 +01:00
Matthias Springer
93d640f392 [mlir][SCF][Utils][NFC] Make some utils public for better reuse
These functions will be used in a subsequent change. Also some minor refactoring.

Differential Revision: https://reviews.llvm.org/D143909
2023-02-22 10:35:48 +01:00
Alexander Belyaev
eb2f946e78 [mlir][scf] Rename ForeachThreadOp->ForallOp, PerformConcurrentlyOp->InParallelOp.
Differential Revision: https://reviews.llvm.org/D144242
2023-02-17 09:59:39 +01:00
Alexander Belyaev
310deca248 [mlir] Add loop bounds to scf.foreach_thread.
https://discourse.llvm.org/t/rfc-parallel-loops-on-tensors-in-mlir/68332

Differential Revision: https://reviews.llvm.org/D144072
2023-02-17 08:57:52 +01:00
Matthias Springer
9fa6b3504b [mlir][bufferization] Improve aliasing OpOperand/OpResult property
`getAliasingOpOperands`/`getAliasingOpResults` now encodes OpOperand/OpResult, buffer relation and a degree of certainty. E.g.:
```
// aliasingOpOperands(%r) = {(%t, EQUIV, DEFINITE)}
// aliasingOpResults(%t) = {(%r, EQUIV, DEFINITE)}
%r = tensor.insert %f into %t[%idx] : tensor<?xf32>

// aliasingOpOperands(%r) = {(%t0, EQUIV, MAYBE), (%t1, EQUIV, MAYBE)}
// aliasingOpResults(%t0) = {(%r, EQUIV, MAYBE)}
// aliasingOpResults(%t1) = {(%r, EQUIV, MAYBE)}
%r = arith.select %c, %t0, %t1 : tensor<?xf32>
```

`BufferizableOpInterface::bufferRelation` is removed, as it is now part of `getAliasingOpOperands`/`getAliasingOpResults`.

This change allows for better analysis, in particular wrt. equivalence. This allows additional optimizations and better error checking (which is sometimes overly conservative). Examples:

* EmptyTensorElimination can eliminate `tensor.empty` inside `scf.if` blocks. This requires a modeling of equivalence: It is not a per-OpResult property anymore. Instead, it can be specified for each OpOperand and OpResult. This is important because `tensor.empty` may be eliminated only if all values on the SSA use-def chain to the final consumer (`tensor.insert_slice`) are equivalent.
* The detection of "returning allocs from a block" can be improved. (Addresses a TODO in `assertNoAllocsReturned`.) This allows us to bufferize IR such as "yielding a `tensor.extract_slice` result from an `scf.if` branch", which currently fails to bufferize because the alloc detection is too conservative.
* Better bufferization of loops. Aliases of the iter_arg can be yielded (even if they are not equivalent) without having to realloc and copy the entire buffer on each iteration.

The above-mentioned examples are not yet implemented with this change. This change just improves the BufferizableOpInterface, its implementations and related helper functions, so that better aliasing information is available for each op.

Differential Revision: https://reviews.llvm.org/D142129
2023-02-09 11:35:03 +01:00
Matthias Springer
1ac248e485 [mlir][bufferization][NFC] Rename getAliasingOpOperand/getAliasingOpResult
* `getAliasingOpOperand` => `getAliasingOpOperands`
* `getAliasingOpResult` => `getAliasingOpResults`

Also a few minor code cleanups and better documentation.

Differential Revision: https://reviews.llvm.org/D142979
2023-02-01 10:07:41 +01:00
Matthias Springer
148432ea84 [mlir][bufferization][NFC] Rename BufferRelation::None to BufferRelation::Unknown
The previous name was incorrect. `None` does not mean that there is no buffer relation between two buffers (seems to imply that they do not alias for sure); instead it means that there is no further information available.

Differential Revision: https://reviews.llvm.org/D142870
2023-01-30 11:09:28 +01:00
Matthias Springer
34d65e81e8 [mlir][bufferization] Generalize and rename isMemoryWrite
The name of the method was confusing. It is bufferizesToMemoryWrite, but from the perspective of OpResults.

`bufferizesToMemoryWrite(OpResult)` now supports ops with regions that do not have aliasing OpOperands (such as `scf.if`). These ops no longer need to implement `isMemoryWrite`.

Differential Revision: https://reviews.llvm.org/D141684
2023-01-30 09:34:04 +01:00
Kazu Hirata
5c9013e266 Use std::optional instead of llvm::Optional (NFC) 2023-01-28 00:45:19 -08:00
Ingo Müller
b0e4d389d3 [mlir][scf] Fix typo in comment in BufferizableOpInterfaceImpl.cpp (NFC).
Reviewed By: ingomueller-net

Differential Revision: https://reviews.llvm.org/D142554
2023-01-25 19:06:06 +00:00
Mahesh Ravishankar
dbbd907015 [mlir][TilingInterface] Fix use after free error from D141028.
The `candidateSliceOp` was replaces and used in a subsequent
call. Instead just replace its uses. The op is dead and will be
removed with CSE.

Differential Revision: https://reviews.llvm.org/D141869
2023-01-16 20:59:50 +00:00
Mahesh Ravishankar
9db7d4edd8 [mlir][TilingInterface] Add an option to tile and fuse to yield replacement for the fused producer.
This patch adds an option to the method that fuses a producer with a
tiled consumer, to also yield from the tiled loops a value that can be
used to replace the original producer. This is only valid if it can be
assertained that the slice of the producer computed within each
iteration of the tiled loop nest does not compute slices of the
producer redundantly. The analysis to derive this is very involved. So
this is left to the caller to assertain.  A test is added that mimics
the `scf::tileConsumerAndFuseProducersGreedilyUsingSCFForOp`, but also
yields the values of all fused producers. This can be used as a
reference for how a caller could use this functionality.

Differential Revision: https://reviews.llvm.org/D141028
2023-01-16 18:30:13 +00:00
Mahesh Ravishankar
ce349ff1a4 [mlir][TilingInterface] NFC: Separate out a utility method to perform one step of tile + fuse.
Differential Revision: https://reviews.llvm.org/D141027
2023-01-16 05:03:41 +00:00