Commit Graph

81 Commits

Author SHA1 Message Date
Matthias Springer
98a6edd38f [mlir][Interfaces] LoopLikeOpInterface: Expose tied loop results (#70535)
Expose loop results, which correspond to the region iter_arg values that
are returned from the loop when there are no more iterations. Exposing
loop results is optional because some loops (e.g., `scf.while`) do not
have a 1-to-1 mapping between region iter_args and op results.

Also add additional helper functions to query tied
results/iter_args/inits.
2023-11-01 08:34:14 +09:00
Matthias Springer
3cd2a0bc1a [mlir][Interfaces] LoopLikeOpInterface: Add helpers to query tied inits/iter_args (#70408)
The `LoopLikeOpInterface` already has interface methods to query inits
and iter_args. This commit adds helper functions to query tied
init/iter_arg pairs and removes the corresponding functions for
`scf::ForOp`.
2023-10-28 12:10:36 +09:00
Matthias Springer
ab737a8699 [mlir][Interfaces] LoopLikeOpInterface: Add helper to get yielded values (#67305)
Add a new interface method that returns the yielded values.
    
Also add a verifier that checks the number of inits/iter_args/yielded
values. Most of the checked invariants (but not all of them) are already
covered by the `RegionBranchOpInterface`, but the `LoopLikeOpInterface`
now provides (additional) error messages that are easier to read.
2023-10-16 08:45:48 +09:00
Matthias Springer
173fd67a12 [mlir][scf][bufferize] Improve bufferization of allocs yielded from scf.for (#68089)
The `BufferizableOpInterface` implementation of `scf.for` currently
assumes that an OpResult does not alias with any tensor apart from the
corresponding init OpOperand. Newly allocated buffers (inside of the
loop) are also allowed. The current implementation checks whether the
respective init_arg and yielded value are equivalent. This is overly
strict and causes extra buffer allocations/copies when yielding a new
buffer allocation from a loop.
2023-10-03 16:08:50 +02:00
Matthias Springer
e52899ea52 [mlir][SCF] Bufferize scf.index_switch (#67666)
Add the `BufferizableOpInterface` implementation of `scf.index_switch`.
2023-09-28 19:05:14 +02:00
Matthias Springer
9b5ef2bea8 [mlir][Interfaces] LoopLikeOpInterface: Support ops with multiple regions (#66754)
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.

`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.

Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
2023-09-19 17:35:38 +02:00
Matthias Springer
6923a31542 [mlir][IR] Change MutableArrayRange to enumerate OpOperand & (#66622)
In line with #66515, change `MutableArrayRange::begin`/`end` to
enumerate `OpOperand &` instead of `Value`. Also remove
`ForOp::getIterOpOperands`/`setIterArg`, which are now redundant.

Note: `MutableOperandRange` cannot be made a derived class of
`indexed_accessor_range_base` (like `OperandRange`), because
`MutableOperandRange::assign` can change the number of operands in the
range.
2023-09-19 09:09:21 +02:00
Martin Erhart
6bf043e743 [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute (#66619)
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
2023-09-18 16:44:48 +02:00
Martin Erhart
c199f7dc62 Revert "[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute"
This reverts commit 6a91dfedeb.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:48 +00:00
Martin Erhart
6a91dfedeb [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.

This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.

The documentation should w.r.t. these pass option changes should also be
updated in this commit.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D156662
2023-09-13 09:30:22 +00:00
Matthias Springer
1e1a3112f1 [mlir][bufferization] Privatize buffers for parallel regions
One-Shot Bufferize correctly handles RaW conflicts around repetitive regions (loops). Specical handling is needed for parallel regions. These are a special kind of repetitive regions that can have additional RaW conflicts that would not be present if the regions would be executed sequentially.

Example:
```
%0 = bufferization.alloc_tensor()
scf.forall ... {
  %1 = linalg.fill ins(...) outs(%0)
  ...
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %1 into ...
  }
}
```

A separate (private) buffer must be allocated for each iteration of the `scf.forall` loop.

This change adds a new interface method to `BufferizableOpInterface` to detect parallel regions. By default, regions are assumed to be sequential.

A buffer is privatized if an OpOperand bufferizes to a memory read inside a parallel region that is different from the parallel region where operand's value is defined.

Differential Revision: https://reviews.llvm.org/D159286
2023-09-06 14:28:43 +02:00
Matthias Springer
6ecebb496c [mlir][bufferization] Support unstructured control flow
This revision adds support for unstructured control flow to the bufferization infrastructure. In particular: regions with multiple blocks, `cf.br`, `cf.cond_br`.

Two helper templates are added to `BufferizableOpInterface.h`, which can be implemented by ops that supported unstructured control flow in their regions (e.g., `func.func`) and ops that branch to another block (e.g., `cf.br`).

A block signature is always bufferized together with the op that owns the block.

Differential Revision: https://reviews.llvm.org/D158094
2023-08-31 12:55:53 +02:00
Matthias Springer
878950b82c [mlir][bufferization] Simplify getBufferType
`getBufferType` computes the bufferized type of an SSA value without bufferizing any IR. This is useful for predicting the bufferized type of iter_args of a loop.

To avoid endless recursion (e.g., in the case of "scf.for", the type of the iter_arg depends on the type of init_arg and the type of the yielded value; the type of the yielded value depends on the type of the iter_arg again), `fixedTypes` was used to fall back to "fixed" type. A simpler way is to maintain an "invocation stack". `getBufferType` implementations can then inspect the invocation stack to detect repetitive computations (typically when computing the bufferized type of a block argument).

Also improve error messages in case of inconsistent memory spaces inside of a loop.

Differential Revision: https://reviews.llvm.org/D158060
2023-08-16 15:02:07 +02:00
Matthias Springer
a02ad6c177 [mlir][bufferization] Generalize getAliasingOpResults to getAliasingValues
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.

This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.

If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.

Differential Revision: https://reviews.llvm.org/D157957
2023-08-15 15:02:47 +02:00
Matthias Springer
7c74a2507c [mlir][SCF][NFC] Add helper functions to get body of scf.while
Add two new helper functions `getBeforeBody` and `getAfterBody` to be consistent with "scf.for" (`getBody`) and to show in the API that both regions have exactly one block. Also simplify some code that assumed that there can be more than one block in a region.

Differential Revision: https://reviews.llvm.org/D157860
2023-08-14 14:57:09 +02:00
Tres Popp
68f58812e3 [mlir] Move casting calls from methods to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
  for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This patch updates all remaining uses of the deprecated functionality in
mlir/. This was done with clang-tidy as described below and further
modifications to GPUBase.td and OpenMPOpsInterfaces.td.

Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
   additional check:
   main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
   and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
   them to a pure state.

```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
               -header-filter=mlir/ mlir/* -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc
```

Differential Revision: https://reviews.llvm.org/D151542
2023-05-26 10:29:55 +02:00
Matthias Springer
ae8cb64372 [mlir][scf][bufferize] Fix bug in WhileOp analysis verification
Block arguments and yielded values are not equivalent if there are not enough block arguments. This fixes #59442.

Differential Revision: https://reviews.llvm.org/D145575
2023-05-15 15:42:56 +02:00
Tres Popp
5550c82189 [mlir] Move casting calls from methods to function calls
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.

Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.

Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.

Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
  for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443

Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.

Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
   additional check:
   https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
   and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
   them to a pure state.
4. Some changes have been deleted for the following reasons:
   - Some files had a variable also named cast
   - Some files had not included a header file that defines the cast
     functions
   - Some files are definitions of the classes that have the casting
     methods, so the code still refers to the method instead of the
     function without adding a prefix or removing the method declaration
     at the same time.

```
ninja -C $BUILD_DIR clang-tidy

run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
               -header-filter=mlir/ mlir/* -fix

rm -rf $BUILD_DIR/tools/mlir/**/*.inc

git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
            mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
            mlir/lib/**/IR/\
            mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
            mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
            mlir/test/lib/Dialect/Test/TestTypes.cpp\
            mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
            mlir/test/lib/Dialect/Test/TestAttributes.cpp\
            mlir/unittests/TableGen/EnumsGenTest.cpp\
            mlir/test/python/lib/PythonTestCAPI.cpp\
            mlir/include/mlir/IR/
```

Differential Revision: https://reviews.llvm.org/D150123
2023-05-12 11:21:25 +02:00
Thomas Raoux
1d448e1f24 [mlir][bufferization] Use rewriter to erase ops in scf.forall bufferization.
Without this bufferization cannot track operations removed during bufferization.
Unfortunately there is currently no way to enforce that ops need to be erased through
the rewriter and this causes sporadic errors when tracking pointers in Bufferization pass.
Therefore there is no easy way to test that the pattern is doing the right thing.

Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D147095
2023-03-28 23:59:45 +00:00
Alexander Belyaev
eb2f946e78 [mlir][scf] Rename ForeachThreadOp->ForallOp, PerformConcurrentlyOp->InParallelOp.
Differential Revision: https://reviews.llvm.org/D144242
2023-02-17 09:59:39 +01:00
Alexander Belyaev
310deca248 [mlir] Add loop bounds to scf.foreach_thread.
https://discourse.llvm.org/t/rfc-parallel-loops-on-tensors-in-mlir/68332

Differential Revision: https://reviews.llvm.org/D144072
2023-02-17 08:57:52 +01:00
Matthias Springer
9fa6b3504b [mlir][bufferization] Improve aliasing OpOperand/OpResult property
`getAliasingOpOperands`/`getAliasingOpResults` now encodes OpOperand/OpResult, buffer relation and a degree of certainty. E.g.:
```
// aliasingOpOperands(%r) = {(%t, EQUIV, DEFINITE)}
// aliasingOpResults(%t) = {(%r, EQUIV, DEFINITE)}
%r = tensor.insert %f into %t[%idx] : tensor<?xf32>

// aliasingOpOperands(%r) = {(%t0, EQUIV, MAYBE), (%t1, EQUIV, MAYBE)}
// aliasingOpResults(%t0) = {(%r, EQUIV, MAYBE)}
// aliasingOpResults(%t1) = {(%r, EQUIV, MAYBE)}
%r = arith.select %c, %t0, %t1 : tensor<?xf32>
```

`BufferizableOpInterface::bufferRelation` is removed, as it is now part of `getAliasingOpOperands`/`getAliasingOpResults`.

This change allows for better analysis, in particular wrt. equivalence. This allows additional optimizations and better error checking (which is sometimes overly conservative). Examples:

* EmptyTensorElimination can eliminate `tensor.empty` inside `scf.if` blocks. This requires a modeling of equivalence: It is not a per-OpResult property anymore. Instead, it can be specified for each OpOperand and OpResult. This is important because `tensor.empty` may be eliminated only if all values on the SSA use-def chain to the final consumer (`tensor.insert_slice`) are equivalent.
* The detection of "returning allocs from a block" can be improved. (Addresses a TODO in `assertNoAllocsReturned`.) This allows us to bufferize IR such as "yielding a `tensor.extract_slice` result from an `scf.if` branch", which currently fails to bufferize because the alloc detection is too conservative.
* Better bufferization of loops. Aliases of the iter_arg can be yielded (even if they are not equivalent) without having to realloc and copy the entire buffer on each iteration.

The above-mentioned examples are not yet implemented with this change. This change just improves the BufferizableOpInterface, its implementations and related helper functions, so that better aliasing information is available for each op.

Differential Revision: https://reviews.llvm.org/D142129
2023-02-09 11:35:03 +01:00
Matthias Springer
1ac248e485 [mlir][bufferization][NFC] Rename getAliasingOpOperand/getAliasingOpResult
* `getAliasingOpOperand` => `getAliasingOpOperands`
* `getAliasingOpResult` => `getAliasingOpResults`

Also a few minor code cleanups and better documentation.

Differential Revision: https://reviews.llvm.org/D142979
2023-02-01 10:07:41 +01:00
Matthias Springer
148432ea84 [mlir][bufferization][NFC] Rename BufferRelation::None to BufferRelation::Unknown
The previous name was incorrect. `None` does not mean that there is no buffer relation between two buffers (seems to imply that they do not alias for sure); instead it means that there is no further information available.

Differential Revision: https://reviews.llvm.org/D142870
2023-01-30 11:09:28 +01:00
Matthias Springer
34d65e81e8 [mlir][bufferization] Generalize and rename isMemoryWrite
The name of the method was confusing. It is bufferizesToMemoryWrite, but from the perspective of OpResults.

`bufferizesToMemoryWrite(OpResult)` now supports ops with regions that do not have aliasing OpOperands (such as `scf.if`). These ops no longer need to implement `isMemoryWrite`.

Differential Revision: https://reviews.llvm.org/D141684
2023-01-30 09:34:04 +01:00
Ingo Müller
b0e4d389d3 [mlir][scf] Fix typo in comment in BufferizableOpInterfaceImpl.cpp (NFC).
Reviewed By: ingomueller-net

Differential Revision: https://reviews.llvm.org/D142554
2023-01-25 19:06:06 +00:00
Ramkumar Ramachandra
22426110c5 mlir/tblgen: use std::optional in generation
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.

A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.

See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716

Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>

Differential Revision: https://reviews.llvm.org/D138934
2022-12-17 11:13:26 +01:00
Lei Zhang
9bb633741a [mlir][bufferization] Support general Attribute as memory space
MemRef has been accepting a general Attribute as memory space for
a long time. This commits updates bufferization side to catch up,
which allows downstream users to plugin customized symbolic memory
space. This also eliminates quite a few `getMemorySpaceAsInt`
calls, which is deprecated.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D138330
2022-11-21 09:40:50 -05:00
Guray Ozen
6663f34704 [mlir] Introduce device mapper attribute for thread_dim_map and mapped to dims
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```

It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.

The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```

Reviewed By: ftynse, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D137413
2022-11-11 08:44:57 +01:00
Matthias Springer
c9b3638126 [mlir][scf][bufferize] Fix bufferizesToMemoryRead with 0 loop iterations
There was a bug in scf.for loop bufferization that could lead to a missing buffer copy (alloc was there, but not the copy).

Differential Revision: https://reviews.llvm.org/D135053
2022-10-24 14:34:41 +02:00
Mehdi Amini
23f989a2e3 Apply clang-tidy fixes for readability-simplify-boolean-expr in BufferizableOpInterfaceImpl.cpp (NFC) 2022-10-12 01:16:36 +00:00
Johannes Reifferscheid
eaf20c4fc2 [mlir] Fix a cast that should be a dyn_cast.
This fixes a crash for certain IR, see the new test case for an
example.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D134424
2022-09-22 13:13:21 +02:00
Johannes Reifferscheid
d1536ee48c Fix clang-format. 2022-09-08 11:05:12 +02:00
Johannes Reifferscheid
6247988e07 One-shot-bufferize: fix for inconsistent while arg types in before/after.
Currently, if the `before` and `after` regions of a while op have
tensor args in different indices, this leads to a crash.

This moves the pass-through check for args to the handling of the
condition block, since that is where the results are produced, so
it's also where copies must be made.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D133477
2022-09-08 10:24:11 +02:00
Johannes Reifferscheid
fb9fc79809 One-shot-bufferize: allow non-tensor arguments in scg.while/for.
Currently, one-shot-bufferize crashes as soon as there's
a mixture of tensor and non-tensor arguments. This seems
to happen for no good reason.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D133419
2022-09-07 15:54:25 +02:00
Matthias Springer
4cd7362083 [mlir][SCF] foreach_thread: Capture shared output tensors explicitly
This change refines the semantics of scf.foreach_thread. Tensors that are inserted into in the terminator must now be passed to the region explicitly via `shared_outs`. Inside of the body of the op, those tensors are then accessed via block arguments.

The body of a scf.foreach_thread is now treated as a repetitive region. I.e., op dominance can no longer be used in conflict detection when using a value that is defined outside of the body. Such uses may now be considered as conflicts (if there is at least one read and one write in the body), effectively privatizing the tensor. Shared outputs are not privatized when they are used via their corresponding block arguments.

As part of this change, it was also necessary to update the "tiling to scf.foreach_thread", such that the generated tensor.extract_slice ops use the scf.foreach_thread's block arguments. This is implemented by cloning the TilingInterface op inside the scf.foreach_thread, rewriting all of its outputs with block arguments and then calling the tiling implementation. Afterwards, the cloned op is deleted again.

Differential Revision: https://reviews.llvm.org/D133114
2022-09-02 14:54:04 +02:00
Matthias Springer
86974e32a4 [mlir][SCF][bufferize] Support different iter_arg/init_arg types (scf.while)
This change implements the same functionality as D132860, but for scf.while.

Differential Revision: https://reviews.llvm.org/D132927
2022-08-30 16:58:21 +02:00
Matthias Springer
9d6096c56f [mlir][SCF][bufferize][NFC] Move scf.if buffer type computation to getBufferType
A part of the functionality of `bufferize` is extracted into `getBufferType`. Also, bufferized scf.yields inside scf.if are now created with the correct bufferized type from the get-to.

Differential Revision: https://reviews.llvm.org/D132862
2022-08-30 16:48:10 +02:00
Matthias Springer
123c4b0251 [mlir][SCF][bufferize] Support different iter_arg/init_arg types (scf.for)
Even though iter_arg and init_arg of an scf.for loop may have the same tensor type, their bufferized memref types are not necessarily equal. It is sometimes necessary to insert a cast in case of differing layout maps.

Differential Revision: https://reviews.llvm.org/D132860
2022-08-30 16:35:32 +02:00
Matthias Springer
111c919665 [mlir][bufferization] Generalize getBufferType
This change generalizes getBufferType. This function can be used to predict the buffer type of any tensor value (not just BlockArguments) without changing any IR. It also subsumes getMemorySpace. This is useful for loop bufferization, where the precise buffer type of an iter_arg cannot be known without examining the loop body.

Differential Revision: https://reviews.llvm.org/D132859
2022-08-30 16:26:44 +02:00
Kazu Hirata
10bcfeebfa [mlir] Remove unused using (NFC)
Identified with misc-unused-using-decls.
2022-07-17 18:08:48 -07:00
Nicolas Vasilache
7fbf55c927 [mlir][Tensor] Move ParallelInsertSlice to the tensor dialect
This is moslty NFC and will allow tensor.parallel_insert_slice to gain
rank-reducing semantics by reusing the vast majority of the tensor.insert_slice impl.

Depends on D128857

Differential Revision: https://reviews.llvm.org/D128920
2022-07-04 01:53:12 -07:00
Nicolas Vasilache
b994d388ae [mlir][SCF] Add a ParallelCombiningOpInterface to decouple scf::PerformConcurrently from its contained operations
This allows purging references of scf.ForeachThreadOp and scf.PerformConcurrentlyOp from
ParallelInsertSliceOp.
This will allowmoving the op closer to tensor::InsertSliceOp with which it should share much more
code.

In the future, the decoupling will also allow extending the type of ops that can be used in the
parallel combinator as well as semantics related to multiple concurrent inserts to the same
result.

Differential Revision: https://reviews.llvm.org/D128857
2022-07-01 00:16:02 -07:00
Matthias Springer
76f7e4b7a3 [mlir][SCF][bufferize][NFC] Utilize recently added helper function
This should have been part of D128666.

Differential Revision: https://reviews.llvm.org/D128885
2022-06-30 09:54:52 +02:00
Matthias Springer
04dac2ca7c [mlir][SCF][bufferize][NFC] Implement resolveConflicts for ParallelInsertSliceOp
This was previous implemented as part of the BufferizableOpInterface of ForEachThreadOp. Moving the implementation to ParallelInsertSliceOp to be consistent with the remaining ops and to have a nice example op that can serve as a blueprint for other ops.

Differential Revision: https://reviews.llvm.org/D128666
2022-06-28 12:18:22 +02:00
Matthias Springer
f164814f2f [mlir][SCF][bufferize] Small simplification and more comments
Differential Revision: https://reviews.llvm.org/D128651
2022-06-27 17:04:29 +02:00
Matthias Springer
c0b0b6a00a [mlir][bufferize] Infer memory space in all bufferization patterns
This change updates all remaining bufferization patterns (except for scf.while) and the remaining bufferization infrastructure to infer the memory space whenever possible instead of falling back to "0". (If a default memory space is set in the bufferization options, we still fall back to that value if the memory space could not be inferred.)

Differential Revision: https://reviews.llvm.org/D128423
2022-06-27 16:32:52 +02:00
Matthias Springer
45b995cda4 [mlir][bufferize][NFC] Change signature of allocateTensorForShapedValue
Add a failure return value and bufferization options argument. This is to keep a subsequent change smaller.

Differential Revision: https://reviews.llvm.org/D128278
2022-06-27 16:00:06 +02:00
Nicolas Vasilache
a0f843fdaf [SCF] Add thread_dim_mapping attribute to scf.foreach_thread
An optional thread_dim_mapping index array attribute specifies for each
virtual thread dimension, how it remaps 1-1 to a set of concrete processing
element resources (e.g. a CUDA grid dimension or a level of concrete nested
async parallelism). At this time, the specification is backend-dependent and
is not verified by the op, beyond being an index array attribute.
It is the reponsibility of the lowering to interpret the index array in the
context of the concrete target the op is lowered to, or to ignore it when
the specification is ill-formed or unsupported for a particular target.

Differential Revision: https://reviews.llvm.org/D128633
2022-06-27 04:58:36 -07:00
Matthias Springer
5d50f51c97 [mlir][bufferization][NFC] Add error handling to getBuffer
This is in preparation of adding memory space support.

Differential Revision: https://reviews.llvm.org/D128277
2022-06-27 13:48:01 +02:00