Commit Graph

439 Commits

Author SHA1 Message Date
Matthias Springer
217700baf7 [mlir][bufferization] Support bufferization of external functions (#113999)
This commit adds support for bufferizing external functions that have no
body. Such functions were previously rejected by One-Shot Bufferize if
they returned a tensor value.

This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize external functions.

Also update a few comments.
2024-10-30 21:49:10 +09:00
Andrzej Warzyński
91c11574e8 Revert "[MLIR] Make OneShotModuleBufferize use OpInterface (#110322)" (#113124)
This reverts commit 2026501cf1.

Failing bot:
  * https://lab.llvm.org/staging/#/builders/125/builds/389
2024-10-22 13:28:44 +01:00
Simon Camphausen
70334081f7 [mlir][bufferization] Expose buffer alignment as a pass option in one-shot-bufferize (#112505) 2024-10-16 11:49:49 +02:00
Matthias Springer
206fad0e21 [mlir][NFC] Mark type converter in populate... functions as const (#111250)
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.

Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.

Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
2024-10-05 21:32:40 +02:00
Tzung-Han Juang
2026501cf1 [MLIR] Make OneShotModuleBufferize use OpInterface (#110322)
**Description:** 
This PR replaces a part of `FuncOp` and `CallOp` with
`FunctionOpInterface` and `CallOpInterface` in `OneShotModuleBufferize`.
Also fix the error from an integration test in the a previous PR
attempt. (https://github.com/llvm/llvm-project/pull/107295)

The below fixes skip `CallOpInterface` so that the assertions are not
triggered.


8d78000762/mlir/lib/Dialect/Bufferization/Transforms/OneShotModuleBufferize.cpp (L254-L259)


8d78000762/mlir/lib/Dialect/Bufferization/Transforms/OneShotModuleBufferize.cpp (L311-L315)

**Related Discord Discussion:**
[Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900)

---------

Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
2024-10-01 15:58:52 +02:00
Matthias Springer
49df12c01e [mlir][NFC] Minor cleanup around ModuleOp usage (#110498)
Use `moduleOp.getBody()` instead of `moduleOp.getBodyRegion().front()`.
2024-09-30 21:20:48 +02:00
Andrzej Warzyński
bfde17834d [mlir] Update the return type of getNum{Dynamic|Scalable}Dims (#110472)
Updates the return type of `getNumDynamicDims` and `getNumScalableDims`
from `int64_t` to `size_t`. This is for consistency with other
helpers/methods that return "size" and to reduce the number of
`static_cast`s in various places.
2024-09-30 14:53:50 +01:00
Matthias Springer
0259f92711 [mlir][memref] Add builder that infers reinterpret_cast result type (#109432)
Add a convenience builder that infers the result type of
`memref.reinterpret_cast`.

Note: It is not possible to remove the result type from all builder
overloads because this op currently also allows certain
operand/attribute + result type combinations that do not match. The op
verifier should probably be made stricter, but that's a larger change
that requires additional `memref.cast` ops in some places that build
`reinterpret_cast` ops.
2024-09-25 09:33:15 +02:00
Matthias Springer
ae7b454f98 Revert "[MLIR] Make OneShotModuleBufferize use OpInterface" (#109919)
Reverts llvm/llvm-project#107295

This commit breaks an integration test:
```
build/bin/mlir-opt mlir/test/Integration/Dialect/Complex/CPU/correctness.mlir  -one-shot-bufferize="bufferize-function-boundaries"
```
2024-09-25 09:17:49 +02:00
Tzung-Han Juang
f586b1e3f4 [MLIR] Make OneShotModuleBufferize use OpInterface (#107295)
**Description:** 

`OneShotModuleBufferize` deals with the bufferization of `FuncOp`,
`CallOp` and `ReturnOp` but they are hard-coded. Any custom
function-like operations will not be handled. The PR replaces a part of
`FuncOp` and `CallOp` with `FunctionOpInterface` and `CallOpInterface`
in `OneShotModuleBufferize` so that custom function ops and call ops can
be bufferized.

**Related Discord Discussion:**
[Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900)

---------

Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
2024-09-25 07:27:21 +02:00
Kazu Hirata
8d8bedef0d [Bufferization] Avoid repeated hash lookups (NFC) (#108925) 2024-09-17 00:18:23 -07:00
JOE1994
884221eddb [mlir] Tidy uses of llvm::raw_stream_ostream (NFC)
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly

( 65b13610a5 for further reference )

* Don't call raw_string_ostream::flush(), which is essentially a no-op.
* Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
2024-09-16 23:23:25 -04:00
Kazu Hirata
7be6ea1244 [Dialect] Avoid repeated hash lookups (NFC) (#108137) 2024-09-11 06:39:30 -07:00
Henrich Lauko
d1cad2290c Reland [MLIR] Make resolveCallable customizable in CallOpInterface (#107989)
Relands #100361 with fixed dependencies.
2024-09-10 15:33:13 +02:00
Matthias Springer
7574042e2a Revert "[MLIR] Make resolveCallable customizable in CallOpInterface" (#107984)
Reverts llvm/llvm-project#100361

This commit caused some linker errors. (Missing `MLIRCallInterfaces`
dependency.)
2024-09-10 10:24:05 +02:00
Henrich Lauko
958f59d90f [MLIR] Make resolveCallable customizable in CallOpInterface (#100361)
Allow customization of the `resolveCallable` method in the
`CallOpInterface`. This change allows for operations implementing this
interface to provide their own logic for resolving callables.

- Introduce the `resolveCallable` method, which does not include the
optional symbol table parameter. This method replaces the previously
existing extra class declaration `resolveCallable`.

- Introduce the `resolveCallableInTable` method, which incorporates the
symbol table parameter. This method replaces the previous extra class
declaration `resolveCallable` that used the optional symbol table
parameter.
2024-09-10 10:08:41 +02:00
Menooker
26645ae2ee [mlir][memref] Fix hoist-static-allocs option of buffer-results-to-out-params when function parameters are returned (#102093)
buffer-results-to-out-params pass will have a nullptr-referencing error
when hoist-static-allocs option is on, when the return value of a
function is a parameter of the function. This PR fixes this issue.
2024-09-04 20:36:19 +08:00
Longsheng Mou
7f04a8ad13 [mlir][func][bufferization] Fix cast incompatible when bufferize callOp (#105929)
Handle caller/callee type mismatch using `castOrReallocMemRefValue`
instead of just a `CastOp`. The method insert a reallocation + copy if
it cannot be statically guaranteed that a direct cast would be valid.
Fix #105916.
2024-08-27 07:06:00 +08:00
Dennis Filimonov
6de04e6fe8 [mlir][bufferization] Adding the optimize-allocation-liveness pass (#101827)
Adding a pass that is expected to run after the deallocation pipeline
and will move buffer deallocations right after their last user or
dependency, thus optimizing the allocation liveness.
2024-08-14 13:22:47 +02:00
Giuseppe Rossini
441b672bbd [mlir] Fix block merging (#102038)
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.

What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.

**Note-1**: I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
**Note-2**: I fixed a bug found by @Dinistro in #97697 . The issue was
that, when looking for redundant arguments, I was not considering that
the block might have already some arguments. So the index (in the block
args list) of the i-th `newArgument` is `i+numOfOldArguments`.
2024-08-07 09:10:01 +01:00
Nikhil Kalra
84cc1865ef [mlir] Support DialectRegistry extension comparison (#101119)
`PassManager::run` loads the dependent dialects for each pass into the
current context prior to invoking the individual passes. If the
dependent dialect is already loaded into the context, this should be a
no-op. However, if there are extensions registered in the
`DialectRegistry`, the dependent dialects are unconditionally registered
into the context.

This poses a problem for dynamic pass pipelines, however, because they
will likely be executing while the context is in an immutable state
(because of the parent pass pipeline being run).

To solve this, we'll update the extension registration API on
`DialectRegistry` to require a type ID for each extension that is
registered. Then, instead of unconditionally registered dialects into a
context if extensions are present, we'll check against the extension
type IDs already present in the context's internal `DialectRegistry`.
The context will only be marked as dirty if there are net-new extension
types present in the `DialectRegistry` populated by
`PassManager::getDependentDialects`.

Note: this PR removes the `addExtension` overload that utilizes
`std::function` as the parameter. This is because `std::function` is
copyable and potentially allocates memory for the contained function so
we can't use the function pointer as the unique type ID for the
extension.

Downstream changes required:
- Existing `DialectExtension` subclasses will need a type ID to be
registered for each subclass. More details on how to register a type ID
can be found here:
8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)
- Existing uses of the `std::function` overload of `addExtension` will
need to be refactored into dedicated `DialectExtension` classes with
associated type IDs. The attached `std::function` can either be inlined
into or called directly from `DialectExtension::apply`.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-06 01:32:36 +02:00
Longsheng Mou
6867324eee [mlir][bufferization] Improve performance of DropEquivalentBufferResultsPass (#101281)
By using DenseMap to minimize the traveral time of callOps, and the
efficiency of running this pass has been greatly improved.
2024-08-02 09:22:20 +08:00
Christian Ulmann
6a5a64c56b Revert "[mlir] Fix block merging" (#100510)
Reverts llvm/llvm-project#97697

This commit introduced non-trivial bugs related to type consistency.
2024-07-25 10:42:25 +02:00
Giuseppe Rossini
c63125d453 [mlir] Fix block merging (#97697)
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.

What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.

**Note**: this a rework of #96871 . I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
2024-07-17 17:05:40 +01:00
donald chen
662c6fc74c [mlir] [bufferize] fix bufferize deallocation error in nest symbol table (#98476)
In nested symbols, the dealloc_helper function generated by lower
deallocations pass was incorrectly positioned, causing calls fail. This
patch fixes this issue.
2024-07-15 12:52:46 +08:00
Nikhil Kalra
0ad6ac8c53 [NFC][MLIR] Fix: alloca promotion for AllocationOpInterface (#97672)
The std::optional returned by buildPromotedAlloc was directly
dereferenced and assumed to be non-null, even though the documentation
for AllocationOpInterface indicates that std::nullopt is a legal value
if buffer stack promotion is not supported (and is the default value
supplied by the TableGen interface file). This patch removes the direct
dereference so that the optional can be null-checked prior to use.

Co-authored-by: Nikhil Kalra <nkalra@apple.com>
2024-07-04 08:49:33 +02:00
Mehdi Amini
28a11cc492 Revert "Fix block merging" (#97460)
Reverts llvm/llvm-project#96871

Bots are broken.
2024-07-02 20:57:16 +02:00
Giuseppe Rossini
6c3897d90e Fix block merging (#96871)
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.

What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.

**Note**: many tests are still not passing. But I wanted to submit the
code before changing all the tests (and probably adding a couple), so
that we can agree in principle on the algorithm/design.
2024-07-02 17:12:33 +01:00
Ramkumar Ramachandra
db791b278a mlir/LogicalResult: move into llvm (#97309)
This patch is part of a project to move the Presburger library into
LLVM.
2024-07-02 10:42:33 +01:00
Matthias Springer
cf9b77a636 [mlir][bufferization] Fix bug in bufferization of elementwise ops (#97209)
There is an optimization in One-Shot Bufferize wrt. ops that bufferize
to elementwise access. A copy can sometimes be avoided. E.g.:
```
%0 = tensor.empty()
%1 = tensor.fill ...
%2 = linalg.map ins(%1, ...) outs(%1)
```

In the above example, a buffer copy is not needed for %1, even though
the same buffer is read/written by two different operands (of the same
op). That's because the op bufferizes to elementwise access.

```c++
// Two equivalent operands of the same op are not conflicting if the op
// bufferizes to element-wise access. I.e., all loads at a position
// happen before all stores to the same position.
```

This optimization cannot be applied when op dominance cannot be used to
rule out conflicts. E.g., when the `linalg.map` is inside of a loop. In
such a case, the reads/writes happen multiple times and it is not
guaranteed that "all loads at a position happen before all stores to the
same position."

Fixes #90019.
2024-07-01 19:00:21 +02:00
zhicong zhong
1d4ce574a4 [mlir][bufferization] skip empty tensor elimination if they have different element type (#96998)
In the origin implementation, the empty tensor elimination will add a
`tensor.cast` and eliminate the tensor even if they have different
element type(f32, bf16). Here add a check for element type and skip the
elimination if they are different.
2024-07-01 09:30:04 +08:00
McCowan Zhang
a159b36724 Bufferization with ControlFlow Asserts (#95868)
Fixed incorrect bufferization interaction with cf.assert
- reordered bufferization condition checking
- fixed hasNeitherAllocateNorFreeSideEffect checking bug
- implemented memory interface for cf.assert

---------

Co-authored-by: McCowan Zhang <mccowan.z@ssi.samsung.com>
2024-06-26 08:00:39 +02:00
donald chen
2c1ae801e1 [mlir][side effect] refactor(*): Include more precise side effects (#94213)
This patch adds more precise side effects to the current ops with memory
effects, allowing us to determine which OpOperand/OpResult/BlockArgument
the
operation reads or writes, rather than just recording the reading and
writing
of values. This allows for convenient use of precise side effects to
achieve
analysis and optimization.

Related discussions:
https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
2024-06-19 22:10:34 +08:00
Max191
d586372194 [mlir] Add bufferization option for parallel region check (#94645)
Handling parallel region RaW conflicts should usually be the
responsibility of the source program, rather than bufferization
analysis. However, to preserve current functionality, checks on parallel
regions is put behind a bufferization in this PR, which is on by
default. Default functionality will not change, but this PR enables the
option to leave parallelism checks out of the bufferization analysis.
2024-06-11 10:31:06 -04:00
Matthias Springer
13896b6ce9 [mlir][bufferization] Fix handling of indirect function calls (#94896)
This commit fixes a crash in the ownership-based buffer deallocation
pass when indirectly calling a function via SSA value. Such functions
must be conservatively assumed to be public.

Fixes #94780.
2024-06-10 08:07:24 +02:00
Matthias Springer
9d4b20a44e [mlir][bufferization] Allow mixed static/dynamic shapes in materialize_in_destination op (#92681)
This commit relaxes the verifier of
`bufferization.materialize_in_destination` such that mixed
static/dynamic dimensions are allowed for the source and destination
operands. E.g., `tensor<5xf32>` and `tensor<?xf32>` are now compatible,
but it is assumed that the dynamic dimension is `5` at runtime.

This commit fixes #91265.
2024-06-01 12:04:56 +02:00
Kunwar Grover
debdbeda15 [mlir] Remove dialect specific bufferization passes (Reland) (#93535)
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.

Relands https://github.com/llvm/llvm-project/pull/93488 which failed on
buildbot. Fixes the failure by updating integration tests to use
one-shot-bufferize instead.
2024-05-28 20:04:27 +01:00
Kunwar Grover
39848d0a98 Revert "[mlir] Remove dialect specific bufferization passes" (#93528)
Reverts llvm/llvm-project#93488

Buildbot failure:
https://lab.llvm.org/buildbot/#/builders/220/builds/39911
2024-05-28 11:21:34 +01:00
Kunwar Grover
2fc5106437 [mlir] Remove dialect specific bufferization passes (#93488)
These passes have been depreciated for a long time and replaced by
one-shot bufferization. These passes are also unsafe because they do not
check for read-after-write conflicts.
2024-05-28 11:12:58 +01:00
Jie Fu
1c8c2fdd28 [mlir] Fix -Wdeprecated-declarations in BufferResultsToOutParams.cpp (NFC)
/llvm-project/mlir/lib/Dialect/Bufferization/Transforms/BufferResultsToOutParams.cpp:124:26:
error: 'cast' is deprecated: Use mlir::cast<U>() instead [-Werror,-Wdeprecated-declarations]
  124 |           orig.getType().cast<MemRefType>().hasStaticShape()) {
      |
2024-05-08 10:38:34 +08:00
Menooker
0af448b711 [MLIR][Bufferization] BufferResultsToOutParams: Add an option to eliminate AllocOp and avoid Copy (#90011)
Add an option hoist-static-allocs to remove the unnecessary memref.alloc
and memref.copy after this pass, when the memref in ReturnOp is
allocated by memref.alloc and is statically shaped. Instead, it replaces
the uses of the allocated memref with the memref in the out argument.
By default, BufferResultsToOutParams will result in a memcpy operation
to copy the originally returned memref to the output argument memref.
This is inefficient when the source of memcpy (the returned memref in
the original ReturnOp) is from a local AllocOp. The pass can use the
output argument memref to replace the locally allocated memref for
better performance.hoist-static-allocs avoids dynamic allocation and
memory movement.
This option will be critical for performance-sensivtive applications,
which require BufferResultsToOutParams pass for a caller-owned output
buffer calling convension.
2024-05-08 10:14:52 +08:00
Rafael Ubal
a42a2ca19b Avoid buffer hoisting from parallel loops (#90735)
This change corrects an invalid behavior in pass
`--buffer-loop-hoisting`. The pass is in charge of extracting buffer
allocations (e.g., `memref.alloca`) from loop regions (e.g., `scf.for`)
when possible. This works OK for looks with sequential execution
semantics. However, a buffer allocated in the body of a parallel loop
may be concurrently accessed by multiple thread to store its local data.
Extracting such buffer from the loop causes all threads to wrongly share
the same memory region.

In the following example, dimension 1 of the input tensor is reversed.
Dimension 0 is traversed with a parallel loop.

```
func.func @f(%input: memref<2x3xf32>) -> memref<2x3xf32> {
  %c0 = index.constant 0
  %c1 = index.constant 1
  %c2 = index.constant 2
  %c3 = index.constant 3

  %output = memref.alloc() : memref<2x3xf32>
  scf.parallel (%index) = (%c0) to (%c2) step (%c1) {
    // Create subviews for working input and output slices
    %input_slice = memref.subview %input[%index, 2][1, 3][1, -1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, -1], offset: ?>>
    %output_slice = memref.subview %output[%index, 0][1, 3][1, 1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>

    // Copy the input slice into this temporary buffer. This intermediate
    // copy is unnecessary, but is used for illustration purposes.
    %temp = memref.alloc() : memref<1x3xf32>
    memref.copy %input_slice, %temp : memref<1x3xf32, strided<[3, -1], offset: ?>> to memref<1x3xf32>

    // Copy temporary buffer into output slice
    memref.copy %temp, %output_slice : memref<1x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>
    scf.reduce
  }

  return %output : memref<2x3xf32>
}
```

The patch submitted here prevents `%temp = memref.alloc() :
memref<1x3xf32>` from being hoisted when the containing op is
`scf.parallel` or `scf.forall`. A new op trait called
`HasParallelRegion` is introduced and assigned to these two ops to
indicate that their regions have parallel execution semantics.

@joker-eph @ftynse @nicolasvasilache @sabauma
2024-05-04 08:35:36 +02:00
Matthias Springer
179e174945 [mlir][bufferization][NFC] More documentation for runOneShotBufferize (#90445) 2024-04-29 13:23:37 +02:00
Christian Sigg
a5757c5b65 Switch member calls to isa/dyn_cast/cast/... to free function calls. (#89356)
This change cleans up call sites. Next step is to mark the member
functions deprecated.

See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-19 15:58:27 +02:00
Matthias Gehre
c515c78024 [mlir][Bufferization] castOrReallocMemRefValue: Use BufferizationOptions (#89175)
This allows to configure both the op used for allocation and copy of
memrefs.
It also changes the default behavior because the default allocation in
`BufferizationOptions` creates `memref.alloc` with `alignment = 64`
where we used to create `memref.alloca` without any alignment before.
Fixes
```
// TODO: Use alloc/memcpy callback from BufferizationOptions if called via
// BufferizableOpInterface impl of ToMemrefOp.
```
2024-04-18 15:47:08 +02:00
Kunwar Grover
6f1e23b47d [MLIR][Bufferization] Choose default memory space in tensor copy insertion (#88500)
Tensor copy insertion currently uses memory_space = 0 when creating a
tensor copy using alloc_tensor. This memory space should instead be the
default memory space provided in bufferization options.
2024-04-12 17:56:46 +02:00
Jakub Kuderski
971b852546 [mlir][NFC] Simplify type checks with isa predicates (#87183)
For more context on isa predicates, see:
https://github.com/llvm/llvm-project/pull/83753.
2024-04-01 11:40:09 -04:00
Jianbang Yang
4bb9f918ff [mlir][tensor] fix out-of-bound index in tensor.dim (#85901)
fix a crash when fold tensor.dim with out-of-bound index.

Fixes: https://github.com/llvm/llvm-project/issues/70183
2024-03-25 21:08:18 +08:00
Matthias Springer
dbfc38ed6b [mlir][bufferization] Add BufferOriginAnalysis (#86461)
This commit adds the `BufferOriginAnalysis`, which can be queried to
check if two buffer SSA values originate from the same allocation. This
new analysis is used in the buffer deallocation pass to fold away or
simplify `bufferization.dealloc` ops more aggressively.

The `BufferOriginAnalysis` is based on the `BufferViewFlowAnalysis`,
which collects buffer SSA value "same buffer" dependencies. E.g., given
IR such as:
```
%0 = memref.alloc()
%1 = memref.subview %0
%2 = memref.subview %1
```
The `BufferViewFlowAnalysis` will report the following "reverse"
dependencies (`resolveReverse`) for `%2`: {`%2`, `%1`, `%0`}. I.e., all
buffer SSA values in the reverse use-def chain that originate from the
same allocation as `%2`. The `BufferOriginAnalysis` is built on top of
that. It handles only simple cases at the moment and may conservatively
return "unknown" around certain IR with branches, memref globals and
function arguments.

This analysis enables additional simplifications during
`-buffer-deallocation-simplification`. In particular, "regular" scf.for
loop nests, that yield buffers (or reallocations thereof) in the same
order as they appear in the iter_args, are now handled much more
efficiently. Such IR patterns are generated by the sparse compiler.
2024-03-25 18:57:53 +09:00
Matthias Springer
a45e58af1b [mlir][bufferization] Add BufferViewFlowOpInterface (#78718)
This commit adds the `BufferViewFlowOpInterface` to the bufferization
dialect. This interface can be implemented by ops that operate on
buffers to indicate that a buffer op result and/or region entry block
argument may be the same buffer as a buffer operand (or a view thereof).
This interface is queried by the `BufferViewFlowAnalysis`.

The new interface has two interface methods:
* `populateDependencies`: Implementations use the provided callback to
declare dependencies between operands and op results/region entry block
arguments. E.g., for `%r = arith.select %c, %m1, %m2 : memref<5xf32>`,
the interface implementation should declare two dependencies: %m1 -> %r
and %m2 -> %r.
* `mayBeTerminalBuffer`: An SSA value is a terminal buffer if the buffer
view flow analysis stops at the specified value. E.g., because the value
is a newly allocated buffer or because no further information is
available about the origin of the buffer.

Ops that implement the `RegionBranchOpInterface` or `BranchOpInterface`
do not have to implement the `BufferViewFlowOpInterface`. The buffer
dependencies can be inferred from those two interfaces.

This commit makes the `BufferViewFlowAnalysis` more accurate. For
unknown ops, it conservatively used to declare all combinations of
operands and op results/region entry block arguments as dependencies
(false positives). This is no longer the case. While the analysis is
still a "maybe" analysis with false positives (e.g., when analyzing ops
such as `arith.select` or `scf.if` where the taken branch is not known
at compile time), results and region entry block arguments of unknown
ops are now marked as terminal buffers.

This commit addresses a TODO in `BufferViewFlowAnalysis.cpp`:
```
// TODO: We should have an op interface instead of a hard-coded list of
// interfaces/ops.
```
It is no longer needed to hard-code ops.
2024-03-24 12:48:19 +09:00