Commit Graph

195 Commits

Author SHA1 Message Date
Jay Foad
e87f94a6a8 [llvm-project] Fix typos mutli and mutliple. NFC. (#122880) 2025-01-14 11:59:41 +00:00
Amir Bishara
d9111f19d2 [mlir][bufferization]-Refactor findValueInReverseUseDefChain to accept opOperand (#121304)
Edit the `findValueInReverseUseDefChain` method to accept `OpOperand`
instead of the `Value` type, This change will make sure that the
populated `visitedOpOperands` argument is fully accurate and contains
the opOperand we have started the reverse chain from.
2024-12-30 21:18:38 +02:00
Amir Bishara
08aa956387 [mlir][bufferization]-Replace only one use in TensorEmptyElimination (#118958)
In many cases the emptyTensorElimination can not transform or eliminate
the empty tensor which is being inserted into the
`SubsetInsertionOpInterface`.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point for the
`subsetExtract` which is about to replace the empty tensor. However, we
may try to handle this issue by moving the needed values which
responsible on building the `subsetExtract` nearby the empty tensor
(which is about to be eliminated). Thus increasing the probability to
find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses
all at once in one apply, rather than replacing only the specific use
which was visited in the use-def chain (when traversing from the
tensor.insert_slice). This scenario of replacing all the uses of the
tensor.empty may lead into additional read effects after bufferization
of the specific subset extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization which
can not be canonicalized.

The first case can be noticed when having a `tensor.empty` followed by
`SubsetInsertionOpInterface` (or in simple words `tensor.insert_slice`),
which have been lowered from `tensor/tosa.concat`.

The second case can be noticed when having a `tensor.empty`, with many
uses and leading to applying the transformation only once, since the
whole uses have been replaced at once.

The first commit in the PR only adds the lit tests for the cases shown
above (NFC), to emphasize how the transform works, in the coming MRs
will upload a slight changes to handle these case.

The second commit in this PR, we want to replace only the specific use
which was visited in the `use-def` chain (when traversing from the
`tensor.insert_slice`'s source).
2024-12-18 23:57:13 +02:00
Christopher Bate
ced2fc7819 [mlir][bufferization] Fix OneShotBufferize when defaultMemorySpaceFn is used (#91524)
As described in issue llvm/llvm-project#91518, a previous PR
llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.

However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under

`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.

Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:

- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
  operations to explicitly include operand and destination types,
  whereas previously they relied on type inference to deduce the
  tensor types. Since the type inference cannot recover the correct
  tensor encoding/memory space, the operand and result types must be
  explicitly included. This is a small assembly format change, but it
  touches a large number of test files.

- Makes minor updates to other bufferization functions to handle the
  changes in building the above ops.

- Updates bufferization of `tensor.from_elements` to handle memory
  space.


Integration/upgrade guide:

In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:

```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```

becomes

```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32> 
```
2024-11-26 09:45:57 -07:00
Matthias Springer
cbc7802233 [mlir][bufferization] Remove finalizing-bufferize pass (#114154)
The dialect conversion-based bufferization passes have been migrated to
One-Shot Bufferize about two years ago. To clean up the code base, this
commit removes the `finalizing-bufferize` pass, one of the few remaining
parts of the old infrastructure. Most bufferization passes have already
been removed.

Note for LLVM integration: If you depend on this pass, migrate to
One-Shot Bufferize or copy the pass to your codebase.

Depends on #114152.
2024-11-21 10:51:23 +09:00
Matthias Springer
b0a4e958e8 [mlir][bufferization] Add support for non-unique func.return (#114017)
Multiple `func.return` ops inside of a `func.func` op are now supported
during bufferization. This PR extends the code base in 3 places:

- When inferring function return types, `memref.cast` ops are folded
away only if all `func.return` ops have matching buffer types. (E.g., we
don't fold if two `return` ops have operands with different layout
maps.)
- The alias sets of all `func.return` ops are merged. That's because
aliasing is a "may be" property.
- The equivalence sets of all `func.return` ops are taken only if they
match. If different `func.return` ops have different equivalence sets
for their operands, the equivalence information is dropped. That's
because equivalence is a "must be" property.

This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize functions with multiple
`return` ops.
2024-11-13 08:51:39 +09:00
Matthias Springer
c271ba7f79 [mlir][bufferization] Add support for recursive function calls (#114003)
This commit adds support for recursive function calls to One-Shot
Bufferize.

The analysis does not support recursive function calls. The function
body itself can be analyzed, but we cannot make any assumptions about
the aliasing relation between function result and function arguments.
Similarly, when looking at a `call` op, we do not know whether the
operands will bufferize to a memory read/write. In the absence of such
information, we have to conservatively assume that they do.

This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize recursive functions.
2024-11-05 10:18:35 +09:00
Matthias Springer
217700baf7 [mlir][bufferization] Support bufferization of external functions (#113999)
This commit adds support for bufferizing external functions that have no
body. Such functions were previously rejected by One-Shot Bufferize if
they returned a tensor value.

This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize external functions.

Also update a few comments.
2024-10-30 21:49:10 +09:00
Matthias Springer
ea050ab1a9 [mlir][Transforms][NFC] Dialect conversion: Reformat materialization error message (#114176)
This commit changes the format of the materialization error message.

Previously: `failed to legalize unresolved materialization from ('f64')
to 'f32' that remained live after conversion`
Now: `failed to legalize unresolved materialization from ('f64') to
('f32') that remained live after conversion`

This commit is in preparation of merging the 1:1 and 1:N dialect
conversions. At that point, target materializations may create more than
one SSA value. I am sending this change as a separate PR to keep the
main PR smaller.
2024-10-30 21:36:39 +09:00
Andrzej Warzyński
91c11574e8 Revert "[MLIR] Make OneShotModuleBufferize use OpInterface (#110322)" (#113124)
This reverts commit 2026501cf1.

Failing bot:
  * https://lab.llvm.org/staging/#/builders/125/builds/389
2024-10-22 13:28:44 +01:00
Tzung-Han Juang
2026501cf1 [MLIR] Make OneShotModuleBufferize use OpInterface (#110322)
**Description:** 
This PR replaces a part of `FuncOp` and `CallOp` with
`FunctionOpInterface` and `CallOpInterface` in `OneShotModuleBufferize`.
Also fix the error from an integration test in the a previous PR
attempt. (https://github.com/llvm/llvm-project/pull/107295)

The below fixes skip `CallOpInterface` so that the assertions are not
triggered.


8d78000762/mlir/lib/Dialect/Bufferization/Transforms/OneShotModuleBufferize.cpp (L254-L259)


8d78000762/mlir/lib/Dialect/Bufferization/Transforms/OneShotModuleBufferize.cpp (L311-L315)

**Related Discord Discussion:**
[Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900)

---------

Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
2024-10-01 15:58:52 +02:00
Matthias Springer
ae7b454f98 Revert "[MLIR] Make OneShotModuleBufferize use OpInterface" (#109919)
Reverts llvm/llvm-project#107295

This commit breaks an integration test:
```
build/bin/mlir-opt mlir/test/Integration/Dialect/Complex/CPU/correctness.mlir  -one-shot-bufferize="bufferize-function-boundaries"
```
2024-09-25 09:17:49 +02:00
Tzung-Han Juang
f586b1e3f4 [MLIR] Make OneShotModuleBufferize use OpInterface (#107295)
**Description:** 

`OneShotModuleBufferize` deals with the bufferization of `FuncOp`,
`CallOp` and `ReturnOp` but they are hard-coded. Any custom
function-like operations will not be handled. The PR replaces a part of
`FuncOp` and `CallOp` with `FunctionOpInterface` and `CallOpInterface`
in `OneShotModuleBufferize` so that custom function ops and call ops can
be bufferized.

**Related Discord Discussion:**
[Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900)

---------

Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
2024-09-25 07:27:21 +02:00
Matthias Springer
3815f478bb [mlir][Transforms] Dialect conversion: Make materializations optional (#107109)
This commit makes source/target/argument materializations (via the
`TypeConverter` API) optional.

By default (`ConversionConfig::buildMaterializations = true`), the
dialect conversion infrastructure tries to legalize all unresolved
materializations right after the main transformation process has
succeeded. If at least one unresolved materialization fails to resolve,
the dialect conversion fails. (With an error message such as `failed to
legalize unresolved materialization ...`.) Automatic materializations
through the `TypeConverter` API can now be deactivated. In that case,
every unresolved materialization will show up as a
`builtin.unrealized_conversion_cast` op in the output IR.

There used to be a complex and error-prone analysis in the dialect
conversion that predicted the future uses of unresolved
materializations. Based on that logic, some casts (that were deemed to
unnecessary) were folded. This analysis was needed because folding
happened at a point of time when some IR changes (e.g., op replacements)
had not materialized yet.

This commit removes that analysis. Any folding of cast ops now happens
after all other IR changes have been materialized and the uses can
directly be queried from the IR. This simplifies the analysis
significantly. And certain helper data structures such as
`inverseMapping` are no longer needed for the analysis. The folding
itself is done by `reconcileUnrealizedCasts` (which also exists as a
standalone pass).

After casts have been folded, the remaining casts are materialized
through the `TypeConverter`, as usual. This last step can be deactivated
in the `ConversionConfig`.

`ConversionConfig::buildMaterializations = false` can be used to debug
error messages such as `failed to legalize unresolved materialization
...`. (It is also useful in case automatic materializations are not
needed.) The materializations that failed to resolve can then be seen as
`builtin.unrealized_conversion_cast` ops in the resulting IR. (This is
better than running with `-debug`, because `-debug` shows IR where some
IR changes have not been materialized yet.)

Note: This is a reupload of #104668, but with correct handling of cyclic
unrealized_conversion_casts that may be generated by the dialect
conversion.
2024-09-05 19:40:58 +02:00
Matthias Springer
5eda498811 Revert "[mlir][Transforms] Dialect conversion: Make materializations optional" (#106778)
Reverts llvm/llvm-project#104668

This commit triggers an edge case that can cause circular
`unrealized_conversion_cast` ops.
https://github.com/llvm/llvm-project/pull/106760 may fix it, but it is
has other issues. Reverting this PR for now, until I find a solution for
that problem.
2024-08-30 12:34:41 -07:00
Longsheng Mou
7f04a8ad13 [mlir][func][bufferization] Fix cast incompatible when bufferize callOp (#105929)
Handle caller/callee type mismatch using `castOrReallocMemRefValue`
instead of just a `CastOp`. The method insert a reallocation + copy if
it cannot be statically guaranteed that a direct cast would be valid.
Fix #105916.
2024-08-27 07:06:00 +08:00
Matthias Springer
d7073c5274 [mlir][Transforms] Dialect conversion: Make materializations optional (#104668)
This commit makes source/target/argument materializations (via the
`TypeConverter` API) optional.

By default (`ConversionConfig::buildMaterializations = true`), the
dialect conversion infrastructure tries to legalize all unresolved
materializations right after the main transformation process has
succeeded. If at least one unresolved materialization fails to resolve,
the dialect conversion fails. (With an error message such as `failed to
legalize unresolved materialization ...`.) Automatic materializations
through the `TypeConverter` API can now be deactivated. In that case,
every unresolved materialization will show up as a
`builtin.unrealized_conversion_cast` op in the output IR.

There used to be a complex and error-prone analysis in the dialect
conversion that predicted the future uses of unresolved
materializations. Based on that logic, some casts (that were deemed to
unnecessary) were folded. This analysis was needed because folding
happened at a point of time when some IR changes (e.g., op replacements)
had not materialized yet.

This commit removes that analysis. Any folding of cast ops now happens
after all other IR changes have been materialized and the uses can
directly be queried from the IR. This simplifies the analysis
significantly. And certain helper data structures such as
`inverseMapping` are no longer needed for the analysis. The folding
itself is done by `reconcileUnrealizedCasts` (which also exists as a
standalone pass).

After casts have been folded, the remaining casts are materialized
through the `TypeConverter`, as usual. This last step can be deactivated
in the `ConversionConfig`.

`ConversionConfig::buildMaterializations = false` can be used to debug
error messages such as `failed to legalize unresolved materialization
...`. (It is also useful in case automatic materializations are not
needed.) The materializations that failed to resolve can then be seen as
`builtin.unrealized_conversion_cast` ops in the resulting IR. (This is
better than running with `-debug`, because `-debug` shows IR where some
IR changes have not been materialized yet.)
2024-08-23 14:03:10 -07:00
Matthias Springer
2d50029f98 [mlir][Transforms] Dialect conversion: Build unresolved materialization for replaced ops (#101514)
When inserting an argument/source/target materialization, the dialect
conversion framework first inserts a "dummy"
`unrealized_conversion_cast` op (during the rewrite process) and then
(in the "finialize" phase) replaces these cast ops with the IR generated
by the type converter callback.

This is the case for all materializations, except when ops are being
replaced with values that have a different type. In that case, the
dialect conversion currently directly emits a source materialization.
This commit changes the implementation, such that a temporary
`unrealized_conversion_cast` is also inserted in that case.

This commit simplifies the code base: all materializations now happen in
`legalizeUnresolvedMaterialization`. This commit makes it possible to
decouple source/target/argument materializations from the dialect
conversion (to reduce the complexity of the code base). Such
materializations can then also be optional. This will be implemented in
a follow-up commit.

Depends on #101476.

---------

Co-authored-by: Jakub Kuderski <jakub@nod-labs.com>
2024-08-15 11:33:37 +02:00
Dennis Filimonov
6de04e6fe8 [mlir][bufferization] Adding the optimize-allocation-liveness pass (#101827)
Adding a pass that is expected to run after the deallocation pipeline
and will move buffer deallocations right after their last user or
dependency, thus optimizing the allocation liveness.
2024-08-14 13:22:47 +02:00
Giuseppe Rossini
441b672bbd [mlir] Fix block merging (#102038)
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.

What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.

**Note-1**: I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
**Note-2**: I fixed a bug found by @Dinistro in #97697 . The issue was
that, when looking for redundant arguments, I was not considering that
the block might have already some arguments. So the index (in the block
args list) of the i-th `newArgument` is `i+numOfOldArguments`.
2024-08-07 09:10:01 +01:00
Christian Ulmann
6a5a64c56b Revert "[mlir] Fix block merging" (#100510)
Reverts llvm/llvm-project#97697

This commit introduced non-trivial bugs related to type consistency.
2024-07-25 10:42:25 +02:00
Giuseppe Rossini
c63125d453 [mlir] Fix block merging (#97697)
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.

What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.

**Note**: this a rework of #96871 . I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
2024-07-17 17:05:40 +01:00
donald chen
662c6fc74c [mlir] [bufferize] fix bufferize deallocation error in nest symbol table (#98476)
In nested symbols, the dealloc_helper function generated by lower
deallocations pass was incorrectly positioned, causing calls fail. This
patch fixes this issue.
2024-07-15 12:52:46 +08:00
Mehdi Amini
28a11cc492 Revert "Fix block merging" (#97460)
Reverts llvm/llvm-project#96871

Bots are broken.
2024-07-02 20:57:16 +02:00
Giuseppe Rossini
6c3897d90e Fix block merging (#96871)
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.

What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.

**Note**: many tests are still not passing. But I wanted to submit the
code before changing all the tests (and probably adding a couple), so
that we can agree in principle on the algorithm/design.
2024-07-02 17:12:33 +01:00
zhicong zhong
1d4ce574a4 [mlir][bufferization] skip empty tensor elimination if they have different element type (#96998)
In the origin implementation, the empty tensor elimination will add a
`tensor.cast` and eliminate the tensor even if they have different
element type(f32, bf16). Here add a check for element type and skip the
elimination if they are different.
2024-07-01 09:30:04 +08:00
McCowan Zhang
a159b36724 Bufferization with ControlFlow Asserts (#95868)
Fixed incorrect bufferization interaction with cf.assert
- reordered bufferization condition checking
- fixed hasNeitherAllocateNorFreeSideEffect checking bug
- implemented memory interface for cf.assert

---------

Co-authored-by: McCowan Zhang <mccowan.z@ssi.samsung.com>
2024-06-26 08:00:39 +02:00
Matthias Springer
13896b6ce9 [mlir][bufferization] Fix handling of indirect function calls (#94896)
This commit fixes a crash in the ownership-based buffer deallocation
pass when indirectly calling a function via SSA value. Such functions
must be conservatively assumed to be public.

Fixes #94780.
2024-06-10 08:07:24 +02:00
klensy
f0b0c02504 [mlir][test] Fix filecheck annotation typos (#92897)
Moved fixes for mlir from
https://github.com/llvm/llvm-project/pull/91854, plus few additional in
second commit.

---------

Co-authored-by: klensy <nightouser@gmail.com>
2024-05-24 09:24:59 +02:00
Rafael Ubal
a42a2ca19b Avoid buffer hoisting from parallel loops (#90735)
This change corrects an invalid behavior in pass
`--buffer-loop-hoisting`. The pass is in charge of extracting buffer
allocations (e.g., `memref.alloca`) from loop regions (e.g., `scf.for`)
when possible. This works OK for looks with sequential execution
semantics. However, a buffer allocated in the body of a parallel loop
may be concurrently accessed by multiple thread to store its local data.
Extracting such buffer from the loop causes all threads to wrongly share
the same memory region.

In the following example, dimension 1 of the input tensor is reversed.
Dimension 0 is traversed with a parallel loop.

```
func.func @f(%input: memref<2x3xf32>) -> memref<2x3xf32> {
  %c0 = index.constant 0
  %c1 = index.constant 1
  %c2 = index.constant 2
  %c3 = index.constant 3

  %output = memref.alloc() : memref<2x3xf32>
  scf.parallel (%index) = (%c0) to (%c2) step (%c1) {
    // Create subviews for working input and output slices
    %input_slice = memref.subview %input[%index, 2][1, 3][1, -1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, -1], offset: ?>>
    %output_slice = memref.subview %output[%index, 0][1, 3][1, 1] : memref<2x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>

    // Copy the input slice into this temporary buffer. This intermediate
    // copy is unnecessary, but is used for illustration purposes.
    %temp = memref.alloc() : memref<1x3xf32>
    memref.copy %input_slice, %temp : memref<1x3xf32, strided<[3, -1], offset: ?>> to memref<1x3xf32>

    // Copy temporary buffer into output slice
    memref.copy %temp, %output_slice : memref<1x3xf32> to memref<1x3xf32, strided<[3, 1], offset: ?>>
    scf.reduce
  }

  return %output : memref<2x3xf32>
}
```

The patch submitted here prevents `%temp = memref.alloc() :
memref<1x3xf32>` from being hoisted when the containing op is
`scf.parallel` or `scf.forall`. A new op trait called
`HasParallelRegion` is introduced and assigned to these two ops to
indicate that their regions have parallel execution semantics.

@joker-eph @ftynse @nicolasvasilache @sabauma
2024-05-04 08:35:36 +02:00
Gaurav Shukla
97069a8619 [MLIR] Generalize expand_shape to take shape as explicit input (#90040)
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.

The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.

Differential Revision: https://reviews.llvm.org/D140821

---------

Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
2024-04-30 09:28:35 -07:00
Mehdi Amini
8c0341df02 Revert "[MLIR] Generalize expand_shape to take shape as explicit input" (#89540)
Reverts llvm/llvm-project#69267

this broke some bots.
2024-04-21 14:33:48 +02:00
Gaurav Shukla
e095d978ba [MLIR] Generalize expand_shape to take shape as explicit input (#69267)
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.

The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.

Differential Revision: https://reviews.llvm.org/D140821

Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
2024-04-21 07:37:02 -04:00
Matthias Gehre
c515c78024 [mlir][Bufferization] castOrReallocMemRefValue: Use BufferizationOptions (#89175)
This allows to configure both the op used for allocation and copy of
memrefs.
It also changes the default behavior because the default allocation in
`BufferizationOptions` creates `memref.alloc` with `alignment = 64`
where we used to create `memref.alloca` without any alignment before.
Fixes
```
// TODO: Use alloc/memcpy callback from BufferizationOptions if called via
// BufferizableOpInterface impl of ToMemrefOp.
```
2024-04-18 15:47:08 +02:00
Kunwar Grover
6f1e23b47d [MLIR][Bufferization] Choose default memory space in tensor copy insertion (#88500)
Tensor copy insertion currently uses memory_space = 0 when creating a
tensor copy using alloc_tensor. This memory space should instead be the
default memory space provided in bufferization options.
2024-04-12 17:56:46 +02:00
Matthias Springer
dbfc38ed6b [mlir][bufferization] Add BufferOriginAnalysis (#86461)
This commit adds the `BufferOriginAnalysis`, which can be queried to
check if two buffer SSA values originate from the same allocation. This
new analysis is used in the buffer deallocation pass to fold away or
simplify `bufferization.dealloc` ops more aggressively.

The `BufferOriginAnalysis` is based on the `BufferViewFlowAnalysis`,
which collects buffer SSA value "same buffer" dependencies. E.g., given
IR such as:
```
%0 = memref.alloc()
%1 = memref.subview %0
%2 = memref.subview %1
```
The `BufferViewFlowAnalysis` will report the following "reverse"
dependencies (`resolveReverse`) for `%2`: {`%2`, `%1`, `%0`}. I.e., all
buffer SSA values in the reverse use-def chain that originate from the
same allocation as `%2`. The `BufferOriginAnalysis` is built on top of
that. It handles only simple cases at the moment and may conservatively
return "unknown" around certain IR with branches, memref globals and
function arguments.

This analysis enables additional simplifications during
`-buffer-deallocation-simplification`. In particular, "regular" scf.for
loop nests, that yield buffers (or reallocations thereof) in the same
order as they appear in the iter_args, are now handled much more
efficiently. Such IR patterns are generated by the sparse compiler.
2024-03-25 18:57:53 +09:00
Matthias Springer
35d3b3430e [mlir][bufferization] Add "bottom-up from terminators" analysis heuristic (#83964)
One-Shot Bufferize currently does not support loops where a yielded
value bufferizes to a buffer that is different from the buffer of the
region iter_arg. In such a case, the bufferization fails with an error
such as:
```
Yield operand #0 is not equivalent to the corresponding iter bbArg
    scf.yield %0 : tensor<5xf32>
```

One common reason for non-equivalent buffers is that an op on the path
from the region iter_arg to the terminator bufferizes out-of-place. Ops
that are analyzed earlier are more likely to bufferize in-place.

This commit adds a new heuristic that gives preference to ops that are
reachable on the reverse SSA use-def chain from a region terminator and
are within the parent region of the terminator. This is expected to work
better than the existing heuristics for loops where an iter_arg is
written to multiple times within a loop, but only one write is fed into
the terminator.

Current users of One-Shot Bufferize are not affected by this change.
"Bottom-up" is still the default heuristic. Users can switch to the new
heuristic manually.

This commit also turns the "fuzzer" pass option into a heuristic,
cleaning up the code a bit.
2024-03-21 14:16:02 +09:00
Matthias Springer
0940be1581 [mlir][bufferization] Never pass ownership to functions (#80655)
Even when `private-function-dynamic-ownership` is set, ownership should
never be passed to the callee. This can lead to double deallocs (#77096)
or use-after-free in the caller because ownership is currently passed
regardless of whether there are any further uses of the buffer in the
caller or not.

Note: This is consistent with the fact that ownership is never passed to
nested regions.

This commit fixes #77096.
2024-02-05 12:11:49 +01:00
lorenzo chelini
84c8d0377d [MLIR][Vector] Implement memory effect for print (#80400)
Add write memory effect for the print operation. The exact memory
behavior is implemented in other print-like operations such as
`transform::PrintOp` or `gpu::printf`.

Providing memory behavior allows using the operation in passes like
buffer deallocation instead of emitting an error.
2024-02-02 11:05:35 +01:00
Matthias Springer
fbb62d449c [mlir][bufferization] Buffer deallocation: Make op preconditions stricter (#75127)
The buffer deallocation pass checks the IR ("operation preconditions")
to make sure that there is no IR that is unsupported. In such a case,
the pass signals a failure.

The pass now rejects all ops with unknown memory effects. We do not know
whether such an op allocates memory or not. Therefore, the buffer
deallocation pass does not know whether a deallocation op should be
inserted or not.

Memory effects are queried from the `MemoryEffectOpInterface` interface.
Ops that do not implement this interface but have the
`RecursiveMemoryEffects` trait do not have any side effects (apart from
the ones that their nested ops may have).

Unregistered ops are now rejected by the pass because they do not
implement the `MemoryEffectOpInterface` and neither do we know if they
have `RecursiveMemoryEffects` or not. All test cases that currently have
unregistered ops are updated to use registered ops.
2024-01-21 11:10:09 +01:00
Matthias Springer
b4f24be7ef [mlir][bufferization] Simplify helper potentiallyAliasesMemref (#78690)
This commit simplifies a helper function in the ownership-based buffer
deallocation pass. Fixes a potential double-free (depending on the
scheduling of patterns).
2024-01-19 13:22:02 +01:00
Sergio Afonso
8fb685fb7e [MLIR][LLVM] Add explicit target_cpu attribute to llvm.func (#78287)
This patch adds the target_cpu attribute to llvm.func MLIR operations
and updates the translation to/from LLVM IR to match "target-cpu"
function attributes.
2024-01-17 14:55:02 +00:00
Matthias Springer
a43641c9db [mlir][bufferization] Fix regionOperatesOnMemrefValues (#75016)
`Region::walk([](Block *b) {...})` does not enumerate blocks that are
direct children of the region. These blocks must be checked manually.
2023-12-12 08:56:23 +09:00
Nicolas Vasilache
3a223f4414 [mlir][Bufferization] Add support for controlled bufferization of alloc_tensor (#70957)
This revision adds support to
`transform.structured.bufferize_to_allocation` to bufferize
`bufferization.alloc_tensor()` ops.
    
This is useful as a means path to control the bufferization of
`tensor.empty` ops that have bene previously
`bufferization.empty_tensor_to_alloc_tensor`'ed.
2023-11-02 11:34:10 +01:00
Matthias Springer
4fbbb7ad7c [mlir][bufferization] Fix ownership computation of unknown ops (#70773)
No ownership is assumed for memref results of ops that implement none of
the relevant interfaces and have no memref operands. This fixes #68948.
2023-11-01 09:26:47 +09:00
Oleksandr "Alex" Zinenko
e4384149b5 [mlir] use transform-interpreter in test passes (#70040)
Update most test passes to use the transform-interpreter pass instead of
the test-transform-dialect-interpreter-pass. The new "main" interpreter
pass has a named entry point instead of looking up the top-level op with
`PossibleTopLevelOpTrait`, which is arguably a more understandable
interface. The change is mechanical, rewriting an unnamed sequence into
a named one and wrapping the transform IR in to a module when necessary.

Add an option to the transform-interpreter pass to target a tagged
payload op instead of the root anchor op, which is also useful for repro
generation.

Only the test in the transform dialect proper and the examples have not
been updated yet. These will be updated separately after a more careful
consideration of testing coverage of the transform interpreter logic.
2023-10-24 16:12:34 +02:00
Matthias Springer
4d80eff861 [mlir][bufferization] Ownership-based deallocation: Allow manual (de)allocs (#68648)
Add a new attribute `bufferization.manual_deallocation` that can be
attached to allocation and deallocation ops. Buffers that are allocated
with this attribute are assigned an ownership of "false". Such buffers
can be deallocated manually (e.g., with `memref.dealloc`) if the
deallocation op also has the attribute set. Previously, the
ownership-based buffer deallocation pass used to reject IR with existing
deallocation ops. This is no longer the case if such ops have this new
attribute.

This change is useful for the sparse compiler, which currently
deallocates the sparse tensor buffers by itself.
2023-10-23 09:45:33 +09:00
Matthias Springer
6d88ac11d7 [mlir][bufferization] Transfer restrict during empty tensor elimination (#68729)
Empty tensor elimination is looking for
`bufferization.materialize_in_destination` ops with a `tensor.empty`
source. It replaces the `tensor.empty` with a `bufferization.to_tensor
restrict` of the memref destination. As part of this rewrite, the
`restrict` keyword should be removed, so that no second `to_tensor
restrict` op will be inserted. Such IR would be invalid.
`bufferization.materialize_in_destination` with memref destination and
without the `restrict` attribute are ignored by empty tensor
elimination.

Also relax the verifier of `materialize_in_destination`. The `restrict`
keyword is not generally needed because the op does not expose the
buffer as a tensor.
2023-10-11 08:51:16 -07:00
Matthias Springer
3d0ca2cfe3 [mlir][bufferization] Allow cyclic function graphs without tensors (#68632)
Cyclic function call graphs are generally not supported by One-Shot
Bufferize. However, they can be allowed when a function does not have
tensor arguments or results. This is because it is then no longer
necessary that the callee will be bufferized before the caller.
2023-10-09 17:52:04 -07:00
Matthias Springer
8ee38f3b32 [mlir][bufferization] Follow up for #68074 (#68488)
Address additional comments in #68074. This should have been part of
#68074.
2023-10-07 10:07:17 -07:00