This change:
- makes **scf.if** recursively speculatable like **affine.if** is.
- also introduces related LICM tests for both **scf.if** and
**affine.if**
The current implementation of LocationSnapshotPass takes an
OpPrintingFlags argument and stores it as member, but does not use it
for printing.
Properly implement the printing flags, also supporting command line args.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
This commit updates the internal `ConversionValueMapping` data structure
in the dialect conversion driver to support 1:N replacements. This is
the last major commit for adding 1:N support to the dialect conversion
driver.
Since #116470, the infrastructure already supports 1:N replacements. But
the `ConversionValueMapping` still stored 1:1 value mappings. To that
end, the driver inserted temporary argument materializations (converting
N SSA values into 1 value). This is no longer the case. Argument
materializations are now entirely gone. (They will be deleted from the
type converter after some time, when we delete the old 1:N dialect
conversion driver.)
Note for LLVM integration: Replace all occurrences of
`addArgumentMaterialization` (except for 1:N dialect conversion passes)
with `addSourceMaterialization`.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
This commit adds a test case that performs two back-to-back 1:N
replacements: `(i16) -> (i16, i16) -> ((i16, i16), (i16, i16))`. For the
moment, 3 argument materializations are inserted. In the future (when
the conversion value mapping supports 1:N), a single target
materialization will be inserted. Addresses a
[comment](https://github.com/llvm/llvm-project/pull/116524#discussion_r1894629711)
in #116524.
Vector::BroadCastOp expects the identical element type in folding. It
causes the crash if the different source type is given to the SCCP pass.
We need to guard the pass from crashing if the nonidentical element type
is given, but still compatible. (e.g. index vs integer type)
https://github.com/llvm/llvm-project/issues/120193
During a 1:N replacement (`applySignatureConversion` or
`replaceOpWithMultiple`), the dialect conversion driver used to insert
two materializations:
* Argument materialization: convert N replacement values to 1 SSA value
of the original type `S`.
* Target materialization: convert original type to legalized type `T`.
The target materialization is unnecessary. Subsequent patterns receive
the replacement values via their adaptors. These patterns have their own
type converter. When they see a replacement value of type `S`, they will
automatically insert a target materialization to type `T`. There is no
reason to do this already during the 1:N replacement. (The functionality
used to be duplicated in `remapValues` and `insertNTo1Materialization`.)
Special case: If a subsequent pattern does not have a type converter, it
does *not* insert any target materializations. That's because the
absence of a type converter indicates that the pattern does not care
about type legality. Therefore, it is correct to pass an SSA value of
type `S` (or any other type) to the pattern.
Note: Most patterns in `TestPatterns.cpp` run without a type converter.
To make sure that the tests still behave the same, some of these
patterns now have a type converter.
This commit is in preparation of adding 1:N support to the conversion
value mapping. Before making any further changes to the mapping
infrastructure, I'd like to make sure that the code base around it (that
uses the mapping) is robust.
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
This allows for inlining to be somewhat controlled by the user instead
of always inlining everything. External heuristics may be used to place
`no_inline` attributes on invidiual calls or functions to prevent
inlining.
This PR fixes a bug in `RemoveDeadValues` where the
`FunctionOpInterface` does not have the `isDeclaration` method. As a
result, we should use the `isExternal` method instead. Fixes#116347.
This commit adds a new `matchAndRewrite` overload to `ConversionPattern`
to support 1:N replacements. This is the first of two main PRs that
merge the 1:1 and 1:N dialect conversion drivers.
The existing `matchAndRewrite` function supports only 1:1 replacements,
as can be seen from the `ArrayRef<Value>` parameter.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<Value> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const;
```
This commit adds a `matchAndRewrite` overload that is called by the
dialect conversion driver. By default, this new overload dispatches to
the original 1:1 `matchAndRewrite` implementation. Existing
`ConversionPattern`s do not need to be changed as long as there are no
1:N type conversions or value replacements.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<ValueRange> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const {
// Note: getOneToOneAdaptorOperands produces a fatal error if at least one
// ValueRange has 0 or more than 1 value.
return matchAndRewrite(op, getOneToOneAdaptorOperands(operands), rewriter);
}
```
The `ConversionValueMapping`, which keeps track of value replacements
and materializations, still does not support 1:N replacements. We still
rely on argument materializations to convert N replacement values back
into a single value. The `ConversionValueMapping` will be generalized to
1:N mappings in the second main PR.
Before handing the adaptor values to a `ConversionPattern`, all argument
materializations are "unpacked". The `ConversionPattern` receives N
replacement values and does not see any argument materializations. This
implementation strategy allows us to use the 1:N infrastructure/API in
`ConversionPattern`s even though some functionality is still missing in
the driver. This strategy was chosen to keep the sizes of the PRs
smaller and to make it easier for downstream users to adapt to API
changes.
This commit also updates the the "decompose call graphs" transformation
and the "sparse tensor codegen" transformation to use the new 1:N
`ConversionPattern` API.
Note for LLVM conversion: If you are using a type converter with 1:N
type conversion rules or if your patterns are performing 1:N
replacements (via `replaceOpWithMultiple` or
`applySignatureConversion`), conversion pattern applications will start
failing (fatal LLVM error) with this error message: `pattern 'name' does
not support 1:N conversion`. The name of the failing pattern is shown in
the error message. These patterns must be updated to the new 1:N
`matchAndRewrite` API.
This change removes the restriction on `SymbolUserOpInterface` operators
so they can be used with operators that implement `SymbolOpInterface`,
example:
`memref.global` implements `SymbolOpInterface` so it can be used with
`memref.get_global` which implements `SymbolUserOpInterface`
```
// Define a global constant array
memref.global "private" constant @global_array : memref<10xi32> = dense<[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]> : tensor<10xi32>
// Access this global constant within a function
func @use_global() {
%0 = memref.get_global @global_array : memref<10xi32>
}
```
Reference: https://github.com/llvm/llvm-project/pull/116519 and
https://discourse.llvm.org/t/question-on-criteria-for-acceptable-ir-in-removedeadvaluespass/83131
---------
Co-authored-by: Zeeshan Siddiqui <mzs@ntdev.microsoft.com>
This commit adds support for 1:N result type conversions for `func.call`
ops. In that case, argument materializations to the original result type
should be inserted (via `replaceOpWithMultiple`).
This commit is in preparation of merging the 1:1 and 1:N conversion
drivers.
This is intended as a fast pattern rewrite driver for the cases when a
simple walk gets the job done but we would still want to implement it in
terms of rewrite patterns (that can be used with the greedy pattern
rewrite driver downstream).
The new driver is inspired by the discussion in
https://github.com/llvm/llvm-project/pull/112454 and the LLVM Dev
presentation from @matthias-springer earlier this week.
This limitation comes with some limitations:
* It does not repeat until a fixpoint or revisit ops modified in place
or newly created ops. In general, it only walks forward (in the
post-order).
* `matchAndRewrite` can only erase the matched op or its descendants.
This is verified under expensive checks.
* It does not perform folding / DCE.
We could probably relax some of these in the future without sacrificing
too much performance.
This commit changes the format of the materialization error message.
Previously: `failed to legalize unresolved materialization from ('f64')
to 'f32' that remained live after conversion`
Now: `failed to legalize unresolved materialization from ('f64') to
('f32') that remained live after conversion`
This commit is in preparation of merging the 1:1 and 1:N dialect
conversions. At that point, target materializations may create more than
one SSA value. I am sending this change as a separate PR to keep the
main PR smaller.
Fixes#107870.
We can allow the enclosing Module operation to have a symbol.
The check was likely originally not considering this case and intended
to catch symbols inside the region, not accounting that the walk would
visit the enclosing operation.
For index type of induction variable, the indexing math is better
represented using affine ops such as `affine.delinearize_index`.
This also further demonstrates that some of these `affine` ops might
need to move to a different dialect. For one these ops only support
`IndexType` when they should be able to work with any integer type.
This change also includes some canonicalization patterns for
`affine.delinearize_index` operation to
1) Drop unit `basis` values
2) Remove the `delinearize_index` op when the `linear_index` is a loop
induction variable of a normalized loop and the `basis` is of size 1 and
is also the upper bound of the normalized loop.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Handle dropped block arguments and dropped op results in the same way:
build a source materialization (that may fold away if unused). This
simplifies the code base a bit and makes it possible to merge
`legalizeConvertedArgumentTypes` and `legalizeConvertedOpResultTypes` in
a future commit. These two functions are almost doing the same thing
now.
As a side effect, this commit also changes the dialect conversion such
that temporary circular cast ops are no longer generated. (There was a
workaround in #107109 that can now be removed again.) Example:
```
%0 = "builtin.unrealized_conversion_cast"(%1) : (!a) -> !b
%1 = "builtin.unrealized_conversion_cast"(%0) : (!b) -> !a
// No further uses of %0, %1.
```
This happened when:
1. An op was erased. (No replacement values provided.)
2. A conversion pattern for another op builds a replacement value for
the erased op's results (first cast op) during `remapValues`, but that
SSA value is not used during the pattern application.
3. During the finalization phase, `legalizeConvertedOpResultTypes`
thinks that the erased op is alive because of the cast op that was built
in Step 2. It builds a cast from that replacement value to the original
type.
4. During the commit phase, all uses of the original op are replaced
with the casted value produced in Step 3. We have generated circular IR.
This problem can be avoided by making sure that source materializations
are generated for all dropped results. This ensures that we always have
some replacement SSA value in the mapping. Previously, we sometimes had
a value mapped and sometimes not. (No more special casing is needed
anymore to distinguish between "value dropped" or "value replaced with
SSA value".)
This commit makes source/target/argument materializations (via the
`TypeConverter` API) optional.
By default (`ConversionConfig::buildMaterializations = true`), the
dialect conversion infrastructure tries to legalize all unresolved
materializations right after the main transformation process has
succeeded. If at least one unresolved materialization fails to resolve,
the dialect conversion fails. (With an error message such as `failed to
legalize unresolved materialization ...`.) Automatic materializations
through the `TypeConverter` API can now be deactivated. In that case,
every unresolved materialization will show up as a
`builtin.unrealized_conversion_cast` op in the output IR.
There used to be a complex and error-prone analysis in the dialect
conversion that predicted the future uses of unresolved
materializations. Based on that logic, some casts (that were deemed to
unnecessary) were folded. This analysis was needed because folding
happened at a point of time when some IR changes (e.g., op replacements)
had not materialized yet.
This commit removes that analysis. Any folding of cast ops now happens
after all other IR changes have been materialized and the uses can
directly be queried from the IR. This simplifies the analysis
significantly. And certain helper data structures such as
`inverseMapping` are no longer needed for the analysis. The folding
itself is done by `reconcileUnrealizedCasts` (which also exists as a
standalone pass).
After casts have been folded, the remaining casts are materialized
through the `TypeConverter`, as usual. This last step can be deactivated
in the `ConversionConfig`.
`ConversionConfig::buildMaterializations = false` can be used to debug
error messages such as `failed to legalize unresolved materialization
...`. (It is also useful in case automatic materializations are not
needed.) The materializations that failed to resolve can then be seen as
`builtin.unrealized_conversion_cast` ops in the resulting IR. (This is
better than running with `-debug`, because `-debug` shows IR where some
IR changes have not been materialized yet.)
Note: This is a reupload of #104668, but with correct handling of cyclic
unrealized_conversion_casts that may be generated by the dialect
conversion.
This fixes#94520 by ensuring that any if any block arguments are being
used outside of the original block that the block is not considered a
candidate for merging.
More details: the root cause of the issue described in #94520 was that
`^bb2` and `^bb5` were being merged despite `%4` (an argument to `^bb2`)
was being used later in `^bb7`. When the block merge occurred, that
unintentionally changed the value of `%4` for all downstream code. This
change prevents that from happening.
buffer-results-to-out-params pass will have a nullptr-referencing error
when hoist-static-allocs option is on, when the return value of a
function is a parameter of the function. This PR fixes this issue.
This commit changes the inlining to also update the locations of block
arguments. Not updating these locations leads to LLVM IR verification
issues when exporting converted block arguments to phi nodes. This lack
of location update was not visible due to ignoring the argument
locations until recently.
Relevant change: https://github.com/llvm/llvm-project/pull/105534
This commit makes source/target/argument materializations (via the
`TypeConverter` API) optional.
By default (`ConversionConfig::buildMaterializations = true`), the
dialect conversion infrastructure tries to legalize all unresolved
materializations right after the main transformation process has
succeeded. If at least one unresolved materialization fails to resolve,
the dialect conversion fails. (With an error message such as `failed to
legalize unresolved materialization ...`.) Automatic materializations
through the `TypeConverter` API can now be deactivated. In that case,
every unresolved materialization will show up as a
`builtin.unrealized_conversion_cast` op in the output IR.
There used to be a complex and error-prone analysis in the dialect
conversion that predicted the future uses of unresolved
materializations. Based on that logic, some casts (that were deemed to
unnecessary) were folded. This analysis was needed because folding
happened at a point of time when some IR changes (e.g., op replacements)
had not materialized yet.
This commit removes that analysis. Any folding of cast ops now happens
after all other IR changes have been materialized and the uses can
directly be queried from the IR. This simplifies the analysis
significantly. And certain helper data structures such as
`inverseMapping` are no longer needed for the analysis. The folding
itself is done by `reconcileUnrealizedCasts` (which also exists as a
standalone pass).
After casts have been folded, the remaining casts are materialized
through the `TypeConverter`, as usual. This last step can be deactivated
in the `ConversionConfig`.
`ConversionConfig::buildMaterializations = false` can be used to debug
error messages such as `failed to legalize unresolved materialization
...`. (It is also useful in case automatic materializations are not
needed.) The materializations that failed to resolve can then be seen as
`builtin.unrealized_conversion_cast` ops in the resulting IR. (This is
better than running with `-debug`, because `-debug` shows IR where some
IR changes have not been materialized yet.)
Mem2Reg assumes SSA dependencies but did not check for graph regions.
This fixes it.
---------
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
There was a typo in the code path that removes unnecessary
materializations.
Before: Update `opResult` (result of an op different from `user`) in
mapping and remove `user`.
```
replaceMaterialization(rewriterImpl, opResult, inputOperands,
inverseMapping);
necessaryMaterializations.remove(materializationOps.lookup(user));
```
After: Update `user->getResults()` in mapping and remove `user`.
```
replaceMaterialization(rewriterImpl, user->getResults(), inputOperands,
inverseMapping);
necessaryMaterializations.remove(materializationOps.lookup(user));
```
When inserting an argument/source/target materialization, the dialect
conversion framework first inserts a "dummy"
`unrealized_conversion_cast` op (during the rewrite process) and then
(in the "finialize" phase) replaces these cast ops with the IR generated
by the type converter callback.
This is the case for all materializations, except when ops are being
replaced with values that have a different type. In that case, the
dialect conversion currently directly emits a source materialization.
This commit changes the implementation, such that a temporary
`unrealized_conversion_cast` is also inserted in that case.
This commit simplifies the code base: all materializations now happen in
`legalizeUnresolvedMaterialization`. This commit makes it possible to
decouple source/target/argument materializations from the dialect
conversion (to reduce the complexity of the code base). Such
materializations can then also be optional. This will be implemented in
a follow-up commit.
Depends on #101476.
---------
Co-authored-by: Jakub Kuderski <jakub@nod-labs.com>
This commit adds three matchers that unlike the m_NonZero matcher
not only match constants, but also operations that implement the
InferIntRangeInterface. These matchers can then match a non-zero value
or a value that is not minus one based on the inferred range. Additionally,
the commit uses the new matchers in the getSpeculatability functions of
Arith's signed and unsigned integer divisions. At the moment, the
matchers only look at the defining operation to avoid expensive IR walks.
This range based matchers can be useful when hoisting divisions out of
a loop, which requires knowing the divisor is non-zero and not minus one
for signed divisions. Just checking for a constant divisor may not be
sufficient, if the divisor is, for example, the result of an operation that
returns the number of threads of a team of threads.
When a function argument is annotated with the `llvm.byval` attribute,
[LLVM expects](https://llvm.org/docs/LangRef.html#parameter-attributes)
the function argument type to be an `llvm.ptr`. For example:
```
func.func (%args0 : llvm.ptr {llvm.byval = !llvm.struct<(i32)>} {
...
}
```
Unfortunately, this makes the type conversion context-dependent, which
is something that the type conversion infrastructure (i.e.,
`LLVMTypeConverter` in this particular case) doesn't support. For
example, we may want to convert `MyType` to `llvm.struct<(i32)>` in
general, but to an `llvm.ptr` type only when it's a function argument
passed by value.
To fix this problem, this PR changes the FuncToLLVM conversion logic to
generate an `llvm.ptr` when the function argument has a `llvm.byval`
attribute. An `llvm.load` is inserted into the function to retrieve the
value expected by the argument users.
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.
What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.
**Note-1**: I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
**Note-2**: I fixed a bug found by @Dinistro in #97697 . The issue was
that, when looking for redundant arguments, I was not considering that
the block might have already some arguments. So the index (in the block
args list) of the i-th `newArgument` is `i+numOfOldArguments`.
This reverts commit 2aa96fcf75.
This was merged without a test. Also it seems it was only fixing an
issue for users which used a particular workaround that is not actually
needed anymore (skipping UnrealizedConversionCast operands).
This code got lost in #97213 and there was no test for it. Add it back
with an MLIR test.
When a pattern is run without a type converter, we can assume that the
new block argument types of a signature conversion are legal. That's
because they were specified by the user. This won't work for 1->N
conversions due to limitations in the dialect conversion infrastructure,
so the original `FIXME` has to stay in place.
While we have had a Properties.td that allowed for defining
non-attribute-backed properties, such properties were not plumbed
through the basic autogeneration facilities available to attributes,
forcing those who want to migrate to the new system to write such code
by hand.
## Potentially breaking changes
- The `setFoo()` methods on `Properties` struct no longer take their
inputs by const reference. Those wishing to pass non-owned values of a
property by reference to constructors and setters should set the
interface type to `const [storageType]&`
- Adapters and operations now define getters and setters for properties
listed in ODS, which may conflict with custom getters.
- Builders now include properties listed in ODS specifications,
potentially conflicting with custom builders with the same type
signature.
## Extensions to the `Property` class
This commit adds several fields to the `Property` class, including:
- `parser`, `optionalParser`, and `printer` (for parsing/printing
properties of a given type in ODS syntax)
- `storageTypeValueOverride`, an extension of `defaultValue` to allow
the storage and interface type defaults to differ
- `baseProperty` (allowing for classes like `DefaultValuedProperty`)
Existing fields have also had their documentation comments updated.
This commit does not add a `PropertyConstraint` analogous to
`AttrConstraint`, but this is a natural evolution of the work here.
This commit also adds the concrete property kinds `I32Property`,
`I64Property`, `UnitProperty` (and special handling for it like for
UnitAttr), and `BoolProperty`.
## Property combinators
`Properties.td` also now includes several ways to combine properties.
One is `ArrayProperty<Property elem>`, which now stores a
variable-length array of some property as
`SmallVector<elem.storageType>` and uses `ArrayRef<elem.storageType>` as
its interface type. It has `IntArrayProperty` subclasses that change its
conversion to attributes to use `DenseI[N]Attr`s instead of an
`ArrayAttr`.
Similarly, `OptionalProperty<Property p>` wraps a property's storage in
`std::optional<>` and adds a `std::nullopt` default value. In the case
where the underlying property can be parsed optionally but doesn't have
its own default value, `OptionalProperty` can piggyback off the optional
parser to produce a cleaner syntax, as opposed to its general form,
which is either `none` or `some<[value]>`.
(Note that `OptionalProperty` can be nested if desired).
## Autogeneration changes
Operations and adaptors now support getters and setters for properties
like those for attributes. Unlike for attributes, there aren't separate
value and attribute forms, since there is no `FooAttr()` available for a
`getFooAttr()` to return.
The largest change is to operation formats. Previously, properties could
only be used in custom directives. Now, they can be used anywhere an
attribute could be used, and have parsers and printers defined in their
tablegen records.
These updates include special `UnitProperty` logic like that used for
`UnitAttr`.
## Misc.
Some attempt has been made to test the new functionality.
This commit takes tentative steps towards updating the documentation to
account for properties. A full update will be in order once any followup
work has been completed and the interfaces have stabilized.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
This commit fixes a crash in the dialect conversion when applying a
signature conversion to a block inside of a detached region.
This fixes an issue reported in
4114d5be87 (r1691809730).
This commit simplifies the handling of dropped arguments and updates
some dialect conversion documentation that is outdated.
When converting a block signature, a `BlockTypeConversionRewrite` object
and potentially multiple `ReplaceBlockArgRewrite` are created. During
the "commit" phase, uses of the old block arguments are replaced with
the new block arguments, but the old implementation was written in an
inconsistent way: some block arguments were replaced in
`BlockTypeConversionRewrite::commit` and some were replaced in
`ReplaceBlockArgRewrite::commit`. The new
`BlockTypeConversionRewrite::commit` implementation is much simpler and
no longer modifies any IR; that is done only in `ReplaceBlockArgRewrite`
now. The `ConvertedArgInfo` data structure is no longer needed.
To that end, materializations of dropped arguments are now built in
`applySignatureConversion` instead of `materializeLiveConversions`; the
latter function no longer has to deal with dropped arguments.
Other minor improvements:
- Add more comments to `applySignatureConversion`.
Note: Error messages around failed materializations for dropped basic
block arguments changed slightly. That is because those materializations
are now built in `legalizeUnresolvedMaterialization` instead of
`legalizeConvertedArgumentTypes`.
This commit is in preparation of decoupling argument/source/target
materializations from the dialect conversion.
This is a re-upload of #96207.
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.
What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted. This needs to happen to all
the arguments we pass to the different successors of the parent block
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.
**Note**: this a rework of #96871 . I ran all the integration tests
(`-DMLIR_INCLUDE_INTEGRATION_TESTS=ON`) and they passed.
This commit fixes a bug in the dialect conversion. During a 1:N
signature conversion, the dialect conversion did not insert a cast back
to the original block argument type, producing invalid IR.
See `test-block-legalization.mlir`: Without this commit, the operand
type of the op changes because an `unrealized_conversion_cast` is
missing:
```
"test.consumer_of_complex"(%v) : (!llvm.struct<(f64, f64)>) -> ()
```
To implement this fix, it was necessary to change the meaning of
argument materializations. An argument materialization now maps from the
new block argument types to the original block argument type. (It now
behaves almost like a source materialization.) This also addresses a
`FIXME` in the code base:
```
// FIXME: The current argument materialization hook expects the original
// output type, even though it doesn't use that as the actual output type
// of the generated IR. The output type is just used as an indicator of
// the type of materialization to do. This behavior is really awkward in
// that it diverges from the behavior of the other hooks, and can be
// easily misunderstood. We should clean up the argument hooks to better
// represent the desired invariants we actually care about.
```
It is no longer necessary to distinguish between the "output type" and
the "original output type".
Most type converter are already written according to the new API. (Most
implementations use the same conversion functions as for source
materializations.) One exception is the MemRef-to-LLVM type converter,
which materialized an `!llvm.struct` based on the elements of a memref
descriptor. It still does that, but casts the `!llvm.struct` back to the
original memref type. The dialect conversion inserts a target
materialization (to `!llvm.struct`) which cancels out with the other
cast.
This commit also fixes a bug in `computeNecessaryMaterializations`. The
implementation did not account for the possibility that a value was
replaced multiple times. E.g., replace `a` by `b`, then `b` by `c`.
This commit also adds a transform dialect op to populate SCF-to-CF
patterns. This transform op was needed to write a test case. The bug
described here appears only during a complex interplay of 1:N signature
conversions and op replacements. (I was not able to trigger it with ops
and patterns from the `test` dialect without duplicating the `scf.if`
pattern.)
Note for LLVM integration: Make sure that all
`addArgument/Source/TargetMaterialization` functions produce an SSA of
the specified type.
Depends on #98743.
With this PR I am trying to address:
https://github.com/llvm/llvm-project/issues/63230.
What changed:
- While merging identical blocks, don't add a block argument if it is
"identical" to another block argument. I.e., if the two block arguments
refer to the same `Value`. The operations operands in the block will
point to the argument we already inserted
- After merged the blocks, get rid of "unnecessary" arguments. I.e., if
all the predecessors pass the same block argument, there is no need to
pass it as an argument.
- This last simplification clashed with
`BufferDeallocationSimplification`. The reason, I think, is that the two
simplifications are clashing. I.e., `BufferDeallocationSimplification`
contains an analysis based on the block structure. If we simplify the
block structure (by merging and/or dropping block arguments) the
analysis is invalid . The solution I found is to do a more prudent
simplification when running that pass.
**Note**: many tests are still not passing. But I wanted to submit the
code before changing all the tests (and probably adding a couple), so
that we can agree in principle on the algorithm/design.