The proper layering here is that Inliner depends on InlinerUtils, and
not the other way round. Maybe it's time to give InliningUtils a less
terrible file name.
Current inliner disables inlining when the caller is in a region with
single block trait, while the callee function contains multiple blocks.
the SingleBlock trait is used in operations such as do/while loop, for
example fir.do_loop, fir.iterate_while and fir.if. Typically, calls within
loops are good candidates for inlining. However, functions with multiple
blocks are also common. for example, any function with "if () then
return" will result in multiple blocks in MLIR.
This change gives the flexibility of a customized inliner to handle such
cases.
doClone: clones instructions and other information from the callee
function into the caller function. .
canHandleMultipleBlocks: checks if functions with multiple blocks can be
inlined into a region with the SingleBlock trait.
The default behavior of the inliner remains unchanged.
---------
Co-authored-by: jeanPerier <jean.perier.polytechnique@gmail.com>
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
When folding an op during a conversion, first try to legalize all
generated constants, then replace the original operation. This is
slightly more efficient because fewer rewrites must be rolled back in
case a generated constant could not be legalized.
Note: This is in preparation of the One-Shot Dialect Conversion
refactoring.
I observed that we have the boundary comments in the codebase like:
```
//===----------------------------------------------------------------------===//
// ...
//===----------------------------------------------------------------------===//
```
I also observed that there are incomplete boundary comments. The
revision is generated by a script that completes the boundary comments.
```
//===----------------------------------------------------------------------===//
// ...
...
```
Signed-off-by: hanhanW <hanhan0912@gmail.com>
This commit adds an additional overload to `replaceOpWithMultiple` that
accepts additional container types. This has been brought up by users of
the new `replaceOpWithMultiple` API.
In particular, one missing container type was
`SmallVector<SmallVector<Value>>`. The "default" `ArrayRef<ValueRange>`
container type can lead to use-after-scope errors in cases such as:
```c++
// Compute the replacement value ranges. Some replacements are single
// values, some are value ranges.
SmallVector<ValueRange> repl;
repl.push_back(someValueRange); // OK
for (...) {
// push_back(Value) triggers an implicit conversion to ValueRange,
// which does not own the range.
repl.push_back(someValue); // triggers use-after-scope later
}
rewriter.replaceOpWithMultiple(op, repl);
```
In this example, users should use `SmallVector<SmallVector<Value>>
repl;`.
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
When inlining a `callee` with a call site debug location, the inlining
infrastructure was trivially combining the `callee` and the `caller`
locations, forming a "tree" of call stacks. Because of this, the remarks
were printing an incomplete inlining stack.
This commit handles this case and appends the `caller` location at the
end of the `callee`'s stack, extending the chain.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
Add runtime verification for `memref.dim`: check that the index is in
bounds.
Also simplify the pass pipeline for all memref runtime verification
checks.
This commit adds 1:N support to `SignatureConversion::remapInputs`. This
API allows users to replace a block argument with multiple replacement
values. (And the block argument is dropped.) The API already supported
"bbarg --> multiple bbargs" mappings, but "bbarg --> multiple SSA
values" was missing.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
205c5325b3
added a transform utility that moved all SSA dependences of an operation
before an insertion point. Similar to that, this PR adds a transform
utility function, `moveValueDefinitions` to move the slice of operations
that define all values in a `ValueRange` before the insertion point.
While very similar to `moveOperationDependencies`, this method differs
in a few ways
1. When computing the backward slice since the start of the slice is
value, the slice computed needs to be inclusive.
2. The combined backward slice needs to be sorted topologically before
moving them to avoid SSA use-def violations while moving individual ops.
The PR also adds a new transform op to test this new utility function.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
The added utility method moves all SSA values that an operation depends
upon before an insertion point. This is useful during transformations
where such movements might make transformations (like fusion) more
powerful.
To test the operation add a transform dialect op that calls the move
operation. To be able to capture the `notifyMatchFailure` messages from
the transformation and to report/check these in the test modify the
`ErrorCheckingTrackingListener` to capture the last match failure
notification.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Currently, when `GreedyPatternRewriteDriver` fails, the log output
contains nested failure messages:
```bash
} -> failure : pattern failed to match
} -> failure : pattern failed to match
```
This may seem redundant, but these messages refer to different aspects
of the pattern application logic. This patch clarifies the distinction
by separately logging:
* Success/failure for a specific pattern (e.g., "_this pattern_ failed
to match on the Op currently being processed").
* Success/failure for an operation as a whole (e.g., "_all patterns_
failed to match the Op currently being processed").
Before (example with success):
```bash
Processing operation : (...) {
* Pattern (...) -> ()' {
Trying to match "..."
** Match Failure : (...)
} -> failure : pattern failed to match
* Pattern (...) -> ()' {
Trying to match "..."
} -> success : pattern applied successfully
} -> success : pattern matched
```
After (example with success):
```bash
Processing operation : (...) {
* Pattern (...) -> ()' {
Trying to match "..."
** Match Failure : (...)
} -> failure : pattern failed to match
* Pattern (...) -> ()' {
Trying to match "..."
} -> success : pattern applied successfully
} -> success : at least one pattern matched
```
This improves log clarity, making it easier to distinguish pattern-level
failures from operation-level outcomes.
Enable ops with only read side effects in scf.for to be hoisted with a
scf.if guard that checks against the trip count
This patch takes a step towards a less conservative LICM in MLIR as
discussed in the following discourse thread:
[Speculative LICM?](https://discourse.llvm.org/t/speculative-licm/80977)
This patch in particular does the following:
1. Relaxes the original constraint for hoisting that only hoists ops
without any side effects. This patch also allows the ops with only read
side effects to be hoisted into an scf.if guard only if every op in the
loop or its nested regions is side-effect free or has only read side
effects. This scf.if guard wraps the original scf.for and checks for
**trip_count > 0**.
2. To support this, two new interface methods are added to
**LoopLikeInterface**: _wrapInTripCountCheck_ and
_unwrapTripCountCheck_. Implementation starts with wrapping the scf.for
loop into scf.if guard using _wrapInTripCountCheck_ and if there is no
op hoisted into the this guard after we are done processing the
worklist, it unwraps the guard by calling _unwrapTripCountCheck_.
This patch improves the GraphViz output of ViewOpGraph
(--view-op-graph).
- Switch to rectangular record-based nodes, inspired by a similar
visualization in [Glow](https://github.com/pytorch/glow). Rectangles
make more efficient use of space when printing text.
- Add input and output ports for each operand and result, and remove
edge labels.
- Switch to a muted color palette to reduce eye strain.
Fix a crash in `ConversionPatternRewriter::replaceUsesOfBlockArgument`
when running with `-debug`. The block that owns the block argument can
be a detached block. In that case, do not attempt to print the name of
the owner op.
Following up on #122188, this PR adds support for poison indices to
`ExtractOp` and `InsertOp`. It also includes canonicalization patterns
to turn extract/insert ops with poison indices into `ub.poison`.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
The current implementation of LocationSnapshotPass takes an
OpPrintingFlags argument and stores it as member, but does not use it
for printing.
Properly implement the printing flags, also supporting command line args.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Fixes two minor issues in `findOrBuildReplacementValue`:
* Remove a redundant `mapping.map`.
* Map `repl` instead of `value`. We used to overwrite an existing
mapping, which could introduce extra materializations.
Note: We generally do not want to overwrite mappings, but create a chain
of mappings. There are still a few more places, where a mapping is
overwritten. Once those are fixed, I will put an assertion into
`ConversionValueMapping::map`.
The `buildUnresolvedMaterialization` implementation used to check if a
materialization is necessary. A materialization is not necessary if the
desired types already match the input. However, this situation can never
happen: we look for mapped values with the desired type at the call
sites before requesting a new unresolved materialization.
The previous implementation seemed incorrect because
`buildUnresolvedMaterialization` created a mapping that is never rolled
back. (When in reality that code was never executed, so it is
technically not incorrect.)
Also fix a comment that in `findOrBuildReplacementValue` that was
incorrect.
This commit updates the internal `ConversionValueMapping` data structure
in the dialect conversion driver to support 1:N replacements. This is
the last major commit for adding 1:N support to the dialect conversion
driver.
Since #116470, the infrastructure already supports 1:N replacements. But
the `ConversionValueMapping` still stored 1:1 value mappings. To that
end, the driver inserted temporary argument materializations (converting
N SSA values into 1 value). This is no longer the case. Argument
materializations are now entirely gone. (They will be deleted from the
type converter after some time, when we delete the old 1:N dialect
conversion driver.)
Note for LLVM integration: Replace all occurrences of
`addArgumentMaterialization` (except for 1:N dialect conversion passes)
with `addSourceMaterialization`.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
During a 1:N replacement (`applySignatureConversion` or
`replaceOpWithMultiple`), the dialect conversion driver used to insert
two materializations:
* Argument materialization: convert N replacement values to 1 SSA value
of the original type `S`.
* Target materialization: convert original type to legalized type `T`.
The target materialization is unnecessary. Subsequent patterns receive
the replacement values via their adaptors. These patterns have their own
type converter. When they see a replacement value of type `S`, they will
automatically insert a target materialization to type `T`. There is no
reason to do this already during the 1:N replacement. (The functionality
used to be duplicated in `remapValues` and `insertNTo1Materialization`.)
Special case: If a subsequent pattern does not have a type converter, it
does *not* insert any target materializations. That's because the
absence of a type converter indicates that the pattern does not care
about type legality. Therefore, it is correct to pass an SSA value of
type `S` (or any other type) to the pattern.
Note: Most patterns in `TestPatterns.cpp` run without a type converter.
To make sure that the tests still behave the same, some of these
patterns now have a type converter.
This commit is in preparation of adding 1:N support to the conversion
value mapping. Before making any further changes to the mapping
infrastructure, I'd like to make sure that the code base around it (that
uses the mapping) is robust.
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
The dialect conversion has various checks that detect incorrect API
usage in patterns. One of these checks turned out to be quite expensive
(N*M complexity where N is the number of block rewrites and M is the
total number of rewrites) in NVIDIA-internal workloads: Checking that a
block is not converted multiple times.
This check iterates over the stack of all rewrites, which can be large.
We saw `hasRewrite` being called around 45000 times with an average
rewrite stack size of 500000.
This PR moves the check to `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.
For consistency reasons, the other `hasRewrite`-based check is also
moved there.
This PR fixes a bug in `RemoveDeadValues` where the
`FunctionOpInterface` does not have the `isDeclaration` method. As a
result, we should use the `isExternal` method instead. Fixes#116347.
This commit adds a new `matchAndRewrite` overload to `ConversionPattern`
to support 1:N replacements. This is the first of two main PRs that
merge the 1:1 and 1:N dialect conversion drivers.
The existing `matchAndRewrite` function supports only 1:1 replacements,
as can be seen from the `ArrayRef<Value>` parameter.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<Value> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const;
```
This commit adds a `matchAndRewrite` overload that is called by the
dialect conversion driver. By default, this new overload dispatches to
the original 1:1 `matchAndRewrite` implementation. Existing
`ConversionPattern`s do not need to be changed as long as there are no
1:N type conversions or value replacements.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<ValueRange> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const {
// Note: getOneToOneAdaptorOperands produces a fatal error if at least one
// ValueRange has 0 or more than 1 value.
return matchAndRewrite(op, getOneToOneAdaptorOperands(operands), rewriter);
}
```
The `ConversionValueMapping`, which keeps track of value replacements
and materializations, still does not support 1:N replacements. We still
rely on argument materializations to convert N replacement values back
into a single value. The `ConversionValueMapping` will be generalized to
1:N mappings in the second main PR.
Before handing the adaptor values to a `ConversionPattern`, all argument
materializations are "unpacked". The `ConversionPattern` receives N
replacement values and does not see any argument materializations. This
implementation strategy allows us to use the 1:N infrastructure/API in
`ConversionPattern`s even though some functionality is still missing in
the driver. This strategy was chosen to keep the sizes of the PRs
smaller and to make it easier for downstream users to adapt to API
changes.
This commit also updates the the "decompose call graphs" transformation
and the "sparse tensor codegen" transformation to use the new 1:N
`ConversionPattern` API.
Note for LLVM conversion: If you are using a type converter with 1:N
type conversion rules or if your patterns are performing 1:N
replacements (via `replaceOpWithMultiple` or
`applySignatureConversion`), conversion pattern applications will start
failing (fatal LLVM error) with this error message: `pattern 'name' does
not support 1:N conversion`. The name of the failing pattern is shown in
the error message. These patterns must be updated to the new 1:N
`matchAndRewrite` API.
When an unresolved materialization (`unrealized_conversion_cast` op) is
rolled back, the mapping should be rolled back as well, regardless of
whether it is a source, target or argument materialization. Otherwise,
we accumulate pointers to erased IR in the `mapping`. This is harmless
in most cases, but can cause issues when a new operation is allocated at
the same memory location and the pointer is "reused".
It is not possible to write a test case for this because I cannot
trigger the pointer reuse programmatically.
This commit adds an extra assertion to `applySignatureConversion` to
prevent incorrect API usage: The same block cannot be converted multiple
times. That would mess with the underlying conversion value mapping.
(Mappings would be overwritten.) This is similar to op replacements: The
same op cannot be replaced multiple times.
To simplify the check, `BlockTypeConversionRewrite::block` now stores
the original block. The new block is stored in an extra field. (It used
to be the other way around.)
This commit is in preparation of adding 1:N support to the conversion
value mapping. Before making any further changes to the mapping
infrastructure, I'd like to make sure that the code base around it (that
uses the mapping) is robust.
This change removes the restriction on `SymbolUserOpInterface` operators
so they can be used with operators that implement `SymbolOpInterface`,
example:
`memref.global` implements `SymbolOpInterface` so it can be used with
`memref.get_global` which implements `SymbolUserOpInterface`
```
// Define a global constant array
memref.global "private" constant @global_array : memref<10xi32> = dense<[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]> : tensor<10xi32>
// Access this global constant within a function
func @use_global() {
%0 = memref.get_global @global_array : memref<10xi32>
}
```
Reference: https://github.com/llvm/llvm-project/pull/116519 and
https://discourse.llvm.org/t/question-on-criteria-for-acceptable-ir-in-removedeadvaluespass/83131
---------
Co-authored-by: Zeeshan Siddiqui <mzs@ntdev.microsoft.com>