`tensor.insert_slice` needs to have read semantics on its destination
operand. Since it has a return value, its semantics are
- Copy dest to result
- Copy source to subview of destination.
`tensor.parallel_insert_slice` though has no result. So it does not need
to have read semantics. The op description
[here](a3ac318e5f/mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td (L1524))
also says that it is expected to lower to a `memref.subview`, that does
not have read semantics on the destination (its just a view).
This patch drops the read semantics for destination of
`tensor.parallel_insert_slice` but also makes the `shared_outs` operands
of `scf.forall` have read semantics. Earlier it would rely indirectly on
read semantics of destination operand of `tensor.parallel_insert_slice`
to propagate the read semantics for `shared_outs`. Now that is specified
more directly.
Fixes#133964
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
We can use *Set::insert_range to collapse:
for (auto Elem : Range)
Set.insert(E);
down to:
Set.insert_range(Range);
In some cases, we can further fold that into the set declaration.
This gets the consumer fusion method in sync with the corresponding
producer fusion method `tileAndFuseProducerOfSlice`. Not taking this as
input required use of complicated analysis to retrieve the surrounding
loops which are very fragile. Just like the producer fusion method, the
loops need to be taken in as an argument, with typically the loops being
created by the tiling methods.
Some utilities are added to check that the loops passed in are perfectly
nested (in the case of an `scf.for` loop nest.
This is change 1 of N to simplify the implementation of tile and fuse
consumers.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Why? This option can lead to incorrect IR if used in isolation, for
example, consider the IR below:
```mlir
func.func @loop_with_aliasing(%arg0: tensor<5xf32>, %arg1: index, %arg2: index) -> tensor<5xf32> {
%c1 = arith.constant 1 : index
%cst = arith.constant 1.000000e+00 : f32
%0 = tensor.empty() : tensor<5xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<5xf32>) -> tensor<5xf32>
// The BufferizableOpInterface says that %2 alias with %arg0 or be a newly
// allocated buffer
%2 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (tensor<5xf32>) {
scf.yield %1 : tensor<5xf32>
}
%cst_0 = arith.constant 1.000000e+00 : f32
%inserted = tensor.insert %cst_0 into %1[%c1] : tensor<5xf32>
return %2 : tensor<5xf32>
}
```
If we bufferize with: enforce-aliasing-invariants=false, we get:
```
func.func @loop_with_aliasing(%arg0: memref<5xf32, strided<[?], offset: ?>>, %arg1: index, %arg2: index) -> memref<5xf32, strided<[?], offset: ?>> {
%c1 = arith.constant 1 : index
%cst = arith.constant 1.000000e+00 : f32
%alloc = memref.alloc() {alignment = 64 : i64} : memref<5xf32>
linalg.fill ins(%cst : f32) outs(%alloc : memref<5xf32>)
%0 = scf.for %arg3 = %arg1 to %arg2 step %c1 iter_args(%arg4 = %arg0) -> (memref<5xf32, strided<[?], offset: ?>>) {
%cast = memref.cast %alloc : memref<5xf32> to memref<5xf32, strided<[?], offset: ?>>
scf.yield %cast : memref<5xf32, strided<[?], offset: ?>>
}
%cst_0 = arith.constant 1.000000e+00 : f32
memref.store %cst_0, %alloc[%c1] : memref<5xf32>
return %0 : memref<5xf32, strided<[?], offset: ?>>
}
```
Which is not correct IR since the loop yields the allocation.
I am using this option. What do I need to do now?
If you are using this option in isolation, you are possibly generating
incorrect IR, so you need to revisit your bufferization strategy. If you
are using it together with `copyBeforeWrite,` you simply need to retire
the `enforceAliasingInvariants` option.
Co-authored-by: Matthias Springer <mspringer@nvidia.com>
This is a code cleanup. Update a few places in MLIR that should use
`hasSingleElement`/`getSingleElement`.
Note: `hasSingleElement` is faster than `.getSize() == 1` when it is
used with linked lists etc.
Depends on #131508.
Fix a bug in method `getUntiledProducerFromSliceSource` where address
sanitizer fails compilation on heap
buffer overflow for accessing value out of the iteration range.
This PR fixes the issue and adds a lit test to reproduce it.
This PR adds a new interface method to PartialReductionOpInterface which
allows it to query the result tile position for the partial result.
Previously, tiling the reduction dimension with
SplitReductionOuterReduction when the result has transposed parallel
dimensions would produce wrong results.
Other fixes that were needed to make this PR work:
- Instead of ad-hoc logic to decide where to place the new reduction
dimensions in the partial result based on the iteration space, the
reduction dimensions are always appended to the partial result tensor.
- Remove usage of PartialReductionOpInterface in Mesh dialect. The
implementation was trying to just get a neutral element, but ended up
trying to use PartialReductionOpInterface for it, which is not right. It
was also passing the wrong sizes to it.
This PR makes TileAndFuse explicitly track replacements using a listener
instead of assuming that the results always come from the outer most
tiling loop. scf::tileUsingInterface can introduce merge operations
whose results are the actual replacements to use, instead of the outer
most loop results.
The greedy rewriter is used in many different flows and it has a lot of
convenience (work list management, debugging actions, tracing, etc). But
it combines two kinds of greedy behavior 1) how ops are matched, 2)
folding wherever it can.
These are independent forms of greedy and leads to inefficiency. E.g.,
cases where one need to create different phases in lowering and is
required to applying patterns in specific order split across different
passes. Using the driver one ends up needlessly retrying folding/having
multiple rounds of folding attempts, where one final run would have
sufficed.
Of course folks can locally avoid this behavior by just building their
own, but this is also a common requested feature that folks keep on
working around locally in suboptimal ways.
For downstream users, there should be no behavioral change. Updating
from the deprecated should just be a find and replace (e.g., `find ./
-type f -exec sed -i
's|applyPatternsAndFoldGreedily|applyPatternsGreedily|g' {} \;` variety)
as the API arguments hasn't changed between the two.
This patch unifies the tiling implementation for tileUsingFor and
tileReductionUsingFor. This is done by passing an addition option to
SCFTilingOptions, allowing it to set how reduction dimensions should be
tiled. Currently, there are 3 different options for reduction tiling:
FullReduction (old tileUsingFor), PartialReductionOuterReduction (old
tileReductionUsingFor) and PartialReductionOuterParallel
(linalg::tileReductionUsingForall, this isn't implemented in this
patch).
The patch makes tileReductionUsingFor use the tileUsingFor
implementation with the new reduction tiling options.
There are no test changes because the implementation was doing almost
the exactly same thing. This was also tested in IREE (which uses both
these APIs heavily) and there were no test changes.
This commit adds a new `matchAndRewrite` overload to `ConversionPattern`
to support 1:N replacements. This is the first of two main PRs that
merge the 1:1 and 1:N dialect conversion drivers.
The existing `matchAndRewrite` function supports only 1:1 replacements,
as can be seen from the `ArrayRef<Value>` parameter.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<Value> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const;
```
This commit adds a `matchAndRewrite` overload that is called by the
dialect conversion driver. By default, this new overload dispatches to
the original 1:1 `matchAndRewrite` implementation. Existing
`ConversionPattern`s do not need to be changed as long as there are no
1:N type conversions or value replacements.
```c++
LogicalResult ConversionPattern::matchAndRewrite(
Operation *op, ArrayRef<ValueRange> operands /*adaptor values*/,
ConversionPatternRewriter &rewriter) const {
// Note: getOneToOneAdaptorOperands produces a fatal error if at least one
// ValueRange has 0 or more than 1 value.
return matchAndRewrite(op, getOneToOneAdaptorOperands(operands), rewriter);
}
```
The `ConversionValueMapping`, which keeps track of value replacements
and materializations, still does not support 1:N replacements. We still
rely on argument materializations to convert N replacement values back
into a single value. The `ConversionValueMapping` will be generalized to
1:N mappings in the second main PR.
Before handing the adaptor values to a `ConversionPattern`, all argument
materializations are "unpacked". The `ConversionPattern` receives N
replacement values and does not see any argument materializations. This
implementation strategy allows us to use the 1:N infrastructure/API in
`ConversionPattern`s even though some functionality is still missing in
the driver. This strategy was chosen to keep the sizes of the PRs
smaller and to make it easier for downstream users to adapt to API
changes.
This commit also updates the the "decompose call graphs" transformation
and the "sparse tensor codegen" transformation to use the new 1:N
`ConversionPattern` API.
Note for LLVM conversion: If you are using a type converter with 1:N
type conversion rules or if your patterns are performing 1:N
replacements (via `replaceOpWithMultiple` or
`applySignatureConversion`), conversion pattern applications will start
failing (fatal LLVM error) with this error message: `pattern 'name' does
not support 1:N conversion`. The name of the failing pattern is shown in
the error message. These patterns must be updated to the new 1:N
`matchAndRewrite` API.
As described in issue llvm/llvm-project#91518, a previous PR
llvm/llvm-project#78484 introduced the `defaultMemorySpaceFn` into
bufferization options, allowing one to inform OneShotBufferize that it
should use a specified function to derive the memory space attribute
from the encoding attribute attached to tensor types.
However, introducing this feature exposed unhandled edge cases,
examples of which are introduced by this change in the new test under
`test/Dialect/Bufferization/Transforms/one-shot-bufferize-encodings.mlir`.
Fixing the inconsistencies introduced by `defaultMemorySpaceFn` is
pretty simple. This change:
- Updates the `bufferization.to_memref` and `bufferization.to_tensor`
operations to explicitly include operand and destination types,
whereas previously they relied on type inference to deduce the
tensor types. Since the type inference cannot recover the correct
tensor encoding/memory space, the operand and result types must be
explicitly included. This is a small assembly format change, but it
touches a large number of test files.
- Makes minor updates to other bufferization functions to handle the
changes in building the above ops.
- Updates bufferization of `tensor.from_elements` to handle memory
space.
Integration/upgrade guide:
In downstream projects, if you have tests or MLIR files that explicitly
use
`bufferization.to_tensor` or `bufferization.to_memref`, then update
them to the new assembly format as follows:
```
%1 = bufferization.to_memref %0 : memref<10xf32>
%2 = bufferization.to_tensor %1 : memref<10xf32>
```
becomes
```
%1 = bufferization.to_memref %0 : tensor<10xf32> to memref<10xf32>
%2 = bufferization.to_tensor %0 : memref<10xf32> to tensor<10xf32>
```
The `getIterationDomainTileFromOperandTile` implementation for
tensor.unpack did not clamp sizes when the unpack op had extract_slice
semantics. This PR fixes the bug.
The PR also makes a minor change to `tileAndFuseConsumerOfSlice`. When
replacing DPS inits, the iteration domain is needed, and it is computed
from the tiled version of the operation after the initial tiling
transformation. This can result in some extra indexing computation, so
the PR changes it to use the original full sized cloned consumer op.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
The dialect conversion-based bufferization passes have been migrated to
One-Shot Bufferize about two years ago. To clean up the code base, this
commit removes the `scf-bufferize` pass, one of the few remaining parts
of the old infrastructure. Most bufferization passes have already been
removed.
Note for LLVM integration: If you depend on this pass, migrate to
One-Shot Bufferize or copy the pass to your codebase.
When pipelining an `scf.for` with dynamic loop bounds, the epilogue
ramp-down must align with the prologue when num_stages >
total_iterations.
For example:
```
scf.for (0..ub) {
load(i)
add(i)
store(i)
}
```
When num_stages=3 the pipeline follows:
```
load(0) - add(0) - scf.for (0..ub-2) - store(ub-2)
load(1) - - add(ub-1) - store(ub-1)
```
The trailing `store(ub-2)`, `i=ub-2`, must align with the ramp-up for
`i=0` when `ub < num_stages-1`, so the index `i` should be `max(0,
ub-2)` and each subsequent index is an increment. The predicate must
also handle this scenario, so it becomes `predicate[0] =
total_iterations > epilogue_stage`.
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.
Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.
Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
The SCF helper for tiling an operation implementing the TilingInterface
and greedily fusing consumers requires an uninterrupted chain of
operations implementing the tiling interface to succeed. There can be
cases with intermediate ops that don't implement the interface but have
producers that could be fused if various canonicalization/simplification
patterns could run in between fusion steps.
This adds an option to SCFTileAndFuseOptions for a pattern set to run
between fusion steps to the ops that result from fusion/tiling. Removed
and newly inserted slices are tracked for continued fusion applications.
See this RFC for more discussion:
https://discourse.llvm.org/t/rfc-split-fusion-portions-of-the-tilinginterface-into-a-new-interface/81155
There are some spurious libraries which can be removed.
I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
-- This commit extends consumer fusion to take place even if the
producer has multiple uses.
-- The multiple uses of the producer essentially means that besides the
consumer op in concern, the only other uses of the producer are
allowed in :-
1. scf.yield
2. tensor.parallel_insert_slice
Signed-off-by: Abhishek Varma <abhvarma@amd.com>
For index type of induction variable, the indexing math is better
represented using affine ops such as `affine.delinearize_index`.
This also further demonstrates that some of these `affine` ops might
need to move to a different dialect. For one these ops only support
`IndexType` when they should be able to work with any integer type.
This change also includes some canonicalization patterns for
`affine.delinearize_index` operation to
1) Drop unit `basis` values
2) Remove the `delinearize_index` op when the `linear_index` is a loop
induction variable of a normalized loop and the `basis` is of size 1 and
is also the upper bound of the normalized loop.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This fixes loop iteration count calculation if the step is
a negative value, where we should adjust the added
delta from `step-1` to `step+1` when doing the ceil div.
Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of tiled/tiled+fused operations to see if they are
produced by `extract_slice` operations to populate the worklist used to
continue fusion. This implicit assumption does not always work. Instead
make the implementations of `getTiledImplementation` return the slices
to use to continue fusion.
This is a breaking change
- To continue to get the same behavior of
`scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree
implementation of `TilingInterface::getTiledImplementation` to return
the slices to continue fusion on. All in-tree implementations have been
adapted to this.
- This change touches parts that required a simplification to the
`ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a
`std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that
should be `std::nullopt` if fusion is not to be performed.
Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.
```
%0 = scf.for() {
%1 = scf.for() {
tiledProducer
}
yield %1
}
%2 = consumer ins(%0)
```
Compared with #94190, this PR fix build failure by making C++17 happy.
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.
```
%0 = scf.for() {
%1 = scf.for() {
tiledProducer
}
yield %1
}
%2 = consumer ins(%0)
```
The SCFLoopPipelining allows predication on peeled or loop ops. When the
predicationFn returns a nullptr this signifies the op type is
unsupported and the pipeliner fails except in `emitPrologue` where it
asserts.
This patch fixes handling in the prologue to gracefully fail.
Previously the values in the peeled prologue that weren't treated with
the `predicateFn` were passed to the loop body without any other
predication. If those values are later used outside of the loop body,
they may be incorrect if the num iterations is smaller than num stages -
1. We need similar masking for those, as is done in the main loop body,
using already existing predicates.
The implementation of these methods are legacy and they are removed in
favor of using the `scf::tileUsingSCF` methods as replacements. To get
the latter on par with requirements of the deprecated methods, the
tiling allows one to specify the maximum number of tiles to use instead
of specifying the tile sizes. When tiling to `scf.forall` this
specification is used to generate the `num_threads` version of the
operation.
A slight deviation from previous implementation is that the deprecated
method always generated the `num_threads` variant of the `scf.forall`
operation. Instead now this is driven by the tiling options specified.
This reduces the indexing math generated when the tile sizes are
specified.
**Moving from `linalg::tileToForallOp` to `scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> numThreads;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOp(b, op, numThreads, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setNumThreads(numThreads);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
This generates the `numThreads` version of the `scf.forall` for the
inter-tile loops, i.e.
```
... = scf.forall (%arg0, %arg1) in (%nt0, %nt1) shared_outs(...)
```
**Moving from `linalg::tileToForallOpUsingTileSizes` to
`scf::tileUsingSCF`**
```
OpBuilder b;
TilingInterface op;
ArrayRef<OpFoldResult> tileSizes;
ArrayAttr mapping;
FailureOr<ForallTilingResult> result =linalg::tileToForallOpUsingTileSizes(b, op, tileSizes, mapping);
```
can be replaced by
```
scf::SCFTilingOptions options;
options.setTileSizes(tileSizes);
options.setLoopType(scf::SCFTilingOptions::LoopType::ForallOp);
options.setMapping(mapping.getValue()); /*note the difference that setMapping takes an ArrayRef<Attribute> */
FailureOr<scf::SCFTilingResult> result = scf::tileUsingSCF(b, op, options);
```
Also note that `linalg::tileToForallOpUsingTileSizes` would effectively
call the `linalg::tileToForallOp` by computing the `numThreads` from the
`op` and `tileSizes` and generate the `numThreads` version of the
`scf.forall`. That is not the case anymore. Instead this will directly
generate the `tileSizes` version of the `scf.forall` op
```
... = scf.forall(%arg0, %arg1) = (%lb0, %lb1) to (%ub0, %ub1) step(%step0, %step1) shared_outs(...)
```
If you actually want to use the `numThreads` version, it is upto the
caller to compute the `numThreads` and set `options.setNumThreads`
instead of `options.setTileSizes`. Note that there is a slight
difference in the num threads version and tile size version. The former
requires an additional `affine.max` on the tile size to ensure
non-negative tile sizes. When lowering to `numThreads` version this
`affine.max` is not needed since by construction the tile sizes are
non-negative. In previous implementations, the `numThreads` version
generated when using the `linalg::tileToForallOpUsingTileSizes` method
would avoid generating the `affine.max` operation. To get the same
state, downstream users will have to additionally normalize the
`scf.forall` operation.
**Changes to `transform.structured.tile_using_forall`**
The transform dialect op that called into `linalg::tileToForallOp` and
`linalg::tileToForallOpUsingTileSizes` have been modified to call
`scf::tileUsingSCF`. The transform dialect op always generates the
`numThreads` version of the `scf.forall` op. So when `tile_sizes` are
specified for the transform dialect op, first the `tile_sizes` version
of the `scf.forall` is generated by the `scf::tileUsingSCF` method which
is then further normalized to get back to the same state. So there is no
functional change to `transform.structured.tile_using_forall`. It always
generates the `numThreads` version of the `scf.forall` op (as it did
before this change).
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
Define SCF dialect patterns rotating `scf.while` loops leveraging
existing `mlir::scf::wrapWhileLoopInZeroTripCheck`. `forceCreateCheck`
is always `false` as the pattern would lead to an infinite recursion
otherwise.
This pattern rotates `scf.while` ops, mutating them from "while" loops to
"do-while" loops. A guard checking the condition for the first iteration
is inserted. Note this guard can be optimized away if the compiler can
prove the loop will be executed at least once.
Using this pattern, the following while loop:
```mlir
scf.while (%arg0 = %init) : (i32) -> i64 {
%val = .., %arg0 : i64
%cond = arith.cmpi .., %arg0 : i32
scf.condition(%cond) %val : i64
} do {
^bb0(%arg1: i64):
%next = .., %arg1 : i32
scf.yield %next : i32
}
```
Can be transformed into:
``` mlir
%pre_val = .., %init : i64
%pre_cond = arith.cmpi .., %init : i32
scf.if %pre_cond -> i64 {
%res = scf.while (%arg1 = %va0) : (i64) -> i64 {
// Original after block
%next = .., %arg1 : i32
// Original before block
%val = .., %next : i64
%cond = arith.cmpi .., %next : i32
scf.condition(%cond) %val : i64
} do {
^bb0(%arg2: i64):
%scf.yield %arg2 : i32
}
scf.yield %res : i64
} else {
scf.yield %pre_val : i64
}
```
The test pass for `wrapWhileLoopInZeroTripCheck` has been modified to
use the new pattern when `forceCreateCheck=false`.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
This reverts commit edbc0e30a9.
Reason for rollback. ASAN complains about this PR:
==4320==ERROR: AddressSanitizer: heap-use-after-free on address 0x502000006cd8 at pc 0x55e2978d63cf bp 0x7ffe6431c2b0 sp 0x7ffe6431c2a8
READ of size 8 at 0x502000006cd8 thread T0
#0 0x55e2978d63ce in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr> mlir/include/mlir/IR/IRMapping.h:40:11
#1 0x55e2978d63ce in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:156:11
#2 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
#3 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
#4 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#5 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#6 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#7 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
#8 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
#9 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#10 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#11 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#12 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
#13 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
#14 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
#15 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
#16 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#17 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#18 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
#19 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
#20 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
#21 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
#22 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
#23 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
#24 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
#25 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
#26 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#27 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#28 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
#29 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
#30 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
#31 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
#32 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
#33 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
#34 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
#35 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
#36 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
#37 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
0x502000006cd8 is located 8 bytes inside of 16-byte region [0x502000006cd0,0x502000006ce0)
freed by thread T0 here:
#0 0x55e29130b7e2 in operator delete(void*, unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:155:3
#1 0x55e2979eb657 in __libcpp_operator_delete<void *, unsigned long>
#2 0x55e2979eb657 in __do_deallocate_handle_size<>
#3 0x55e2979eb657 in __libcpp_deallocate
#4 0x55e2979eb657 in deallocate
#5 0x55e2979eb657 in deallocate
#6 0x55e2979eb657 in operator()
#7 0x55e2979eb657 in ~vector
#8 0x55e2979eb657 in mlir::Block::~Block() mlir/lib/IR/Block.cpp:24:1
#9 0x55e2979ebc17 in deleteNode llvm/include/llvm/ADT/ilist.h:42:39
#10 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:205:5
#11 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:209:39
#12 0x55e2979ebc17 in mlir::Block::erase() mlir/lib/IR/Block.cpp:67:28
#13 0x55e297aef978 in mlir::RewriterBase::eraseBlock(mlir::Block*) mlir/lib/IR/PatternMatch.cpp:245:10
#14 0x55e297af0563 in mlir::RewriterBase::inlineBlockBefore(mlir::Block*, mlir::Block*, llvm::ilist_iterator<llvm::ilist_detail::node_options<mlir::Operation, false, false, void, false, void>, false, false>, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:331:3
#15 0x55e297af06d8 in mlir::RewriterBase::mergeBlocks(mlir::Block*, mlir::Block*, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:341:3
#16 0x55e297036608 in mlir::scf::ForOp::replaceWithAdditionalYields(mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) mlir/lib/Dialect/SCF/IR/SCF.cpp:575:12
#17 0x55e2970673ca in mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Model<mlir::scf::ForOp>::replaceWithAdditionalYields(mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.h.inc:658:56
#18 0x55e2978d5feb in replaceWithAdditionalYields blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.cpp.inc:105:14
#19 0x55e2978d5feb in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:135:14
#20 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
#21 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
#22 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#23 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#24 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#25 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
#26 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
#27 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
#28 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
#29 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
#30 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
#31 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
#32 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
#33 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
#34 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#35 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#36 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
#37 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
#38 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
#39 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
#40 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
#41 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
#42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
#43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
#44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
#47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
#48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
#49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
#50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
#51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
previously allocated by thread T0 here:
#0 0x55e29130ab5d in operator new(unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:86:3
#1 0x55e2979ed5d4 in __libcpp_operator_new<unsigned long>
#2 0x55e2979ed5d4 in __libcpp_allocate
#3 0x55e2979ed5d4 in allocate
#4 0x55e2979ed5d4 in __allocate_at_least<std::__u::allocator<mlir::BlockArgument> >
#5 0x55e2979ed5d4 in __split_buffer
#6 0x55e2979ed5d4 in mlir::BlockArgument* std::__u::vector<mlir::BlockArgument, std::__u::allocator<mlir::BlockArgument>>::__push_back_slow_path<mlir::BlockArgument const&>(mlir::BlockArgument const&)
#7 0x55e2979ec0f2 in push_back
#8 0x55e2979ec0f2 in mlir::Block::addArgument(mlir::Type, mlir::Location) mlir/lib/IR/Block.cpp:154:13
#9 0x55e29796e457 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2172:34
#10 0x55e29796e457 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
#11 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
#12 0x55e297035742 in mlir::scf::ForOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/SCF/IR/SCF.cpp:521:14
#13 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
#14 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
#15 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#16 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#17 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
#18 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
#19 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
#20 0x55e297971d20 in parseBlockBody mlir/lib/AsmParser/Parser.cpp:2296:9
#21 0x55e297971d20 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) mlir/lib/AsmParser/Parser.cpp:2226:12
#22 0x55e29796e4f5 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2184:7
#23 0x55e29796e4f5 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
#24 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
#25 0x55e29796b2cf in (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1796:12
#26 0x55e2978d89ff in mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) mlir/lib/Interfaces/FunctionImplementation.cpp:232:14
#27 0x55e2969ba41d in mlir::func::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/Func/IR/FuncOps.cpp:203:10
#28 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
#29 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
#30 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#31 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#32 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
#33 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
#34 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
#35 0x55e297959b78 in parse mlir/lib/AsmParser/Parser.cpp:2725:20
#36 0x55e297959b78 in mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) mlir/lib/AsmParser/Parser.cpp:2785:41
#37 0x55e29790d5c2 in mlir::parseSourceFile(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) mlir/lib/Parser/Parser.cpp:46:10
#38 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp, const std::__u::shared_ptr<llvm::SourceMgr> &> mlir/include/mlir/Parser/Parser.h:159:14
#39 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp> mlir/include/mlir/Parser/Parser.h:189:10
#40 0x55e291ebbfe2 in mlir::parseSourceFileForTool(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::ParserConfig const&, bool) mlir/include/mlir/Tools/ParseUtilities.h:31:12
#41 0x55e291ebb263 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:383:33
#42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
#43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
#44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
#47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
#48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
#49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
#50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
#51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
#52 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
#53 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
#54 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
#55 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120
SUMMARY: AddressSanitizer: heap-use-after-free mlir/include/mlir/IR/IRMapping.h:40:11 in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr>
Shadow bytes around the buggy address:
0x502000006a00: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
0x502000006a80: fa fa 00 fa fa fa 00 00 fa fa 00 00 fa fa 00 00
0x502000006b00: fa fa 00 00 fa fa 00 00 fa fa 00 fa fa fa 00 fa
0x502000006b80: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 00
0x502000006c00: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa fd fa
=>0x502000006c80: fa fa fd fa fa fa fd fd fa fa fd[fd]fa fa fd fd
0x502000006d00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
0x502000006d80: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
0x502000006e00: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 fa
0x502000006e80: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
0x502000006f00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==4320==ABORTING
The refactor had a bug where the fused loop was inserted in an incorrect
location. This patch fixes the bug and relands the original PR
https://github.com/llvm/llvm-project/pull/94391.
This patch refactors code related to LoopFuseSiblingOp transform in
attempt to reduce duplicate common code. The aim is to refactor as much
as possible to a functions on LoopLikeOpInterfaces, but this is still a
work in progress. A full refactor will require more additions to the
LoopLikeOpInterface.
In addition, scf.parallel fusion support has been added.
This patch refactors code related to `LoopFuseSiblingOp` transform in
attempt to reduce duplicate common code. The aim is to refactor as much
as possible to a functions on `LoopLikeOpInterface`s, but this is still
a work in progress. A full refactor will require more additions to the
`LoopLikeOpInterface`.
In addition, `scf.parallel` fusion support has been added.
This patch extends the functionality of yielding replacement for multiple
results case and adds another optional argument called `yieldResultNumber`
indicating which result(s) need yield. If not given, all of results will be yield
by default.
The `TilingInterface` methods have return values that allow the
interface implementation to return multiple operations, and also return
tiled values explicitly. This is to avoid the assumption that the
interface needs to return a single operation and this operations result
are the expected tiled values. Make the
`PartialReductionOpInterface::tileToPartialReduction` return
`TilingResult` as well for the same reason.
Similarly make the `PartialReductionOpInterface::mergeReductions` also
return a list of generated operations and values to use as replacements.
This is just a refactoring to allow for deprecation of
`linalg::tileReductionUsingForall` with `scf::tileReductionUsingSCF`
method.
This patch adds `getLoopInductionVars`, `getLoopLowerBounds`,
`getLoopBounds`, `getLoopSteps` interface methods to
`LoopLIkeOpInterface`. The corresponding single value versions have been
moved to shared class declaration and have been implemented based on the
new interface methods.
There is currently no path to lower scf.forall to scf.parallel with the
goal of targeting the OpenMP dialect.
In the SCF->ControlFlow conversion, scf.forall is briefly converted to
scf.parallel, but the scf.parallel is lowered directly to a sequential
loop. This makes experimenting with scf.forall for CPU execution
difficult.
This change factors out the rewrite in the SCF->ControlFlow pass into a
utility function that can then be used in the SCF->ControlFlow lowering
and via a separate -scf-forall-to-parallel pass.
---------
Co-authored-by: Spenser Bauman <sabauma@fastmail>